Introduction
In this article, you will get to learn and understand the Windows Boot Process.
It is well known that to troubleshoot no boot issue, knowledge of the boot process is required. The entire boot process is certainly very complex and there are hundreds of system files that come into the picture. Each file has a different function in boot process failure of which will result in no boot situation.
Phase 1 – x86 and x64 Initial (Preboot)
The POST (PowerOn Self-Test)
- The power supply is switched on.
- The Microprocessor timer chip begins generating a reset to the CPU.
- The power supply performs a self-test.
- After the successful execution of the POST, the system BIOS then checks the first hard disk that is powered up.
- The motherboard ROM BIOS attempts to access the first boot device specified in the BIOS. (This is typically user-configurable and can be edited using the BIOS configuration utility.)
- The ROM BIOS reads cylinder 0, head 0, and sector 1 of the first boot device.
- The ROM BIOS loads that sector into memory and tests it.
- For a floppy drive, the first sector is a FAT boot sector.
- For a hard drive, the first sector is the Master Boot Record. (MBR).
- When booting to the hard drive, the ROM BIOS looks at the last two signature bytes of sector 1 and verifies they are equal to 55AAh which is known as End of Sector Marker.
- If the signature bytes do not equal End of Sector Marker, the system assumes that the MBR is corrupted or the hard drive has never been partitioned. This invokes BIOS Interrupt 18h, which displays an error that is BIOS-vendor specific such as “Operating System not found”.
- If the BIOS finds that the last two bytes are End of Sector Marker, the MBR program executes.
- The active partition’s boot sector loads and looks for NTLDR (legacy) or BootMgr (Vista and later). The contents of the boot sector are entirely dependent on the format of the partition.
- The role of the boot-sector code is to give NT just enough information about the structure and format of a logical drive to enable it to read the NTLDR file from the root directory.
- NTLDR is loaded into memory and executed and the Boot Loader phase begins.
Phase 2 – Boot Loader
NTLDR
- NTLDR begins its existence while the system is executing in an x86-operating mode called Real-Mode.
- In Real-Mode, no virtual to the physical translation of memory addresses occurs, which means that programs that use the addresses interpret them as physical addresses, and only the first 640KB of the computer’s physical memory is accessible.
- NTLDR is a very small file, so it easily fits into this memory space.
- The first action NTLDR takes is to switch the system from Real-Mode to Protected-Mode. Still, no virtual-to-physical translation occurs at this point in the boot process, but a full 32 bits of memory addressing becomes accessible.
- After it is in Protected-Mode, NTLDR can access all of the physical memory. After creating enough page tables to make memory below 1MB accessible with paging turned on, NTLDR enables paging. Protected-Mode with paging enabled is the mode in which NT executes in normal operation.
- NTLDR then performs the following steps:
- NTLDR starts the FAT and NTFS mini file system drivers contained in its own code. Unlike the boot sector’s file system code, NTLDR’s code can read subdirectories.
- NTLDR reads the HIBERFIL.SYS to determine if it contains a valid hibernator file. If a valid hibernator file is found, then that file is loaded into memory and the boot process is completed.
- If HIBERFIL.SYS is not found, NTLDR parses the BOOT.INI file. If the BOOT.INI contains a reference to more than one operating system, then the boot loader screen is displayed.
- NTLDR loads NTOSKRNL.EXE and HAL.DLL into memory.
- NTLDR loads the System registry hive, \Windows\System32\Config\System, so that it can determine which device drivers need to be loaded to accomplish the boot.
- NTLDR begins to build the HKLM\SYSTEM\CURRENTCONTROLSET key.
- NTLDR reads the HKLM\SYSTEM\SELECT registry key.
The Windows Vista Boot Process
Windows Vista boots to the Boot Manager (BootMgr).
Boot Process in Windows Vista
- The system is powered on.
- The MBR is located on the system drive.
- The Boot Sector is located; BootMgr is loaded and looks for an active partition on sector 0.
- BootMgr reads the BCD file in the \Boot directory, gathers information about operating systems installed on the computer, and displays a boot menu if necessary.
- If a Windows Vista operating system is selected, BootMgr transfers control to WINLOAD.exe; in the case of a resume operation, WINRESUME.exe is called.
- If a down-level operating system is chosen, BootMgr transfers control to NTLDR or the down-level loader.
- WINLOAD.exe initializes memory and loads drivers set to start at boot, and then transfers control to the kernel.
Phase 3 – The Kernel and Executive Subsystems
- On Vista and later, when Winload calls Ntoskrnl, it passes a pointer to the memory tables Winload generated to describe the physical memory in the system, a pointer to the in-memory copy of the HARDWARE and SYSTEM registry hives, a pointer to the list of boot drivers Winload loaded, as well as various other information related to the boot processing performed up to this point.
- Ntoskrnl.exe performs the following steps:
- The Kernel displays either the Windows startup screen with the progress bar or the blue Windows NT startup screen with progress dots depending on OS.
- The Kernel creates the HKLM\HARDWARE key from data passed to it from NTLDR.
- The Kernel scans the registry for device drivers that have a Start value of 0x1.
- The I/O manager initialization now takes place.
- The I/O manager first initializes various internal structures and creates the driver and device object types. It then calls the Plug and Play manager, power manager, and the HAL to begin the various stages of dynamic device enumeration and initialization. Then the Windows Management Instrumentation (WMI) subsystem is initialized, which provides WMI support for device drivers that adhere to the Windows Driver Model (WDM). Next, all the boot-start drivers are called to perform their driver-specific initialization, and the system-start device drivers are loaded and initialized. Finally, the MS-DOS device names are created as symbolic links in the object manager’s namespace.
- The Kernel executes SMSS.EXE, the Session Manager Subsystem. SMSS is responsible for creating the user-mode environment that provides the visible interface to Windows.
SMSS.EXE
- Creating LPC port objects and threads for client requests.
- Defining symbolic links for MS-DOS-based devices.
- Performing any delayed file-rename operations or pending file deletions.
- Opening known DLLs.
- Creating additional paging files if any are specified.
- Initializing the registry.
- Creating the system environment variables.
- Loading the Kernel-mode portion of the Win32k subsystem.
- Starting subsystem processes.
- Starting the logon process (WINLOGON.EXE).
- After SMSS.EXE has performed its required tasks, it remains in memory waiting for either Winlogon or Csrss to end.
- SMSS.EXE reads the Subsystems key again and looks for the Windows Value. This loads the Windows 32-bit subsystem (CSRSS.EXE). CSRSS.EXE is the user-mode portion of the Win32 subsystem (with WIN32.SYS being the kernel-mode portion which is loaded as part of the Csrss initialization process). Csrss is an abbreviation for Client/Server Run-time Subsystem and is an essential subsystem that must be running at all times. CSRSS.EXE is responsible for console windows, creating and/or deleting threads and some parts of the 16-bit virtual MS-DOS environment.
- The Windows subsystem then starts the Winlogon process. Winlogon is the process responsible for managing user logon and logoff. Moreover, Winlogon is active only when the user presses CTRL+ALT+DEL, at which point it shows the security dialog box.
- SMSS.EXE loads and initializes all drivers with a Start value of 0x2 (Automatic).
- The progress bar is (finally) set to 100%, and the Winlogon phase begins.
Phase 4 – The Winlogon Phase
Interactive logon (as opposed to network logon) occurs through the interaction of the logon process (Winlogon), Lsass, one or more authentication packages, and the SAM or Active Directory.
Troubleshooting Tips:
Power-on self-test (Post)
- A failure at this stage requires the involvement of the h/w vendor.
- Normally because of a power outage or faulty hardware component can lead to failure here.
- If there is an error message we can continue with troubleshooting according to the error message (and be logical).
- Replacing registry hives when error message during post is “no drive found” really does not make sense.
- Checking BIOS settings, boot order can help.
- Sometimes, accidentally RAID controller is disabled from BIOS which also leads to no boot issue at this stage.
- Troubleshooting in this phase primarily revolves around hardware and BIOS settings.
Detection of MBR on disk
- Once the post is done, an open handle on disk is acquired on which OS is installed.
- 0x0000007B is one the most common stop code that we see when this handle acquiring fails (because of file system filter driver).
- Intel boot code is read from sector 0.
- If the Intel boot code is corrupt, many times it is resolved by running fix boot/fix MBR commands.
- After the Intel boot code, partition information is read.
- If the partition table is missing, the issue becomes even complex as we now have to use a disk probe on a working machine to fix the partition table issue.
- In many cases, both Intel boot code and partition table information are there, but the boot process still fails as the partition is not marked active.
- We can always use diskpart to check all the partitions and if correct partition is marked active.
Detection of boot.ini/BCD
- Incorrect entry in boot.ini/BCD also leads to no boot issues.
- Troubleshooting boot.ini is simple as it is a simple text file with limited entries in it.
- Troubleshooting BCD requires additional knowledge of BCD commands.
- Sometimes machine boots in recovery environment every time we boot.
- We can use BCD to disable the recovery environment.
- Sometimes, bootmgr.efi is missing from the system reserved partition.
- In this case, changes in BCD is required to point it to correct bootmgr.
- In case we receive errors for missing or corrupt drivers, we can disable signature verification enforcement (when a driver is actually not missing or corrupt).
- Changes in BCD to point to correct directory and correct location of Winload.exe.
In case failure in the following phases, machine registry hives are of great help and we need to make changes in the registry keys.
Loading and initialization of boot services/drivers (registry start-up value 0)
- After the splash screen is displayed for some time, the boot services come in the picture.
- Any failure at this stage will result in an error message/stop code for a particular file/driver.
- The file can be replaced by a working machine.
- If the above step is taken in windows 2008/2008 R2, disabling driver signature enforcement is a must.
- If replacing one file results in an error for other files and then another file, it time to stop troubleshooting.
- There is an alternative way to resolve the issue then which I do not want to document in this email.
- Incorrect RAID driver in boot image can also lead to this problem.
- Running startup and SFC off boot command can help in fixing any corruption.
- Swapping registry hives can help us here.
Loading and initialization of System services/drivers (registry start-up value 1)
- Failure at this stage again will result in an error message/stop code.
- Based on the stop code, troubleshooting can be performed.
- Third-party drivers are at fault most of the time (anti-virus and backup applications).
- We can capture the system registry and then make changes to system service/drivers.
- Running startrep and sfc offboot command can help in fixing any corruption.
- Swapping registry hives can help us here.
Loading and initialization of Automatic services/drivers (registry start-up value 2)
- The best thing to do at this stage is to check if we can ping the machine.
- If we are able to, then we can connect to the service and registry of the problem machine remotely from some other machine.
- This makes easier to identify the cause of the crash and hence resolving it.
- Many times, at this stage machine, is stuck at applying computer settings.
- The first thing that should be done is follow KB 2004121 (this can be followed for both 2008 and 2008 R2).
- Swapping hives can help here but it should be the last step.
- In case the machine is stuck at configuring updates, we can try to rename pending.xml and delete the pendingxmlidentifier and advancedinstallerneedsresolving keys from the component hive.
- The easier method is to disable the Trusted Installer (set registry value to 4).
- Once machine boots up, now we have to troubleshoot the servicing issue.
- But at least the machine is back in production.