# Delving into the Linux boot process for an ARM SoC Ajay Kumar, Thiagu Ramalingam FDS S/W solutions - Samsung Semiconductor India Research ## **CONTENTS** - ARMv8 SoC basic architecture - SoC internal memory and bootup - Bootloader - Setup and Initialize the RAM - Copy images to main memory - Decompressing the kernel image - Kernel image header - Kernel image header - Prepare for Jumping into Kernel - Deciding CPU boot configuration - Jumping into Kernel: primary\_entry - Arch Independent Kernel Starting Point - Process 0 - The First Processor Activation - setup\_arch - Scheduler Initialization - SMP on ARM SOC - irq\_init and time\_init System Timer - rest\_init # **Assumptions** - ARMv8 SoC - Hypervisor not used - BL0, BL1, etc the bootloader "stages" conceptual only. - Microcontrollers handling initial SoC boot aren't covered ## **ARMv8 SoC basic architecture** - Example of a simple (complex?) BigLittle ARM SoC: - A SoC is basically an organization of various components: - CPU clusters - System buses - Memory controller - Main Memory - Other sub systems (Display, GPU, Peripherals, Host controllers, etc) - The SOC also consists of components like Clock, Power switches, Power Domains for sub blocks. # **SoC** internal memory - Apart from main memory, SoC will have a ROM (Read-Only Memory) which contains minimal code to setup the system for next stage binary loading. This piece of code is executed upon CA block reset. - It might also have an SRAM (volatile memory) which can help in execution of initial C routines. ## **ROM** code - The ROM code does minimal initialization of SRAM block and copy Bootloader(BL0) from storage/flash memory to SRAM memory. - Runs in EL3 mode - Interrupts are mostly disabled at this stage - Powering up core clocks, power domains - Setting up C environment on SRAM for BL0 execution # **Setup and Initialize the RAM** - Now the Bootloader BL0 executing from SRAM can further initialize the system clocks, power domains and most importantly initialize main memory. - The Bootloader is expected to find and initialize all RAM that the kernel will use. It performs this in a machine dependent manner. ## **BL1** and Secure Monitor - Once the primary Bootloader BL0 has initialized the main memory, it can load a secondary bootloader (BL1) which can execute from main memory. - BL1 initializes the system for supporting Linux boot, loads other binaries needed for Linux boot from storage to main memory. Can have interrupts enabled. Initialize system BL0/BL1 should also keep a Secure Monitor code(SMC) for handling secure access. # Copy images to main memory - DTB Device Tree Blob Description of Hardware in Device Tree format - Image Actual Kernel binary - Ramdisk Initial RAMDISK minimal rootfs loaded before mounting actual root file system. Required to execute init scripts - All loaded to memory via BL1 ## **Device Tree Blob** - Description of Hardware in Device tree format - Contains memory mapped addresses and information about CPU, memory, GPIO, clocks, peripherals, etc. - Before the kernel is executed, bootloader selects proper device tree file and passes it as an argument to the kernel - This is because the dtb will be mapped cacheable using blocks of up to 2 megabytes in size, it must not be placed within any 2M region which must be mapped with any specific attributes. # Decompressing the kernel image - Image Actual Kernel binary, Image.gz Compressed Kernel binary - The AArch64 kernel does not currently provide a decompressor and therefore requires decompression (gzip etc.) to be performed by the boot loader if a compressed Image target (e.g. Image.gz) is used. For bootloaders that do not implement this requirement, the uncompressed Image target is available instead. ## Kernel image header ``` THE LINUX FOUNDATION ``` ``` The decompressed kernel image contains a 64-byte header as follows:: u32 code0; /* Executable code */ /* Executable code */ u32 code1; /* Image load offset, little endian */ u64 text offset; /* Effective Image size, little endian */ u64 image size; /* kernel flags, little endian */ u64 flags; /* reserved */ u64 res2 /* reserved */ u64 res3 = 0; u64 res4 /* reserved */ /* Magic number, little endian, "ARM\x64" *, u32 magic = 0x644d5241; /* reserved (used for PE COFF offset) */ u32 res5; ``` - code Start of text section - text\_offset Obsolete - image\_size Effective Image size - Over the years, few of these fields have become obsolete (ex: text\_offset) ## Kernel image header flags ``` The decompressed kernel image contains a 64-byte header as follows:: u32 code0; /* Executable code */ /* Executable code */ u32 code1; u64 text offset; /* Image load offset, little endian */ /* Effective Image size, little endian */ u64 image size: kernel flags, little endian */ u64 flags; /* reserved */ = 0; u64 res2 u64 res3 = 0; /* reserved */ u64 res4 = 0; /* reserved */ u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ /* reserved (used for PE COFF offset) */ u32 res5; ``` - Bit [0] Kernel endianness: 1 if BigEndian, 0 if LittleEndian. - Bit [1-2] Kernel Page size: 0 Unspecified, 1 4K, 2 16K, 3 64K - Bit [3] Kernel physical placement - 0: 2MB aligned base should be as close as possible to the base of DRAM, since memory below it is not accessible via the linear mapping - 1: 2MB aligned base may be anywhere in physical memory - Bits [4-63] Reserved. ## Kernel Header dump ``` 00000000: 4d5a 40fa ff3f 6214 0000 0000 0000 0000 MZ@...?b...... 00000010: 0000 3102 0000 0000 0a00 0000 0000 0000 00000030: 0000 0000 0000 0000 4152 4d64 <del>4000 0000</del> 00000040: 5045 0000 64aa 0200 0000 0000 0000 0000 00000050: 0000 0000 a000 0602 0b02 0214 0000 9401 00000060: 0000 9c00 0000 0000 5c23 9<del>001 0000</del> 0100 00000070: 0000 0000 0000 0000 0000 0100 0002 0000 0000 0100 0000 0000 0000 0000 0000 00000090: 0000 3102 0000 0100 0000 0000 0a00 0000 0000 0600 0000 0000 0000 0000 0000 000000fo: 0000 0000 0000 0000 2e74 6578 7400 0000 00000100: 0000 9401 0000 0100 0000 9401 0000 0100 00000120: 2e64 6174 6100 0000 0000 9c00 0000 9501 .data..... 00000130: 00d2 9200 0000 9501 0000 0000 0000 0000 00000140: 0000 0000 4000 00c0 1f20 03d5 1f20 03d5 00000150: 1f20 03d5 1f20 03d5 1f20 03d5 1f20 03d5 00000160: 1f20 03d5 1f20 03d5 1f20 03d5 1f20 03d5 ``` ``` The decompressed kernel image contains a 64-byte header as follows:: u32 code0; /* Executable code */ /* Executable code */ u32 code1: u64 text offset; /* Image load offset, little endian */ /* Effective Image size, little endian */ μ64 image size; /* kernel flags, little endian */ u64_flags; /* reserved */ uo4 res2 /* reserved */ u64 res3 = 0: u64 res4 /* reserved */ = 0: = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ u32 magic /* reserved (used for PE COFF offset) */ u32 res5; ``` # **Prepare for Jumping into Kernel** - After placing the kernel image, what remains is setting up remaining environment for jumping into kernel. - Before jumping into the kernel: - Disable DMA capable devices so that memory does not get corrupted - CPU mode - Primary CPU general-purpose register settings: - x0 = physical address of device tree blob (dtb) in system RAM. - x1 = 0, x2 = 0, x3 = 0 - Secondary CPU general-purpose register settings: - x0 = 0, x1 = 0, x2 = 0, x3 = 0 (reserved for future use) - All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, IRQ and FIQ) - The MMU must be off. - Caches: The instruction cache may be on or off, and must not hold any stale entries corresponding to the loaded kernel image. # Prepare for Jumping into Kernel (contd...) ## Before jumping into the kernel (contd...): - Architected timers: Timers at different exception level have to be initialized. - Coherency: All CPUs to be booted by the kernel must be part of the same coherency domain on entry to the kernel. This may require IMPLEMENTATION DEFINED initialization to enable the receiving of maintenance operations on each CPU. For ARMv8 Linux, all CPU under SMP fall into same Inner Shareable domain. - System registers: All writable architected system registers at or below the exception level where the kernel image will be entered must be initialized by software at a higher exception level to prevent execution in an UNKNOWN state - The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. - All CPUs must enter the kernel in the same exception level. # **Deciding CPU boot configuration** - The primary CPU jumps directly to the first instruction of the kernel image. - The device tree blob passed by this CPU must contain an 'enable-method' property for other cpu nodes. - "psci" enable-method: - kernel will issue CPU\_ON calls as described in Power State Coordination Interface - Secure monitor code (ATF) will take care of powering up CPU internally - Platforms mostly use PSCI method. - "spin-table" enable-method: - must have a 'cpu-release-addr' property in their cpu node - These CPUs should spin outside of the kernel in a reserved area of memory polling their cpu-release-addr location - A wfe instruction may be inserted to reduce the overhead of the busy-loop and a sev will be issued by the primary CPU. # **Jumping into Kernel** - Once the bootloader BL1 has performed all necessary SOC initialization (clocks, power domains) and prepared for jumping to kernel, it will jump to kernel. - Lets take example of coreboot: ``` /* May update bl31_params if necessary. */ void *bl31_plat_params = soc_get_bl31_plat_params(&bl31_params); /* MMU disable will flush cache, so passed params land in memory. */ raw_write_daif(SPSR_EXCEPTION_MASK); mmu_disable(); bl31_entry(&bl31_params, bl31_plat_params); die("BL31_returned!"); ``` src/arch/arm64/arm\_tf.c 3rdparty/arm-trusted-firmware/bl31/bl31\_main.c # **Snapshot before jumping to kernel** - **CPU 0: EL1** - CMU, PMU on - MMU off - Data cache off, Instruction cache may be kept on - Binaries placed in memory at respective addresses adhering to respective constraints. BL<sub>1</sub> Image dtb ramdisk # head.S: primary\_entry: Kernel entry point: ``` * Kernel startup entry point. The requirements are: MMU = off, D-cache = off, I-cache = on or off, x0 = physical address to the FDT blob. This code is mostly position independent so you call this at pa(PAGE OFFSET). Note that the callee-saved registers are used for storing variables that are useful before the MMU is enabled. The allocations are described * in the entry routines. HEAD * DO NOT MODIFY. Image header expected by Linux boot-loaders. // special NOP to identity as PE/COFF executable efi_signature_nop // branch to kernel start, magic primary entry // Image load offset from start of RAM, little-endian le64sym kernel size le // Effective size of kernel image, little-endian le64sým kernel flags le // Informative flags, little-endian . guad // reserved // reserved // reserved // Magic number .ascii ARM64 IMAGE MAGIC .Lpe header offset // Offset to the PE header. EFI PE HEADER INIT * The following callee saved general purpose registers are used on the * primary lowlevel boot path: Register primary entry() .. start kernel() FDT pointer passed at boot in x0 primary entry() .. start kernel() physical misalignment/KASLR offset __create_page_tables() callee preserved temp register primary switch() x19/x20 callee preserved temp registers primary switch() .. relocate kernel() current RELR displacement ``` primary\_entry (or stext in earlier versions of linux kernel) is the entry point of arm64 architecture (arch/arm64/kernel/head.S) # arch/arm64/kernel/head.S: primary\_entry: - preserve\_boot\_args: Preserve the arguments passed by the bootloader in x0-x3 (x21 = x0 = FDT) - init\_kernel\_el: Setup based on the current kernel exception level - EL1/EL2 and return w0=cpu\_boot\_mode - KASLR (Kernel Address Space Layout Randomization) setting. - set\_cpu\_boot\_mode\_flag: Sets the \_\_boot\_cpu\_mode flag depending on the CPU boot mode passed in w0, for later usage. - \_\_create\_page\_tables: Setup the initial page tables - Identity mapping for MMU enable code (low address, TTBR0) idmap\_pg\_dir. - Linear mapping for first few MB of the kernel init\_pg\_dir # arch/arm64/mm/proc.S ## \_\_cpu\_setup: - Initialize processor for turning the MMU on: clear TLB, set size for virtual, physical addresses, enable VM features. - Sets the TCR (Translation control register), and SCTRL (System control register) to do the same. #### \_\_primary\_switch: - Set Page table address for TTBR0 (idmap\_pg\_dir), TTBR1(init\_pg\_dir) - \_enable\_mmu check and configure for required Page granule, turn MMU on - Try to relocate kernel if possible KASLR - call \_\_primary\_switched: - Assign EL1 vector table, Clear BSS, setup kernel stack, create FDT mapping, call start kernel # **Arch Independent Kernel Starting Point** Kernel is always booted by architecture specific code. But then execution is passed to the start\_kernel function that is responsible for common kernel initialization and is an architecture independent kernel starting point. The main purpose of the start\_kernel to finish kernel initialization process and launch the first init process. ``` asmlinkage __visible void __init start_kernel(void) char *command line; char *after_dashes; set task stack end magic(&init task); smp setup processor id(); debug objects early init(); cgroup init early(); local_irq_disable(); early boot irgs disabled = true; boot cpu init(); page_address_init(); pr_notice("%s", linux banner); early security init(): setup_arch(&command_line); setup command line(command line); setup nr cpu ids(); setup per cpu areas(); smp prepare boot cpu(); /* arch-specific boot-cpu hooks */ boot cpu hotplug init(); build all zonelists(NULL); page alloc init(); pr_notice("Kernel command line: %s\n", boot_command_line); ``` init/main.c # **Kernel Creating Process 0** ``` asmlinkage __visible void __init start_kernel(void) { char *command_line; char *after_dashes; set_task_stack_end_magic(&init_task); smp_setup_processor_id(); debug_objects_early_init(); ``` - init\_task represents the initial task structure, that stores all the information about a process. - The process 0 is statically defined. The only process that is not created by kernel thread nor fork. - set\_task\_stack\_end\_magic function will set the stack border of init\_task,which is the process0. ## **The First Processor Activation** - The function initializes various CPU masks for the bootstrap processor. - The processor id is got from the function: - int cpu = smp\_processor\_id(); - set the given CPU online, active, present, possible - set\_cpu\_online(cpu, true); - set\_cpu\_active(cpu, true); - set\_cpu\_present(cpu, true); - set\_cpu\_possible(cpu, true); - cpu\_possible : set of CPU ID's which can be plugged in at any time during the life of that system boot - cpu\_present : represents which CPUs are currently plugged in - cpu\_online: represents subset of the cpu\_present and indicates CPUs which are available for scheduling # setup\_arch() - early\_ioremap\_init: - for early users of early\_ioremap(paddr, size) - setup\_machine\_fdt - Parse 'bootargs' from DT 'chosen' node - Parse Physical Memory base and size, added into memblock subsystem - Parse Machine model - parse\_early\_param: - early\_param("mem", early\_mem); start\_kernel setup\_arch early\_param("earlycon", param\_setup\_earlycon); - earry\_paramit earrycom, param\_setup\_earr - early\_param("debug", debug\_kernel); - **cpu\_uninstall\_idmap**: Remove idmap\_pg\_dir from TTBR0\_EL1 and invalidate - arm64\_memblock\_init: - Reserve memory used by kernel image - Reserve memory specified in DT and specifical initialization if any # setup\_arch() contd(...) #### paging\_init / bootmem\_init - Remap kernel sections \_text, \_rodata, \_data and etc with fine grain permissions per segment to swapper\_pg\_dir - Create Linear mapping for available physical memory blocks - Switch page table to swapper\_pg\_dir - Build memory zones Usually only one DMA zone for ARM64 ## psci\_init - Firmware interface implementing CPU power related operations specified by ARM PSCI spec - Including CPU\_ON/OFF/SUSPNED/MIGRATION and etc. ## **Scheduler Initialization** - The scheduler subsystem is one of the core subsystems of the kernel. It is responsible for the rational allocation of CPU resources in the system. It needs to be able to handle the scheduling requirements of complex different types of tasks. - kernel has five scheduling classes, and the priority is distributed from high to low as follows - Scheduling initialization located at start\_kernel is relatively backward. At this time, the memory initialization has been completed, so you can see sched\_init can already call kzmalloc and other memory application functions. - sched\_init initialize the run queue (RQ), the global default bandwidth of DL / RT, the run queue of each scheduling class, and CFS soft interrupt registration for each CPU. ## SMP on ARM SOC - A symmetric multiprocessor system (SMP) is a multiprocessor system with centralized shared memory called main memory (MM) operating under a single operating system with two or more homogeneous processors. - Most of the SMP code is not architecture dependent (in kernel directory). - Few SMP functions related to the SoC: - smp\_init\_cpus(): - Setup the set of possible CPUs (via cpu\_possible()). - Can be removed if the CPU topology is up to date in the device tree. - Called very early during the boot process (from setup\_arch()). - smp\_prepare\_cpus(): - Enables coherency. - Initializes cpu\_possible map. - Prepares the resources (power, ram, clock...). - Called early during the boot process (before the initcalls but after setup\_arch()). - smp\_secondary\_init(): - Perform platform specific initialization of the specified CPU". - Called from secondary\_start\_kernel() on the CPU which has just been started. - smp\_boot\_secondary(): - Actually boots a secondary CPU identified by the CPU number given in parameter. - Called from cpu\_up() on the booting CPU. # irq\_init and time\_init() System Timer ## irq\_init - init\_irq\_stacks: Setup per CPU IRQ stack - irqchip\_init → of\_irq\_init(\_\_irqchip\_of\_table): Initialize the GIC controllers. Scans the device tree for matching interrupt controller nodes, and calls their initialization functions ## time\_init - The function time\_init() selects and initializes the system timer - It is a device that can be configured to periodically interrupt a processor with some predefined frequency. - One particular application of the timer, that it is used in the process scheduling - A scheduler needs to measure for how long each process has been executed and use this information to select the next process to run. - This measurement is based on timer interrupts. # rest\_init() - start\_kernel() initializes dozens of kernel subsystems and ends calling rest\_init(). - rest\_init() in its turn, spawns the very first user space process: kernel\_init(). - Its process id is 1 it will become the direct or indirect ancestor of all user space processes. - It also spawns kthread process (normally, process id 2), that's the parent of all kernel threads. - Finally, it runs **cpu\_idle**(), a process that takes over the CPU whenever there is no other process using it. - kernel\_init() will start any additional CPU core. - If there is a initial RAM disk is defined, it will decompress and mount it. - Then, it load the device drivers, mount the root file system in read-only mode and finally call the init process (normally, in /sbin/init). ## Before we end: - For a complete understanding please refer to following spec: - References: - ARMv8 architecture: https://developer.arm.com/documentation/den0024/a - Kernel source: <a href="https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/">https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/</a> - Booting on ARM64: <a href="https://www.kernel.org/doc/html/latest/arm64/booting.html">https://www.kernel.org/doc/html/latest/arm64/booting.html</a> - Analyzing Linux boot process : <a href="https://opensource.com/article/18/1/analyzing-linux-boot-process">https://opensource.com/article/18/1/analyzing-linux-boot-process</a> - SMP boot in Linux : <a href="https://developer.arm.com/documentation/den0013/d/Multi-core-processors/Booting-SMP-systems/SMP-boot-in-Linux">https://developer.arm.com/documentation/den0013/d/Multi-core-processors/Booting-SMP-systems/SMP-boot-in-Linux</a> - Memory Layout on AArch64 Linux : <a href="https://www.kernel.org/doc/html/latest/arm64/memory.html">https://www.kernel.org/doc/html/latest/arm64/memory.html</a>