NOTICE: This page has been moved to http://elinux.org/Kernel_XIP
This content may be out-of-date!!
Please do not edit this page here, but instead make
all future edits to this page on the new Embedded Linux Wiki.

This page describes the use of Kernel Execute-In-Place as a bootup time reduction technique.

Table Of Contents:

Description

When the kernel is executed in place, the bootloader does not have to 1) read the kernel from flash, 2) decompress the kernel, and 3) write the kernel to RAM.

How to implement or use

[describe how to achieve the technique (config options, command args, etc.)]

see KernelXIPInstructionsForOMAP

Expected Improvement - about .5 seconds

The expected improvement from using this technique depends on the size of the kernel, and the time to load it and decompress it from persistent storage.

In general, time savings of about .5 seconds have been observed.

Resources

Projects

Specifications

[list or link to CELF specifications related to this technique]

Patches

[ARM PATCH] 2154/2: XIP kernel for ARM

Patch from Nicolas Pitre

This patch allows for the kernel to be configured for XIP.
A lot of people are using semi hacked up XIP patches already so
it is a good idea to have a generic and clean implementation
supporting all ARM targets. The patch isn't too intrusive.

It involves:

- modifying the kernel entry code to map separate .text and .data
  sections in the initial page table, as well as relocating
  .data to ram when needed

- modifying the linker script to account for the different VMA and
  LMA for .data, as well as making sure that .init.data gets
  relocated to ram

- adding the final kernel mapping with a new MT_ROM mem type

- distinguishing between XIP and non-XIP for bootmem and memory
  resource declaration

- and adding proper target handling to Makefiles.

While at it, this also cleans up the kernel boot code a bit
so the kernel can now be compiled for any address in ram,
removing the need for a relation between kernel address and
start of ram.  Also throws in some more comments.

And finally the _text, _etext, _end and similar variables are now
declared extern void instead of extern char, or even extern int.
That allows for operations on their address directly without any
cast, and trying to reference them by mistake would yield an
error which is a good thing.

Tested both configurations: XIP and non XIP, the later
producing a kernel for execution from ram just as before.

Signed-off-by: Nicolas Pitre
Signed-off-by: Russell King

Case Studies

Case 1 - XIP on Arctic III PowerPC board

XIP was used on a PowerPC board, with the following results:

Table of bootup times:

Boot Stage

Non-XIP Time

XIP Time

Copy kernel to RAM

85 ms

12 ms *

Decompress kernel

453 ms

0 ms

Kernel time to initialize
(time to first user space program)

819 ms

882 ms

Total kernel boot time

1357 ms

894 ms

Reduction:

-

463 ms

* still have to copy data segment

Thanks to Todd Poynor of MontaVista for providing this information.

Case 2 - XIP on OMAP Innovator

XIP was used on a TI OMAP (Innovator board), with the following results:

Boot Stage

Non-XIP Time
Kernel compressed

Non-XIP Time
Kernel not compressed

XIP Time

Copy kernel to RAM

56 ms

120 ms

0 ms

Decompress kernel

545 ms

0 ms

0 ms

Kernel time to initialize
(time to first user space program)

88 ms

208 ms

110 ms

Total kernel boot time

689 ms

208 ms

110 ms

Reduction:

-

481 ms

579 ms

Thanks to Hiroyuki Machida of Sony for providing this information.

Case 3 - comparing NOR XIP with OneNAND quick-copy to RAM

Dongjun Shin of Samsung Electronics reports:

As I've mentioned in AG meeting, we've done some boot time measurements on OMAP 5912 target platform (OSK5912 from Spectrum Digital). We've done this experiment in order to identify the timing gap between NOR XIP and NAND shadowing. Here is the result (the number represents time in microseconds).

The column noted as "XIP tuning" means that we changed the NOR I/F setting of OMAP (EMIFS) so that the synchronous read is used instead of (default) asynchronous read.

In case of OneNAND, only 1Kbytes of initial part of OneNAND can be used as XIP region and we used 1Kbytes IPL for loading u-boot. Shadowing means that kernel copy (to RAM) is used.

The reason why the kernel initialization time are broken into 2 phases is that we used timer register for measurement and the timer is initialized during kernel booting. You can just add the values for 2 phases to get the total kernel booting time.

Boot stage

NOR

OneNAND

XIP

Shadowing

Normal

Tuning

Compressed

Uncompressed

Boot loader CPU frequency

96MHz

96 MHz

Boot loader (IPL)

0

0

5,999

5,999

Boot loader (u-boot)

388,146

372,538

356,821

356,810

Copy kernel to RAM

0

0

35,029

56,884

Decompress kernel

0

0

1,178,481

0

Kernel time to initialize - 1 phase

18,964

12,826

9,091

9,119

Kernel time to initialize - 2 phase

61,176

51,263

50,118

50,126

Total

468,287

436,626

1,635,540

478,938

Questions

TimRiker asks:

Implementation Notes (from the field)

Notes on configuring Linux for XIP (for PPC)

Using XIP with U-Boot on Arm

Wolfgang Denks, the primary author of the UBoot bootloader, wrote the following:

>> Yes. But... _Does_ mkimage -x put header on the front of it?

Yes, it does.

>>> > * You program the resulting image at 0x10004000.
>>> >
>>> > What is programmed at 0x10004000 ?  The xipImage code or the uboot header?
>
>>
>> The u-boot headers, yes. Thats wrong. But how to use mkimage -x then?
>> Is the header-caused offset known?

Yes. The U-Boot header is 64 bytes.

U-Boot expects (and verifies) that the entry point is  equal  to  the
load address plus the size of the U-Boot header.

Lots more details are in the thread (split across months in the archives):

How to determine offsets for sections

On Fri, 21 Oct 2005, Sreeni wrote:
>> Hi,
>>
>> I have a montavista XIP kernel running on ARM and my kernel will be in
>> the flash. Since its XIP, I know that the ".text" portion of the
>> kernel will be executed from flash but that ".data" needs to be placed
>> in SDRAM. Now my question is - based on what offset this data will be
>> placed?
>>
>> My SDRAM physicall address starts at 3000_0000 and flash starts at
>> 0100_0000. when i allocated a global variable in the kernel module and
>> when i try to check its actually physical address using virt_to_phys,
>> its giving me the address in the range of 0100_0000 ~ 0600_0000 which
>> is my flash (the PAGE_OFFSET doesn't work in case of XIP).
>>
>> Can you please help in knowing the physical address of my .data
>> portion in this situation.
>>
>> Thanks
>> Shree
>>

I don't know about the ARM in particular, but if you look
in ../arch/arm/boot/compressed/vmlinux.lds.in, you will see
that this linker-file simply allocates the start addresses
of each section as the next available address. The same
is true of ../arch/arm/boot/bootp.lds. If you expect to
have code the data elements and stack accessed at a
specific physical offset, you modify the linker files().

Note that "." means "right here", just like '$' in many
assemblers. You can specify a physical offset simply
as:

ENTRY(_start)
SECTIONS
{
   . = 0x01000000  <== like this for code
   .text : {
    ...
    ... }
    .rodata : { }
    . = 0x30000000 <== like this data
    .data : {  }
    .bss  : {  }
}

In the above, we have put .rodata (initialized ASCII stuff)
right after the code in the .text section. You may need to
extract this from the binary blob to put into your NVRAM.

Also, any initialzed data needs to be relocated to your
writable SDRAM and the .bss stuff needs to be zeroed.
This is non-trivial. You may want to create a ".reloc"
section which contains your initialized data, put it
in your flash, and relocate it at startup.

Basically executing-in-place is BAD. Flash should exist
in some little window where the code gets sucked out,
loaded at the correct offset in RAM, then you jump
there and close the little window. RAM, even SDRAM,
is cheaper than NAND FLASH. You can boot instantly
even as I have shown.

Cheers,
Dick Johnson

KernelXIP (last edited 2008-05-07 18:22:22 by localhost)