[original file is preempt070222celfjambo13.pdf] [see the original file for figures] [translated by ikoma] 2007/2/22 CELF Jamboree #13 [slide 1]---------------------------- = ---------------------------- Porting of PREEMPT_RT -- Progress Report -- Katsuya Matsubara Igel Co., Ltd. Funded by Renesas Solutions Corp. Technology Consulting Company IGEL CO., Ltd. [slide 2]---------------------------- = ---------------------------- Bacground # Want to implement device drivers in user level * Easy to develop * Reduce system down due to driver bugs * More intimate collaboration with applications # Some issues * Access to I/O memory, physical memory * Receiving interrupt requests (IRQ) * Response time * ... # New features of kernel 2.6 * NPTL (Native POSIX Thread Library) * Scheduler improvement (O(1) scehduler etc.) * Kernel preemption * ... [slide 3]---------------------------- = ---------------------------- History 1. Studied response of threads 2. Designed ULDD (User Level Device Driver) framework 3. Implemented prototype of ULDD 4. Studied realtime support of Linux Today, I report the porting work of PREEMPT_RT applied 2.6.16 kernel onto Renesas SH4 based board, RTS7751R2D [slide 4]---------------------------- = ---------------------------- 4. Realtime Support of Linux # Complete realtime * Guarantee deadlines of all tasks => Difficult with grafting on existing (non-realtime) OSes * Nested OS, Dual OS/Dual Core, ART-Linux # Partial realtime * Guarantee best efforts for deadlines of some tasks * Most of realtime tasks utilize limited resources only => Minimize latency = preemption => Further less latency = realtime preemption * CONFIG_PREEMPT, CONFIG_PREEMPT_RT [slide 5]---------------------------- = ---------------------------- Preemption in Linux # Timing of Preemption * at return from interrupt handling * at return from system call * when task sleeps voluntarily => Set stack pointer of aother task at return, and pop # On 2.6 without CONFIG_PREEMPT / 2.4, preemption (task switching) does NOT occur in kernel mode (Interrupt handling DOES) * At return from interrupt in kernel mode, control returns to the original point (because the code does not support preemption) [slide 6]---------------------------- = ---------------------------- [see the original file for figure] 2.4 / 2.6 without CONFIG_PREEMPT user --> <-- kernel Task A Task B Interrupt Processing : | : | : +-----------------+ : | | x <--Data wait (sleep) | : | : +----------------------+ : | : <--H/W Interrupt || : : Interrupt disabled region || : : : : | : : | : : | : : V x : wakes up Task B | : | : +----------------------+ : | : | : V : Realized soft realtime of 10ms (tick) grain (Large latency may be occur, depending on the condition in kernel mode, such as I/O) [slide 7]---------------------------- = ---------------------------- [see the original file for figure] CONFIG_PREEMPT user --> <-- kernel Task A Task B Interrupt Processing : | : | : +-----------------+ : | | x <--Data wait (sleep) | : | : +----------------------+ : | : <--H/W Interrupt || : : Interrupt disabled region || : : : : | : : | : : | : : V : x wakes up Task B : | : | +----------------------:-----+ | : | : V : Realized soft realtime of 100us grain (Latency may be occur, depending on the lock condition) [slide 8]---------------------------- = ---------------------------- (CONFIG_)PREEMPT_RT # Linux realtime preemption patch by Ingo Molnar et. al. * Reduced latency by increacing preemption opportunities even while interrupt handling or in critical region, and realized soft realtime with granularity of 20-30 microseconds * Provided priority inheritance lock to prevent priority inversion * Refactored and optimized other code part * http://people.redhat.com/mingo/realtime-preempt/ [slide 9]---------------------------- = ---------------------------- [see the original file for figure] PREEMPT_RT user --> <-- kernel Task A Task B Interrupt Processing : | : | : +-----------------+ : | | x <--Data wait (sleep) | : | : +----------------------+ : | : <--H/W Interrupt || : | Interrupt disabled region || : | : : | : x wakes up Task B V : | : | +----------------------:-----+ | : | : V : [slide 10]---------------------------- = ---------------------------- Features Added/Modified with PREEMPT_RT i. Preemption in critical regions ii. Preemption in interrupt handlers iii. Preemption in interrupt disabled code regions iv. Priority inheritance of spinlock and semaphore v. Other optimization [slide 11]---------------------------- = ---------------------------- Porting PREEMPT_RT # Currently supported architctures * Intel x86, PowerPC, ARM, MIPS, etc. # Supporting new architecture * Modification in arch/ARCHITECTURE, include/asm-ARCHITECTURE - Add "raw_" or "compat_" to locks and semaphores where interrupt and preemption should be disabled * Modify (infrequently used) device drivers - Can interrupt handler executable in process context? - Review locks and interrupt disabled regions * Set SA_NODELAY at the registration of interrupt handlers to be executed in interrupt context (timer handler etc.) * Review processing related to scheduler calls * Add other processing specific to the architcture (if defined(ARCITECTURE)) [slide 12]---------------------------- = ---------------------------- Procedure to Apply PREEMPT_RT 1. Obtain and expand kernel source from kernel.org 2. Apply RT patch 3. Modify .config 4. Review architecture dependent code and device driver code 5. Compile and run (Repeat 4. until it becomes stable) [slide 13]---------------------------- = ---------------------------- 1. Expanding Kernel Source # No need to explain... (^^; # Tips: As is often the case, RT patch can not be applied to 2.6.x.y # Apply a patch for the target (e.g. linuxsh) if necessary [slide 14]---------------------------- = ---------------------------- 2. Applying RT Patch # Download the patch from Ingo's web page * http://people.redhat.com/mingo/realtime-preempt/ # Patch 687 files under 181 directories (including nested ones) (in case of 2.6.16-rt) [slide 15]---------------------------- = ---------------------------- [see the original file for screen shot] 3. Modify .config # Enable PREEMPT_RT [slide 16]---------------------------- = ---------------------------- 4. Review Target Dependent Code *** The hardest part *** # As the behaviors below change, architecture dependent code (arch/, include/asm-*/) and code of device drivers (drivers/) have to be reviewed * Spinlocks * Interrupt disabling * Interrupt handlers * Scheduler related processing * etc. [slide 17]---------------------------- = ---------------------------- Review of Spinlocks # Spinlock sleeps => Must be checked if it is OK to sleep where splinlock is used # Some cases where review is required * Spinlocks in interrupt handlers * Spinlocks while disabling interrupt or disabling preemption * Atomic processing in the critical region created by spinlock [slide 18]---------------------------- = ---------------------------- Review of Disabling Interrupt # Disabling interrupt does not actually disable => Fix required for the cases interrupt should be actually disabled # Some cases where review is required * Interrupt disabling to realize atomic processing * Interrupt disabling to assure that interrupt handlers do not run [slide 19]---------------------------- = ---------------------------- Review of Interrupt Handlers # Interrupt handlers run in process context => Need to judge if latency and preemption are allowable in handler processing # Some cases where review is required * Processing where exclusive control is expected * Processing where preemption (and further interrupt) are expected not to occur * Processing where (large) latency is not is not allowed [slide 20]---------------------------- = ---------------------------- [see the original file for code block] Example: include/asm-sh/atomic.h [code block] 25 static __inline__ void atomic_add(int i, atomic_t *v) 26 { 27 unsigned long flags; 28 29 local_irq_save(flags); 30 *(long *)v += i; 31 local_irq_restore(flags); 32 } 33 34 static __inline__ void atomic_sub(int i, atomic_t *v) 35 { 36 unsigned long flags; 37 38 local_irq_save(flags); 39 *(long *)v -= i; 40 local_irq_restore(flags); 41 } # Interrupt is not disabled with PREEMPT_RT * Sharing an atomic variable with interrupt handler causes trouble => Fix them to raw_local_irq_save/restore() [slide 21]---------------------------- = ---------------------------- 5. Compile and Check If It Works # No need to explain, either? # If it does not run stably, you have to repeat reviewing code and checking if it works [slide 22]---------------------------- = ---------------------------- Work on RTS7751R2D # Linux 2.6.16 + linuxsh patch + Patch for RTS7751R2D + patch-2.6.16-rt29 * Rewrote architecture dependent code (thanks to Lineo Solutions) * Added and removed redundant raw_ and compat_ * Reviewed device driver code * Fixed other troubles [slide 23]---------------------------- = ---------------------------- [see the original file for code block] Digression: Oops in get_wchan() # Looking at code ... [code block] 441 unsigned long get_wchan(struct task_struct *p) 442 { 443 unsigned long schedule_frame; 444 unsigned long pc; 445 446 PRINTK(KERN_DEBUG, "%s: 1\n", __FUNCTION__); 447 if(!p || p==current || p->state==TASK_RUNNING) 448 return 0; 449 450 /* 451 * The same comment as on an Alpha applies here, too ... 452 */ 453 PRINTK(KERN_DEBUG, "%s: 2\n", __FUNCTION__); 454 pc = thread_saved_pc(p); # As it is called only from procfs and it seems to be used with ps command only, problems fixed (tentatively) by always returning zero [slide 24]---------------------------- = ---------------------------- [see the original file for code block] get_wchan() for alpha # It says the code is ugly... The relationship to preemption is unknown 492 unsigned long 493 get_wchar(struct task_struct *p) 494 { 495 unsigned long scheduler_frame; 496 unsigned long pc; 497 if (!p || p==current || p->state==TASK_RUNNING) 498 return 0; 499 /* 500 * This one depends on the frame size of schedule(). Do a 501 * "disass schedule" in gdb to find the frame size. Also the 502 * code assumes that sleep_on() follows immediately after 503 * interruptible_sleep_on() and that add_timer() follows 504 * immediately after interruptible_sleep(). Ugly, isn't it? 505 * Maybe adding a wchan field to task_struct would be better, 506 * after all... 507 */ 508 509 pc = thread_saved_pc(p); [slide 25]---------------------------- = ---------------------------- Aha, it worked! ... Maybe? # Hard to determine if it is "stable" * Have to find indetectable race conditions or deadlocks, as behavior of critical regions or interrupt handlers change * Evaluate with all devices in heavy load? * Hard to determine if it is "too inclined to be stable" # Hard to determine the effect * With light load, the benefit can not be seen. The throughput may be deteriorated. * Depending on how ported (reduced the opportunities of preemption), the effect may not be seen <= Want to evaluate the throughput deterioration in ordinary condition as well as the improvement of latency [slide 26]---------------------------- = ---------------------------- Summary and ToDo # Ported (Am porting) PREEMPT_RT onto SuperH board # To publish the patch to have community evaluated it # To support other boards (devices) # To send to Ingo Molner or LKML to have it merged into RT patch? # To evaluate the effect on ULDD ---------------------------- = ----------------------------