[original file is celf_nptl-1.pdf by Kumagai, Toshiba]
[translated by ikoma]
CELF Technical Jamboree #16
Aug. 31, 2007
Copyright 2007, Toshiba Corporation.
Recent new features in NPTL
NPTL: Native POSIX Thread Library
Hiroki Kumagai
Core Technology Center
Toshiba Corpoartion
Aug 31st, 2007
2
Introduction
* In Jamboree #10 last year, I presented basic performance on MIPS, comparing NPTL and linuxthreads
* Today I am presenting NPTL functions which have become available in the last year
3
What happend in a last year
+-------+------------------------------+--------------------------------+ | | Linux kernel | glibc/NPTL | +-------+------------------------------+--------------------------------+ |2006/6 | 2.6.17 | | | | Rubust Mutex | | | | (futex) | | +-------+------------------------------+--------------------------------+ |2006/9 | 2.6.18 | 2.5 | | | Priority Inheritance Mutex | +----------------------------+ | | | (futex) | | Robust Mutex | | | | | | Priority Inheritance Mutex |<--- Today's topics | | | | Priority Protect Mutex | | | | | +----------------------------+ | +-------+------------------------------+--------------------------------+ |2006/11| 2.6.19 | | +-------+------------------------------+--------------------------------+ |2007/2 | 2.6.20 | | +-------+------------------------------+--------------------------------+ |2007/4 | 2.6.21 | | +-------+------------------------------+--------------------------------+ |2007/5 | | 2.6 | +-------+------------------------------+--------------------------------+ |2007/7 | 2.6.22 | 2.5.1 / 2.6.1 | | | fix "future priority based | fix "robust mutex does | | | wakeup" (non-PI) | not work if owner died with | | | | multiple waiters" bugzilla | | | | #4512 | +-------+------------------------------+--------------------------------+
4
Contents
abbrev.:
PI = Priority Inheritance
PP = Priority Protect* 1. Priority (Inheritance/Protect) Mutex
- Solves priority inversion problem of user space threads
-> Better responsiveness expected for high priority threads
<- Reports on implementation comaprison and performance evaluation
* 2. Robust Mutex
- Mutex owned by deleted context can be recovered
-> Applicable to system consruction which synchronize with processes
<- Reports on evaluation of interprocess syncronization
5
Priority (Inheritance / Protect) Mutex
6
Priority (Inheritance / Protect) Mutex
+-----------+---------------------------------+---------------------------------+ | | Priority Inheritance Mutex | Priority Protect Mutex | +-----------+---------------------------------+---------------------------------+ | behavior | Thread with a lock is executed | Thread with a lock is executed | | | with the priority of the thread*| with the specified ceiling | | | waiting for the lock | priority* | | | (* if the waiting thread has | (* When locks nested, the | | | higher priority) | highest priority) | +-----------+---------------------------------+---------------------------------+ | features | - Can bound priority inversion | - Can bound priority inversion | | | - No need to change priority | - Fixed cost necessary for | | | unless lock contention arises | changing priority. | | | | - Difficult to determine the | | | | value of ceiling priority | +-----------+---------------------------------+---------------------------------+ | supported | kernel 2.6.18 or later | glibc 2.5 or later | | in | glibc 2.5 or later | | +-----------+---------------------------------+---------------------------------+ | note | 1. Wake up order does not | 1. Not compatible with robust | | | correspond to the priority order| mutex in NPTL | | | for non-PI Mutex (non priority | | | | inheritance mutex) | 2. Can not be used in threads | | | "futex priority based wakeup" | of SCHED_OTHER class | | | (fixed in kernel 2.6.22 or | | | | later) | | | | Implementation of priority | | | | inheritance (plist support) | | | | was applied also to ordinary | | | | futex and fixed | | +-----------+---------------------------------+---------------------------------+
7
Priority Inversion (PI Mutex)
[see original file for figure]
8
Priority Inversion (PP Mutex)
[see original file for figure]
9
Priority Inversion Test - PI
* Tested the effect for priority inversion by giving disturbance load to real system
- Disturbance load
- Added a task in best effort class and added a task which repeats locking/unlocking Mutex used in message queue
- Measured target
- Time between sending and receiving on message queue with distrbance load (worst value)
10
Priority Inversion Test - PI
[see original file for graphs]
+---------------------------------+
| histogram |
| (priority inheritance: no) |
| PI OFF |
|f |
+-----------------------------+ |r Max 805[ms] |<-- getting worse
| histogram | |e | more than 50 times
| (priority inheritance: no) |=====>|q |
| non competing equiv. load | | data intervals |
| | +---------------------------------+
|f |
|r Max 16[ms] | +---------------------------------+
|e | | historgram |
|q |=====>| (priority inheritance: yes) |
| data intervals | | PI ON |
+-----------------------------+ |f |
|r Max 23[ms] |<-- no significant
|e | deterioration
|q | (effect proved)
| data intervals |
+---------------------------------+11
Mutex Implementation in NPTL
* Compared implementations of two processing patterns – fast path
- fast path
- When no lock contention arises (no conention)
– slow path
- slow path
When wait and wakeup required (wait & wakeup)
12
Mutex Implementation in NPTL (fast path)
"var" is a lock variable (mutex->__data.__lock)
var = (presumed value) value after change
+-------------+----------------------+-------------------------------------+
| fast path | user (glibc) | kernel |
+------+------+----------------------+-------------------------------------+
|normal|lock |var = (0) 1 *| |
|mutex +------+----------------------+-------------------------------------+
| |unlock|var = (1) 0 *| *: process complete in user space |
+------+------+----------------------+-------------------------------------+
|PI |lock |var = (0) TID *| |
|mutex +------+----------------------+-------------------------------------+
| |unlock|var = (TID) 0 *| |
+------+------+----------------------+-------------------------------------+
|PP |lock |sched_setscheduler |priority changed to ceiling priority |
|mutex | |var = (ceil) ceil | 1 | |
| +------+----------------------+-------------------------------------+
| |unlock|var = (var) ceil #| |
| | |sched_setscheduler |priority changed to base priority |
+------+------+----------------------+-------------------------------------+
#: whenever ceiling prio > thread prio,
priority change operation is inserted
13
Mutex Implementation in NPTL (slow path)
+-------------+----------------------+-------------------------------------+ | slow path | user (glibc) | kernel | +------+------+----------------------+-------------------------------------+ |normal| |var = (1) 2 *| | |mutex |lock |futex(FUTEX_WAIT) *| schedule | | | |var = (0) 2 *| | | +------+----------------------+-------------------------------------+ | |unlock|var = (*) 0 *| | | | |futex(FUTEX_WAKE) *| wake_up | +------+------+----------------------+-------------------------------------+ |PI | |futex(FUTEX_LOCK_RI) | var = (var) var | FUTEX_WAITERS | |mutex |lock | |#priority boosting | | | | |#rt_mutex lock | | +------+----------------------+-------------------------------------+ | | |futex(FUTEX_UNLOCK_RI)| var = (var) TID | FUTEX_WAITERS |(TID is waiting task) | |unlock| |#rt_mutex unlock | | | | |#priority unboosting | +------+------+----------------------+-------------------------------------+ |PP | |sched_setscheduler *| priority changed to ceiling priority| |mutex |lock |var=(ceil|1) ceil|2 *| | | | |futex(FUTEX_WAIT) *| schedule | | | |var=(ceil) ceil | 2 *| | | +------+----------------------+-------------------------------------+ | | |var = (var) ceil *| | | |unlock|futex(FUTEX_WAKE) *| wake_up | | | |sched_setscheduler *| priority changed to base priority | +------+------+----------------------+-------------------------------------+ *: same futex used for synchronization #: rt_mutex is in the backend, and here priority is inherited
14
Mutex Performance
[see original file for figure]
Call overhead Wait & wakeup
(fast path) (slow path)
+----------------------+ +----------------------------------------------+
| metric: time / N | | metric: time / (M + N) |
| +-----------------+ | | +------------------+ +-------------------+ |
| | process A | | | | process B | | process C | |
start| | | | start| | lock(1) | | lock(2) | |
----------------------------| -------------------------------------|--------------|
^ | | | | ^ | | | | |(go first) | |
| | |=================| | | | |==================| |======V============| |
| | | while(!stop) | | | | | while(!stop) | | while(!stop) | |
time | | lock(0) | | time | | lock((N+2)%3) | | lock(M%3) | |
-10s | | unlock(0) | | -10s | | unlock((N+1)%3<==> unlock((M+2)%3)| |
| | | N++ | | | | | N++ | | M++ | |
| | |=================| | | | |==================| |===================| |
V | | | | V | | | | | |
----------------------------| ----------------------------------------------------|
end | +-----------------+ | end| +------------------+ +-------------------+ |
+----------------------+ +----------------------------------------------+
15
Mutex Peformance
[see original file for graph (in English)]
16
Summary -- Priority (Inheritance/Protect) Mutex --
+--------------------+-----------------------------+-----------------------------+
| | Priority Inheritance Mutex | Priority Protect Mutex |
+--------------------+-----------------------------+-----------------------------+
| when no lock |comparing with usual mutex: | comparing with usual mutex: |
| contention | a little bit slower | considerably slower |
| (fast path) | approx. 1.5 times | approx. 5-15 times (*) |
| +-----------------------------+-----------------------------+
| | cause: | cause: |
| | - Obtaining TID in glibc | - Internal lock processing |
| | | - Changing priority |
+--------------------+-----------------------------+-----------------------------+
| when synch | comparing with usual mutex: | comparing with usual mutex: |
| operration arises | a little bit slower | almost same |
| (slow path) | approx. 1.6 times | approx. 1.2 times |
| +-----------------------------+-----------------------------+
| |cause: |cause: |
| | - PI processing in kernel | |
+--------------------+-----------------------------+-----------------------------+
| conclusion | As the frequency of lock conflict is unlikely to be high |
| |(comparing with non conflicting cases; poorly implemented),|
| | considering fast path more important, there is no reason |
| | to use PP |
+--------------------+-----------------------------------------------------------+
(*): Depending on the relation to ceiling priority17
Robust Mutex
18
Robust Mutex
+--------------+---------------------------------------------------+
| | Robust Mutex |
+--------------+---------------------------------------------------+
| Behavior | When a thread terminated while holding a lock, |
| | other threads can detect the state (owner dead) |
+--------------+---------------------------------------------------+
| Features | Reliable synchronization of the system combining |
| | processes and threads, can be constructed |
+--------------+---------------------------------------------------+
| Supported in | kernel 2.6.17 - |
| | glibc 2.5 - |
+--------------+---------------------------------------------------+
| Note | 1. If owner dead arises while more than one |
| | threads are waiting a lock, wait of the |
| | second and later threads can not be released |
| | "robust mutex does not work if owner died |
| | with multiple waiters" |
| | http://sources.redhat.com/bugzilla #4512 |
| | (fixed in glibc 2.5.1, 2.6.1) |
| | 2. Robust Mutex does not work under the condition |
| | where only main thread runs with static link |
| | and no nptl/init.o is not linked |
| | (For example, a program which does just mutex |
| lock/unlock in main function) |
| | 3. POSIX specification in progress |
+--------------+---------------------------------------------------+
19
Robust Mutex - process flow
[see original file for flow chart (in English)]
20
Inter-Process Synchronization with Robust Mutex
* Here, we focus on usage, not function, of Robust Mutex
- Usage: interprocess synchronization
* What I want to do:
- Evaluate interprocess synchronization based on Robus Mutex, comparing with conventional method (IPC)
- Mutual exclution
- Message communication
21
Inter-Process Mutual exclution with Robust Mutex
* Mutual exclusion
- Comparing:
- SysV IPC Semaphore (semX) with SEM_UNDO*
- *: Resource is released when process terminated
- Robust mutex
- Allocate synchronization variable by mmap'ing shared memory
- SysV IPC Semaphore (semX) with SEM_UNDO*
- Performance
- Call overhead
- No resource contention, single process, call overhead
Context switch (Wait & wakeup)
- With resource contention, switch between two processes
- Call overhead
22
Inter-Process Mutual exclution with Robust Mutex
[see original file for figure]
e.g.) mutex (lock: P, unlock: V in semaphore)
Call overhead Wait & Wakeup
+---------------------------------------------------------+
| Same evaluation method as the case of Mutex Performance |
| |
| mutex lock: semaphore P |
| mutex unlock: semaphore V |
+---------------------------------------------------------+
[see slide 14]
23
Inter-Process Mutual execution Performance
[see original file for graph (in English)]
24
Inter-Process Mutual execution Performance
[see original file for graph] For systems with low lock contention, Robust Mutex is more efficient. (When lock conflicts, IPC shows better performance) In many cases, as lock contention rate is low, Robust Mutex is a better choice.
25
Inter-Process Mutual execution with Robust Mutex
* Summary
+--------------------------+---------------------------+----------------------------+ | | Robust Mutex | SysV IPC Semaphore | +--------------------------+---------------------------+----------------------------+ | When no lock contends | 1.8[us] superior | 4.3[us] inferior | |(Fast Path) | slow because: | slow because: | | | | - system call is mandatory| +--------------------------+---------------------------+----------------------------+ | When synchoronization | 8.5[us] inferior | 6.6[us] superior | | arises | slow because: | slow because: | | (Slow Path) | - Queue management etc. | | | | (Processing in user space)| | +--------------------------+---------------------------+----------------------------+ | Other than performance | - Can recover synchronized| - Automatic release of | | | resource, and thus freer | synchronized resource is | | | - Priority inheritance can| possible (SEM_UNDO) | | | be used together | | +--------------------------+---------------------------+----------------------------+ | Conclusion | Robust Mutex is more useful, from a standpint of | | | performance and function. | +--------------------------+---------------------------+----------------------------+
26
Inter-Process Message Communication with Robust Mutex
* Message Communication
- Compared:
- Posix Message Queue* (mq_X)
*: For measurement, upper limit of total message buffer changed to about 2MB
- MQ_BYTES_MAX 2101248 @ include/linux/mqueue.h
- implemented as system call; N:N communication; message priority
- Message Queue with Robust Mutex + Sync Method
mu/cv ... Robust Mutex & Conditional variable
mu/sm ... Robust Mutex & Semaphore
- Shared memory mmap'ed to allocate synchronization variable
- Posix Message Queue* (mq_X)
- Performance
Context switch (wait & wakeup + msg copy)
- Bi-directional message transmission using two queues between two processes
27
Inter-Process Message Communication with Robust Mutex
* Message Queue with Robust Mutex + Sync Method
mu/cv: conditional variable method mu/sm: semaphore method +---------------------------------------------------------------------------------+ | mq_send(,char *msg, size_t msg_size,) | +----------------------------------------+----------------------------------------+ | pthread_mutex_lock(); | pthread_mutex_lock(); | | memcpy(shared memory, msg, msg_size); | memcpy(shared memory, msg, msg_size); | | count++; | count++; | | pthread_cond_broadcast(); | | | pthread_mutex_unlock(); | pthread_mutex_unlock(); | | | sem_post(); | +----------------------------------------+----------------------------------------+ | mq_receive(, char *msg, size_t msg_size,) | +----------------------------------------+----------------------------------------+ | pthread_mutex_lock(); | sem_wait(); | | while(!count) | pthread_mutex_lock(); | | pthread_cond_wait(); | | | memcpy(msg, shared memory, msg_size); | memcpy(msg, shared memory, msg_size); | | count--; | count--; | | pthread_mutex_unlock | pthread_mutex_unlock | +----------------------------------------+----------------------------------------+
28
Inter-Process Message Communication with Robust Mutex
[see original file for figure]
Message transfer throughput
Metric: msg_size * time / ( N + M )
+------------------+ +-------------------+
| process B | | process C |
| | |+ - - - - - - - - +|
start | | || send(0) ||
| | |+ - - - - - - - - +|
-------------------------------------|--------------
^ | | | |(go first) |
| |==================| |======V============|
| | while(!stop) | | while(!stop) |
time | lock((N+2)%3) | | lock(M%3) |
-10s | unlock((N+1)%3<==> unlock((M+2)%3)|
| | N++ | | M++ |
| |==================| |===================|
V | | | |
----------------------------------------------------
end +------------------+ +-------------------+
29
Inter-Process Message Communication
* Performance Results
[see original file for graph]
30
Inter-Process Message Communication with Robust Mutex
* Summary
+-------------+--------------------------------------+---------------------+ | | Rubust Mutex | Posix Message Queue | | +----------------------+---------------+ | | | conditional variable | semaphore | | | | mu/cv | mu/sm | | +-------------+----------------------+---------------+---------------------+ | Throughput | fair | good | excellent | +-------------+----------------------+---------------+---------------------+ | Other than | Specification customizable | Superior in security| | performance | |(e.g. access control)| +-------------+--------------------------------------+---------------------+ | Conclusion | - IPC shows the best results | | | - For communication with particular spec(1:1 communication)| | | implementation with thread library can be useful | | | - By implementing Robust Mutex + futex, the performance | | | level of Posix message queue may be achieved!? | +-------------+------------------------------------------------------------+
31
Summary
* Priority (inheritance/protect) Mutex
- Basic concept through implementation, performance explained
- PP may be unnecessary to use for performance
* Robust Mutex
- Evaluated for application to interprocess synchronization
- (In this evaluation environment)
- Robust Mutex for mutual exclusion
- IPC for message communication
32
TOSHIBA
Leading Innovation >>>
