[original file is celf_nptl-1.pdf by Kumagai, Toshiba]
[translated by ikoma]


CELF Technical Jamboree #16
Aug. 31, 2007


Copyright 2007, Toshiba Corporation.

Recent new features in NPTL

NPTL: Native POSIX Thread Library

Hiroki Kumagai
Core Technology Center
Toshiba Corpoartion

Aug 31st, 2007


2

Introduction

* In Jamboree #10 last year, I presented basic performance on MIPS, comparing NPTL and linuxthreads

* Today I am presenting NPTL functions which have become available in the last year


3

What happend in a last year

+-------+------------------------------+--------------------------------+
|       | Linux kernel                 | glibc/NPTL                     |
+-------+------------------------------+--------------------------------+
|2006/6 | 2.6.17                       |                                |
|       |   Rubust Mutex               |                                |
|       |     (futex)                  |                                |
+-------+------------------------------+--------------------------------+
|2006/9 | 2.6.18                       | 2.5                            |
|       |   Priority Inheritance Mutex | +----------------------------+ |
|       |     (futex)                  | | Robust Mutex               | |
|       |                              | | Priority Inheritance Mutex |<--- Today's topics
|       |                              | | Priority Protect Mutex     | |
|       |                              | +----------------------------+ |
+-------+------------------------------+--------------------------------+
|2006/11| 2.6.19                       |                                |
+-------+------------------------------+--------------------------------+
|2007/2 | 2.6.20                       |                                |
+-------+------------------------------+--------------------------------+
|2007/4 | 2.6.21                       |                                |
+-------+------------------------------+--------------------------------+
|2007/5 |                              | 2.6                            |
+-------+------------------------------+--------------------------------+
|2007/7 | 2.6.22                       | 2.5.1 / 2.6.1                  |
|       |   fix "future priority based |   fix "robust mutex does       |
|       |  wakeup" (non-PI)            |  not work if owner died with   |
|       |                              |  multiple waiters" bugzilla    |
|       |                              |  #4512                         |
+-------+------------------------------+--------------------------------+


4

Contents

                               abbrev.:
                                 PI = Priority Inheritance
                                 PP = Priority Protect

* 1. Priority (Inheritance/Protect) Mutex

* 2. Robust Mutex


5

Priority (Inheritance / Protect) Mutex


6

Priority (Inheritance / Protect) Mutex

+-----------+---------------------------------+---------------------------------+
|           | Priority Inheritance Mutex      | Priority Protect Mutex          |
+-----------+---------------------------------+---------------------------------+
| behavior  | Thread with a lock is executed  | Thread with a lock is executed  |
|           | with the priority of the thread*| with the specified ceiling      |
|           | waiting for the lock            | priority*                       |
|           | (* if the waiting thread has    | (* When locks nested, the       |
|           | higher priority)                | highest priority)               |
+-----------+---------------------------------+---------------------------------+
| features  | - Can bound priority inversion  | - Can bound priority inversion  |
|           | - No need to change priority    | - Fixed cost necessary for      |
|           |   unless lock contention arises |   changing priority.            |
|           |                                 | - Difficult to determine the    |
|           |                                 |   value of ceiling priority     |
+-----------+---------------------------------+---------------------------------+
| supported | kernel 2.6.18 or later          | glibc 2.5 or later              |
| in        | glibc 2.5 or later              |                                 |
+-----------+---------------------------------+---------------------------------+
| note      | 1. Wake up order does not       | 1. Not compatible with robust   |
|           | correspond to the priority order| mutex in NPTL                   |
|           | for non-PI Mutex (non priority  |                                 |
|           | inheritance mutex)              | 2. Can not be used in threads   |
|           |  "futex priority based wakeup"  | of SCHED_OTHER class            |
|           |  (fixed in kernel 2.6.22 or     |                                 |
|           |   later)                        |                                 |
|           |   Implementation of priority    |                                 |
|           |   inheritance (plist support)   |                                 |
|           |   was applied also to ordinary  |                                 |
|           |   futex and fixed               |                                 |
+-----------+---------------------------------+---------------------------------+


7

Priority Inversion (PI Mutex)

[see original file for figure]


8

Priority Inversion (PP Mutex)

[see original file for figure]


9

Priority Inversion Test - PI

* Tested the effect for priority inversion by giving disturbance load to real system


10

Priority Inversion Test - PI

[see original file for graphs]
                                     +---------------------------------+
                                     |             histogram           |
                                     |  (priority inheritance: no)     |
                                     |                          PI OFF |
                                     |f                                |
+-----------------------------+      |r           Max 805[ms]          |<-- getting worse
|          histogram          |      |e                                |  more than 50 times
|  (priority inheritance: no) |=====>|q                                |
|  non competing equiv. load  |      |          data intervals         |
|                             |      +---------------------------------+
|f                            |
|r         Max 16[ms]         |      +---------------------------------+
|e                            |      |             historgram          |
|q                            |=====>|  (priority inheritance: yes)    |
|       data intervals        |      |                          PI ON  |
+-----------------------------+      |f                                |
                                     |r           Max 23[ms]           |<-- no significant
                                     |e                                |    deterioration
                                     |q                                |   (effect proved)
                                     |          data intervals         |
                                     +---------------------------------+


11

Mutex Implementation in NPTL

* Compared implementations of two processing patterns – fast path

– slow path


12

Mutex Implementation in NPTL (fast path)

"var" is a lock variable (mutex->__data.__lock)

                                       var = (presumed value) value after change
+-------------+----------------------+-------------------------------------+
| fast path   | user (glibc)         | kernel                              |
+------+------+----------------------+-------------------------------------+
|normal|lock  |var = (0) 1          *|                                     |
|mutex +------+----------------------+-------------------------------------+
|      |unlock|var = (1) 0          *|  *: process complete in user space  |
+------+------+----------------------+-------------------------------------+
|PI    |lock  |var = (0) TID        *|                                     |
|mutex +------+----------------------+-------------------------------------+
|      |unlock|var = (TID) 0        *|                                     |
+------+------+----------------------+-------------------------------------+
|PP    |lock  |sched_setscheduler    |priority changed to ceiling priority |
|mutex |      |var = (ceil) ceil | 1 |                                     |
|      +------+----------------------+-------------------------------------+
|      |unlock|var = (var) ceil     #|                                     |
|      |      |sched_setscheduler    |priority changed to base priority    |
+------+------+----------------------+-------------------------------------+
                               #: whenever ceiling prio > thread prio,
                                  priority change operation is inserted


13

Mutex Implementation in NPTL (slow path)

+-------------+----------------------+-------------------------------------+
| slow path   | user (glibc)         | kernel                              |
+------+------+----------------------+-------------------------------------+
|normal|      |var = (1) 2          *|                                     |
|mutex |lock  |futex(FUTEX_WAIT)    *| schedule                            |
|      |      |var = (0) 2          *|                                     |
|      +------+----------------------+-------------------------------------+
|      |unlock|var = (*) 0          *|                                     |
|      |      |futex(FUTEX_WAKE)    *| wake_up                             |
+------+------+----------------------+-------------------------------------+
|PI    |      |futex(FUTEX_LOCK_RI)  | var = (var) var | FUTEX_WAITERS     |
|mutex |lock  |                      |#priority boosting                   |
|      |      |                      |#rt_mutex lock                       |
|      +------+----------------------+-------------------------------------+
|      |      |futex(FUTEX_UNLOCK_RI)| var = (var) TID | FUTEX_WAITERS     |(TID is waiting task)
|      |unlock|                      |#rt_mutex unlock                     |
|      |      |                      |#priority unboosting                 |
+------+------+----------------------+-------------------------------------+
|PP    |      |sched_setscheduler   *| priority changed to ceiling priority|
|mutex |lock  |var=(ceil|1) ceil|2  *|                                     |
|      |      |futex(FUTEX_WAIT)    *| schedule                            |
|      |      |var=(ceil) ceil | 2  *|                                     |
|      +------+----------------------+-------------------------------------+
|      |      |var = (var) ceil     *|                                     |
|      |unlock|futex(FUTEX_WAKE)    *| wake_up                             |
|      |      |sched_setscheduler   *| priority changed to base priority   |
+------+------+----------------------+-------------------------------------+

 *: same futex used for synchronization
 #: rt_mutex is in the backend, and here priority is inherited


14

Mutex Performance

[see original file for figure]


    Call overhead             Wait & wakeup
    (fast path)               (slow path)

     +----------------------+       +----------------------------------------------+
     |  metric: time / N    |       |  metric: time / (M + N)                      |
     |  +-----------------+ |       |  +------------------+  +-------------------+ |
     |  | process A       | |       |  | process B        |  | process C         | |
start|  |                 | |  start|  |  lock(1)         |  |  lock(2)          | |
----------------------------|  -------------------------------------|--------------|
  ^  |  |                 | |    ^  |  |                  |  |      |(go first)  | |
  |  |  |=================| |    |  |  |==================|  |======V============| |
  |  |  | while(!stop)    | |    |  |  | while(!stop)     |  | while(!stop)      | |
time |  |    lock(0)      | |  time |  |    lock((N+2)%3) |  |    lock(M%3)      | |
-10s |  |    unlock(0)    | |  -10s |  |    unlock((N+1)%3<==>    unlock((M+2)%3)| |
  |  |  |    N++          | |    |  |  |    N++           |  |    M++            | |
  |  |  |=================| |    |  |  |==================|  |===================| |
  V  |  |                 | |    V  |  |                  |  |                   | |
----------------------------|  ----------------------------------------------------|
end  |  +-----------------+ |    end|  +------------------+  +-------------------+ |
     +----------------------+       +----------------------------------------------+


15

Mutex Peformance

[see original file for graph (in English)]


16

Summary -- Priority (Inheritance/Protect) Mutex --

+--------------------+-----------------------------+-----------------------------+
|                    | Priority Inheritance Mutex  | Priority Protect Mutex      |
+--------------------+-----------------------------+-----------------------------+
| when no lock       |comparing with usual mutex:  | comparing with usual mutex: |
| contention         | a little bit slower         | considerably slower         |
| (fast path)        | approx. 1.5 times           | approx. 5-15 times (*)      |
|                    +-----------------------------+-----------------------------+
|                    | cause:                      | cause:                      |
|                    | - Obtaining TID in glibc    | - Internal lock processing  |
|                    |                             | - Changing priority         |
+--------------------+-----------------------------+-----------------------------+
| when synch         | comparing with usual mutex: | comparing with usual mutex: |
| operration arises  | a little bit slower         | almost same                 |
| (slow path)        | approx. 1.6 times           | approx. 1.2 times           |
|                    +-----------------------------+-----------------------------+
|                    |cause:                       |cause:                       |
|                    | - PI processing in kernel   |                             |
+--------------------+-----------------------------+-----------------------------+
| conclusion         | As the frequency of lock conflict is unlikely to be high  |
|                    |(comparing with non conflicting cases; poorly implemented),|
|                    | considering fast path more important, there is no reason  |
|                    | to use PP                                                 |
+--------------------+-----------------------------------------------------------+

                                (*): Depending on the relation to ceiling priority


17

Robust Mutex


18

Robust Mutex

+--------------+---------------------------------------------------+
|              | Robust Mutex                                      |
+--------------+---------------------------------------------------+
| Behavior     | When a thread terminated while holding a lock,    |
|              | other threads can detect the state (owner dead)   |
+--------------+---------------------------------------------------+
| Features     | Reliable synchronization of the system combining  |
|              | processes and threads, can be constructed         |
+--------------+---------------------------------------------------+
| Supported in | kernel 2.6.17 -                                   |
|              | glibc 2.5 -                                       |
+--------------+---------------------------------------------------+
| Note         | 1. If owner dead arises while more than one       |
|              |    threads are waiting a lock, wait of the        |
|              |    second and later threads can not be released   |
|              |    "robust mutex does not work if owner died      |
|              |    with multiple waiters"                         |
|              | http://sources.redhat.com/bugzilla #4512          |
|              |                (fixed in glibc 2.5.1, 2.6.1)      |
|              | 2. Robust Mutex does not work under the condition |
|              |    where only main thread runs with static link   |
|              |    and no nptl/init.o is not linked               |
|              |    (For example, a program which does just mutex  |
               |    lock/unlock in main function)                  |
|              | 3. POSIX specification in progress                |
+--------------+---------------------------------------------------+


19

Robust Mutex - process flow

[see original file for flow chart (in English)]


20

Inter-Process Synchronization with Robust Mutex

* Here, we focus on usage, not function, of Robust Mutex

* What I want to do:


21

Inter-Process Mutual exclution with Robust Mutex

* Mutual exclusion


22

Inter-Process Mutual exclution with Robust Mutex

[see original file for figure]


e.g.) mutex (lock: P, unlock: V in semaphore)

        Call overhead                Wait & Wakeup

   +---------------------------------------------------------+
   | Same evaluation method as the case of Mutex Performance |
   |                                                         |
   |     mutex lock:    semaphore P                          |
   |     mutex unlock:  semaphore V                          |
   +---------------------------------------------------------+

                                        [see slide 14]


23

Inter-Process Mutual execution Performance

[see original file for graph (in English)]


24

Inter-Process Mutual execution Performance

[see original file for graph]


For systems with low lock contention,
Robust Mutex is more efficient.
(When lock conflicts, IPC shows better performance)

In many cases, as lock contention rate is low,
Robust Mutex is a better choice.


25

Inter-Process Mutual execution with Robust Mutex

* Summary

+--------------------------+---------------------------+----------------------------+
|                          | Robust Mutex              | SysV IPC Semaphore         |
+--------------------------+---------------------------+----------------------------+
| When no lock contends    | 1.8[us]       superior    | 4.3[us]      inferior      |
|(Fast Path)               | slow because:             | slow because:              |
|                          |                           |  - system call is mandatory|
+--------------------------+---------------------------+----------------------------+
| When synchoronization    | 8.5[us]       inferior    | 6.6[us]      superior      |
| arises                   | slow because:             | slow because:              |
| (Slow Path)              |  - Queue management etc.  |                            |
|                          | (Processing in user space)|                            |
+--------------------------+---------------------------+----------------------------+
| Other than performance   | - Can recover synchronized| - Automatic release of     |
|                          | resource, and thus freer  | synchronized resource is   |
|                          | - Priority inheritance can| possible (SEM_UNDO)        |
|                          | be used together          |                            |
+--------------------------+---------------------------+----------------------------+
| Conclusion               | Robust Mutex is more useful, from a standpint of       |
|                          | performance and function.                              |
+--------------------------+---------------------------+----------------------------+


26

Inter-Process Message Communication with Robust Mutex

* Message Communication


27

Inter-Process Message Communication with Robust Mutex

* Message Queue with Robust Mutex + Sync Method

mu/cv: conditional variable method        mu/sm: semaphore method

+---------------------------------------------------------------------------------+
| mq_send(,char *msg, size_t msg_size,)                                           |
+----------------------------------------+----------------------------------------+
| pthread_mutex_lock();                  | pthread_mutex_lock();                  |
|  memcpy(shared memory, msg, msg_size); |  memcpy(shared memory, msg, msg_size); |
|  count++;                              |  count++;                              |
|  pthread_cond_broadcast();             |                                        |
| pthread_mutex_unlock();                | pthread_mutex_unlock();                |
|                                        | sem_post();                            |
+----------------------------------------+----------------------------------------+
| mq_receive(, char *msg, size_t msg_size,)                                       |
+----------------------------------------+----------------------------------------+
| pthread_mutex_lock();                  | sem_wait();                            |
|  while(!count)                         | pthread_mutex_lock();                  |
|   pthread_cond_wait();                 |                                        |
|  memcpy(msg, shared memory, msg_size); |  memcpy(msg, shared memory, msg_size); |
|  count--;                              |  count--;                              |
| pthread_mutex_unlock                   | pthread_mutex_unlock                   |
+----------------------------------------+----------------------------------------+


28

Inter-Process Message Communication with Robust Mutex

[see original file for figure]


Message transfer throughput

  Metric:  msg_size * time / ( N + M )

         +------------------+  +-------------------+
         | process B        |  | process C         |
         |                  |  |+ - - - - - - - - +|
 start   |                  |  || send(0)         ||
         |                  |  |+ - - - - - - - - +|
 -------------------------------------|--------------
   ^     |                  |  |      |(go first)  |
   |     |==================|  |======V============|
   |     | while(!stop)     |  | while(!stop)      |
 time    |    lock((N+2)%3) |  |    lock(M%3)      |
 -10s    |    unlock((N+1)%3<==>    unlock((M+2)%3)|
   |     |    N++           |  |    M++            |
   |     |==================|  |===================|
   V     |                  |  |                   |
 ----------------------------------------------------
   end   +------------------+  +-------------------+


29

Inter-Process Message Communication

* Performance Results

[see original file for graph]


30

Inter-Process Message Communication with Robust Mutex

* Summary

+-------------+--------------------------------------+---------------------+
|             | Rubust Mutex                         | Posix Message Queue |
|             +----------------------+---------------+                     |
|             | conditional variable | semaphore     |                     |
|             | mu/cv                | mu/sm         |                     |
+-------------+----------------------+---------------+---------------------+
| Throughput  | fair                 | good          | excellent           |
+-------------+----------------------+---------------+---------------------+
| Other than  | Specification customizable           | Superior in security|
| performance |                                      |(e.g. access control)|
+-------------+--------------------------------------+---------------------+
| Conclusion  | - IPC shows the best results                               |
|             | - For communication with particular spec(1:1 communication)|
|             |   implementation with thread library can be useful         |
|             | - By implementing Robust Mutex + futex, the performance    |
|             |   level of Posix message queue may be achieved!?           |
+-------------+------------------------------------------------------------+


31

Summary

* Priority (inheritance/protect) Mutex

* Robust Mutex


32

TOSHIBA
Leading Innovation >>>


Jamboree16Nptl (最終更新日時 2008-07-24 12:57:08 更新者 207)