Bon Voyage

CS8803 Lesson 8 Kernel & User Level Threads 본문

개념정리/운영체제

CS8803 Lesson 8 Kernel & User Level Threads

nangkyeong 2019. 10. 1. 15:38

Introduction to Operating Systems by Georgia Tech
https://www.udacity.com/course/introduction-to-operating-systems--ud923

 

Introduction to Operating Systems | Udacity

Introduction to Operating Systems teaches the basic operating system abstractions, mechanisms, and their implementations.

www.udacity.com

Lesson 8 Thread Design Consideration 중 Kernel / User Level Thread 부분을 필기한 내용입니다.


 

 

 

1. Kernel vs. User Level Threads

  • OS Kernel itself is multi-threaded, it maintains
    • thread abstractions
    • scheduling, sync...
  • User-level library provides
    • thread abstraction
    • scheduling, sync...
  • Mappings (Process Control Block, PCB)
    • 1:1
    • M:1
    • M:M

 

 

2. Thread Data Structures- At Scale

ULT, User Level Thread within the thread_library, keeps track of all ULT representing the single process

  • relationship between PCB, represents address space, exists
    — keep track of which KLT is executing behalf of each processes
    — vice versa

KLT, Kernel Level Thread, maintain relationship between CPUs

  • if the system is multi-CPU, Data Structure for CPU exists
    — keep track of which KLT is related to the CPU
    — for a CPU, a pointer to its current thread or typically executed for certain threads...

Multiple KLT can support a single User Level Process,
When the Kernel need to schedule or context switch among KLTs that belongs to different processes

  • determine the KLTs point to a different PCB,
    so KLTs have different virtual address mappings
  • then it can decide whether or not to context switch
  • in the process, it will save entire PCB of the first KLT, then context switch to next KLT

 

 

3. Thread-related Data Structures

when there are multiple KLTs belong to the same address space,
there is a portion in PCB preserving 'light process state' for KLTs for User Level Processes

  • PCB can be separated into 'hard' process state and 'light' process state

— light process state includes signal mask, sys call args...

 

4. Rationale for multiple Data Structures

  1. Single PCB

    • large continuous Data Structure

    • private for each entity: maintain every single thread's information, (even though they share some infos)

    • on context switch, PCB is saved and restored for new process

    • since it's single, updating for any changes

    • Downsides*
      limited scalability, overheads, performance, flexibility

  2. Multiple Data Structure

    • smaller data structures: maintained by pointers to smaller data elements

    • Easier to share portions of information

    • context switch only needed parts (save and restore)

    • modification would impact only subset of data elements,
      so User-level library only update portion of the state

    • Advantages*

      improved scalability, overheads, performance, flexibility

 

 

5. User-level Thread Data Structures

Implementing Lightweight Threads, by stein & Shah

  • when thread is created, thread id (tid) is created: index into table of pointers to per thread data structure
  • in thread data structure:
    execution context, registers, signal mask, stack pointer, thread local storage(private storage allocated for thread function variables at compile time by compiler), stack
  • size of thread data structure is known at compile time, and layer them in a contiguous way: achieve locality, easier scheduling, ...
  • however thread library doesn't control stack growth: so it's possible one thread ends up overwriting another data structure, but this error is detected when other threads get to run

Solution: having separate part called 'red zone' in a thread

red zone is not allocated, refers to a virtual address space
when a stack grew enough to write in red zone, then OS will cause fault → easier to debug the executing thread itself

 

 

6. Kernel-level Thread Data Structures

  1. Process: maintain information of each process

    • list of executing kernel-level threads,
    • mappings of virtual address space in which the processes are executed,
    • user credentials: if user has a right to access to a file
    • signal handlers: infos of how to respond to certain events occurred in OS
  2. Light-Weight Process, LWP: information of a sub-subset of process

    • infos of one or more running user-level threads in the context of the process

    • their user-level registers and system call arguments

    • resource usage info that corresponds to that kernel level thread:
      at OS level, kernel tracks resource uses on a per kernel thread basis

    • signal mask

    • similar to ULT, but this is only visible to kernel → OS level scheduler need this when scheduling
      not needed when process is not running, so swappable when under memory pressure
      → allows system to support larger number of threads in a smaller memory footprint*

  3. Kernel-level Threads

    • kernel-level registers

    • stack pointer

    • scheduling info (class, ...)

    • pointers to associated LWP, Process, CPU structures

    • information about a kernel level thread and execution context, which is always needed.
      OS services that requires this information even when a thread is not active, process not running (e.g. scheduling information whether to activate the thread or not) ⇒ not swappable*

  4. CPU

    • current thread that's currently scheduledlist of other kernel-level threads that ran there

    • dispatching handling information:
      how to actually execute the procedure for dispatching a thread

    • interrupt handling information:
      how to respond to interrupts on the referral devices

      on SPARC, a dedicated register points to the current thread at any given point of time so that you can see the updated register context

 

 

7. Basic Thread Management Interactions

Necessary interactions for efficient management of threads

Example of multi-threaded process

  1. Assume process requests 2 kernel-level threads
  2. when process starts, kernel gives default number of KLTs
    so the process requests additional KLTs using set_concurrency and get more KLTs
  3. if KLTs that were executing became blocked for some reasons(e.g. I/O operations), the KLTs will be moved to the wait queue for the particular operation
    then corresponding ULT also have to wait for completion
  4. Kernel sends notification signal that the KLTs is going to be blocked, then process requests more LWPs or KLTs by system call
  5. the user-level library starts scheduling the remaining runnable ULTs onto associated LWPs or KLTs

Both User-level library and Kernel don't know what's happening to each other.
→ use System Calls and Signals to interact and coordinate

 

 

8. Thread Management Visibility and Design

1) single CPU

  • Problem: Lack of visibility between the kernel and the user-level thread management
    • because at the user-level, the User-level library makes scheduling decisions that the kernel doesn't know
    • this will change the user to kernel-level address mappings
  • 1:1 model helps address some of these issues

Multiple reasons why the control should be passed to UL library scheduler, when:

  • ULTs explicitly yield
  • timer set by UL Library expires
  • ULTs call Library functions like lock/unlock (synchronization)
  • blocked threads become runnable

2) Multiple CPUs

when there are three threads in the user level,
and thread priority is T3 > T2 > T1,
and T3 wants mutex but T2 has locked it, and the other CPU is executing T1

when T2 unlocks, T3 become runnable

then the kernel related to T2 will signal to T1 thread on the other CPU, to run library code locally, so that UL library can schedule threads according to thread priority

 

9. Destroying Threads

  • instead of destroying (it takes time... ) reuse threads!

How to reuse them?

when a thread done, exits,

  1. mark the thread as it's on a 'death row'
  2. periodically destroyed(freed) by a reaper thread on the death row
  3. if a request comes in before it's properly destroyed,
    thread structures/stacks are reused ⇒ performance gains!
Comments