Sunday, April 7, 2019

Overview of Multithreading

Multithreading is the ability to run multiple threads in a program at the same time. A thread is the smallest unit of execution for the purposes of this discussion. A larger unit of execution than threads that one should be aware of is a task from which stems the word Multitasking. Tasks are also termed processes. We could call this multiprocessing at the OS level, but this word is also used to refer to hardware supporting multiple processors. The often confusing terms such as 'Parallel', 'Concurrent', and 'Synchronous' are related to Multithreading and Multitasking.

Knotted Threads

Photo by Francesco Ungaro from Pexels
As a computer user, you can see that you can run more than one application at the same time, regardless of the computer you use. Be it Windows, Mac, or Ubuntu you can work on Word or Openoffice Text document, alongside a spreadsheet application and several sessions of your preferred browser. This is possible because the Operating System is capable of running more than one task/process at the same time.
Any application is a collection of programs and a process is nothing but the active execution of these programs. A process is an isolated area in the memory of a computer. The memory area of a process houses the set of instructions about the program along with its data. A modern OS is capable of executing more than one process (deal with more than one isolated area in memory). If the hardware or the Microprocessor has only one core, then the OS rapidly swaps processes in and out of memory, giving more than one process to execute in the processor. The OS takes care of scheduling each process and when and how many shares of time each process gets with the processor. This is called time-slicing.
The PRAM - Parallel Random Access Machine









The cost, in terms of allocating memory and maintaining a log of swaps along with resources such as files opened, network sockets opened, when spawning a new process is high. So instead of creating many collaborating processes to create big applications, multithreading is an easier way.
Threads run within a process. So the resources within the processes are shared by the threads within the processes. Therefore the "cost" associated with multithreading is considerably lower when compared to processes from the view of the operating system. Multithreading still incurs costs associated with context switching between threads. The OS has to keep track of threads.
The term concurrent in the context of multithreading is the ability of more than one thread to execute simultaneously. If it's a single core or single processor system, the OS scheduler swaps threads so fast it gives a semblance of running simultaneously even though the threads run sequentially. In the case of multi-core or multi-processor systems, they actually run concurrently across the many cores. This indeed provides a performance gain.
Parallelism is when a program splits a single activity across multiple threads and all those threads execute simultaneously to finish the activity. This is called task-level parallelism. There is also processor-level parallelism to consider such as instruction-level parallelism which may or may not be supported by programming language libraries.
Synchronous is used when more than one thread shares the same data. In the case of processes, memory does not overlap between processes as they are isolated. This also means communication between two processes is difficult adding to the "cost". Since threads share the same memory, data can easily be shared between threads. This also brings problems associated with inadvertent modifications of data by different threads. Synchronous is when two threads know and comply with rules before modifying shared data.
Threads in a PRAM

Multithreading allows a program to achieve high performance and throughput, especially in modern hardware as most are equipped with multi-core CPUs. In technical terms, every thread has its own execution stack which is maintained by the OS. The execution stack will be swapped along with local data when switching context between threads. While the execution stack is different data is mostly shared amongst the threads of a single process. Working with thread frameworks is like a tightrope walk for programmers. The difficulties in housekeeping between multiple threads are the responsibilities of the application programmer. The programming language of choice usually provides the threading libraries required for such housekeeping while developing multithreaded programs.

Tightrope walk



















Photo by Greg Rosenke on Unsplash

So what are the key differences between Multitasking and Multithreading:

Si#MultitaskingMultithreading
1 Multiple applications run at the same time. Each application is one task at least. Multiple parts of a program run simultaneously. Every program has at least one thread the "Main" thread
2 OS switches between multiple processes, allowing each application some time to run on the microprocessor. OS switches between multiple threads of the same process. OS has to keep track of the execution stack and execution context.
3 The cost of multitasking is high as each process runs in its own address space (process isolation). Interprocess Communication between address spaces and switching between address spaces inexpensive in CPU and OS terms. The cost of threading is lower comparatively as context switching happens within the address space of a process and data is mostly shared between threads.

3 comments: