OK. Today I’m off from work at work. Yeah, it is Sunday, but I have been busy with works related to my compnay during weekend. So, I decided to give some time to me.
So, I had a chance to think about Grand Central Dispatch.
Strangely, I have had big interest in multi-processing and parallel systems. So, when Apple introduced Grand Central Dispatch, aka GCD, I got to have interest in it.
There are many blog posts about what GCD is, and how to write codes using GCD.
However, have you ever though about how GCD distributes threads or processes across CPU and CPU cores?
As you know, OS without GCD can distribute chores to multiple CPUs and CPU cores. Then, how different GCD is?
We have also other choices for writing multi-thread codes : OpenMP
How different are they?
Many thought flew inside of brain. I know what the GCD is. However, do I really know about it?
Actually before understanding codes to realize GCD in compiler, framework and OS, I think it is not really possible to understand it fully. However, nobody obtains such codes to understand a specific technology. So, we, or at least I, depends on some blurry idea which explains about an interested technology.
So, here I would like to divide the different face of GCD.
There are two aspects of the GCD, I think.
- Methods for making multi-thread codes easy to write
- transparent distribution of threads and processes across CPUs and CPU cores
Actually, I think GCD is about (1) not (2).
With GCD, you don’t need to worry about synchronization, organizing your instance methods or class methods and many other things. You don’t need to spend much time to figure out, “Will this lines of codes run without any logical flaw with threads?”.
Then how is it different from OpenMP?
As you can see from my previous posts about OpenMP, it looks like compiler preprocessor level of functionality. I don’t recall it accurately but when I read about it at first, I think it was described as such. So, you can set how many threads you want for a given codes designed for single thread in mind, and what variables are to be used as shared variable among threads, and so on. However, it also supports optimal number of threads on a system it runs on. So, I assume that it will also generates some codes which will check those things into the final object codes. So, it is also dynamic.
However, GCD is mostly dynamic. I’m not sure if it uses OpenMP internally under some situation. I have never read any document about the relationship between OpenMP and GCD. Probably GCD is not based on OpenMP at all. Then there must be reason Apple people invented new mechanism, GCD, instead of using or enhancing existing OpenMP.
OK. So, my discussion on the (1) is kind of done.
Now, think about the (2).
If there is already decent solution for making multi-thread codes easier to write, GCD should also provides a better distribution mechanism for multi-thread/multi-processor environment. From old latest generation of Pentium 4 and current i5 and i7 cores, which, I believe, bring the Pentium 4 mechanism back to current CPU family, Intel somehow provides conceptual diagram how threads are handled and put into process pipeline. Based on those diagram, I got impression that somehow CPU had capability to load threads and processes onto each pipeline which flows to each CPU cores. If it is true, how efficient was it? When I attended WWDC 2005 or 2006, when Intel Mac was first announced, I listened to some Apple person who explained about the Intel Mac to developers in a room. He told us that there was problems in Intel CPU at that time. Although Pentium 4 utilizes the multiple cores and hyperthreading, Intel architecture was said to have problem of dirty pipeline and giving up most of elements in the pipeline very often. If you learn about Parallel Systems, you will know that you sometimes need to put away whole things which are already queued in a pipeline. So, efficiency of pipelining or even super-pipelining is diminished. I believe Intel finally fixed that problems and they introduced those things again in current i5 and i7 cores.
Then, does GCD completely depends on the CPU’s capability of distributing units of task?
Probably GCD doesn’t do anything at this level. Probably it is just to aid developers to write multi-threads codes easier.
(How about multi-processing, then?)
Then, if GCD does nothing at this level, what is the difference between GCD and OpenMP, from its fundamental approach?
When using GCD, is it more efficient in Software Engineering point of view? Is it more efficient than OpenMP? Flexibility will be better with GCD. Portability will be better with OpenMP.
GCD will be able to provide much better way than OpenMP if it starts to provide “Grand Grand Central Dispatch”, which is described in one of my previous post. However, I’m not sure if it is feasible, because GPU ad CPU are very different monster. To make it possible, they should make a compiler or preprocessor at least to generate two different version of same codes, one for CPU and another for GPU.
ADDED : On Apple page, this is mentioned :
GCD has a multicore execution engine that reads the queues created by each applica- tion and assigns work from the queues to the threads it is managing. GCD manages threads based on the number of cores available and the demands being made at any given point in time by the applications on the system. Because the system is responsible for managing the threads used to execute blocks, the same application code runs efficiently on single-processor machines, large multiprocessor servers, and everything in between.
So, it means that GCD also handles (2) part. Then how it will be different from systems without GCD and with GCD?
ADDED : Here is an article about OpenMP and GCD : Cocoa for Scientists (XXXI): All Aboard Grand Central