Difference in Concurrency Model in MacOS X and MS Windows (4)

12 05 2008

요새 지구촌 공생회일과 관련해서 돕고 있는 일이 있어서, 한 1주일간 포스팅을 제대로 못했다. 비록 이 블로그는 나의 생각을 정리하는 것이 주 목표이고, 언제든지 글을 쓸때의 마음으로 돌아가서 다시 리뷰해보고, 재빠르게 해당 주제로 내 생각을 context-switching하기 위한 것이어서, 내 마음 내키는 때에 글을 올리면 되겠지만, 이제 각종 검색 엔진에 이 블로그를 공개했고, 찾아오는 이들도 내 생각보다는 꽤 되어서, 알게 모르게 어떤 “의무감”을 느끼게 되다보니, 하나의 “숙제”가 되어가는 느낌이다. 앞으로 OpenMP나 aligned malloc에 대한 구현에 대한 글도 쓰려고 하는데, (한글로는 아니고..) 우선 이 synchronization에 대한 글을 마무리져야겠다는 느낌이다.

I have not posted for last week, due to my involvement to the Good Hands for Globe”. Although the purpose of this blog is to organize my thought, and to return the state of my mind when I post an article anytime, or in other words to contex-switch quickly to subjects in posts, it became like “homework” after I made this blog public to a few search engines. Indeed, not a few people visit this blog.
By the way, I would like to finish this “synchronization” post for new ones like OpenMP and implementation of aligned malloc.

  • pthread (POSIX thread )

Mac OS X에서의 synchronization을 이야기 하기 전에, 우선 pthread에 대한 간략한 언급이 먼저 있어야겠다. 이 pthread는 Unix 환경에서의 사실상 표준인 것이기 때문이다. 또한 BSD를 그 모태로 하는 Mac OS X는 그 기저에서 이런 pthread를 사용한다. Objective-C와 Cocoa는 그런 모태를 제쳐두고는 이야기할 수가 없다.

Before talking about the synchronization for Mac OS X, pthread should be mentioned at least. The pthread is the de-facto standard of threading on Unix environment. Moreover, the Mac OS X uses this pthread in its lower base, because it is based on the BSD. Therefore withouth the pthread, there can’t be the synchronization model of Objective-C and Cocoa.

pthread에 대한 좋은 tutorial은 다음의 두 웹 사이트에 잘 나와 있다.
There are good tutorials on the pthread as follows:

- POSIX Threads Programming

- Linux Tutorial : POSIX Threads Libraries

위 사이트에서 나온 예를 잠깐 보자.


#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *functionC();
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int  counter = 0;

main()
{
   int rc1, rc2;
   pthread_t thread1, thread2;

   /* Create independent threads each of which will execute functionC */

   if( (rc1=pthread_create( &thread1, NULL, &functionC, NULL)) )
   {
      printf("Thread creation failed: %d\n", rc1);
   }

   if( (rc2=pthread_create( &thread2, NULL, &functionC, NULL)) )
   {
      printf("Thread creation failed: %d\n", rc2);
   }

   /* Wait till threads are complete before main continues. Unless we  */
   /* wait we run the risk of executing an exit which will terminate   */
   /* the process and all threads before the threads have completed.   */

   pthread_join( thread1, NULL);
   pthread_join( thread2, NULL); 

   exit(0);
}

void *functionC()
{
   pthread_mutex_lock( &mutex1 );
   counter++;
   printf("Counter value: %d\n",counter);
   pthread_mutex_unlock( &mutex1 );
}

functionC()를 보면 mutex를 어떻게 thread화 된 함수에서 사용하는지가 간략하게 나와 있다. pthread는 모델이 무척 간결하여, 이 예만으로도 거의 대부분을 파악할 수가 있다. 아무튼 이 함수를 보면 mutex1이라는 mutex 변수에 대해서 lock을 걸고, 뭔가를 한 후에 lock을 푸는 것을 볼 수있다. Windows의 경우와 같이 WaitForSingleObject()와 같은 별도의 함수를 사용하지 않는다. 앞의 글에서 언급되었듯이, 차라리 MFC의 모델과 닮지 않았는가? 아마도 MFC의 그것은 이런 산업 표준의 thread를 사용하는 개발자들이 쉽게 접근할 수있도록 디자인된 것 같다.

functionC() shows how to use the mutex variables in a threaded function. The model of pthread is so simple that it is possible to understand how the pthrea functions and data are organized and how to use them. Anyway, it locks the mutex variable, mutex1, and release it using unlock(). It doesn’t use a function like, WaitForSingleObject() which is for the Win32. This is very similar to the model for the MFC. Probably the MFC was designed such that programmers familiar with the industry standard could learn the threading using MFC easier.

그럼 critical section은 어떨까? pthread에선 위의 두 웹 페이지에 나와 있듯이 따로 “CRITICAL_SECTION”과 같이 준비된 타입의 변수형은 없다. mutex와 같은 synchronization variable에 의해 regulate되는 블럭을 원래 critical section이라고 부른다. 사실 나로써는 왜 MS가 따로 CRITICAL_SECTION을 만들었는지 개념적으로 잘 이해가 가지 않는다. 물론 MSDN 설명서에 나와 있듯이, single machine, single process 내에서 속도가 더 빠르다고 하니, 아마 그런 optimization을 하기 위해서 특별히 만들어낸게 아닐까 한다.

Then, what about the critical section? If you search the two web site, you will not be able to find any special data type like “CRITICAL_SECTION”. The original concept of critical section is a code block surbodinated and protected by lock() and unlock() functions on a synchronization variable. So, you don’t need such a special data type. Conceptually it is somewhat difficult to understand why MS made such a special type. As its MSDN explains, the CRITICAL_SECTION is faster on a single machine, a single process. So, to optimize more in such case, I guess MS made it.

그런데 이런 pthread가 WIN32처럼 기본적으로 쓰이는 API를 위한 framework인 MFC와 같은 수준의 추상화를 해 준다는 점이 재미있다.

It is interesting that the pthread has the same level of abstraction to the MFC which is a framework for the Win32 API.

  • Synchronization in Objective-C

Objective-C는 언어 자체에서 synchronizatin을 지원한다. 즉 @synchronized()란 것을 지원하는데, 이것을 GCC 3.3과 그 이후의 버젼에서 지원받으려면 -fobjc-exceptions란 gcc 패러미터를 넣어 주어야 한다.

The support to synchronization is built in the Objective-C. To use the feature, a parameter -fobjc-exceptions should be passed to the gcc. Then, it can use @synchonized().

자 사용예를 한번 보자.

Let’s take a look at some examples.


- (void)criticalMethod
{
    @synchronized(self) {

        // Critical code.
        ...

    }
}

혹은 현재의 selector를 mutex로 사용할 수도 있다.

Or, the current selector, i.e. method, can be used as a mutex.


- (void)criticalMethod
{
    @synchronized(NSStringFromSelector(_cmd)) {

        // Critical code.
        ...

    }
}

굉장히 간결하지 않은가? 이것은 WIN32의 CRITICAL_SECTION보다 더 간결하다.
Isn’t it very simple? This is even simpler than the CRITICAL_SECTION of the Win32.

  • Synchronization in Cocoa

Lock을 사용하는 기본 예제를 보자.
Let’s take a look at a basic example.


BOOL moreToDo = YES;

NSLock *theLock = [[NSLock alloc] init];
…

while (moreToDo) {
    /* Do another increment of calculation */

    /* until there’s no more to do. */

    if ([theLock tryLock]) {

        /* Update display used by all threads. */

        [theLock unlock];
    }
}

tryLock을 사용한 부분은 그냥 lock을 사용한 것과 개념적으론 같다. 아무튼 이 예제에서 알 수있는 바와 같이, pthread의 그것을 그대로 닯았다. Cocoa는 그 기저에 BSD function들이 있다. 그러므로 그 함수들이 제시하는 프로그래밍 모델을 자연스럽게 따를 수밖에 없는 것이다.

The tryLock is conceptually analoguous to lock() of the pThread. Anyway, it resembles that of pthread. Cocoa is a kind of wrapper to its base architecture. So, it naturally follows the model of BSD functions and pthread.

여기에 추가적으로 Cocoa는 다양한 Lock을 제공한다. 이 Lock들은 근본적으로 다른 Lock이 아니라, 기본 lock을 어떻게 사용하는가에 따라 카테고리를 주어서 만든것이다. 이런 Lock의 종류로는 Mutex, Recursive Lock, Read-Write Lock, Distributed Lock, Spin Lock 이 있고, NSCondition 클래스를 이용한 condition lock이 있다.
엄밀하게 말해서 mutex 자체는 lock이 아니다. lock 관련 클래스들이 이런 mutex와 같은 semaphore value에 대해서 lock을 거는 것이다. mutex를 여기에 놓아서 문서를 읽는 이로 하여금 헷갈리게 만들었다.

In addition tho that, the Cocoa provide variaous lock types. They are basically same, but categorized by its use. They are mutex, Recursive Lock, Read-Write Lock, Distributed Lock and Spin Lock. Also there is a condition lock using NSCondition class.
The mutex is not lock. The various lock class uses the mutex for locking. By enlisting the mutex here, Apple confused people.

사용예는 다음의 문서를 참조하기 바란다.
About how to use them, please read this documentation.

http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/chapter_5_section_7.html#//apple_ref/doc/uid/10000057i-CH8-SW17

pthread도 condition lock이 있고, Cocoa가 제공하는 다양한 lock의 형태는 pthread의 기본 lock을 이용해서 구현할 수가 있다. 그러므로 전반적으로 봤을때, pthread의 synchronization model과 같음을 알 수있다.

pthread has condition lock and all kinds of lock in Cocoa can be implemented using the basic lock of the pthread. So, the synchronization model of the Mac is same to that of the pthread.

실제로, 굳이 pthread를 사용해도 된다.
Actually you can replace them with those from pthread.

여기서 재미난 점은 Spin Lock이다. 제일 흔한 lock인데, 이건 해당 mutex에 대한 lock을 얻을 수있는지 아닌지 계속 polling을 하면서 block되는 것이다. 이건 WaitForSingleObject()에서 그 첫째 인자로 INFINITE를 전달한 것과 같다. NSLock같은 경우엔 tryLock을 호출함으로써, lock을 얻을 수없으면 바로 해당 block을 pass할 수있는 유연성을 준다. WaitForSingleObject()의 경우엔 그 두번째 인자에 0을 전달하면 같은 효과를 볼 수있다.

What is interesting is Spin Lock. It is the most common lock, and keeps polling to acquire a lock. Thus it is blocked. This is similar to the WaitForSingleObject() with INFINITE as its 1st parameter.
NSLock provides flexibility by calling tryLock. It is not blocked. For the case of WaitForSingleObject(), you can pass 0 to its 2nd parameter.

여기서 알수있듯이 Cocoa/Unix의 경우는 mutex (혹은 semaphore) 와 lock의 두 개념만으로 synchronization을 수행한다. 반면에 Win32는 WaitForSingleObject()같은 별도의 함수를 이용한다. 사실 그 함수가 lock()과 마찬가지인데, 왜 굳이 보통 쓰이는 lock()을 안쓰고 새롭게 이름을 지었는지 모르겠다. 물론 MFC에서 해결을 해주곤 있지만..

For now, you can see that only two concepts, mutex (or semaphore ) and lock are necessary for the Cocoa and Unix, while there is a separate function, WaitForSingleObject() for the Win32. Actually it is analogous to the lock(). I don’t understand why MS invented a new name for that, although MFC solves the issue.

(C++/C#/CLI의 경우는 Cococa와 비슷하다는 점을 언급해두고 싶다.)
(I would like to mention C++/C#/CLI case is similar to the Cocoa.)

이상으로 각 플랫폼에서의 synchronization 모델에 대해서 살펴보았다. 이렇게 놓고 보면 다 매우 흡사해 보인다. 하지만 처음 MSDN 문서와 Apple 그리고 pthread 문서를 놓고 비교해 가면서 각 플랫폼에서 코드를 만드는 사람들은 헷갈릴 수가 있다. 이 문서가 그런 분들에게 도움이 되었으면 좋겠다.

So far, we figured out what the synchronization models on various platform look like. At this point, they look all the same, or similar. However, people who are fist to the platforms can be confused. So, I hope thes posts could help them.




Objective-C without Cocoa

2 05 2008

There are people who think that writing codes without Cocoa is impossible, or at least it is meaningless. Although many articles on Objective-C mentions that it is a superset of the C, many people really don’t understand it.

Some even ask, “Is it possible to use functions from libraries written in C with Objective-C code?”.

Not so few people are curious about why Objective-C didn’t become popular, and some answer that it is because of NeXT’s falling down. Their reasoning is based on the NeXT’s failure in making their system popular. Is it really so?

I don’t think, so. Just like standard C libary or C++ library, anyone who are interested in making Objective-C standard library could build their own.

However, it didn’t take place. You can use standard C library in Objective-C code. So, probably the original writers of the Objective-C thought it would be OK not to have “standard” library for the Objective-C.

Anyway, truth is that you can write codes with the Objective-C and standard C library. You can write your own Objective-C class which is analogous to the NSObject. Or you can inherit from the Object. (Actually, the Object was the root class before when NSObject became so. )

The gcc provides some header files to help coding in Objective-C :

  • encoding.h
  • hash.h
  • NXConstStr.h
  • objc.h
  • objc-api.h
  • objc-list.h
  • Object.h
  • Protocol.h
  • sarray.h
  • thr.h
  • typedstream.h

Among them, application programmers will be interested in Object.h, Protocol.h, thr.h, hash.h, and NXConstStr.h. The Object.h contains a class, Object, like the NSObject. Thus, it has messages like, init, initialize, free, alloc, and so on. thr.h is for threading.

So, with those header files, you can write codes in Objective-C.

The Objective-C is really a little addition to the C, while it has the genuine power of the Objected Oriented Programming.

Although it doesn’t support operator overloading, meta programming, it supports dynamism, message forwarding, remote messaging, flexible expansion of class.

However, as for reusability, I personally think that Objective-C is better than the C++.

Now, here is a simple Objective-C code without Cocoa or Foundation.

To compile it, issue :

    gcc -g -O -c MyClass.m main.m

    gcc -fgnu-runtime -fobjc-exceptions MyClass.o main.o -lobjc

File : MyClass.h


#import <objc/Object.h>
#import <objc/objc-api.h>
#import <objc/NXConstStr.h>
#import <objc/thr.h>

int gWait;

@interface MyClass : Object
{
    objc_mutex_t pMutex;

    NXConstantString *description;
    int value;
}

- (void) setDescription:(NXConstantString *)theText;
- (void) description;
- (void) doSomethingAtomically;
- (void) doSomethingAtomically1;
- free;
@end

File : MyClass.m


#import "MyClass.h"
#import <stdio.h>
#include <windows.h>

extern int gWait;

@implementation MyClass

- (void)setDescription:(NXConstantString *)theText
{
    description = [theText copy];
    value = 0;
}

- (void)description
{
    printf("%s", [description cString]);
}

- (void) doSomethingAtomically
{
    int i;

    printf("in Thread function\n" );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("1st : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("2nd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("3rd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    gWait++;

}

- (void) doSomethingAtomically1
{
    int i;

    printf("\t in Thread function 2\n" );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-1st : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-2nd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-3rd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    gWait++;
}

- (id)init
{
    if( [super init] != nil )
    {
        pMutex = objc_mutex_allocate();
        description = nil;
    }

    gWait = 0;

    return self;
}

- free
{
    printf("being deallocated..\n");

    objc_mutex_deallocate( pMutex );

    if( description )
    {
        [description free];
    }
}
@end

File : main.m


#import <stdio.h>
#import "MyClass.h"
#include <windows.h>

void isMultithreaded(void)
{
    printf("The program is multithreaded\n");
}

int main( void )
{
    MyClass *myObject = [[MyClass alloc] init];

    [myObject setDescription:@"Hello, World\n"];
    [myObject description];

    objc_set_thread_callback( isMultithreaded);

    objc_thread_detach( @selector(doSomethingAtomically), myObject, nil);
    objc_thread_detach( @selector(doSomethingAtomically1), myObject, nil);
    objc_thread_detach( @selector(doSomethingAtomically), myObject, nil);

    objc_thread_yield();

    while( gWait < 2)
    {
        //printf("main : Shall I sleep?\n");
        Sleep(5);
    }

    [myObject free];

   return 0;
}

NOTE : Above codes were written in MingW environment.




How to solve weirdness of the high resolution counter

26 04 2008

In a previous post, some issues on QueryPerformanceCounter() was discussed.

Fortunately I found a very good blog, Zooba’s Blog on problems using counters like rdtsc and QueryPerformanceCounter. Because there is additional processing time needed to get the CPU frequency that is used along with the result of rdtsc, or because just approximate frequency is used by looking up a registry, I think it is not good to use the rdtsc.
So, the last option is to use the QueryPerformanceCounter.

There are two issues to solve.

  1. To guarantee the timing starts and ends where you want to do so.
  2. Because of optimization, the compiler may reorder instructions. So, your “Start Measuring” command can be placed earlier and later.

  3. To obtain reliable count.
  4. As it was discussed in the previous post, it doesn’t return reliable count number on multi-processors or multi-core processors.

To solve the 1st problem, special instructions called “serializing instruction” should be called.
According to the Zooba’s Blog, there are 3 of them : iret, rsm, cpuid.
However, the iret and rsm change the instruction pointer. So, they are out. The cpuid is for getting information a cpu. So, it has no harm.
(What is the “serialization instruction“? It is an instruction which forces codes to be serialized. So, instructions in the queue already will be flushed out, and an instruction like cpuid is processed. So, you can ensure that the instruction for starting and stopping measuring will be located as they are expected. )

The 2nd issue is raised especially when the CPU you use is multicore processor or multi processor. Also when your CPU has the speed-step technology, it happens.
However, as it was mentioned in the Zooba’s Blog, the speed-step case is minimized. Because in the code you want to measure its performance, it would make your CPU sweat enough in most cases. So, the most troublesome case is the multi-core, multi-processor case.
How to solve this problem? It is also explained in the Zooba’s blog. (Thank you, Zooba!)
If you set the a specific processor runs the QueryPerformanceCounter(), it will return reliable result. So, the SetProcessorAffinity() or the SetThreadAffinity() can be used.

So, here is the code example.


// performance_measure.h
#ifndef PERFORMANCE_MEASURE
#define PERFORMANCE_MEASURE

#define DECLARE_GLOBAL_FOR_PEFORMANCE_MEASURE()\
    LARGE_INTEGER g_Start_Counter, g_End_Counter, g_Frequency;\
    DWORD g_Old_ProcessAffinityMask,g_New_ProcessAffinityMask, g_SystemAffinityMask;\
    HANDLE hCurrentProcess;

DECLARE_GLOBAL_FOR_PEFORMANCE_MEASURE();

inline void INIT_PERFORMANCE_MEASURE( void )
{
    hCurrentProcess = GetCurrentProcess();
    GetProcessAffinityMask( hCurrentProcess, &g_Old_ProcessAffinityMask, &g_SystemAffinityMask );

    QueryPerformanceFrequency( &g_Frequency );
}   

inline void START_PERFORMANCE_MEASURE( void )
{
    int CPUInfo[4];

    // Serializing Information
    __cpuid( CPUInfo, 0 );  // used the intrinsic version of the cpuid

    g_New_ProcessAffinityMask = 0×01;
    SetProcessAffinityMask( hCurrentProcess, (DWORD_PTR)&g_New_ProcessAffinityMask );

    QueryPerformanceCounter( &g_Start_Counter );

    // Revert to back
    SetProcessAffinityMask(hCurrentProcess, (DWORD_PTR)&g_Old_ProcessAffinityMask );
}

inline void STOP_PERFORMANCE_MEASURE( void )
{
    int CPUInfo[4];

    __cpuid( CPUInfo, 0 );  // Serializing Information
    SetProcessAffinityMask( hCurrentProcess, (DWORD_PTR)&g_New_ProcessAffinityMask );

    QueryPerformanceCounter( &g_End_Counter );

    // Revert to back
    SetProcessAffinityMask(hCurrentProcess, (DWORD_PTR)&g_Old_ProcessAffinityMask );
}

double GET_PERFORMANCE_MEASURE( void )
{
    return ((double)g_End_Counter.QuadPart - (double)g_Start_Counter.QuadPart)/(double)g_Frequency.QuadPart;
}

#endif

Insert above code like this in your code.


#include <windows.h>
#include <intrin.h>
using namespace std;

// This header file contains above code
#include "performance_measure.h"

void matrix_multiplication( void )
{
    ...

    printf("Single\n");

    INIT_PERFORMANCE_MEASURE();

    START_PERFORMANCE_MEASURE();

    start_t = clock();

    for( iteration = 0; iteration < 90000; iteration++ )
    {
        for( i = 0; i < 8; i++ )
            for( j = 0; j < 8; j++ )
            {
                temp = 0;
                for( k = 0; k < 8; k++ )
                {
                    temp += matA[i][k]*matB[k][j];
                }
                matC[i][j] = temp;
            }
    }
    duration_t = clock() - start_t;

    STOP_PERFORMANCE_MEASURE();

    printf("Duration = %f (%f)\n", (double)duration_t/CLOCKS_PER_SEC,
        GET_PERFORMANCE_MEASURE() );

Now, you will get a reliable result.

Hew….




OBJC_API_VERSION and __OBJC2__

26 04 2008

I would like to post an answer from the objc-language mailing list.

On Apr 25, 2008, at 2:06 PM, JongAm Park wrote:
With C++, there is a macro*__cplusplus*. Is there anything analogous for the Objective-C 2.0?
The Xcode 3.0 support converting to Obj-C 2.0 code from pre-2.0 code. However, there are people who still use Obj-C pre 2.0, or even though they use the 2.0, they may want to maintain code compatibility with an old compiler.
So, if there is a macro __OBJC_2_0__, it would be very helpful to write appropriate code for both pre-2.0 and 2.0 compiler.

So, the code will be like :

#ifdef __OBJC_2_0__
statements in Objective-C 2.0 syntax
#else
statement in Obj-C pre 2.0 syntax
#endif

There are no macros that match the available syntax. The available flags are:

OBJC_API_VERSION
This is set based on the low-level runtime API available. OBJC_API_VERSION==0 is the legacy API. OBJC_API_VERSION==2 means the function-based API added in Leopard is available. This is closer to what you want; it helps because it will disallow the new syntax when the deployment target is pre-Leopard, but if you’re compiling for 10.5+ then it doesn’t tell you whether your compiler knows about the new syntax.

__OBJC2__
This is set based on the ABI version (i.e. the metadata format on disk). This distinguishes between the legacy (i386+ppc) version and the modern (x86_64+ppc64) version. This isn’t what you want.

You could use OBJC_API_VERSION if your supported combinations are:
* Old compiler with deployment target older than Leopard
* New compiler with any deployment target

Thank you, Mr. Parker for the information.




Difference in Concurrency Model in MacOS X and the Windows (3)

25 04 2008

3. Event

Windows is made based-on event-driven model. Therefore, events play very important role on Windows environment, and are used very often whether a programmer make one or use ones provided by the OS. Let’s take a look at how events are used.

Windows는 event-driven 모델을 써서 만들어졌다. 그러므로 event는 상당히 중요한 역할을 하고, 많은 프로그램들이 OS가 제공하는 event를 사용하건, 아니면 해당 프로그램에서 event를 만들건 이 event를 많이 사용한다.
우선 이 event가 사용되는 예를 보자.


int _tmain(int argc, _TCHAR* argv[])
{
    HANDLE hThread[kMaxThreads];

    int i;

    initEvent();

    for( i = 0; i < kMaxThreads; i++ )
    {
	// Threads wait on their events and trigger events for others.
        hThread[i] = CreateThread( NULL, 0, doMultiThreadWay, 0, 0, &gThreadID[i] );

        if( hThread[i] == NULL )
        {
		…
            ExitProcess(i);
        }
        else
        {
		…
        }
    }

   // Until now, all threads are created and wait for their events.

   // set the 1st event, gEvents[0], or fire an event.
    SetEvent( gEvents[0] );

    // Wait until all threads have terminated
    WaitForMultipleObjects( kMaxThreads, hThread, TRUE, INFINITE );

    // Close all thread handles
    for( i = 0; i < kMaxThreads; i++ )
        CloseHandle( hThread[i] );

    destroyEvent();

	return 0;
}

// This is how the events are initialized.
void initEvent( void )
{
    int i;

    for( i = 0; i < kMaxThreads; i++ )
    {
	// events are automatically reset if there are once set.
        gEvents[i] = CreateEvent( NULL, FALSE, FALSE, NULL );

        if( gEvents[i] == NULL )
            outputString( __T("Error in creating events\n"), FOREGROUND_RED | FOREGROUND_INTENSITY );
    }
}

// Threading function
DWORD WINAPI doMultiThreadWay( LPVOID lpParam )
{
    TCHAR msgBuf[kBuffSize];
    size_t cchStringSize;
    DWORD dwChars;
    DWORD threadID;
    DWORD dwWaitResult;
    WORD textColor;

    int i;

    threadID = GetCurrentThreadId();
    if( threadID == gThreadID[0] )
        textColor = FOREGROUND_GREEN | FOREGROUND_RED;
    else if( threadID == gThreadID[1] )
        textColor = FOREGROUND_BLUE | FOREGROUND_RED;
    else
        textColor = FOREGROUND_BLUE | FOREGROUND_GREEN;

    // Thread safe way of outputting
    StringCchPrintf( msgBuf, kBuffSize, __T("doMultiThreadWay (%d)\n"), threadID );
    outputString( msgBuf, textColor );

    for( i = 0; i < 5; i++ )
    {
	// Each threat wait for its event.
        if( threadID == gThreadID[0] )
        {
            dwWaitResult = WaitForSingleObject( gEvents[0], INFINITE );
            outputString(__T("First thread says \"Do It\" to the second thread\n"), textColor );
		// An event, e.g. gEvents[0], is automatically reset.
            SetEvent( gEvents[1] );
        }
        else if ( threadID == gThreadID[1] )
        {
            dwWaitResult = WaitForSingleObject( gEvents[1], INFINITE );
            outputString(__T("Second thread says \"Do It\" to the third thread\n"), textColor );
            SetEvent( gEvents[2] );
        }
        else
        {
            dwWaitResult = WaitForSingleObject( gEvents[2], INFINITE );
            outputString(__T("Third thread says \"Do It\" to the First thread\n\n"), textColor );
            SetEvent( gEvents[0] );
        }

    }

    return 0;

}

What the threading function does is to wait for their event and trigger the next event. It is to wake up threads one by one.
This illustrates the effect of using events.

위에 있는 쓰레드 함수가 하는 것은, 각 쓰레드에 대응하는 이벤트를 기다리다가, 자기 것이 트리거되면, 해당 쓰레드가 다음의 이벤트를 fire함으로써, 다음번 쓰레드가 깨어나게 하는 것이다.
이런 행동이 바로 이벤트를 사용함으로써 얻고자 하는 효과이다.

If you don’t want to read the whole text above, here is the screenshot which will help you what the codes do.

위의 긴 글을 읽기 싫다면, 다음의 스크린샷을 보면 위의 코드가 무엇을 하는지 대번에 눈치를 챌 수있을 것이다.
What the codes do.

As it has always been, events can be implemented using mutex or semaphore. However, using events will simplify things.

역시 여기서도 생각해 볼 수있는 것이, 이 Event라는 것도 semaphore나 mutex을 이용하면 구현할 수있을 거라는 생각이다. 하지만 event를 사용하면 편리하게 구현을 할 수가 있다.

It is characteristic that there are functions like WaitForSingleObject() and WaitForMultipleObjects(), and this makes the Windows different from other OSes like Unix. So, a student who learned multiprocessing and parallel computing model based on Unix and other Oses than Windows can be confused.
However, it is also easy and reasonable model, and there is no problem in learning this Windows model.

이상에서 살펴본 Win32에서의 synchronization 모델에는 그 특징이 있다.
Critical Section, Mutex, Semaphore, Event등을 선언하고 세팅한 후, WaitForSingleObject()와 같은 함수를 이용해서 해당 상황이 발생하는지 기다리는 것이다. 이것이 주목해야 할 Win32의 synchronization 프로그래밍 모델이다.
무척 이해하기가 쉽고 논리적으로 설계가 되었지만, 다른 OS에는 이런 WaitForSingleObject()와 같은 함수가 없다. 그러므로 Unix와 같은 다른 OS에서 프로그래밍을 하다가 Windows에서 하게 되었을때, 혼동을 일으킬 수있다.

Windows multithreading (MFC)

MFC contains lots of wrappers to Win32 data types and their behaviour. So, it is framework.
MFC는 바로 이상의 것들을 감싸서 사용하기 쉬운 클래스로 만들어준 것이다. 즉 Framework인 것이다.

However, the MFC wrappers to synchronization do more than that.
It makes the synchronization model of Windows look similar to that of the Unix.
Let’s take a look at an example.

그런데 synchronization에 관해서 MFC의 wrapper들은 단순히 wrapping해서 쓰기 쉽게만 해주는 것이 아니라, 그 모델을 Unix의 그것과 비슷하게 해준다.
자 예를 한번 보자.


// Global Mutex Object
CMutex g_m;
int g_C;

UINT ThreadFunction1(LPVOID lParam)
{
    // Create object for Single Lock using the mutex
    CSingleLock lock(&g_m);

	// try obtaining a lock.
    lock.Lock();

    // code block protected by the lock.
	...

	// release the lock
    lock.Unlock();

    return 0;
}

UINT ThreadFunction2(LPVOID lParam)
{
    // Single Lock Construct Mutex
    CSingleLock lock(&g_m);

   // If the other thread already obtained the lock, this thread will wait here.
    lock.Lock();

    // code block protected by the lock.
	...

    lock.Unlock();

    return 0;
}

Where the Lock() function is located is comparable to the lines where WaitForSingleObject() is used in Win32.
For critical section, i.e. CCriticalSection, can be also implented by replacing g_m with a CCriticalSection. So, for mutex, semaphore, event, and critical section, the style how they are locked and and unlocked are the same.
This is the major difference between the Win32 model and the MFC model.

Anyway, where it is locked and unlocked are similar to the model for the Unix.

Lock() 메소드가 쓰여진 부분이 바로, Win32의 경우에 WaitForSingleObject()가 쓰여진 부분에 대응한다고 볼 수있다.
MFC에서는 그 locking variable이 뭐던간에, 즉 critical section이냐, mutex냐, event냐에 상관없이 모두 같은 프로그래밍 모델을 제공한다. 즉 위의 코드에서 CMutex로 선언된 부분을 CCriticalSection으로 바꾸면, 거의 코드를 고칠 필요없이, 그대로 사용할 수있게 된다. 즉 다시 말하자면, 다른 locking variable에 대해서 통합된 모델을 제공한다는 것이다.

아무튼 전체적으로 lock을 하고 unlock을 하는 부분이 Unix를 닮은 부분이다.

So far, we tried figuring out how synchronization looks like on the Windows.
In the next post, let’s try the Objective-C and Cocoa case.

자 이상으로 Windows에서의 synchronization에 대해서 알아보았다.
다음에는 Objective-C와 Cocoa의 경우를 살펴보기로 하자.




Difference in Concurrency Model in MacOS X and MS Windows (2)

23 04 2008

This post is the 2nd part of the previous post a while ago. As I promised before, this series of post is written in English and Korean.

  OK. It is time to return back to this issue, “multi-threading design” on Windows and Mac. When I studied multi-threading and synchronization on Windows after learning Unix, it was a little confusing. Although those on Windows is easy to learn and similar to those on Unix, there are some difference. The reason of difference comes from how the functions and facilities are designed.
Basically they share the same model. However, they present it in slightly different

자 한동안 잊고 지냈던 multi-threading에 대한 이야기를 해보자. Unix를 배우고 나서, Windows의 muti-threading과 synchronization에 대해서 공부를 하게 되면, 약간 좀 헷갈리는 면이 생긴다. 상당히 흡사하면서도, 익히기 쉽게 되어 있는 Windows의 그것은 하지만 좀 다른 면도 있다. 그 이유는 어떻게 해당 함수들을 디자인했는가에 기인한다.

In this post, the facilities provided by the Windows for multi-threading are presented, and let’s figure out how to use them. In next post, those for Objective-C and Cocoa will be explained.
이 글에서는 multi-threading을 위해 Windows에서 마련해 놓은 여러 장치들을 알아보고, 그 쓰는 법을 간단히 살펴본다. 그리고 다음번에는 Objective-C와 Cocoa등 Apple이 접근하는 방법을 알아보기로 하자.

1. Synchronization in Win32

1.1 Critical Section

The critical section seesm to be the simplest synchronization method. By embracing a code block with two functions, it enables mutually-exclusive access to the block.

이 critical section은 개인적으로 볼때 가장 간단한 synchronization 방법이 아닌가 한다.  일련의 코드 블럭을  감싸는 두 함수를 호출함으로써, 해당 블럭에 대한 배타적 접근을 가능하게 한다.


 for( i = 0; i < 5; i++ )
 {
#ifdef USE_CRITICAL_SECTION
 	EnterCriticalSection( &gCriticalSection );

        // Thread safe way of outputting
        StringCchPrintf( msgBuf, kBuffSize, __T("doMultiThreadWay (%d) : %d\n"), threadID, i );
        outputString( msgBuf, textColor );

        LeaveCriticalSection( &gCriticalSection );
#endif
}

The EnterCriticalSection() and the LeaveCriticalSection() are those two functions.

EnterCriticalSection()과 LeaveCriticalSection()이 바로 그 두 함수이다.

1.2 Mutex ( Mutually Exclusive Semaphore ) & Semaphore

The Windows prepares special functions for realizing mutex, or more generally semaphore : CreateMutex(), CreateSemaphore(), WaitForSingleObject(), WaitForMultipleObjects(), ReleaseMutex(), and ReleaseSemaphore().

윈도우즈에선 mutex 혹은 좀더 일반적으로 말하자면 semaphore를 처리하기 위해서 특별한 함수들을 준비해 놓고 있는데, 바로 CreateMutex(), CreateSemaphore(), WaitForSingleObject(), WaitForMultipleObjects(), ReleaseMutex(), ReleaseSemaphore()와 같은 함수들이다.

Mutexs and Semaphores are created by calling CreateMutext() and CreateSemaphore(), respectively. After creating them, a code block can be accessed as seen below.

Mutex와 Semaphore는 각각 CreateMutex()와 CreateSemaphore()를 호출함으로써 만들어지고, 일단 만들어진 후에는 다음에 보이는 것처럼 코드 블락을 억세스하는데 사용할 수있다. ( 아니 오히려 억세스를 regulate한다라고 봐야하겠다. )


        dwWaitResult = WaitForSingleObject( gMutex, 5000L );
        switch( dwWaitResult )
        {
        case    WAIT_OBJECT_0:
                __try
                {
                    // Thread safe way of outputting
                    StringCchPrintf( msgBuf, kBuffSize, __T("doMultiThreadWay (%d) : %d\n"), threadID, i );
                    outputString( msgBuf, textColor );
                }
                __finally
                {
                    if( !ReleaseMutex( gMutex ) )
                    {
                        // Save old attribute for a console
                        WORD wPrevColorAttrs = normalTextCsbiInfo.wAttributes;

                        // Now, write in Red
                        if( !SetConsoleTextAttribute( hStdout, FOREGROUND_RED ) )
                        {
                            MessageBox( NULL, __T("SetConsoleTextAttribute"), __T("Console Error"), MB_OK );
                            break;
                        }

                        // Thread safe way of outputting
                        StringCchPrintf( msgBuf, kBuffSize, __T("doMultiThreadWay (%d) : Error in releasing mutex\n"), threadID );
                        outputString( msgBuf, FOREGROUND_GREEN );

                        if( !SetConsoleTextAttribute( hStdout, wPrevColorAttrs ) )
                        {
                            MessageBox( NULL, __T("SetConsoleTextAttribute"), __T("Console Error"), MB_OK );
                            break ;
                        }
                    }

                    break;
                }

        case WAIT_TIMEOUT:
            break;

        case WAIT_ABANDONED:
            break;
        }

So, when a thread which is at after the WaitForSingleObject() line releases the mutex by calling ReleaseMutex(). Then next thread waiting at the line WaitForSingleObject() get the mutex, blocks other thread to get the mutex, and proceeds.

WaitForSingleObject()를 넘어간 쓰레드는, mutex를 획득한 것인데, ReleaseMutex()를 호출함으로써 mutex를 놓게 된다. 그러면 WaitForSingleObject()에서 기다리고 있던 다음의 쓰레드가 이제 mutex를 획득하고, 처리를 계속해 나간다.

Simple, isn’t it?
What is somewhat different from the Unix model is to use calls like WaitForSingleObject(). However, it is quite easy to understand and manipulate.

간단하지 않은가?
이런 모델이 Unix의 모델과 다른 점은 WaitForSingleObject()와 같은 함수를 씀으로써 달라지는 형식이다. 하지만 이런 Windows의 방식도 굉장히 이해하기 쉽고, 다루기가 쉽다.

Actually, at this point, you may wonder why the critical section is necessary. You can implement critical section using mutex. Then why are there the critical section? Actually some OSes don’t have the critical section. Anyway, to understand the difference and similarity, please read MSDN document at Critical Section Objects.

이 시점에서, 왜 critical section이 필요한지 궁금할 수있다. 즉 mutex를 이용하면 critical section을 구현할 수가 있는데, 굳이 왜 critical section이란 것을 만들까?
실제로 어떤 OS에는 critical section이 없는것도 있다. 자 우선 MS의 critical section과 mutex등의 차이점에 대해선 MSDN의 Critical Section Objects라는 문서를 참조해 보자.

“A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process. Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned.”

The clear difference is that critical section can be used only for the threads of a single process. And in that case, it is faster.

결정적인 차이는 바로 critical section은 single process의 thread에서만 쓸 수있다는 것이고, 그럴 경우 속도가 빠르다는 것이다.

One good example for things which make us confusing when we develop on many different OSes is this critical section. On some lines above, I said that some OSes didn’t have the critical section. Well, to make things more correct, I should revise the statement. It’s wrong. The concept of critical section exist on all multiprocess, multithreading OSes. If you use mutex to force atomic access to some code blocks, then it is the critical section. On the other hand, the critical section mentioned on a MSDN page is the MS’s special structure, CRITICAL_SECTION, rather than critical secition as general concept. A code example is here :

여러 OS에서 프로그래밍을 하다보면 헷갈리게 되는 게 생기는데, 그 좋은 예가 바로 이 critical section이다. 앞에서 잠깐 어떤 OS에선 critical section이 없다고 이야기 했는데, 지금와서 밝히자면 이 말은 좀 잘못된 말이다. critical section의 개념은 다 존재한다. mutex를 이용해서 특정 블럭에 대해서 atomic access를 하게 하면, 그게 critical section이다. 반면에 위의 MSDN 문서에서 언급하는 critical section이란 일반적 개념으로써의 critical section이 아니라 다음과 같은 코드로 만들어질 수있는, MS가 만든 특별한 구조체인 CRITICAL_SECTION이다.


CRITICAL_SECTION gOutputCriticalSection;
InitializeCriticalSectionAndSpinCount( &gCriticalSection, 0x80000400 );

So, it is rather safe not to think, “Oh, there is no critical section on xxx OS.”.
그러므로 Windows에서 프로그래밍을 하다가 혹 다른 OS에서 하게 될 경우 “critical section이 없네”하는 생각은 하지 않는게 옳다.




Impressive battery life of 2007 Nov. version of MacBook

20 04 2008

Sometimes it is relaxing to go back to non-technical subject.
I have used the base model of MacBook for a few months. Usually I don’t believe manufacturer’s claim on battery life. If they say that it lasts for 4 hours, the new battery will last about 2 hour and 30 minutes usually, and more realistically 2 hours. If you use that kind of notebook computer for about 1 year, it will be stabilized around 1 hour and 30 minutes.

However, somehow the MacBook has different story.
I charged my machine last sunday, and I turned it on today.
The remained battery level is as follows :

The natural discharging quality is very good. And it actually last more than 4 hours in actual use.

Is the battery really different from those used in Windows notebooks? Or is the Mac OS X power manager is really good?




QueryPerformanceCounter() equivalent on Mac OS X

20 04 2008

Timer is quite an issue to some people who need to process image in realtime or who want to measure very fast code.

Because the QueryPerformanceCounter() and QueryPerformanceFrequency() are discussed in my previous post, one will raise a question, “Is there a similar function for the Mac OS X?”.

Yeah.. Actually, my blog stat showed that a few people searched with that term.

I found some functions like mach_absolute_time() and mach_timebase_info().

You can read very nice explanation here at the MacResearch and here at the Apple’s Q&A page.




Weirdness of the High Resolution Counter, i.e. QueryPerformanceCounter()

19 04 2008

For the most of time, using clock() for measuring performance for a block can be enough.
However, there are some cases where you want to compare two logically identical but differently implemented blocks.
Let’s assume that you want to compare performance of intrinsic version of strcpy and your own implementation of strcpy block written in SIMD instructions.
In most case, the clock()-based functions, like clock() and GetTickCount(), will not reveal the difference between them.

So, you decided to use high performance, or high resolution timer. The Windows supports these two functions for that purpose.

1. QueryPerformanceCounter( LARGE_INTEGER *pVal )
This function is like the clock(). the value returned in a location pointed by pVal is the number of counts, just like that the clock() returns number of ticks.

2. QueryPerformanceFrequency( LARGE_INTEGER *pVal )
This returns how many times it ocillates per a second.

So, the duration of time can be obtained by


    LARGE_INTEGER aVal, aFreq;
    __int64 durataion_in_time;

    QueryPerformanceCounter( &aVal );
    QueryPerformanceFrequency( &aFreq );
    duration_in_time = aVal.QuadPart / aFreq.QuadPart;

However this has some glitches with contemporary CPUs.

Before mentionting the glitch, let’s take a look at how the LARGE_INTEGER is declared.


typedef union _LARGE_INTEGER {
    struct {
        DWORD LowPart;
        LONG HighPart;
    };
    struct {
        DWORD LowPart;
        LONG HighPart;
    } u;
    LONGLONG QuadPart;
} LARGE_INTEGER;

The LONGLONG is __int64 type. So, if your compiler and CPU supports 64bit data type, you can access the content of the LARGE_INTEGER with the QuadPart.

The 1st glitch is that the returned value easily exceeds the boundary of the 64bits for the QuadPart, because current CPUs are so fast.
(If you search on the Google, you will find some web pages on which people explain that it exceeds the 32bit boundary.
And they recommend to use 64bit data type. Well, actually it even exceeds the 64bit boundary. )
So, probably you can use unsigned __int64 instead.

The 2nd glitch is that you can’t print them out properly when you use %I64d for aVal.QuadPart/aFreq.QuadPart.
Even %Lf doesn’t solve the problem. They are all for 64bit integer and real numbers. Then how to display them properly?


printf("%f", (double)aVal.QuadPart/(double)aFreq.QuadPart);

double is also 64bit real number type, and it works.

The 3rd glitch is the real glitch.
Let’s take a look at this screenshot from real invocation of the code.

Hmm… Why the high performance counter is not reliable?
By searching on the Google, I found a clue that it was due to the speed-step or similar technology which changes the CPU speed on demand.
Because it has very high resolution, it has the glitch.
I read somewhere in Intel’s forum that Intel or MS was working on making the call to measure on the FSB side instead of inner core of the CPU.
By doing so, it is said that the function would return more reliable value even when battery-saving technology in a CPU is used.

I assume that the GetTickCount() Win32 API function is also based on the clock(). However, it displays somewhat expected result seemilgy reliably.
The clock()/CLOCKS_PER_SEC displays 2 and 1.9… from time to time.

Probably the GetTickCount() has the lowest resolution.
However, one convenient side of using the GetTickCount() is that it returns a value in millisecond, if you want “time” instead of number of ticks.
So, you don’t need to divide it by some constant like CLOCKS_PER_SEC. Then it should be renamed to GetTickTime().
Well.. the function name again misleads, but it is the brain-child of the MS.

Finally, here is a screenshot when all of them return good results. :)




GCC comes with Mac OS X 10.4.x and 10.5.x doesn’t support flexible array member

11 04 2008

According to GCC manual, it supports flexible array member.


struct foo { int x; int y[]; };
    struct bar { struct foo z; };

    struct foo a = { 1, { 2, 3, 4 } };        // Valid.
    struct bar b = { { 1, { 2, 3, 4 } } };    // Invalid.
    struct bar c = { { 1, { } } };            // Valid.

The lines commented as valid should be compiled without any error. However, the GCC 4.x installed with the Xcode 2.5 and 3.x on OS X 10.4.x and 10.5.x respectively doesn’t compile it without errors.

I reported this bug to the Apple.

NEW on April, 22, 2008
I got message from Apple.

This is a follow-up to Bug ID# 5857390. Engineering has determined that this issue behaves as intended based on the following information:

Page 232 of the GCC manual available at http://gcc.gnu.org/onlinedocs/gcc-4.2.3/gcc.pdf states that:

“To avoid undue complication and confusion with initialization of deeply nested arrays, we simply disallow any non-empty initialization except when the structure is the top-level ob ject.”

Compiling the example on page 232 will produce an error based on the two disallowed statements specifically marked as “Invalid”. Compiling the file with only the “Valid” lines works correctly.