Using bootcamped Windows without any key mapping

31 05 2008

There are many people who complains about different key mapping on Windows and Mac, and non-existent keys on Mac keyboards for the Windows.

With extended keyboard, which most people use, there is no such issue. Because the current Macintosh keyboard except for the small wireless keyboard that doesn’t have extended part.

So, many people try installing key mappers on the Windows. However, Apple provides good information on how to use their keyboards with the Windows. They have very nice pictures of variable Apple keyboards including the small Apple Wireless keyboard.

If you click links on “Note:” line, you can find specific information on each keyboard type.

However, if you are a MacBook user, not a MacBook pro user, you will ask, “How about my MacBook?”, because they don’t have any explanation for the case of MacBook. The Wireless keyboard looks exactly like the MacBook’s. Well, information on that “note” link explains quite useful key mappings, but not all. For example, Start, End, PageUp and PageDown are what I use very frequently when I write codes on the Windows. ( I don’t like the position of those keys, though. Like the Mac OS X, “Command-<left arrow>”, “Command-<right arrow>”, “Command-<Up arrow>”, “Command-<Down arrow>” are more intuitive and more convenient to use. So, here is the good news. On bootcamped Windows, you can use “fn” + <arrow keys> for those. You cand find more information on those on the page for MacBook Pros. The MacBook follows MacBook Pros as far as the keyboard mapping is concerned.

 





AVI file format : inconsistent interpretation of size fields

26 05 2008

I started again a kind of AVI Riff format viewer for Macintosh. Because Mac is used a lot in video industry and handling AVI file formats is very crucial. So, it would be nice if we have such a program for the Macintosh. Although there is a command line interface version of RIFF viewer, GUI version seems not to exist.

Anyway, while I was parsing the RIFF format, I found something inconsistent and which was not documented well. Let’s take a look at basic format which is related to what I describe.
Difference in size fields for movi and idx1 chunk

The value of size field in movi chunk means the size of data. For example, for 01dc block, the size field contains value for the size of its following data.

On the other hand, the idx1 chunk has its own 01dc chunks and each 01dc chunk has its own size field. The size field means the whole 01dc chunk size, i.e its fcc type ( 01dc), the size field, and a data field. So, the whole 01dc chunk size is 16 bytes, not the one indicated by the size field, while that of movi chunk is the actual data size.

So, be careful in parsing the RIFF format.
I’m checking if each field is parsed OK. Most are not documented.





Why OpenMP?

23 05 2008

   When I attended the WWDC, i.e. Apple’s World Wide Developer Conference, a few years ago, and if I remember it correctly, some people raised their hands and asked when the OpenMP support would be included in the GCC provided by the Apple. At that time, I didn’t not understand why it is important. I thought the OpenMP and MPI are for speicial market, like high-performance science and data analysis market. I thought they are for their own league.
   Also, I didn’t understand why we needed another threading and mutiprocessing/multithreading API when we already have the pthread and other message passing APIs. I would confess this. “Why should programmers learn another threading API? I don’t want to do so!”

   However, about 1 month ago, I found out that the OpenMP, at least, can boost performance of any individual programmer’s codes very “easilty”. I woudl like to put emphasis on “easily”. If a new library is announced, it should be easy to be tried without sacrificing your precious time in my opinion.
The OpenMP was turned out to be in that category.

   Actually, it looks like a collection of macros which utilize the pthread functions. However, actually it is built into compilers like the Visual C++’s compiler and GCC v. 4.2.x. So, in other words, you need a compiler which supports the OpenMP.

The great features of the OpenMP are :

  1. Very easy to use; Not so many new keywords to memorize; very straight forward to use.
  2. Enables applying “fine” level of multithreading without hassle.
  3. You can use almost same source codes for single threaded version and multithreaded version no matter how many threads you want to create.

Let’s talk about them more to get better idea what I mean.

1. The OpenMP keywords are very easy to learn. They are quite clean, and doesn’t introduce new concept. It consists of only a couple of keywords, and you can try them very easilty without modifying your logic much. Actually, embrace your logic, which should be handled by threads, with brackets (braces?) and their keywords. That’s it!

2. When codes are written in multithreaded way, they can be usually for coarse-grained multithreaded. It is because that it is tedious to write multithreaded codes in fine-grained way. I just create a new thread which uses a function as a thread function. But with the OpenMP, you can easily slice your time-wasting for-loop and give them to their own threads.

3. If OpenMP allows you to write multithreaded code, but if it makes you to change your codes a lot, it is not useful. The OpenMP allows to convert a single threaded version of code into multithreaded version by adding their new statements. Usually there is no need to change the structure of existing codes. If you want to use 3 threads instead of 2, you can just specify the number of threads to utilize without change the exisiting code structure!

Also, it is quite handy for current multicore processors. The main target of the OpenMP is multiprocessor or multicore processors in one computer. On the other hand, the MPI is for distributed environment.

Here is my sample code which uses the OpenMP. It shows how fast it can be if the OpenMP is used. I also tried using SIMD instructions if it can achieve faster performance than using multithreading. I think SIMDs are more efficient than using multithreads, because there is no overhead to create multiple threads, and maintain them. However, my code sample shows that poorly designed SIMD codes are slower than simpler but multithreaded codes.


// OpenMP.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <omp.h>
#include <cstdio>
#include <cmath>
#include <ctime>
#include <cstring>
#include <windows.h>
#include <intrin.h>
using namespace std;

#include "performance_measure.h"

#define NUM_THREADS 4
#define NUM_START 1
#define NUM_END 10

void test(int val)
{
    #pragma omp parallel if (val) num_threads(val)
    if (omp_in_parallel())
        #pragma omp single
        printf_s("val = %d, parallelized with %d threads\n",
                 val, omp_get_num_threads());
    else
        printf_s("val = %d, serialized\n", val);
}

void AnotherTest( void )
{
   int i, nRet = 0, nSum = 0, nStart = NUM_START, nEnd = NUM_END;
   int nThreads = 0, nTmp = nStart + nEnd;
   unsigned uTmp = (unsigned((abs(nStart - nEnd) + 1)) *
                               unsigned(abs(nTmp))) / 2;
   int nSumCalc = uTmp;

   if (nTmp < 0)
      nSumCalc = -nSumCalc;

   omp_set_num_threads(NUM_THREADS);

   #pragma omp parallel default(none) private(i) shared(nSum, nThreads, nStart, nEnd)
   {
      #pragma omp master
      nThreads = omp_get_num_threads();

      #pragma omp for
      for (i=nStart; i<=nEnd; ++i) {
            #pragma omp atomic
            nSum += i;
      }
   }

   if  (nThreads == NUM_THREADS) {
      printf_s("%d OpenMP threads were used.\n", NUM_THREADS);
      nRet = 0;
   }
   else {
      printf_s("Expected %d OpenMP threads, but %d were used.\n",
               NUM_THREADS, nThreads);
      nRet = 1;
   }

   if (nSum != nSumCalc) {
      printf_s("The sum of %d through %d should be %d, "
               "but %d was reported!\n",
               NUM_START, NUM_END, nSumCalc, nSum);
      nRet = 1;
   }
   else
      printf_s("The sum of %d through %d is %d\n",
               NUM_START, NUM_END, nSum);

}

void test2(int iter)
{
    #pragma omp ordered
    printf_s("test2() iteration %d by thread ID %d\n", iter, omp_get_thread_num());
}

void AnotherTest2( void )
{
    int i;
    #pragma omp parallel
    {
        #pragma omp for ordered
        for (i = 0 ; i < 5 ; i++)
            test2(i);
    }

}

/*
 * taylor.c
 *
 * This program calculates the value of e*pi by first calculating e
 * and pi by their taylor expansions and then multiplying them
 * together.
 */

#define num_steps 20000000

void sequential_taylor( void )
{
  double start, stop; /* times of beginning and end of procedure */
  double e, pi, factorial, product;
  int i;

  printf("Sequential Taylor\n");
  /* start the timer */
  start = clock();

  /* First we calculate e from its taylor expansion */
  printf("e started\n");
  e = 1;
  factorial = 1; /* rather than recalculating the factorial from
		    scratch each iteration we keep it in this varialbe
		    and multiply it by i each iteration. */
  for (i = 1; i<num_steps; i++) {
    factorial *= i;
    e += 1.0/factorial;
  }
  printf("e done\n");

  /* Then we calculate pi from its taylor expansion */
  printf("pi started\n");

  pi = 0;
  for (i = 0; i < num_steps*10; i++) {
    /* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
       therefore we count by fours (0, 4, 8, 12...) and take
         1/(0+1) =  1/1
       - 1/(0+3) = -1/3
         1/(4+1) =  1/5
       - 1/(4+3) = -1/7 and so on */
    pi += 1.0/(i*4.0 + 1.0);
    pi -= 1.0/(i*4.0 + 3.0);
  }
  pi = pi * 4.0;
  printf("pi done\n");

    product = e * pi;

  stop = clock();

  printf("Reached result %f in %.3f seconds\n", product, (stop-start)/1000);

}

void parallel_taylor( void )
{
  double start, stop; /* times of beginning and end of procedure */
  double e, pi, factorial, product;
  int i;

  printf("Parallel Taylor\n");

  /* start the timer */
  start = clock();

  /* Now there is no first and seccond, we calculate e and pi */
#pragma omp parallel sections //shared(e, pi)
  {
#pragma omp section
    {
      printf("e started\n");
      e = 1;
      factorial = 1; /* rather than recalculating the factorial from
			scratch each iteration we keep it in this varialbe
			and multiply it by i each iteration. */
      for (i = 1; i<num_steps; i++) {
	factorial *= i;
	e += 1.0/factorial;
      }
      printf("e done\n");
    } /* e section */

#pragma omp section
    {
      /* In this thread we calculate pi expansion */
      printf("pi started\n");

      pi = 0;
      for (i = 0; i < num_steps*10; i++) {
	/* we want 1/1 - 1/3 + 1/5 - 1/7 etc.
	   therefore we count by fours (0, 4, 8, 12...) and take
             1/(0+1) =  1/1
	   - 1/(0+3) = -1/3
             1/(4+1) =  1/5
	   - 1/(4+3) = -1/7 and so on */
	pi += 1.0/(i*4.0 + 1.0);
	pi -= 1.0/(i*4.0 + 3.0);
      }
      pi = pi * 4.0;
      printf("pi done\n");
    } /* pi section */

  } /* omp sections */
  /* at this point the threads should rejoin */

  product = e * pi;

  stop = clock();

  printf("Reached result %f in %.3f seconds\n", product, (stop-start)/1000);

}

void display_matrix( unsigned char mat[][8] )
{
    int i, j;
    for( i = 0 ; i < 8; i++ )
    {
        for( j = 0 ; j < 8; j++ )
            printf("%3d ", mat[i][j] );
        printf("\n");
    }
    printf("\n");

}

const int PI = 3.141592;

void matrix_multiplication( void )
{

    unsigned __int64 time_stamp_start, time_stamp_duration;

    __declspec(align(16)) unsigned char matA[8][8] = {  { 0, 1, 2, 3, 4, 5, 6, 7 },
                        { 0, 0, 10, 11, 12, 13, 14, 15 },
                        { 0, 0, 0, 3, 4, 5, 6, 7 },
                        { 0, 0, 0, 0, 12, 13, 14, 15 },
                        { 0, 0, 0, 0, 0, 5, 6, 7 },
                        { 0, 0, 0, 0, 0, 0, 14, 15 },
                        { 0, 0, 0, 0, 0, 0, 0, 7 },
                        { 0, 0, 0, 0, 0, 0, 0, 0 }};
    __declspec(align(16)) unsigned char matB[8][8] = {  { 0, 0, 0, 0, 0, 0, 0, 0 },
                        { 0, 0, 0, 0, 0, 0, 0, 7 },
                        { 0, 0, 0, 0, 0, 0, 14, 15 },
                        { 0, 0, 0, 0, 0, 5, 6, 7 },
                        { 0, 0, 0, 0, 12, 13, 14, 15 },
                        { 0, 0, 0, 3, 4, 5, 6, 7 },
                        { 0, 0, 10, 11, 12, 13, 14, 15 },
                        { 0, 1, 2, 3, 4, 5, 6, 7 }};

    __declspec(align(16)) unsigned char matC[8][8];

    memset( matC, 0, 64*sizeof(char) );

    int i, j, k;
    int iteration;
    int nThreads;
    unsigned char temp;
    clock_t start_t, duration_t;

    printf("Single\n");

    INIT_PERFORMANCE_MEASURE();

    START_PERFORMANCE_MEASURE();

    start_t = clock();

    for( iteration = 0; iteration < 90000; iteration++ )
    {
        for( i = 0; i < 8; i++ )
            for( j = 0; j < 8; j++ )
            {
                temp = 0;
                for( k = 0; k < 8; k++ )
                {
                    temp += matA[i][k]*matB[k][j];
                }
                matC[i][j] = temp;
            }
    }
    duration_t = clock() - start_t;

    STOP_PERFORMANCE_MEASURE();

    printf("Duration = %f (%f)\n", (double)duration_t/CLOCKS_PER_SEC,
        GET_PERFORMANCE_MEASURE() );

    display_matrix( matC );
    memset( matC, 0, 64*sizeof(unsigned char) );

    printf("Multithreaded 2\n");

    START_PERFORMANCE_MEASURE();
    start_t = clock();

    #pragma omp parallel default(none) private(i,j,k,temp) shared(nThreads, matA, matB, matC) num_threads(2)
    {
        #pragma omp master
        nThreads = omp_get_num_threads();

        #pragma omp for
        for( iteration = 0; iteration < 90000; iteration++ )
        {
            for( i = 0; i < 8; i++ )
                for( j = 0; j < 8; j++ )
                {
                    temp = 0;
                    for( k = 0; k < 8; k++ )
                    {
                        temp += matA[i][k]*matB[k][j];
                    }
                    matC[i][j] = temp;
                }
        }
   }
    duration_t = clock() - start_t;

    STOP_PERFORMANCE_MEASURE();

    printf("Duration = %f (%f)\n", (double)duration_t/CLOCKS_PER_SEC, GET_PERFORMANCE_MEASURE() );

    display_matrix( matC );
    printf("nThreads : %d\n", nThreads);
    printf("\n");

    /////////////////////////////////////////////////////////////////////////////
    printf("Using SIMD\n");

    __m128i a, b, c, zero, high_temp, low_temp, high_temp2, low_temp2, temp_128, temp2_128;

    start_t = clock();

    zero = _mm_setzero_si128();
    for( iteration = 0; iteration < 90000; iteration++ )
    {
        for( i = 0; i < 8; i++ )
        {
            // ith row
            a = _mm_set_epi16( matA[i][7], matA[i][6],
                              matA[i][5], matA[i][4],
                              matA[i][3], matA[i][2],
                              matA[i][1], matA[i][0] );

            //a = _mm_unpacklo_epi8( a, zero ); // Now they are in 16bits

            for( j = 0; j < 8; j++ )
            {
                b = _mm_set_epi16( matB[7][j], matB[6][j], matB[5][j], matB[4][j],
                               matB[3][j], matB[2][j], matB[1][j], matB[0][j]);

                //b = _mm_unpacklo_epi8( b, zero ); // Now they are in 16bits

                low_temp = _mm_mullo_epi16( a, b );
                high_temp = _mm_mulhi_epi16( a, b );

                low_temp2 = _mm_unpacklo_epi16( low_temp, high_temp ); // 32bits
                high_temp2 = _mm_unpackhi_epi16( low_temp, high_temp ); // 32bits

                temp_128 = _mm_add_epi32( low_temp2, high_temp2 );
                temp2_128 = _mm_shuffle_epi32( temp_128, 0x4E );

                temp_128 = _mm_add_epi32( temp_128, temp2_128 );
                temp2_128 = _mm_shuffle_epi32( temp_128, 0xB1 );

                temp_128 = _mm_add_epi32( temp_128, temp2_128 );

                matC[i][j] = temp_128.m128i_i8[0];

            }
        }
    }
    duration_t = clock() - start_t;
    printf("Duration = %f\n", (double)duration_t/CLOCKS_PER_SEC );

    display_matrix( matC );
    printf("\n");

    /////////////////////////////////////////////////////////////////////////////
    printf("Using SIMD with threads\n");
    start_t = clock();

    start_t = clock();

    zero = _mm_setzero_si128();
#pragma omp parallel default(none) private(i, j, a, b, low_temp, high_temp, low_temp2, high_temp2, temp_128, temp2_128) shared( nThreads, matA, matB, matC, zero )  num_threads(2)
{
#pragma omp master
        nThreads = omp_get_num_threads();

#pragma omp for
    for( iteration = 0; iteration < 90000; iteration++ )
    {
        for( i = 0; i < 8; i++ )
        {
            // ith row
            a = _mm_set_epi16( matA[i][7], matA[i][6],
                              matA[i][5], matA[i][4],
                              matA[i][3], matA[i][2],
                              matA[i][1], matA[i][0] );

            //a = _mm_unpacklo_epi8( a, zero ); // Now they are in 16bits

            for( j = 0; j < 8; j++ )
            {
                b = _mm_set_epi16( matB[7][j], matB[6][j], matB[5][j], matB[4][j],
                               matB[3][j], matB[2][j], matB[1][j], matB[0][j]);

                //b = _mm_unpacklo_epi8( b, zero ); // Now they are in 16bits

                low_temp = _mm_mullo_epi16( a, b );
                high_temp = _mm_mulhi_epi16( a, b );

                low_temp2 = _mm_unpacklo_epi16( low_temp, high_temp ); // 32bits
                high_temp2 = _mm_unpackhi_epi16( low_temp, high_temp ); // 32bits

                temp_128 = _mm_add_epi32( low_temp2, high_temp2 );
                temp2_128 = _mm_shuffle_epi32( temp_128, 0x4E );

                temp_128 = _mm_add_epi32( temp_128, temp2_128 );
                temp2_128 = _mm_shuffle_epi32( temp_128, 0xB1 );

                temp_128 = _mm_add_epi32( temp_128, temp2_128 );

                matC[i][j] = temp_128.m128i_i8[0];
            }
        }
    }
}
    duration_t = clock() - start_t;
    printf("Duration = %f\n", (double)duration_t/CLOCKS_PER_SEC );

    display_matrix( matC );
    printf("nThreads : %d\n", nThreads);
    printf("\n");

}

int _tmain(int argc, _TCHAR* argv[])
{
    //test(0);
    //test(3);

    //AnotherTest();
    //printf("\n\n");
    //AnotherTest2();

    //sequential_taylor();
    //parallel_taylor();

    matrix_multiplication();

	return 0;
}

 

By the way, you can enable the OpenMP in the Visual C++ by setting like this.

For the GCC, 4.2.x versions or above are required. For the Mac OS X, if you log in the ADC web site, you can download 4.2.3(?) version or above. It is still kind of beta.

If you want to know more about the OpenMP, visit :
http://openmp.org/wp/
http://en.wikipedia.org/wiki/OpenMP

GOMP is for C/C++ and Fortran 95 in the GNU Compiler Collection, aka, GCC.
http://gcc.gnu.org/projects/gomp/





Variable argument list bug in Visual C++ 2005 library

22 05 2008

Today, I found a bug in a Visual C++ 2005 standard library related to variable argument list. The problem is that va_arg() doesn’t return correct value.


#include "stdafx.h"
#include <stdarg.h>

void var_tester( char *aString, ... )
{
    int num_arg = 1;
    va_list argument_ptr;
    int aVal;

    va_start( argument_ptr, aString );

    while( (aVal = va_arg( argument_ptr, int )) != NULL )
    {
        num_arg++;
        printf("%d st arg = %X\n", num_arg, aVal );
    }

    va_end( argument_ptr );
}

int _tmain(int argc, _TCHAR* argv[])
{
    var_tester( "Hmm..", 1, 2, 3 );
    printf("\n");
    var_tester( "Hmm..", 1, 2, 3, 4 );

    printf("\n");
    var_tester( "Hmm..", 1, 3 );

    printf("\n");
    var_tester( "Hmm..", 1, 2, 3, 4, 5 );

	return 0;
}

If the code is debugged, it works correctly. But if it is launched without debugging, it doesn’t.

Here are the screenshot.
Correct!

And.. here is the wrong one.
Wrong!

Update : han9kin left a comment which said that this was not a bug. Unix man page explains about it more well. However, I would like to put MSDN explanation here.

“va_arg retrieves a value of type from the location given by arg_ptr and increments arg_ptr to point to the next argument in the list, using the size of type to determine where the next argument starts. va_arg can be used any number of times within the function to retrieve arguments from the list.”

In a code sample following it, they passes -1 as the last parameter, and they check if -1 is retrieved ,and if so they exist the va_arg() loop.

And this hot fix, FIX: The va_arg function returns an incorrect value in a Visual C++ 2005 application , doesn’t explain what it fixes specifically.
Can anyone tell me what “the va_arg function returns an incorrect value” means?

Anyway, in the 1st sample at this site, checks it against NULL. And a sample in this site , uses a number of parameters as its 1st parameter.

Additionally, this site explains interesting topic about variable parameters.





How does google do its magic?

21 05 2008

I just tried searching my latest post using the google. Wow! It searched my page instantly! Although I know that Google is very fast, I didn’t think it was not enough time to index my post.

The most amazing Google features are :

  1. Fast Search : It is really fast!
  2. Accuracy of search : When the google was first announced, I was so surprised because it felt like that there were a lot of people inside of google’s computers and picked what page were I really searched.
Well… I found two web pages which seem to explain how google works.




Back to the basic : Pointer to Array, Array of Pointers…

21 05 2008

   To the beginner of C/C++, one of the most confusing concept is the pointer. Although the concept of the pointer is quite understandable, the notation for pointers and arrays seems to confuse people. When I was a freshman, other students kept mumbling, “Pointer to Pointer”, “Pointer to Array”, “Array of Pointer”, “Array Pointer” or “Pointer of Array”. Because it is in English, it made me quite confusing also. Pointer to Pointer is the easiest to understand. But others were very confusing. It is because Korean is different from English. If I set up concept with terms, “Pointer to Array” and “Array of Pointers”, some of my friends* approached to me and asked what “Array Pointer” is or what “Pointer of Array” is. Semantically, “Array Pointer” means a pointer which is made for array, and “Pointer of Array” means “Pointer to Array”. So, the threes, “Array Pointer”, “Pointer to Array” and “Pointer of Array”, were the most confusing. In Korean, “to…” can mean same to “of…”.

   When I came to the U.S., students from other countries didn’t seem to be confused by those. Probably some terms like “Pointer of Array” sound awkwardly. I don’t know if it is also true to native English speakers.

   When I got interviewed from U.S. companies, I was quite impressed that they asked about every detail of C/C++. The questions was quite creative. And it gave me impression that they know how to manage project and programmers. The way they interview is totally different from the way you do in Korea. ( Nowadays, Google imported their interview style to Korea, and I have heard that Haan soft interviews like americans. ) And they asked me how to make 2D array dynamically. I answered my best solution. ( I did lots of experiment with dynamic array when I made a 3D graphics engine before, and found a way which was very flexible and very powerful. This code will be introduced later. ) However, they said that my answer was wrong. And they showed me how to make a “Pointer to Array” and using it, they built 2D arrays. So, I figured out that their intervew question was not creative. I found one book which contained questions they asked me. Hmm.. It turned out that it was not creative questions.
So, I told them why using pointer to array was not flexible, and why my approach was better. They nodded affirmatively. However, if I didn’t answer things, which I actually know, but forgot because I was out of those problems for a while, they thought me that I didn’t know, although I just needed to remind me of those.. I’m quite experienced programmer, but my English capability limited any quick answer or chatting. So, it gave good impression that I was not compelling.

   Anyway, let’s go back to the root. It’s time to refresh things again!
In this post, I will show how to make “Pointer to Array”, and “Array of Pointers”, and also will show shortcomings of array notation and pointer notation to handle multi-dimensional arrays.

   Let’s start with array notation and pointer notation.


void outputUsingArray( int array[][4], int n_rows, int n_cols )
{
	int i, j;

	printf("Output Using array\n");
	for( i = 0; i < n_rows; i++ )
	{
		for( j = 0; j < n_cols; j++ )
		{
			// Either can be used.
			//printf("%2d ", array[i][j] );
			printf("%2d ", *( *(array + i) + j ) );
		}
		printf("\n");
	}
	printf("\n");
}

void outputUsingPointer( int (*array)[4], int n_rows, int n_cols )
{
	int i, j;

	printf("Output Using Pointer to Array i.e. int (*array)[4]\n");
	for( i = 0; i < n_rows; i++ )
	{
		for( j = 0; j < n_cols; j++ )
		{
			printf("%2d ", *(*(array+i) + j ) );
		}
		printf("\n");
	}
	printf("\n");
}

How it is used is :


int _tmain(int argc, _TCHAR* argv[])
{
	int array[4][4] = { { 0, 1, 2, 3 },
						{ 4, 5, 6, 7 },
						{ 8, 9, 10, 11 },
						{ 12, 13, 14, 15 } };

	outputUsingPointer( (int (*)[4])array, 4, 4 );

	outputUsingArray( array, 4, 4 );

   What is a problem with array notation is that dimension of array is static and functions like outputUsingArray() can be used only for the specific dimension.
However, accessing array using array notation is convenient.
Pointer to array notation can be used like outputUsingPointer(); However, this also has the same problem of the outputUsingArray() case.

   If the pointer to pointer is passed, and if the function is designed so, to access row, n_cols*i, term should be used.
It is flexible, because one function can access any dimension of arrays.
However, you can’t use array notation.

   By the way, before presenting the most flexible solution to it, let’s think about making a dynamic array using array of pointers.


	printf("Using array of pointers -- Half Dynamic\n");
	printf("------------------------\n");
	int *array3[4];
	int i;

	for( i = 0; i < 4; i++ )
		*(array3+i) = (int *)malloc( 4*sizeof( int ) );

	array3[0][0] = 0; array3[0][1] = 2; array3[0][2] = 3; array3[0][3] = 4;
	array3[1][0] = 0, array3[1][1] = 2, array3[1][2] = 3, array3[1][3] = 4;
	array3[2][0] = 0, array3[2][1] = 2, array3[2][2] = 3, array3[2][3] = 4;
	array3[3][0] = 0, array3[3][1] = 2, array3[3][2] = 3, array3[3][3] = 4;

	outputUsingPointer3( array3, 4, 4 );
	outputUsingArray3( array3, 4, 4 );

Then, the outputUsingPointer3() and outputUsingArray3() are :


void outputUsingPointer3( int **array, int n_rows, int n_cols )
{
	int i, j;

	printf("Output Using Pointer to Pointer i.e.\n");
	for( i = 0; i < n_rows; i++ )
	{
		for( j = 0; j < n_cols; j++ )
		{
			printf("%2d ", *(*(array+i) + j ) );
		}
		printf("\n");
	}
	printf("\n");
}
void outputUsingArray3( int **array, int n_rows, int n_cols )
{
	int i, j;

	printf("Output Using Array i.e. int array[][]\n");
	for( i = 0; i < n_rows; i++ )
	{
		for( j = 0; j < n_cols; j++ )
		{
			printf("%2d ", array[i][j]);
		}
		printf("\n");
	}
	printf("\n");
}

   As you can see, the array can be passed as a pointer to pointer, and you can access it using array notation and pointer notation easily. It is quite flexbile.
However, the number of colums are static when such an array is declared. So, I called it “Half-flexible”.

   By the way, can’t you pass the static array declared as int array[4][4] to the outputUsingPointer3() by casting it to (int (*)[4])? No. It is not possible. The Visual C++ 2005 compiler doesn’t allow it.
Also it would be handy if the array can be passed as a pointer to pointer but casted to (int (*)[n_cols] ) inside of the function. Then, the function can be used for any dimension of arrays. However, it is not possible also. Even with a cast statement, the compiler doesn’t allow putting a variable as its dimension. Probably, GCC allows it. Because GCC has its own extension which allows variables for setting dimension of arrays. But it is only for the GCC.

casting doesn\'t work

Now, it’s time to introduce the most flexible dynamic array.


	printf("Using Pointer to Pointer -- Fully Dynamic\n");
	printf("------------------------\n");
	int **array2;

	array2 = (int **)malloc( 4* sizeof( int * ) );

	// each row is a 1D array
	for( i = 0; i < 4; i++ )
		*(array2+i) = (int *)malloc( 4*sizeof( int ) );

	array2[0][0] = 0; array2[0][1] = 2; array2[0][2] = 3; array2[0][3] = 4;
	array2[1][0] = 0, array2[1][1] = 2, array2[1][2] = 3, array2[1][3] = 4;
	array2[2][0] = 0, array2[2][1] = 2, array2[2][2] = 3, array2[2][3] = 4;
	array2[3][0] = 0, array2[3][1] = 2, array2[3][2] = 3, array2[3][3] = 4;

	outputUsingPointer3( array2, 4, 4 );
	outputUsingArray3( array2, 4, 4 );

   By using pointer to pointer, instead of pointer to array, to declare multidimensional array, you can achieve the convenience of array notation and the power of pointer notation as well as flexibility. Here, the outputUsingPointer3() and outputUsingArray3() are same as the example above. In short, you can use any notation in any case for your convenience.

   Using pointer to pointer is semantically easy to understand and the most powerful.
This is the way I explained in one of previous interview. In most of books, only using poiner to array is explained, and I found out that many people had difficulty in understanding “Pointer to Array” and “Array of Pointer.”

   If you have difficulty in memorizing what notation is for pointer to array and array of pointer, try this :

int (*array)[4] : (, ) operators have higher priority because it is on the left of the [,]. So, it is a POINTER to [4].
Again.. It is a POINTER. So, it means it is a pointer to array.

int *array[4] : [,] have heigher priority than *, so, it is ARRAY of pointers.
It is not pointer, it is ARRAY.





Dynamic typing and returned value in Objective-C

20 05 2008

I tried writing these code for something.

GInt.h is :


#import "Generic.h"

@interface GInt : Generic {

	int value;
}

- (void)add:(id)valObj;
- (int)value;
- (void)setData:(int)aVal;
- (NSString *)description;

@end

And GFloat.h is :


#import "Generic.h"

@interface GFloat : Generic {

	float value;
}

- (void)add:(id)valObj;
- (float)value;
- (void)setData:(float)aVal;
- (NSString *)description;
@end

Implementation for the GFloat is :


#import "GFloat.h"

@implementation GFloat
- (void)add:(id)valObj
{
	float temp_value;

        // This yields error
	temp_value = [valObj value];

	value += temp_value;
}

- (float)value
{
	return value;
}

- (void)setData:(float)aVal
{
	value = aVal;
}

in the add: message, [valObj value] returns something strange.
If value message is sent to the valObj, the Objective-C runtime successfully send the message to GFloat, and returns float value. However, in the caller’s source line, i.e. add: message, wrong value is returned.
Isn’t it strange? I confirmed that the message itself returns a correct float value. But when the program counter is back at the caller, wrong value is there!

Then, how to solve it?


	temp_value = (float)[valObj value];

Hmm.. it does not solve the problem.
Actually this solves it.


	temp_value = [(GFloat *)valObj value];

By the way, Apple’s Objective-C manual explains this way :

Return and Argument Types
In general, methods in different classes that have the same selector (the same
name) must also share the same return and argument types. This constraint is
imposed by the compiler to allow dynamic binding. Because the class of a
message receiver (and therefore class-specific details about the method it’s
asked to perform), can’t be known at compile time, the compiler must treat all
methods with the same name alike. When it prepares information on method return
and argument types for the runtime system, it creates just one method
description for each method selector.

So, the original code was wrong, because the return types of value message of GInt and GFloat are different.
However, GCC 3.4.5 of MingW doesn’t raise any error, while GCC 4.0.1 from Apple raises an error.

Let’s summarize it.

  • With dynamic typing, correct message is called.
  • With dynamic typing, an object returns wrong value. So, do not depend on it.

Be careful when accessing returned value of dynamically typed object.
I reported this issue to the GCC bug report page with bug number 26283





Difference in Concurrency Model in MacOS X and MS Windows (4)

12 05 2008

요새 지구촌 공생회일과 관련해서 돕고 있는 일이 있어서, 한 1주일간 포스팅을 제대로 못했다. 비록 이 블로그는 나의 생각을 정리하는 것이 주 목표이고, 언제든지 글을 쓸때의 마음으로 돌아가서 다시 리뷰해보고, 재빠르게 해당 주제로 내 생각을 context-switching하기 위한 것이어서, 내 마음 내키는 때에 글을 올리면 되겠지만, 이제 각종 검색 엔진에 이 블로그를 공개했고, 찾아오는 이들도 내 생각보다는 꽤 되어서, 알게 모르게 어떤 “의무감”을 느끼게 되다보니, 하나의 “숙제”가 되어가는 느낌이다. 앞으로 OpenMP나 aligned malloc에 대한 구현에 대한 글도 쓰려고 하는데, (한글로는 아니고..) 우선 이 synchronization에 대한 글을 마무리져야겠다는 느낌이다.

I have not posted for last week, due to my involvement to the Good Hands for Globe”. Although the purpose of this blog is to organize my thought, and to return the state of my mind when I post an article anytime, or in other words to contex-switch quickly to subjects in posts, it became like “homework” after I made this blog public to a few search engines. Indeed, not a few people visit this blog.
By the way, I would like to finish this “synchronization” post for new ones like OpenMP and implementation of aligned malloc.

  • pthread (POSIX thread )

Mac OS X에서의 synchronization을 이야기 하기 전에, 우선 pthread에 대한 간략한 언급이 먼저 있어야겠다. 이 pthread는 Unix 환경에서의 사실상 표준인 것이기 때문이다. 또한 BSD를 그 모태로 하는 Mac OS X는 그 기저에서 이런 pthread를 사용한다. Objective-C와 Cocoa는 그런 모태를 제쳐두고는 이야기할 수가 없다.

Before talking about the synchronization for Mac OS X, pthread should be mentioned at least. The pthread is the de-facto standard of threading on Unix environment. Moreover, the Mac OS X uses this pthread in its lower base, because it is based on the BSD. Therefore withouth the pthread, there can’t be the synchronization model of Objective-C and Cocoa.

pthread에 대한 좋은 tutorial은 다음의 두 웹 사이트에 잘 나와 있다.
There are good tutorials on the pthread as follows:

- POSIX Threads Programming

- Linux Tutorial : POSIX Threads Libraries

위 사이트에서 나온 예를 잠깐 보자.


#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *functionC();
pthread_mutex_t mutex1 = PTHREAD_MUTEX_INITIALIZER;
int  counter = 0;

main()
{
   int rc1, rc2;
   pthread_t thread1, thread2;

   /* Create independent threads each of which will execute functionC */

   if( (rc1=pthread_create( &thread1, NULL, &functionC, NULL)) )
   {
      printf("Thread creation failed: %d\n", rc1);
   }

   if( (rc2=pthread_create( &thread2, NULL, &functionC, NULL)) )
   {
      printf("Thread creation failed: %d\n", rc2);
   }

   /* Wait till threads are complete before main continues. Unless we  */
   /* wait we run the risk of executing an exit which will terminate   */
   /* the process and all threads before the threads have completed.   */

   pthread_join( thread1, NULL);
   pthread_join( thread2, NULL); 

   exit(0);
}

void *functionC()
{
   pthread_mutex_lock( &mutex1 );
   counter++;
   printf("Counter value: %d\n",counter);
   pthread_mutex_unlock( &mutex1 );
}

functionC()를 보면 mutex를 어떻게 thread화 된 함수에서 사용하는지가 간략하게 나와 있다. pthread는 모델이 무척 간결하여, 이 예만으로도 거의 대부분을 파악할 수가 있다. 아무튼 이 함수를 보면 mutex1이라는 mutex 변수에 대해서 lock을 걸고, 뭔가를 한 후에 lock을 푸는 것을 볼 수있다. Windows의 경우와 같이 WaitForSingleObject()와 같은 별도의 함수를 사용하지 않는다. 앞의 글에서 언급되었듯이, 차라리 MFC의 모델과 닮지 않았는가? 아마도 MFC의 그것은 이런 산업 표준의 thread를 사용하는 개발자들이 쉽게 접근할 수있도록 디자인된 것 같다.

functionC() shows how to use the mutex variables in a threaded function. The model of pthread is so simple that it is possible to understand how the pthrea functions and data are organized and how to use them. Anyway, it locks the mutex variable, mutex1, and release it using unlock(). It doesn’t use a function like, WaitForSingleObject() which is for the Win32. This is very similar to the model for the MFC. Probably the MFC was designed such that programmers familiar with the industry standard could learn the threading using MFC easier.

그럼 critical section은 어떨까? pthread에선 위의 두 웹 페이지에 나와 있듯이 따로 “CRITICAL_SECTION”과 같이 준비된 타입의 변수형은 없다. mutex와 같은 synchronization variable에 의해 regulate되는 블럭을 원래 critical section이라고 부른다. 사실 나로써는 왜 MS가 따로 CRITICAL_SECTION을 만들었는지 개념적으로 잘 이해가 가지 않는다. 물론 MSDN 설명서에 나와 있듯이, single machine, single process 내에서 속도가 더 빠르다고 하니, 아마 그런 optimization을 하기 위해서 특별히 만들어낸게 아닐까 한다.

Then, what about the critical section? If you search the two web site, you will not be able to find any special data type like “CRITICAL_SECTION”. The original concept of critical section is a code block surbodinated and protected by lock() and unlock() functions on a synchronization variable. So, you don’t need such a special data type. Conceptually it is somewhat difficult to understand why MS made such a special type. As its MSDN explains, the CRITICAL_SECTION is faster on a single machine, a single process. So, to optimize more in such case, I guess MS made it.

그런데 이런 pthread가 WIN32처럼 기본적으로 쓰이는 API를 위한 framework인 MFC와 같은 수준의 추상화를 해 준다는 점이 재미있다.

It is interesting that the pthread has the same level of abstraction to the MFC which is a framework for the Win32 API.

  • Synchronization in Objective-C

Objective-C는 언어 자체에서 synchronizatin을 지원한다. 즉 @synchronized()란 것을 지원하는데, 이것을 GCC 3.3과 그 이후의 버젼에서 지원받으려면 -fobjc-exceptions란 gcc 패러미터를 넣어 주어야 한다.

The support to synchronization is built in the Objective-C. To use the feature, a parameter -fobjc-exceptions should be passed to the gcc. Then, it can use @synchonized().

자 사용예를 한번 보자.

Let’s take a look at some examples.


- (void)criticalMethod
{
    @synchronized(self) {

        // Critical code.
        ...

    }
}

혹은 현재의 selector를 mutex로 사용할 수도 있다.

Or, the current selector, i.e. method, can be used as a mutex.


- (void)criticalMethod
{
    @synchronized(NSStringFromSelector(_cmd)) {

        // Critical code.
        ...

    }
}

굉장히 간결하지 않은가? 이것은 WIN32의 CRITICAL_SECTION보다 더 간결하다.
Isn’t it very simple? This is even simpler than the CRITICAL_SECTION of the Win32.

  • Synchronization in Cocoa

Lock을 사용하는 기본 예제를 보자.
Let’s take a look at a basic example.


BOOL moreToDo = YES;

NSLock *theLock = [[NSLock alloc] init];
...

while (moreToDo) {
    /* Do another increment of calculation */

    /* until there’s no more to do. */

    if ([theLock tryLock]) {

        /* Update display used by all threads. */

        [theLock unlock];
    }
}

tryLock을 사용한 부분은 그냥 lock을 사용한 것과 개념적으론 같다. 아무튼 이 예제에서 알 수있는 바와 같이, pthread의 그것을 그대로 닯았다. Cocoa는 그 기저에 BSD function들이 있다. 그러므로 그 함수들이 제시하는 프로그래밍 모델을 자연스럽게 따를 수밖에 없는 것이다.

The tryLock is conceptually analoguous to lock() of the pThread. Anyway, it resembles that of pthread. Cocoa is a kind of wrapper to its base architecture. So, it naturally follows the model of BSD functions and pthread.

여기에 추가적으로 Cocoa는 다양한 Lock을 제공한다. 이 Lock들은 근본적으로 다른 Lock이 아니라, 기본 lock을 어떻게 사용하는가에 따라 카테고리를 주어서 만든것이다. 이런 Lock의 종류로는 Mutex, Recursive Lock, Read-Write Lock, Distributed Lock, Spin Lock 이 있고, NSCondition 클래스를 이용한 condition lock이 있다.
엄밀하게 말해서 mutex 자체는 lock이 아니다. lock 관련 클래스들이 이런 mutex와 같은 semaphore value에 대해서 lock을 거는 것이다. mutex를 여기에 놓아서 문서를 읽는 이로 하여금 헷갈리게 만들었다.

In addition tho that, the Cocoa provide variaous lock types. They are basically same, but categorized by its use. They are mutex, Recursive Lock, Read-Write Lock, Distributed Lock and Spin Lock. Also there is a condition lock using NSCondition class.
The mutex is not lock. The various lock class uses the mutex for locking. By enlisting the mutex here, Apple confused people.

사용예는 다음의 문서를 참조하기 바란다.
About how to use them, please read this documentation.

http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/chapter_5_section_7.html#//apple_ref/doc/uid/10000057i-CH8-SW17

pthread도 condition lock이 있고, Cocoa가 제공하는 다양한 lock의 형태는 pthread의 기본 lock을 이용해서 구현할 수가 있다. 그러므로 전반적으로 봤을때, pthread의 synchronization model과 같음을 알 수있다.

pthread has condition lock and all kinds of lock in Cocoa can be implemented using the basic lock of the pthread. So, the synchronization model of the Mac is same to that of the pthread.

실제로, 굳이 pthread를 사용해도 된다.
Actually you can replace them with those from pthread.

여기서 재미난 점은 Spin Lock이다. 제일 흔한 lock인데, 이건 해당 mutex에 대한 lock을 얻을 수있는지 아닌지 계속 polling을 하면서 block되는 것이다. 이건 WaitForSingleObject()에서 그 첫째 인자로 INFINITE를 전달한 것과 같다. NSLock같은 경우엔 tryLock을 호출함으로써, lock을 얻을 수없으면 바로 해당 block을 pass할 수있는 유연성을 준다. WaitForSingleObject()의 경우엔 그 두번째 인자에 0을 전달하면 같은 효과를 볼 수있다.

What is interesting is Spin Lock. It is the most common lock, and keeps polling to acquire a lock. Thus it is blocked. This is similar to the WaitForSingleObject() with INFINITE as its 1st parameter.
NSLock provides flexibility by calling tryLock. It is not blocked. For the case of WaitForSingleObject(), you can pass 0 to its 2nd parameter.

여기서 알수있듯이 Cocoa/Unix의 경우는 mutex (혹은 semaphore) 와 lock의 두 개념만으로 synchronization을 수행한다. 반면에 Win32는 WaitForSingleObject()같은 별도의 함수를 이용한다. 사실 그 함수가 lock()과 마찬가지인데, 왜 굳이 보통 쓰이는 lock()을 안쓰고 새롭게 이름을 지었는지 모르겠다. 물론 MFC에서 해결을 해주곤 있지만..

For now, you can see that only two concepts, mutex (or semaphore ) and lock are necessary for the Cocoa and Unix, while there is a separate function, WaitForSingleObject() for the Win32. Actually it is analogous to the lock(). I don’t understand why MS invented a new name for that, although MFC solves the issue.

(C++/C#/CLI의 경우는 Cococa와 비슷하다는 점을 언급해두고 싶다.)
(I would like to mention C++/C#/CLI case is similar to the Cocoa.)

이상으로 각 플랫폼에서의 synchronization 모델에 대해서 살펴보았다. 이렇게 놓고 보면 다 매우 흡사해 보인다. 하지만 처음 MSDN 문서와 Apple 그리고 pthread 문서를 놓고 비교해 가면서 각 플랫폼에서 코드를 만드는 사람들은 헷갈릴 수가 있다. 이 문서가 그런 분들에게 도움이 되었으면 좋겠다.

So far, we figured out what the synchronization models on various platform look like. At this point, they look all the same, or similar. However, people who are fist to the platforms can be confused. So, I hope thes posts could help them.





Objective-C without Cocoa

2 05 2008

There are people who think that writing codes without Cocoa is impossible, or at least it is meaningless. Although many articles on Objective-C mentions that it is a superset of the C, many people really don’t understand it.

Some even ask, “Is it possible to use functions from libraries written in C with Objective-C code?”.

Not so few people are curious about why Objective-C didn’t become popular, and some answer that it is because of NeXT’s falling down. Their reasoning is based on the NeXT’s failure in making their system popular. Is it really so?

I don’t think, so. Just like standard C libary or C++ library, anyone who are interested in making Objective-C standard library could build their own.

However, it didn’t take place. You can use standard C library in Objective-C code. So, probably the original writers of the Objective-C thought it would be OK not to have “standard” library for the Objective-C.

Anyway, truth is that you can write codes with the Objective-C and standard C library. You can write your own Objective-C class which is analogous to the NSObject. Or you can inherit from the Object. (Actually, the Object was the root class before when NSObject became so. )

The gcc provides some header files to help coding in Objective-C :

  • encoding.h
  • hash.h
  • NXConstStr.h
  • objc.h
  • objc-api.h
  • objc-list.h
  • Object.h
  • Protocol.h
  • sarray.h
  • thr.h
  • typedstream.h

Among them, application programmers will be interested in Object.h, Protocol.h, thr.h, hash.h, and NXConstStr.h. The Object.h contains a class, Object, like the NSObject. Thus, it has messages like, init, initialize, free, alloc, and so on. thr.h is for threading.

So, with those header files, you can write codes in Objective-C.

The Objective-C is really a little addition to the C, while it has the genuine power of the Objected Oriented Programming.

Although it doesn’t support operator overloading, meta programming, it supports dynamism, message forwarding, remote messaging, flexible expansion of class.

However, as for reusability, I personally think that Objective-C is better than the C++.

Now, here is a simple Objective-C code without Cocoa or Foundation.

To compile it, issue :

    gcc -g -O -c MyClass.m main.m

    gcc -fgnu-runtime -fobjc-exceptions MyClass.o main.o -lobjc

File : MyClass.h


#import <objc/Object.h>
#import <objc/objc-api.h>
#import <objc/NXConstStr.h>
#import <objc/thr.h>

int gWait;

@interface MyClass : Object
{
    objc_mutex_t pMutex;

    NXConstantString *description;
    int value;
}

- (void) setDescription:(NXConstantString *)theText;
- (void) description;
- (void) doSomethingAtomically;
- (void) doSomethingAtomically1;
- free;
@end

File : MyClass.m


#import "MyClass.h"
#import <stdio.h>
#include <windows.h>

extern int gWait;

@implementation MyClass

- (void)setDescription:(NXConstantString *)theText
{
    description = [theText copy];
    value = 0;
}

- (void)description
{
    printf("%s", [description cString]);
}

- (void) doSomethingAtomically
{
    int i;

    printf("in Thread function\n" );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("1st : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("2nd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("3rd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    gWait++;

}

- (void) doSomethingAtomically1
{
    int i;

    printf("\t in Thread function 2\n" );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-1st : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-2nd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    Sleep( 3 );

    objc_mutex_lock( pMutex );

        for( i = 0; i < 5; i++ )
        {
            value++;
            printf("\t 2-3rd : Thread %d's value = %d\n", objc_thread_id(), value );
        }
    printf("\n");
    objc_mutex_unlock( pMutex );

    gWait++;
}

- (id)init
{
    if( [super init] != nil )
    {
        pMutex = objc_mutex_allocate();
        description = nil;
    }

    gWait = 0;

    return self;
}

- free
{
    printf("being deallocated..\n");

    objc_mutex_deallocate( pMutex );

    if( description )
    {
        [description free];
    }
}
@end

File : main.m


#import <stdio.h>
#import "MyClass.h"
#include <windows.h>

void isMultithreaded(void)
{
    printf("The program is multithreaded\n");
}

int main( void )
{
    MyClass *myObject = [[MyClass alloc] init];

    [myObject setDescription:@"Hello, World\n"];
    [myObject description];

    objc_set_thread_callback( isMultithreaded);

    objc_thread_detach( @selector(doSomethingAtomically), myObject, nil);
    objc_thread_detach( @selector(doSomethingAtomically1), myObject, nil);
    objc_thread_detach( @selector(doSomethingAtomically), myObject, nil);

    objc_thread_yield();

    while( gWait < 2)
    {
        //printf("main : Shall I sleep?\n");
        Sleep(5);
    }

    [myObject free];

   return 0;
}

NOTE : Above codes were written in MingW environment.