Weirdness of the High Resolution Counter, i.e. QueryPerformanceCounter()

For the most of time, using clock() for measuring performance for a block can be enough.
However, there are some cases where you want to compare two logically identical but differently implemented blocks.
Let’s assume that you want to compare performance of intrinsic version of strcpy and your own implementation of strcpy block written in SIMD instructions.
In most case, the clock()-based functions, like clock() and GetTickCount(), will not reveal the difference between them.

So, you decided to use high performance, or high resolution timer. The Windows supports these two functions for that purpose.

1. QueryPerformanceCounter( LARGE_INTEGER *pVal )
This function is like the clock(). the value returned in a location pointed by pVal is the number of counts, just like that the clock() returns number of ticks.

2. QueryPerformanceFrequency( LARGE_INTEGER *pVal )
This returns how many times it ocillates per a second.

So, the duration of time can be obtained by

    LARGE_INTEGER aVal, aFreq;
    __int64 durataion_in_time;
    
    QueryPerformanceCounter( &aVal );
    QueryPerformanceFrequency( &aFreq );
    duration_in_time = aVal.QuadPart / aFreq.QuadPart;

However this has some glitches with contemporary CPUs.

Before mentionting the glitch, let’s take a look at how the LARGE_INTEGER is declared.

typedef union _LARGE_INTEGER {
    struct {
        DWORD LowPart;
        LONG HighPart;
    };
    struct {
        DWORD LowPart;
        LONG HighPart;
    } u;
    LONGLONG QuadPart;
} LARGE_INTEGER;

The LONGLONG is __int64 type. So, if your compiler and CPU supports 64bit data type, you can access the content of the LARGE_INTEGER with the QuadPart.

The 1st glitch is that the returned value easily exceeds the boundary of the 64bits for the QuadPart, because current CPUs are so fast.
(If you search on the Google, you will find some web pages on which people explain that it exceeds the 32bit boundary.
And they recommend to use 64bit data type. Well, actually it even exceeds the 64bit boundary. )
So, probably you can use unsigned __int64 instead.

The 2nd glitch is that you can’t print them out properly when you use %I64d for aVal.QuadPart/aFreq.QuadPart.
Even %Lf doesn’t solve the problem. They are all for 64bit integer and real numbers. Then how to display them properly?

printf("%f", (double)aVal.QuadPart/(double)aFreq.QuadPart);

double is also 64bit real number type, and it works.

The 3rd glitch is the real glitch.
Let’s take a look at this screenshot from real invocation of the code.

Hmm… Why the high performance counter is not reliable?
By searching on the Google, I found a clue that it was due to the speed-step or similar technology which changes the CPU speed on demand.
Because it has very high resolution, it has the glitch.
I read somewhere in Intel’s forum that Intel or MS was working on making the call to measure on the FSB side instead of inner core of the CPU.
By doing so, it is said that the function would return more reliable value even when battery-saving technology in a CPU is used.

I assume that the GetTickCount() Win32 API function is also based on the clock(). However, it displays somewhat expected result seemilgy reliably.
The clock()/CLOCKS_PER_SEC displays 2 and 1.9… from time to time.

Probably the GetTickCount() has the lowest resolution.
However, one convenient side of using the GetTickCount() is that it returns a value in millisecond, if you want “time” instead of number of ticks.
So, you don’t need to divide it by some constant like CLOCKS_PER_SEC. Then it should be renamed to GetTickTime().
Well.. the function name again misleads, but it is the brain-child of the MS.

Finally, here is a screenshot when all of them return good results. :)

One response to this post.

  1. […] How to solve weirdness of the high resolution counter 26 04 2008 In a previous post, some issues on QueryPerformanceCounter() was discussed. […]

    Reply

Leave a comment