Implementation of aligned memory allocation

Thanks to evolution in CPU architecture, you have super computers in your house. However, to get benefit from it, it requires some techniques. There are something new, while there are something very traditional.

One of such traditional approach is to use aligned memory allocation. Because CPU access memory most efficiently when accessing in certain unit, it is better if the allocated memory lies in the certain unit boundary. For example, let’s assume a CPU architecture is most efficient when memory is aligned every 4 bytes. When it accesses memory located at multiples of 4 in its address space, it is much faster than when the accessed memory is at 0x03 or 0x05.

The Unix malloc functions usually return aligned memory space, while Windows version doesn’t. Instead, the Windows provide _aligned_malloc().

Then, how to create a aligned malloc() function? There can be some special cases that you want to implement your own aligned malloc, although I don’t imagine such a case. Let’s figure out how to by looking at one of existing implementations. ( You can search one using the Google, and you will find out that they are similar.)


// size : the size of allocated memory
//        The actual size of allocation will be greater than this size.
// alignment : the alignment boundary
void *aligned_memory_alloc( size_t size, size_t alignment )
{
	void *pa, *ptr;

	//pa=malloc(((size+alignment-1)&~(alignment-1))+sizeof(void *)+alignment-1);
 	
	// 1
    pa=malloc((size+alignment-1)+sizeof(void *));
	if(!pa)
		return NULL;

	// 2
	ptr=(void*)( ((ULONG_PTR)pa+sizeof(void *)+alignment-1)&~(alignment-1) );

	// 3
	*((void **)ptr-1)=pa;
    
    printf("CAlignedAlloc::new(%d,%d)=%x\n", (ULONG)size, alignment, (ULONG)ptr);

	return ptr;
}

Point is to allocate more space than required and make it point to some position in the allocated memory. The pointed position is aligned location.

At 1, the total space to allocate is :

size ; the space a user want to allocate
+ (alignment-1) ; additional space due to the alignment
+ sizeof (void *) ; The head location of the newly allocated memory
; contains an address where the aligned memory block starts

The size part is obvious. The sizeof (void *) part follows the design of aligned memory allocation. Without this part, it will not know from where to free the aligned memory space, and from where to access the memory to read/write from/to the space.

For the (alignment-1), please take a look at this picture.

The red arrows show what the destination address should be if the allocated memory is 1, 2, 3 or 5, 6, 7. They are relocated to 4 and 8, respectively. So, it can be shifted up to 3 slots to the right. So, it is (alignment-1)

At 2, the ptr points to the location of aligned place. After reserving the space for storing where the whole allocated memory block is, i.e. pa, it calculates the aligned location. If you look at the picture above, you will see why (alignment-1) is added and the address is “AND”ed with 1’s complement of the (alignment-1). For example, 4-bytes alignment means masking out the last 2bits. It is like removing the last 2 bits.

At 3, “address length” bytes before, it saves the address which points where the whole allocated memory starts. Why it uses (void **) instead of (void *) is because the pointer (ptr-1) is points to an address, which is a pointer to pointer.

And finally it returns the aligned memory location, ptr.

How about the free() function? You can’t free the address to the aligned position. The whole memory space should be freed. That is why the starting address of the whole memory space was saved above.

void aligned_free( void *ptr )
{
    printf("CAlignedAlloc::free(%x)\n", (ULONG)ptr);
	if(ptr)
		free(*((void **)ptr-1));
}

At Just 1 address-width, i.e. 4bytes for 32bit CPU and 8bytes for 64bit CPU, before the location of the aligned space, it contains the starting address of the whole block.
It is better to use (void **) or (void *) to calculate how much space is required to save a pointer, because it works for 64bit architecture as well as 32bit architecture. Actually it works for any architecture.

About these ads

24 responses to this post.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 47 other followers

%d bloggers like this: