Memory alignment… run, you fools!
Memory alignment is probably a quite obscure matter for most C/C++ programmers. It was obscure for me too, that is, until I began writing Obi a couple years ago.
In this post I will explain what is memory alignment, why it matters for some, and how can you deal with it when using Eigen (which is the math library used by Obi).
What is it?
When you allocate memory -either explicitly on the heap, or in the stack- the address where your reserved memory region resides is just a number. This number identifies the first byte of the region, as bytes are the minimum addressable memory unit.
Our address will be a multiple of 2, 4, 8, 16, or higher powers of 2. This alignment is dictated by the processor architecture on a type-by-type basis, so different data types can have a different default alignments. Your compiler will try to keep everything nicely aligned by automatically adding padding bytes at the end or in the middle of your structures.
All of this is for performance reasons: the memory is divided into “blocks” of several bytes. When you ask for something stored in a certain memory address, the whole block containing this address is retrieved from memory. (Caches and memory locality are also related to this, but we won’t engage this beast now).
Now, imagine an object that could fit in a single 4-byte block, but it has been placed in a way that the first half of it is in a block, and the second half is in the next block (2-byte alignment). Instead of fetching just 1 block, the processor will have to fetch 2 to retrieve the object. Now suppose you have a large array of these 4-byte unaligned folks, and you’re iterating trough it… see the problem, right?
Why would I care about alignment?
Unless you’re writing extremely performance critical code or dealing with vector instructions, you can leave all of this in your compiler’s hands. Some architectures (ARM) don’t even support unaligned memory accesses in the general case, so there’s just no choice there.
Why do you care about alignment?
Now, when writing Obi I had to work with vector instructions. These are special assembly instructions that allow you to perform the exact same arithmetic/memory operation on multiple data at the same time. For instance, if you have to add together two points in 3D (3 floats each), using a vector instruction you can perform the 3 additions in the time you’d normally be able to perform just one.
When working with large quantities of the same stuff (particles and constraints, in Obi’s case) where you must perform the same calculations on all of them, vectorization can make a huge difference in performance. Vectorization is orthogonal to multithreading. This means you can distribute your work among multiple CPU cores and still have each one perform calculations on multiple data at a time.
But here’s the caveat: most vector instructions work on 2 or 4 operands which must be laid out in memory in a very specific way. Single-precision numbers (floats) usually take 4 bytes each, double-precision (doubles) take 8 bytes. A vector instruction that operates on 4 consecutive floats needs the first one to be 4 x 4 = 16 byte aligned. Same thing for an instruction that operates on 2 doubles (2 x 8 = 16).
Well, what could go wrong?
What happens if you try to use a vector instruction on a group of floats that is 4-byte, but not 16-byte aligned?. BOOM. Your program just crashed. Most 64 bit systems allocate 16-byte aligned memory by default. So do some 32 bit systems. However, 32 bit Windows only guarantees 8-byte alignment.
So when dealing with vector instruction sets (SSE, AVX), correct memory alignment is absolutely crucial. Specially in Windows. The slightest alignment error can pass undetected very easily and your program will continue to run perfectly on most systems. But it will crash someday on someone’s computer, and debugging it will be fun to say the least.
How does Eigen deal with this?
Eigen will automatically use vectorization on all fixed-size vectors and matrices, unless you specify otherwise. No need to deal with intrinsics or assembly yourself, for the most part it all just works. However, there are some notable pitfalls that you must be aware of:
- If you declare a struct or class that contains fixed-size Eigen member variables, you must use the EIGEN_MAKE_ALIGNED_OPERATOR_NEW macro. This tells Eigen that the whole structure must be aligned when dynamically allocated, in order to keep its members aligned too.
- If you use STL containers to store objects of a fixed-size Eigen type (std::vector<Eigen::Quaternionf>, for instance), you must pass a custom allocator to it. This will tell the container how to properly allocate memory for these guys.
- NEVER, EVER, copy raw unaligned memory to Eigen types using memcpy(). Always use Eigen’s Map. By default it will assume the source memory to be unaligned, and deal with it accordingly.