There was a great talk on quaternions at GDC 2023 by Hamish Todd. I stayed in Room 2001 all day with Prof. Squirrel and those talks were amazing. You should definitely check the GDC Vault if you missed them!
This project is still in early stage so there’s not much to share, but here are some benchmarks I did.
The benchmarks are done on a Terrans Force AMD 27X7SH1 laptop. Here’s the environment:
- CPU: AMD Ryzen 7 3700X 8-Core Processor 3.59 GHz
- RAM: 16GB DDR4 3200MHz
- Windows Version: Windows 10 Pro Version 22H2 OS build 19045.2728
Compiler flags impact these benchmark results A LOT, and if you force your structs to use 16 bytes alignment (alignas(16)
), compiler might do a much better job at vectorization than your handwritten SIMD instructions.
A big lesson I learnt from this is to do as many as benchmarks as you can with different compiler flags.
Vectors
Normalization
Name | Average Time |
---|---|
Plain Normalize | 5.1861 ns |
SIMD Normalize (_mm_sqrt_ps + _mm_div_ps ) |
2.26811 ns |
SIMD Fast Normalize (_mm_rsqrt_ps + _mm_mul_ps ) |
1.19543 ns |
Cross Product
Name | Average Time |
---|---|
Plain Cross Product | 4.63804 ns |
SIMD Cross Product | 0.840373 ns |
Matrices
Matrix Multiplication
Name | Average Time |
---|---|
Plain Matrix Multiplication | 19.6323 ns |
SIMD Matrix Multiplication | 4.29214 ns |
LookAt Matrix
Name | Average Time |
---|---|
glm::lookAt | 28.2944 ns |
SIMD LookAt Matrix | 1.91781 ns |
Perspective Matrix
This one is kind of funny because I didn’t really do anything SIMD other than using an aligned struct. There might be something wrong with glm
’s implementation. It could also be me using glm::perspective
in a wrong way?
Name | Average Time |
---|---|
glm::perspective | 16.4587 ns |
SIMD Perspective Matrix | 1.90413 ns |
Quaternions
Quaternions are very funny. At first I tought I was doing a fantastic job optimizing with SIMD intrinsics. Then I added alignas(16)
to those plain quaternion structs, and suddenly the compiler vectorized versions were blazing fast 😥.
My guess is that I was doing too much _mm_shuffle_ps
. Might need to do some profiling with AMD uProf later.
Hamilton Product
Name | Average Time |
---|---|
Plain Hamilton Product | 0.240584 ns |
SIMD Hamilton Product | 1.27795 ns |
Matrix Conversion
Name | Average Time |
---|---|
Plain Matrix Conversion | 1.91547 ns |
SIMD Matrix Conversion | 1.95097 ns |