Performance Optimization

Results of performance optimization study on both PowerPC and CoreDuo machines. 100 runs of the same two functions were done and the best time from each is recorded as changed are made to the code and compiler flags.

The "Sum" test sums 10,000 vectors (c = a + b).

The "Diffuse" test runs a fluid diffusion pass on a 2D array of vectors.

PowerPC (G5 1.8Ghz)

Change Sum Diffuse
Baseline 28ms 48ms
Switch to vFloat type 68ms 116ms
'inline' Vector ctor 69ms 128ms
AltiVec Vector functions 27ms 62ms
'inline' AltiVec functions 25ms 58ms
'inline' getNeighborSum() 25ms 38ms
Hand tune diffuse with vec_madd n/a 23ms
-mtune=G5 24ms 22ms
-ffast-math=16 24ms 22ms
-falign-loops=16 24ms 22ms

Intel (Core Duo 2Ghz)

Change Sum Diffuse
Baseline 43ms 81ms
Inline SSE 18ms 29ms

About Sean

Sean is a Technical Director at High Moon Studios where he creates console video games.