Blog Archive

Friday, May 30, 2008

A program to measure integer, multi-threading and context-switching performance. More info at

www.alexeilebedev.com/benchmark

You can see for yourself how slow your Intel quad core chip really is.

Wednesday, May 28, 2008

Opteron 285 context switch twice as fast as Xeon E5345

Here is a test I ran under Windows XP 64-bit on two machines: one with 2 dual-core opterons @ 2.6 Ghz, the other with two Xeon E5345 (Core quad @ 2.33 ghz)

Two thread, each sits in WaitForMultipleObjectsEx with an infinite timeout.

We call QueueUserAPC(threadA, funcA)

funcA() {
QueueUserAPC(threadB, funcB);
}
funcB() {
QueueUserAPC(threadA, funcA);
}

This basically measures context-switching speed. When run in a single thread on the AMD machine, each function gets invoked about 550K times/sec.

With two threads, the AMD machine does 140K calls/sec for each function.

The Intel machine can only pull in 62K calls/sec.

Now, this is a very realistic kind of load for a multi-threaded app. A real-life app is not spending all of its time doing integer math, but in fact it does do a lot of context switching, if only to/from the OS. When every 62 context switches eat up a whole millisecond of your time, it's very hard to build something both multi-threaded and fast.

By the way, in case you wonder, this IS the fastest way to force a context switch that I know of. I tried passing a 1-byte token between two anonymous pipes (i.e. each read does a blocking read of 1 byte, and then writes it back for the other thread to receive). The results are not inspiring -- 80K tokens each thread on AMD, even fewer on Xeon.

More numbers: Dual AMD 285 (2x dual core, 2.6 ghz), Windows 32-bit: 105K calls/sec.
Dual Xeon X5365 (2x quad core, 3.00ghz): 101K calls/sec

The conclusion: something about Intel processors really makes context switching slow.
Another conclusion: whenever possible, use x64 code. It's not the "large memory" benefit, it's the extra 8 registers that will make your code fly.