Alexei Lebedev: Opteron 285 context switch twice as fast as Xeon E5345

Here is a test I ran under Windows XP 64-bit on two machines: one with 2 dual-core opterons @ 2.6 Ghz, the other with two Xeon E5345 (Core quad @ 2.33 ghz)

Two thread, each sits in WaitForMultipleObjectsEx with an infinite timeout.

We call QueueUserAPC(threadA, funcA)

funcA() {
 QueueUserAPC(threadB, funcB);
}
funcB() {
  QueueUserAPC(threadA, funcA);
}

This basically measures context-switching speed. When run in a single thread on the AMD machine, each function gets invoked about 550K times/sec.

With two threads, the AMD machine does 140K calls/sec for each function.

The Intel machine can only pull in 62K calls/sec.

Now, this is a very realistic kind of load for a multi-threaded app. A real-life app is not spending all of its time doing integer math, but in fact it does do a lot of context switching, if only to/from the OS. When every 62 context switches eat up a whole millisecond of your time, it's very hard to build something both multi-threaded and fast.

By the way, in case you wonder, this IS the fastest way to force a context switch that I know of. I tried passing a 1-byte token between two anonymous pipes (i.e. each read does a blocking read of 1 byte, and then writes it back for the other thread to receive). The results are not inspiring -- 80K tokens each thread on AMD, even fewer on Xeon.

More numbers: Dual AMD 285 (2x dual core, 2.6 ghz), Windows 32-bit: 105K calls/sec.
Dual Xeon X5365 (2x quad core, 3.00ghz): 101K calls/sec

The conclusion: something about Intel processors really makes context switching slow.
Another conclusion: whenever possible, use x64 code. It's not the "large memory" benefit, it's the extra 8 registers that will make your code fly.

2 comments:

Anonymous7 October 2009 at 18:57
How did you get around, "If you perform an alertable wait inside an APC, it will recursively dispatch the APCs. This can cause a stack overflow.
"
Notetaker21 March 2010 at 13:15
The answer is that you don't perform an alertable wait inside an APC. Each thread enters an alertable wait only once. Call that the main loop.

Alexei Lebedev

Blog Archive

Wednesday, May 28, 2008

Opteron 285 context switch twice as fast as Xeon E5345

2 comments: