Blog Archive

Wednesday, May 28, 2008

Opteron 285 context switch twice as fast as Xeon E5345

Here is a test I ran under Windows XP 64-bit on two machines: one with 2 dual-core opterons @ 2.6 Ghz, the other with two Xeon E5345 (Core quad @ 2.33 ghz)

Two thread, each sits in WaitForMultipleObjectsEx with an infinite timeout.

We call QueueUserAPC(threadA, funcA)

funcA() {
QueueUserAPC(threadB, funcB);
}
funcB() {
QueueUserAPC(threadA, funcA);
}

This basically measures context-switching speed. When run in a single thread on the AMD machine, each function gets invoked about 550K times/sec.

With two threads, the AMD machine does 140K calls/sec for each function.

The Intel machine can only pull in 62K calls/sec.

Now, this is a very realistic kind of load for a multi-threaded app. A real-life app is not spending all of its time doing integer math, but in fact it does do a lot of context switching, if only to/from the OS. When every 62 context switches eat up a whole millisecond of your time, it's very hard to build something both multi-threaded and fast.

By the way, in case you wonder, this IS the fastest way to force a context switch that I know of. I tried passing a 1-byte token between two anonymous pipes (i.e. each read does a blocking read of 1 byte, and then writes it back for the other thread to receive). The results are not inspiring -- 80K tokens each thread on AMD, even fewer on Xeon.

More numbers: Dual AMD 285 (2x dual core, 2.6 ghz), Windows 32-bit: 105K calls/sec.
Dual Xeon X5365 (2x quad core, 3.00ghz): 101K calls/sec

The conclusion: something about Intel processors really makes context switching slow.
Another conclusion: whenever possible, use x64 code. It's not the "large memory" benefit, it's the extra 8 registers that will make your code fly.

2 comments:

  1. How did you get around, "If you perform an alertable wait inside an APC, it will recursively dispatch the APCs. This can cause a stack overflow.
    "

    ReplyDelete
  2. The answer is that you don't perform an alertable wait inside an APC. Each thread enters an alertable wait only once. Call that the main loop.

    ReplyDelete