HYPERTHREADING
Looking for the best floating point performance per
dollar, we decided to try out a dual processor
Intel set-up to see what we could get.
The details:
After getting the whole thing for just under $2800
by buying the individual parts (i.e. those listed
plus a case, a SCSI hard drive, a floppy drive, a CD
drive, a mouse and a keyboard), I assembled
everything and attempted to
install Red Hat 7.2. This didn't work, and after
a bit of a search I attempted the
fix recommended by Tyan. After
farting around with that for a while, I decided to
download and burn the three ISO images comprising
the recently released Red Hat 7.3, which contained the newer
kernel recommended by Tyan. This worked like a charm.
After booting up this screamer for the first time
and running top to see if both processors
were running, I was quite surprised to see four
processors up and running. Just to be sure, I opened
up the case to see if any extra processors had been
hidden somewhere. Nope.
After another search on the web, I found an answer:
hyper-threading, an Intel technology that enables
individual processors to have more than one hyperthreaded "core". The current Linux kernel
supports two "siblings" per CPU, although apparently more are allowed by the specification.
More can be gleaned from a
hyperthreading thread at the Linux Kernel
Mailinglist archive, including a statement by Alan
Cox that hyperthreading performance improvement is
typically 10-30%. This has thus far proved correct
in my benchmarking tests.
Speaking of which, I'm using the
ROMS (Regional Ocean Modeling System) code,
written in Fortran, for benchmarking. Why? Because
it's the code I'm presently using for research.
It compiled and ran using the Intel Fortran Compiler with just
a minor bit of tweaking, and the performance I've seen
so far is most impressive.
A single P4 2.2 GHz processor is almost exactly four
times faster than a single 800 MHz P3 processor (with
the same compilation options), indicating that the
Xeon is doing quite a bit better than just a linear
performance increase due to increased processor speed.
Then is started tweaking the compiler options, with
the following results:
- the vectorization switch (-xW) increased performance typically by around 25%
- the -O2 optimization switch typically produces
better results than the -O3 switch, probably due to
the complex additional -O3 optimization attempts
conflicting with other optimization switches
- the -openmp SMP parallelization switch (which
enables the
OpenMP statements buried in the code to be
compiled) showed little effect with 2 processors
specified, but showed about a 10-25% increase
with 4 specified, so apparently the hyperthreading
is working better than the OpenMP
Another interesting tidbit about hyperthreading
is that Linux was the first OS implementation thereof,
which apparently came about when Intel submitted
a kernel patch.
Microsoft and Intel are having problems getting
together on this one,
purportedly because of a battle over license
fees, i.e. whether each
hyperthreaded processor "sibling" is considered a separate
processor when it comes to charging fees.
Maybe I'll run some of the standard benchmarks to
give a better feel as to this box's relative
performance, although it's already most impressive
running the ROMS code as compared to other machines
on which we've been running it.
posted by Steven Baum
6/12/2002 10:22:31 AM |
link