logo
background
 Home and Links
 Your PC and Security
 Server NAS
 Wargames
 Astronomy
 PhotoStory
 DVD making
 Raspberry Pi
 PIC projects
 Other projects
 Next >>

The history of my 'SETI Wall' Distributed Computing farm

SETI wall

How do it all start ?

It all started in about mid 1999 when the place I used to work at threw out a pile of old PC base units along with a big box of (unopened) Windows 98se disks. I picked up as many PC's as I could carry (for £5 each = OK, so they were Pentium MMX 233's with 16Mb of SIMM, a 3c509 (ISA) network card - but they all had CD drives and some had even been 'upgraded' to a 500 Mb hard disk !). I also deprived the dustmen of all the Win98se License packs.

What to do with this windfall ? - well I was already running SETI on my 'home' PC and had heard of the 'DC Farm' systems being put together in the USA so I decided to have a go !

My first system was simply 8 PC's wired together with a low cost (10mbs) Hub - they all had hard disks but only one had keyboard/mouse/monitor - and this acted as the 'central store' = it had a dial-up Modem connection to the Internet and most nights I would (manually) upload between 1 and 5 (depending on crashes etc) completed work units and download some new ones.

In those days it took about 25-30 Hrs to complete each SETI work-unit :-)

This is how it remained for about 6 months .... and then I discovered eBay ! The rest, as they say, is history.

A dozen or so old motherboards and an 18 port Hub later and I was building my very first 'Seti-Wall'.

My first true Node

The shelves in my study were still being used for other things when I started and I since I only had one 18 port Hub I was somewhat limited :-) - but I soon realized that fitting each new motherboard with a hard disk was going to be both expensive and noisy !
Image
So I spent the next 6 months using a DOS Floppy to boot Windows 98se 'images' across the network from a 'share' onto a RAM disk on the Node. This was achieved without the benefit of any co-operation from Microsoft's software, which seems to have been designed to prevent any such thing = perhaps because MS wants to force you to pay for 'real' Server plus Client licences plus 'add-ons' such as BOOTP & Windows Terminal Server.

In the first photo you will see a K6-III running at 333Mhz and capable of completing a SETI 'wu' in approx 33 hours. Since it's running 100% 24x7, the 'stock' heatsink/fan assembly has been replaced with one double the size. The motherboard is a 'diskless node' using a floppy disk to do the actual network boot. The floppy is sitting on a 'small form factor' style PSU. The motherboard is supported by 4 bolts above an sheet of 'anti-static' bubble wrap

Image
By the end of 2002 I had two 8 port hubs & one 32 port 'switch' and although not all the ports were filled the 'wall' was :-). The old MMX 233 boxes (in photo below, left) were also "crunching wu's" although they now had AMD K6-III 333 inside (which were, rather disappointingly, only slightly faster than the original P2-233's).

Image
So how did I do it ? Well, you can either read about some of the problems I encountered below or dive straight into the the detailed Step by Step "How To" guide (using a bootable Floppy Disk).

Following a major upgrade away from P2/3's and AMD K6's, I started to used Compact Flash (CF) cards (and USB sticks) as boot devices (which you can read about much later).

 

Questions and Answers

Microsoft says Windows 98 can't (won't :-) ) run from a RAM disk

So how did I get Windows 98 to run from a RAM disk ?

Well, Windows 98 doesn't know it's running from a RAM disk. It just thinks it's running from a 'Compacted' hard disk. At floppy boot time, both the RAM disk driver and the (16 bit) Compaction driver load well before Windows even thinks about getting started. So all attempts by Windows to access the 'physical' c:\ drive have to go via the Compaction driver - and this in turn goes via the RAM disk driver (as you will see later from what's on the boot floppy)

What about the 16 bit boot objections ?

Q) "But wait a moment - booting into DOS and booting into Windows involve totally different processes (DOS is 16 bit, Windows 32 bit) - surly there's no way DOS can suddenly switch over to Windows ?"

A) Although Windows 98 is a 32 bit operating system, it is running 'on top of' DOS - and although the boot processes are indeed different, 'win.com' does the trick of 'launching' Windows from DOS.

This, by the way, is not true for NT or 2000 - these are 'real' 32 bit Operating Systems with non-DOS boot sequences and there is no such thing as 'NT.com' [winnt.exe and winnnt32.exe are the SETUP programs used to INSTALL Windows NT/XP].

See also "what's on the boot floppy"> (page after Next>>)

What about the Windows boot Registry set-up sequence ?

Q) "OK, I've heard about win.com - but I know for a fact that 'Windows DOS' is not the same as 'DOS DOS' - among other things, Windows 98 'DOS' mode prepares for later 32 bit running by examining the Motherboard BIOS and Hardware during the Windows DOS boot-up sequence and creating values for the Registry to pick up later. So why doesn't the Register initialisation process get 'upset' and cause Windows launch to fail ?"

A) No idea - maybe because the 'Node image' was created on the 'Node Hardware' itself and, since good old Windows 'Plug & Pray' has to 'cope' with being moved to a different motherboard, perhaps, when no Registry values are found, it just drops back to the 'last used' set (which, of course, are OK since it's actually still on the same Hardware) ?

You are networking Windows with the 16 bit drivers loaded by DOS ??

Q) Windows needs 'proper' Windows network drivers .. how can it cope with DOS drivers ?"

A) It doesn't - the 16bit DOS network drivers are unloaded from RAM when the 'net stop' command is issued. So when Windows launches, no DOS network drivers are present (and IFSHLP will have 'reserved' the crucial low RAM for Windows to use).

Running from RAM disk means nothing gets 'saved' when the power fails !

How, since the RAM disk contents are lost every time the power is cut, do I ever get any results ?"

A) True - the Node RAM contents are indeed lost when-ever there is a power interruption. This is why each Node installs SETI onto a mapped 'share' on the 'Server' drive - so when the SETI software auto-saves 'work in progress' it's saved to the mapped 'share' - which is the 'Server' hard disk - and NOT on the RAM disk on the Node. The Node motherboard BIOS is (of course) set to restart when power is restored = and since the floppy disk is in the drive, DOS will boot-up, re-fetch the Windows RAM disk image, cross boot to Windows link to the map share and, with a short-cut to SETI in the 'Start' folder, it will then continue processing from where it left off !

Why use a Switch (and not a Hub) ?

Q) "I note you have switched to switches .. what advantage does this give ?"

A) Actually, none. I got the 10/100 Switch cheap on eBay at the time I was moving** from a 10mbs network to 100mbs. With a 100Mbs network, you aren't going to see much difference between a switch & a hub (in fact, a hub might even be faster since there is going to be some overhead as the switch decides 'where does this packet need to go ?').

** The only reason I moved to 100mbs was to speed up the boot process (moving a 30Mb 'image' across the network is reduced to a few seconds) - and I only did that after a series of power cut-outs led to frustratingly long delays as 30+ nodes all tried to boot-up at the same time. The network is not a bottleneck during normal running - SETI 'save' of work in progress is only a few bytes and new SETI wu's (work units) are only about 256kb each (and each Node only needs to fetch a new wu to process every day or so :-) )

What's the best Processor for SETI ?

Q) "I've heard that SETI depends heavily of the FPU efficiency .. and that some vendors CPU's give odd results ?"

A) Quick answer = the 'best' CPU is whatever is whatever your existing motherboard(s) will take and whatever is cheap on eBay :-) Also, I never 'over-clocked' my CPU's, so I have never seen any 'odd results' ..

[the following was true late 2001/early 2002 when this was first written - see comments added later]
OK, Pentium III's are best, AMD K6-III are the worst (they have totally crap FPU (Floating Point Units) and the K6-III-333 was only marginally better than a P2-233).

By 2002, AMD chips were much better but had moved to expensive DDR RAM and suchlike .. so the Intel Pentium range (PII, III & Celeron) still 'wins' when it comes to costs. As of November 2002, the best 'bang for the buck' was, without doubt, the 1GHz Celeron (you can get them at 30 to 40 quid) - their 100MHz FSB also means a cheap motherboard (133 Mhz is faster but not significantly enough to warrant the additional cost).

A quick note on the Intel Xeon. In the 'old days' the Xeon was "The God Speed King of Seti 'crunching'" = it's large on-chip cache allowed wu's to be crunched 'within the cache'. However, with the release of the v3.03 SETI Client, processing 'overflowed' the big Xeon L2 cache, so it no longer makes much difference and Xeon's were no longer worth the extra cost of the CPU chip (let alone the massive extra cost of a 'server' level motherboard).

[The following comment was added in 2006]

Funny enough, when HyperThreading came out, the Xeon chips suddenly become desirable again. At typical CPU speeds, the SETI Clients (v3.03 and later) were limited by sequential Floating Point bottlenecks. So when HyperThreading allowed a second wu to be processed 'in parallel' with the first, it could make use of the idle Floating Point cycles otherwise lost waiting for the results of the previous calculation. Together with the 'cheap' dual Xeon motherboards (from ASUS) that became available at roughly the same time, the cost of a dual Xeon CPU Node (both running HyperThreaded, so 4 SETI wu's at a time) came in at well under the cost of 4 'normal' (single CPU, single thread) Nodes.

Unfortunately, Windows 98 has never supported multiple CPUs, let alone HyperThreading, so I was never tempted to move over to Xeon's "on the Wall" although I did fit a couple of stand-alone box's with Windows XP pro (of which more later) running on dual HT'd Xeons .. which 'outperformed' the nodes running on the 'wall' by such a margin that I actually started to turn a few of the slower ones off.

What did it all cost to build ?

A) Very approx (2002 prices) :-
Motherboard = £20 (average)
128Mb RAM = £12
PIII 450/Cel 533 = £28 (average)
Heat-sink & fan = £5 (must have good quality fans - they make less noise !)
PSU = £5
NIC 3c905c = £3 (approx - I got 12 for 30 quid)
Floppy drive (new) = £7
Cables (Floppy Disk data, Ethernet Cat5 & PSU mains) = £3 (£1 ea from the computer Fairs)
98 Licence = £0 (Your_computer/ Fairs / eBay will have Windows98se at anything from £20 to £50)
===============
Total approx = £83

Add in 'part costs' (shared Switch, Server, KB/Mouse/Monitor/KVM), and the cost of a few 'premium' parts (a couple of £35 1GHz Celeron's) plus an allowance for 'wastage' (blown motherboards) and you get a 'ball park' of ...
.... "less than £100 per node".

What are the best power supplies ?

A) I used PSU's that were as small as possible. The more unused 'watts' you have, the less efficient the PSU is at producing the 'watts' you are using (and hence the more heat). I have mainly used 135 watt units I picked up as a 'box load' at a computer Fair = no-one wants them & you can pay less than £5 each if you buy a dozen at a time ('Tell you what mate, how about 40 quid for the whole box ?' ).

WARNING - make sure your PSU's are NOT 'system pulls' = check VERY carefully that they are 'brand new' / 'old stock' / 'un-used' ... PSU's are usually the first thing to go wrong with any PC - get some 'used' PSU's and you WILL end up 'frying' a motherboard/CPU or two whilst trying to work out which are the 'duffs' (I know, I did :-) ).

NB - for those of you with fancy 'Earth Leakage Detector' house Mains power circuits (ELCB's, RCD's etc) - playing with all those switch mode PSU's you ARE going to 'trip' your house power about once a month - this can be a real pain ! I now run my workroom off it's own protected 'spur' before the main trip - when they trip, they only take out the 'SETI Wall'.

The 'Server' itself is on a UPS .. I learnt this lesson after loosing 30+ wu's when I tripped the power, took out the Server's hard drive and lost about 500+ hrs of work :-(

What's it all cost to run ?

Image
A1) I don't like to think about it :-(

A2) I save money on the heating because my seti-wall keeps the house warm ... and the cat loves it (all those nice warm boxes to sit on).

[added late 2008] Increases in fuel costs during 2007/8 prompted a gradual 'turning off' of the 'less efficient' (i.e. slower) w98 Nodes. With SETI moving to BOINC (and dropping support for Windows 98) the final Win98 Node was turned off in late 2008.

Since then the "SETI Wall" has been cold and silent and I have only been running SETI on the 4 or 5 desktop PC boxes that are also used for 'normal' family web browsing and media store/replay functions.

NOTE SETI should never be run on a Laptop = SETI will 100% load the CPU and this will soon lead to overheating.

What else do you have to take into account ?

You need to get it all insured and get a couple of Smoke Detectors. I also have a big CO2 Fire Extinguisher by the door, just in case the worst happens.


Click "Next >>" (in the Navigation Bar, left) for my Step by Step "How To build a SETI Compute Node' guide

Next page :- Building a Compute Node

[top]