You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitri Bichko <db...@aveopharma.com> on 2005/09/09 20:28:52 UTC

Hardware recommendation

Hi,

I'm putting together a cheap indexing server for an "explorative" lucene
project and had a few questions about which route to go.

I am going with a Socket 939 platform - does it make sense to get the
dual core Athlon 64 X2, or is it better to stick with a faster clocked
"plain" Athlon 64?

Also, would Lucene benefit from running in 64 bit mode, or does it
prefer "compatibility" 32 bit?

I figure most indexing apps will be heavily IO bound, so I am stressing
that, while staying with commodity components, so:

WD SATA disks (250GB, 16MB cache, SATAII 3Gb/s)
starting out with 4 of these (plus system disks), on the onboard
controller (RAID0)

If need be I can add two disk cages, 5 disks each with two decent SATA
RAID controllers (64/128MB cache, NCQ, that sort of thing); the nForce4
PCI-Express should stand up to this, I'm hoping.

And of course I am limited to 4GB RAM.

I have three main applications in mind:

Indexing PubMed/Medline article abstracts, this would we an index of
about 15 million records with a couple of identifier fields, a title and
a 1-3 paragraph abstract.  Mostly the searches will be keyword searches
on the text fields.  Potentially I could add full-length papers to this
as well (a lot fewer records though).

Second one is indexing a couple hundred thousand MS Office documents and
PDF files (Google Appliance sort of thing).

And finally a genetic database repository a la LuceGene, or SRS.  This
would have more complex records (ie many fields, but little data with
each), which are mostly retrieved on unique identifiers (very little
text searching).  This would probably run to a few tens of millions of
records, maybe around 100 million eventually.

Given these applications, what else should I be thinking about,
hardware-wise?

Thanks,
Dmitri
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org