You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dan Segel <da...@gmail.com> on 2008/06/05 15:12:31 UTC

Gigablast.com search engine, 10billion pages!!!

Our ultimate goal is to basically replicate gigablast.com search engine.
They claim to have less than 500 servers that contain 10billion pages
indexed, spidered and updated on a routine basis...  I am looking at
featuring 500 million pages indexed per node, and have a total of 20 nodes.
Each node will feature 2 quad core processes, 4TB (at raid 5) and 32 gb of
ram.  I believe this can be done however how many searches per second do you
think would be realistic in this instance?  We are looking at achieving
25+/- searches per second ultimately spread out over the 20 nodes... I can
really uses some advice with this one.

Thanks,
D. Segel