You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ian Reardon <ir...@gmail.com> on 2005/05/04 19:15:10 UTC
Some Nutch Questions
I would like to build an engine based on a hand full of hand picked
sites from a specific domain. I had a few questions.
How many documents can I fit on a single server implementation (2 cpu
xeon)? With space being irrelevant aprox. how many documents can I
have on a single node with respectable search performance?
My idea is to have a hand full of sites that I judge for quality and
index these on a regular basis maybe... once a month. I would like to
add new sites over time. Does this sound feasible with nutch?
What method would be best for this type of application? I setup nutch
and crawled a very small sample using method 1 in the tutorial
"Intranet crawl" I was unable to get whole web crawl to work. What
is that -dmozfile flag? I don't want to base this off dmoz. If
anyone could point me to some documentation or tutorial that better
explains whole web crawling I would appreciate it. Thanks a lot.