You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by David Webster <tr...@loxinfo.co.th> on 2005/09/26 09:01:33 UTC

New SE

Hello, 

I've been subscribed to this list for several months and have read
numerous posts, although most sail merrily above my noggin. I hope this
is not an inappropriate post.

I want to start a SE and have done quite a lot of thinking about it ..
although I'm not a programmer by any stretch of imagination .. and my
budget is .. um .. "challenged"

The SE I want is a clustering SE for Travel .. for specific regions.

- I want to spider and index as many pages as possible (I think) on
*one server*
- I want to exclude *all* predominant affiliate sites and all directory
sites.
- I *think* I only want to spider to 3 levels .. as the site should be
about travel resources more so than detailed information. 
- I believe updating the information (respidering) every 30 days is
sufficient .. maybe even every 60 days?
- I think that the easiest part is to set up nutch and get it working
.. and the harder part is configuring the crawler? or the indexer? to
inclue only those URLs that fit the requirements I have defined above.
- The one specific section of the DMOZ index would probably be okay for
seeding the database

So I'm hoping to get:

- Some reasonable comments on my plan
- A price from a consultant (free works too! ;) to get Nutch +
clustering set up and running
- A price from a consultant to configure the spider/indexer.

Thanks and best regards,

Dave W.