You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vertical Search <ve...@gmail.com> on 2006/03/12 05:37:09 UTC

Crawling sites -- Question

I was wondering if I could get some inputs on few things listed below
I am currently working on a vertical search domain.
For example : Hotels, Restaurants and
1.First of all, from the home page, I should be able to crawl atleast3
levels.For some reasons, I am not able to right now. I am not sure, if any
special config is required.
2. I would like to mimic a user search on the website. Just like user enters
and keey word clicks on "Search",
     Can I get this functionality? How do I go about.
3. How to get rid of URLs during each level not required to be crawled. For
instance on each page, going back to home page, sending a mail or
advertisements etc.,
Iknow, I could limit to a domain. But within domain, I should be able to
ignore some URLs. I know, I could do in crawl-urlfilter.txt.
But, if some one help me that will be great... or atleast point to right
documentation...

Thanks in advance.