You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Mohammad Monirul Hoque <im...@yahoo.com> on 2008/09/03 06:53:39 UTC

problems: crawling specific domain

Hi,

How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.

I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?

Regards.
--mohammad monirul hoque

Re: problems: crawling specific domain

Posted by David Jashi <da...@jashi.ge>.

Ever tried to use this one:
http://wiki.apache.org/nutch/Nutch_0%2e9_Crawl_Script_Tutorial ?
About single site crawl:
http://peterpuwang.googlepages.com/NutchGuideForDummies.htm , part 4.


On Wed, Sep 3, 2008 at 8:53 AM, Mohammad Monirul Hoque
<im...@yahoo.com> wrote:
>
> Hi,
>
> How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.
>
> I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?
>
> Regards.
> --mohammad monirul hoque
>
>
>



-- 
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
David@Jashi.ge

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს ონლაინი"
+995(32)970368
David@Jashi.ge