You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by lewis john mcgibbney <le...@gmail.com> on 2011/07/11 15:55:51 UTC

Re: Error Network is unreachable in Nutch 1.3

Hi,

Please see this new tutorial [1] for configuring Nutch 1.3. If you are
familiar/comnfortable working with Solr for improvements to indexing then
you will find it no problem.

If you require to stick with Lucene and the web application front end then
please stcik with Nutch 1.2 or before.

[1] http://wiki.apache.org/nutch/RunningNutchAndSolr



On Mon, Jul 11, 2011 at 3:02 PM, Yusniel Hidalgo Delgado
<yh...@uci.cu>wrote:

> Hello.
> I'm trying to run nutch 1.3 in my LAN following the NutchTutorial from wiki
> page. When I try to run with this command line options: nutch crawl urls
> -dir crawl -depth 3 I get the following output:
>
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> Injector: starting at 2011-07-11 09:35:37
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-11 09:35:40, elapsed: 00:00:03
> Generator: starting at 2011-07-11 09:35:40
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment: crawl/segments/20110711093542
> Generator: finished at 2011-07-11 09:35:43, elapsed: 00:00:03
> Fetcher: starting at 2011-07-11 09:35:43
> Fetcher: segment: crawl/segments/20110711093542
> Fetcher: threads: 10
> QueueFeeder finished: total 2 records + hit by time limit :0
> fetching http://FIRST <http://first/> SITE/
> fetching http://SECOND <http://second/> SITE/
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=3
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=3
> fetch of http://FIRST <http://first/> SITE/ failed with:
> java.net.ConnectException: Network is unreachable
> -finishing thread FetcherThread, activeThreads=1
> fetch of http://SECOND <http://second/> SITE/ failed with:
> java.net.ConnectException: Network is unreachable
> -finishing thread FetcherThread, activeThreads=0
> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> -activeThreads=0
> Fetcher: finished at 2011-07-11 09:35:45, elapsed: 00:00:02
> ParseSegment: starting at 2011-07-11 09:35:45
> ParseSegment: segment: crawl/segments/20110711093542
> ParseSegment: finished at 2011-07-11 09:35:47, elapsed: 00:00:01
> CrawlDb update: starting at 2011-07-11 09:35:47
> CrawlDb update: db: crawl/crawldb
> CrawlDb update: segments: [crawl/segments/**20110711093542]
> CrawlDb update: additions allowed: true
> CrawlDb update: URL normalizing: true
> CrawlDb update: URL filtering: true
> CrawlDb update: Merging segment data into db.
> CrawlDb update: finished at 2011-07-11 09:35:48, elapsed: 00:00:01
> Generator: starting at 2011-07-11 09:35:48
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=1 - no more URLs to fetch.
> LinkDb: starting at 2011-07-11 09:35:49
> LinkDb: linkdb: crawl/linkdb
> LinkDb: URL normalize: true
> LinkDb: URL filter: true
> LinkDb: adding segment: file:/home/yusniel/Programas/**
> nutch-1.3/runtime/local/bin/**crawl/segments/20110711093542
> LinkDb: finished at 2011-07-11 09:35:50, elapsed: 00:00:01
> crawl finished: crawl
>
> According to this output, the problem is related with the access to
> network, however, I can access to those web site using Firefox. I'm using
> Debian testing version.
>
> Greetings.
>
>


-- 
*Lewis*

Re: Error Network is unreachable in Nutch 1.3

Posted by Yusniel Hidalgo Delgado <yh...@uci.cu>.

Thank you very much Lewis. Greetings.

El 11/07/11 09:55, lewis john mcgibbney escribió:
> Hi,
>
> Please see this new tutorial [1] for configuring Nutch 1.3. If you are
> familiar/comnfortable working with Solr for improvements to indexing then
> you will find it no problem.
>
> If you require to stick with Lucene and the web application front end then
> please stcik with Nutch 1.2 or before.
>
> [1] http://wiki.apache.org/nutch/RunningNutchAndSolr
>
>
>
> On Mon, Jul 11, 2011 at 3:02 PM, Yusniel Hidalgo Delgado
> <yh...@uci.cu>wrote:
>
>> Hello.
>> I'm trying to run nutch 1.3 in my LAN following the NutchTutorial from wiki
>> page. When I try to run with this command line options: nutch crawl urls
>> -dir crawl -depth 3 I get the following output:
>>
>> solrUrl is not set, indexing will be skipped...
>> crawl started in: crawl
>> rootUrlDir = urls
>> threads = 10
>> depth = 3
>> solrUrl=null
>> Injector: starting at 2011-07-11 09:35:37
>> Injector: crawlDb: crawl/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Merging injected urls into crawl db.
>> Injector: finished at 2011-07-11 09:35:40, elapsed: 00:00:03
>> Generator: starting at 2011-07-11 09:35:40
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: Partitioning selected urls for politeness.
>> Generator: segment: crawl/segments/20110711093542
>> Generator: finished at 2011-07-11 09:35:43, elapsed: 00:00:03
>> Fetcher: starting at 2011-07-11 09:35:43
>> Fetcher: segment: crawl/segments/20110711093542
>> Fetcher: threads: 10
>> QueueFeeder finished: total 2 records + hit by time limit :0
>> fetching http://FIRST<http://first/>  SITE/
>> fetching http://SECOND<http://second/>  SITE/
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=3
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=3
>> fetch of http://FIRST<http://first/>  SITE/ failed with:
>> java.net.ConnectException: Network is unreachable
>> -finishing thread FetcherThread, activeThreads=1
>> fetch of http://SECOND<http://second/>  SITE/ failed with:
>> java.net.ConnectException: Network is unreachable
>> -finishing thread FetcherThread, activeThreads=0
>> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
>> -activeThreads=0
>> Fetcher: finished at 2011-07-11 09:35:45, elapsed: 00:00:02
>> ParseSegment: starting at 2011-07-11 09:35:45
>> ParseSegment: segment: crawl/segments/20110711093542
>> ParseSegment: finished at 2011-07-11 09:35:47, elapsed: 00:00:01
>> CrawlDb update: starting at 2011-07-11 09:35:47
>> CrawlDb update: db: crawl/crawldb
>> CrawlDb update: segments: [crawl/segments/**20110711093542]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> CrawlDb update: finished at 2011-07-11 09:35:48, elapsed: 00:00:01
>> Generator: starting at 2011-07-11 09:35:48
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: 0 records selected for fetching, exiting ...
>> Stopping at depth=1 - no more URLs to fetch.
>> LinkDb: starting at 2011-07-11 09:35:49
>> LinkDb: linkdb: crawl/linkdb
>> LinkDb: URL normalize: true
>> LinkDb: URL filter: true
>> LinkDb: adding segment: file:/home/yusniel/Programas/**
>> nutch-1.3/runtime/local/bin/**crawl/segments/20110711093542
>> LinkDb: finished at 2011-07-11 09:35:50, elapsed: 00:00:01
>> crawl finished: crawl
>>
>> According to this output, the problem is related with the access to
>> network, however, I can access to those web site using Firefox. I'm using
>> Debian testing version.
>>
>> Greetings.
>>
>>
>