You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2018/05/30 10:30:00 UTC
[jira] [Resolved] (NUTCH-2588) Getting status code x01 (unfetched)
on more than 80% crawled urls
[ https://issues.apache.org/jira/browse/NUTCH-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2588.
------------------------------------
Resolution: Not A Problem
Hi [~usama_], please subscribe to the [Nutch user mailing list|http://nutch.apache.org/mailing_lists.html] and ask for further help there. Thanks!
> Getting status code x01 (unfetched) on more than 80% crawled urls
> -----------------------------------------------------------------
>
> Key: NUTCH-2588
> URL: https://issues.apache.org/jira/browse/NUTCH-2588
> Project: Nutch
> Issue Type: Bug
> Components: crawldb, fetcher
> Affects Versions: 2.3.1
> Environment: I am using apache nutch 2.3.1 with hadoop 2.7.6 and hbase 0.98.8 hadop2.
> Operating System: Ubuntu 16.04
> Reporter: Usama Tahir
> Priority: Major
>
> when i run nucth with external links enabled, seed of 10 urls and number of rounds 5 using command
> bin/crawl <seed_path> <db> [<solr url>] <number of rounds>
> i have default topN value which is 50000
> the process completes execution in 11 to 12 hours and generated urls rows of about 280000.
> when we analyze hbase table and check status codes of all urls we got round about 242000 urls having status code of x01 [un fetched]
> it means 242000 urls out of 280000 which nutch extracted remains unfetched.
> after some debugging of nutch and analyzing its logs i found that those urls which have status code of x01 are not even tried to fetch.
> is this the bug of nutch or something configuration issue?
> kindly resolve my issue as soon as possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)