You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Omkar Reddy <om...@apache.org> on 2017/08/09 09:56:42 UTC

Regarding checksum error in hadoop in my latest PR.

Hello dev@,

I am facing an EOFException in the file TestGenerator.java and I cannot get
my hands on the way in which I can solve it. The Exception is as follows :


   1. 2017-08-09 12:57:06,026 WARN fs.FSInputChecker
   (ChecksumFileSystem.java:<init>(157)) - Problem opening checksum file:
   file:/tmp/hadoop-omreddy/mapred/temp/generate-temp-16af5bc1-1a80-412b-b0ca-481c82877f3b/fetchlist-0/part-r-00000.
   Ignoring exception:
   2. java.io.EOFException


I cannot understand the reason for it. This PR[1] is the part of an effort
to upgrade Nutch to use new MapReduce API.

Please find the detailed log of the test here[0]. Any suggestions/help
would be appreciated.

Thanks,
Omkar.

[0] https://paste.apache.org/e1cQ
[1] https://github.com/apache/nutch/pull/188

Re: Regarding checksum error in hadoop in my latest PR.

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Omkar,

the test fails because nothing is generated, i.e. the tests detected a broken functionality. :)

Best is to test generator from the command-line whether they behave as expected, e.g. (with a
non-empty CrawlDb):

  $ bin/nutch generate path/to/crawldb path/to/segments/
  ...
  Generator: segment: path/to/segments/20170811152636
  Generator: finished at 2017-08-11 15:26:37, elapsed: 00:00:03

  $ tree path/to/segments/
  path/to/segments/20170811152636
  `-- 20170811152636
      `-- crawl_generate

The folder crawl_generate is empty!  Generator is complex with multiple steps working in temporary
folders. Eventually, it's only the final copying which is broken. But I see no other way as to debug
the fetch list generation to find out what the reason is.

I strongly recommend to test also all other tools from command-line. For the bulk of them just run
a sample crawl via bin/crawl.

Best,
Sebastian


On 08/09/2017 11:56 AM, Omkar Reddy wrote:
> Hello dev@,
> 
> I am facing an EOFException in the file TestGenerator.java and I cannot get my hands on the way in
> which I can solve it. The Exception is as follows :
> 
>  1. 2017-08-09 12:57:06,026 WARN fs.FSInputChecker (ChecksumFileSystem.java:<init>(157)) - Problem
>     opening checksum file:
>     file:/tmp/hadoop-omreddy/mapred/temp/generate-temp-16af5bc1-1a80-412b-b0ca-481c82877f3b/fetchlist-0/part-r-00000.
>     Ignoring exception:
>  2. java.io.EOFException
> 
> 
> I cannot understand the reason for it. This PR[1] is the part of an effort to upgrade Nutch to use
> new MapReduce API. 
> 
> Please find the detailed log of the test here[0]. Any suggestions/help would be appreciated. 
> 
> Thanks,
> Omkar.
> 
> [0] https://paste.apache.org/e1cQ
> [1] https://github.com/apache/nutch/pull/188