You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Thornton <po...@john.thornton.name> on 2018/03/16 12:45:59 UTC
Fetcher error when running on Amazon EMR with S3
Hello,
I'm currently running Nutch under Amazon EMR 5.12.0 with Hadoop 2.83 using
S3 (EMRFS) as the filesystem. If I build the latest version from the
master branch and run a crawl in distributed mode I get a fetcher error
like fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException: Wrong
FS: s3:..., expected: hdfs://...
This problem was reported in NUTCH-2494 and fixed in PR-274 and indeed when
I run the same crawl using a build of commit 87c7a2e it works with no
error. So my question is has a regression been introduced, or am I missing
something?
Regards,
John
Re: Fetcher error when running on Amazon EMR with S3
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi John,
the recent master has seen an upgrade to the new MapReduce API (NUTCH-2375),
it was a huge change which is already known to have introduced some issues.
For production it's recommended to use 1.14 and if necessary patch it.
Could you open a new issue on
https://issues.apache.org/jira/projects/NUTCH
and provide the detailed stack there.
Thanks,
Sebastian
On 03/16/2018 01:45 PM, John Thornton wrote:
> Hello,
>
> I'm currently running Nutch under Amazon EMR 5.12.0 with Hadoop 2.83 using
> S3 (EMRFS) as the filesystem. If I build the latest version from the
> master branch and run a crawl in distributed mode I get a fetcher error
> like fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException: Wrong
> FS: s3:..., expected: hdfs://...
>
> This problem was reported in NUTCH-2494 and fixed in PR-274 and indeed when
> I run the same crawl using a build of commit 87c7a2e it works with no
> error. So my question is has a regression been introduced, or am I missing
> something?
>
> Regards,
>
> John
>