You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/06/17 13:38:05 UTC
[jira] [Commented] (NUTCH-2281) Support non-default FileSystem
[ https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336098#comment-15336098 ]
ASF GitHub Bot commented on NUTCH-2281:
---------------------------------------
GitHub user sebastian-nagel opened a pull request:
https://github.com/apache/nutch/pull/119
NUTCH-2281 Support non-default FileSystem
Instead of
FileSystem fs = FileSystem.get(getConf());
use
FileSystem fs = path.getFileSystem(getConf());
to avoid that FileSystem and Path match.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sebastian-nagel/nutch NUTCH-2281
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/119.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #119
----
commit 7e776d745fe4989bea4ca24ce333c159b5e3ac1a
Author: Sebastian Nagel <sn...@apache.org>
Date: 2016-06-17T13:31:38Z
NUTCH-2281 Support non-default FileSystem
Instead of
FileSystem fs = FileSystem.get(getConf());
use
FileSystem fs = path.getFileSystem(getConf());
to avoid that FileSystem and Path match.
----
> Support non-default FileSystem
> ------------------------------
>
> Key: NUTCH-2281
> URL: https://issues.apache.org/jira/browse/NUTCH-2281
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.12
> Reporter: Sebastian Nagel
> Fix For: 1.13
>
>
> If a path (input or output) does not belong to the configured default FileSystem various Nutch tools may raise an exception like
> {noformat}
> Exception in ... java.lang.IllegalArgumentException: Wrong FS: s3a://..., expected: hdfs://...
> {noformat}
> This is fixed by getting a reference to the FileSystem from the Path object
> {noformat}
> FileSystem fs = path.getFileSystem(getConf());
> {noformat}
> instead of
> {noformat}
> FileSystem fs = FileSystem.get(getConf());
> {noformat}
> A given path (e.g., {{s3a://...}}) may not belong to the default file system ({{hdfs://}} or {{file://}} in local mode) and simple checks such as {{fs.exists(path)}} then will fail. Cf. [FileSystem.checkPath(path)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#checkPath(org.apache.hadoop.fs.Path)], and [FileSystem.get(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(org.apache.hadoop.conf.Configuration)] vs. [FileSystem.get(URI,conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html#get(java.net.URI,%20org.apache.hadoop.conf.Configuration)] which is called by [Path.getFileSystem(conf)|https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/Path.html#getFileSystem%28org.apache.hadoop.conf.Configuration%29].
> Note that the FileSystem for input and output may be different, e.g., read from HDFS and write to S3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)