You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/11/22 13:23:00 UTC

[jira] [Updated] (NUTCH-2331) REST API Fetch fails to retrieve HDFS path on distributed mode

     [ https://issues.apache.org/jira/browse/NUTCH-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-2331:
-----------------------------------
    Affects Version/s: 1.15

> REST API Fetch fails to retrieve HDFS path on distributed mode
> --------------------------------------------------------------
>
>                 Key: NUTCH-2331
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2331
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, REST_api
>    Affects Versions: 1.15
>            Reporter: Sujen Shah
>            Assignee: Sujen Shah
>            Priority: Major
>
> Currently in the REST API, if the user does not specify the absolute path of the segment to fetch and only the crawlId, then the fetcher would find the latest segment generated and use that. 
> But as of now, the above functionality will only work in local mode as per https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/fetcher/Fetcher.java#L562-L573.
> Need to update these lines to enable fetcher to read the directory and list files from an hdfs system. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)