You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2015/05/14 22:56:59 UTC

[jira] [Created] (NUTCH-2010) Implement isFetchingInProgress Utility Function in Fetcher

Lewis John McGibbney created NUTCH-2010:
-------------------------------------------

             Summary: Implement isFetchingInProgress Utility Function in Fetcher
                 Key: NUTCH-2010
                 URL: https://issues.apache.org/jira/browse/NUTCH-2010
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.10
            Reporter: Lewis John McGibbney
             Fix For: 1.11


The aim here is to stop (without killing) a Nutch crawl if the data being fetched is not of value to the user. The user can infer this by implementing some visualization on top of the backported REST API for Nutch trunk (could probably also do with with 2.X REST API as well tbh).
I suggest that we implement a convenience utility function in potentially Fetcher.java which would looking something like the following
{code}
public static boolean isFetchingInProgress() {
  return fetchingInProgress;
}
{code}
The fetchingInProgress should be set to tru whenever fetcher threads are working and should be set to false whenever all fetcher threads are unoccupied and back in the pool vacant.
This would be a powerful mechanism for determining if a crawl could be stopped without corrupting data as currently happens when a fetcher task is interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)