You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@manifoldcf.apache.org by "Julien Massiera (JIRA)" <ji...@apache.org> on 2019/06/18 15:14:00 UTC

[jira] [Created] (CONNECTORS-1612) Postpone files in SMBException

Julien Massiera created CONNECTORS-1612:
-------------------------------------------

             Summary: Postpone files in SMBException
                 Key: CONNECTORS-1612
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1612
             Project: ManifoldCF
          Issue Type: Improvement
          Components: JCIFS connector
    Affects Versions: ManifoldCF 2.12
            Reporter: Julien Massiera


When crawling using the jcifs connector, some unexpected errors may trigger a class "SMBException" which is caught by MCF.
The current behavior for the job is to abort after a few retry.
Although it is a generic class of SMBException, we consider that it is worth before aborting the job, to postpone the concerned problematic files and try the ones already in the pipe before aborting. This way, the job can move on before developers have to study the particular problems. More precisely, the algorithm could look like the following:
Whenever a job encounters an error that is not clearly identified :
1. It immediately retries one time; 
2. If it succeeds, the crawl moves on as usual; 
3. If it fails, the job moves this document to the current end of the processing pipeline, and crawls the remaining documents. It increments the counter of tentative for this document to 2.
4. When encountering this document again, the job tries again. If it succeeds, the crawl moves on as usual. If it fails, it moves this document to the current end of the processing pipeline, increment the counter of 1, and doubles the delay between two tentatives.
5. We iterate until the maximum number of tentatives of the crawl for the problematic document has been reached. If it fails, abort the crawl. With this behavior, a job is finally aborted on critical errors but at least we will be able to crawl a maximum number of non problematic documents till the failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)