You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Nguyen Huu Nhat (Jira)" <ji...@apache.org> on 2022/08/25 02:38:00 UTC

[jira] [Created] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.

Nguyen Huu Nhat created CONNECTORS-1724:
-------------------------------------------

             Summary: When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.
                 Key: CONNECTORS-1724
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
             Project: ManifoldCF
          Issue Type: Bug
            Reporter: Nguyen Huu Nhat


Hi there,

As there is an issue that is still not handled occurs in use, I would like to suggest the following fix for the source code of Generic repository connector.
For details about this issue, please refer to the information below:

h3. +*1. Connector name*+

Generic Repository Connector

h3. +*2. Issue*+

When Generic Repository is calling REST API with _action=seed_ and an error occurs, corresponding error handling is not executed, which results in that crawling job of ManifoldCF is frozen at status *Starting up* and no error message is outputted.
 * When this issue happens in the Generic Repository, seed phase of jobs in other repositories also freezes (perhaps, seed thread is also frozen)
 * Even after ManifoldCF is restarted, as jobs are automatically executed, the same issue happens again.
 * A temporary solution is to aborting the job and recheck the connection.

h3. +*3. Reproduction*+

h4. *Reproduction method:*
 * At setting step for Generic repository connection, set a non-existent entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that entry point and run that job.
 * 10 minutes or more after the job gets started, its status is still *Starting up* and abnormal end does not occur due to connection error and time-out.

h4. *Reproduction steps:*
 * Create a Generic repository connection with the following settings:
 ** On the *Entry Point* tab, set a non-existent entry point (e.g. [http://localhost/no*exist/])
 * Create a job using above Generic repository connection
 * Start the created job and keep track of its status
 ** Job is going to be frozen with the following information:
 *** Status: Starting up
 *** Start Time: Not started
 *** Documents: 0
 ** No new events appear in *Document Status*
 ** No errors get logged in manifoldcf.log

h3. +*4. Cause*+

In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* method is only called when returned HTTP status code is 200.
 * When the connector is not able to connect to REST API, which means that returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not called.
 ** This results in that *complete* flag is not reassigned as _true_
 ** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, job is stuck in the *wait()* process, inside the while loop of *XThreadStringBuffer#fetch()* method.

([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
{code:java}
    while (buffer.size() == 0 && !complete)
      wait();
{code}

⇒ These are the reasons why job is frozen at status *Starting up*

h3. +*5. Solution*+

In order to resolve this issue, we suggest the following things:
 * *seedBuffer.signalDone()* method should be called for all cases of HTTP response status.
 * Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. There is no process to handle ManifoldCFException in *finishUp()* method of *GenericConnector$ExecuteSeedingThread* class, so process to handle this exception should be added.

h3. +*6. Suggested source code (based on release 2.22.1)*+

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
{code:java}
-         seedBuffer.signalDone();
        } finally {
          EntityUtils.consume(response.getEntity());
          method.releaseConnection();
+         seedBuffer.signalDone();
        }
{code}

[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
{code:java}
        if (thr instanceof RuntimeException) {
          throw (RuntimeException) thr;
        } else if (thr instanceof Error) {
          throw (Error) thr;
+       } else if (thr instanceof ManifoldCFException) {
+         throw (ManifoldCFException) thr;
        } else {
          throw new RuntimeException("Unhandled exception of type: " + thr.getClass().getName(), thr);
        }
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)