You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Nguyen Huu Nhat (Jira)" <ji...@apache.org> on 2022/08/25 02:38:00 UTC
[jira] [Created] (CONNECTORS-1724) When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.
Nguyen Huu Nhat created CONNECTORS-1724:
-------------------------------------------
Summary: When the REST API cannot be connected, job using the Generic Repository Connector would be freezed.
Key: CONNECTORS-1724
URL: https://issues.apache.org/jira/browse/CONNECTORS-1724
Project: ManifoldCF
Issue Type: Bug
Reporter: Nguyen Huu Nhat
Hi there,
As there is an issue that is still not handled occurs in use, I would like to suggest the following fix for the source code of Generic repository connector.
For details about this issue, please refer to the information below:
h3. +*1. Connector name*+
Generic Repository Connector
h3. +*2. Issue*+
When Generic Repository is calling REST API with _action=seed_ and an error occurs, corresponding error handling is not executed, which results in that crawling job of ManifoldCF is frozen at status *Starting up* and no error message is outputted.
* When this issue happens in the Generic Repository, seed phase of jobs in other repositories also freezes (perhaps, seed thread is also frozen)
* Even after ManifoldCF is restarted, as jobs are automatically executed, the same issue happens again.
* A temporary solution is to aborting the job and recheck the connection.
h3. +*3. Reproduction*+
h4. *Reproduction method:*
* At setting step for Generic repository connection, set a non-existent entry point (e.g. [http://localhost/no*exist/]). Then, define a job that uses that entry point and run that job.
* 10 minutes or more after the job gets started, its status is still *Starting up* and abnormal end does not occur due to connection error and time-out.
h4. *Reproduction steps:*
* Create a Generic repository connection with the following settings:
** On the *Entry Point* tab, set a non-existent entry point (e.g. [http://localhost/no*exist/])
* Create a job using above Generic repository connection
* Start the created job and keep track of its status
** Job is going to be frozen with the following information:
*** Status: Starting up
*** Start Time: Not started
*** Documents: 0
** No new events appear in *Document Status*
** No errors get logged in manifoldcf.log
h3. +*4. Cause*+
In *GenericConnector$ExecuteSeedingThread* class, *seedBuffer.signalDone()* method is only called when returned HTTP status code is 200.
* When the connector is not able to connect to REST API, which means that returned HTTP status code is not 200, *seedBuffer.signalDone()* method is not called.
** This results in that *complete* flag is not reassigned as _true_
** As *complete* flag is not reassigned as _true_ and *buffer.size()* is 0, job is stuck in the *wait()* process, inside the while loop of *XThreadStringBuffer#fetch()* method.
([https://github.com/apache/manifoldcf/blob/release-2.22.1/framework/connector-common/src/main/java/org/apache/manifoldcf/connectorcommon/common/XThreadStringBuffer.java#L78])
{code:java}
while (buffer.size() == 0 && !complete)
wait();
{code}
⇒ These are the reasons why job is frozen at status *Starting up*
h3. +*5. Solution*+
In order to resolve this issue, we suggest the following things:
* *seedBuffer.signalDone()* method should be called for all cases of HTTP response status.
* Moreover, when HTTP status code is not 200, ManifoldCFException is thrown. There is no process to handle ManifoldCFException in *finishUp()* method of *GenericConnector$ExecuteSeedingThread* class, so process to handle this exception should be added.
h3. +*6. Suggested source code (based on release 2.22.1)*+
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1151]
{code:java}
- seedBuffer.signalDone();
} finally {
EntityUtils.consume(response.getEntity());
method.releaseConnection();
+ seedBuffer.signalDone();
}
{code}
[https://github.com/apache/manifoldcf/blob/release-2.22.1/connectors/generic/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/generic/GenericConnector.java#L1120]
{code:java}
if (thr instanceof RuntimeException) {
throw (RuntimeException) thr;
} else if (thr instanceof Error) {
throw (Error) thr;
+ } else if (thr instanceof ManifoldCFException) {
+ throw (ManifoldCFException) thr;
} else {
throw new RuntimeException("Unhandled exception of type: " + thr.getClass().getName(), thr);
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)