You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2014/02/20 10:00:38 UTC

[jira] [Commented] (CONNECTORS-897) IO exception during indexing: missing CR

    [ https://issues.apache.org/jira/browse/CONNECTORS-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906775#comment-13906775 ] 

Karl Wright commented on CONNECTORS-897:
----------------------------------------

All I can tell from this is that it is coming from HttpClient when it is trying to deal with chunked IO.  HttpClient is called by the SolrJ library.  The SolrJ library is called by the Solr output connector.  So it's pretty far down in the chain of what is going on.

My suspicion, since nobody else has reported this, is that your solr instance (or the app server it is running under) is configured to reject posts that are too large, and that these respond with some bit of HTML that HttpClient of course does not recognize as a valid response to a chunked request.  (I believe this is actually the default configuration for later versions of Solr.)  But the only way to really debug it is to turn on HttpClient wire debugging and crawl one of the affected documents.  You do this by editing logging.ini and adding lines pertaining to HttpClient.  Google "HttpClient wire debugging" and you should get documentation.

If it is not obvious from the logs what is going on, please let us know.

> IO exception during indexing: missing CR
> ----------------------------------------
>
>                 Key: CONNECTORS-897
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-897
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: CMIS connector
>    Affects Versions: ManifoldCF 1.4.1
>         Environment: Windows 7 with bundled mcf 1.4.1, solr 4.6
>            Reporter: lalit
>
> Hi,
> I have downloaded mcf(ManifoldCF) 1.4.1 source from tag site & build it as per instructions. Now i am using cmis connector to connect to alfresco repo & using solr as output channel.
> When i am crawling alfresco repo for indexing into solr, whenever mcf crawls any media such as image or video, i am getting this error into mcf logs. I have also added adm4.1.jar & xmpcore.jar into ..\contrib\extraction\lib.
> ERROR 2014-02-20 12:50:45,251 (Worker thread '3') - Exception tossed: Repeated service interruptions - failure processing document: missing CR
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: missing CR
> 	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:586)
> Caused by: java.io.IOException: missing CR
> 	at sun.net.www.http.ChunkedInputStream.processRaw(Unknown Source)
> 	at sun.net.www.http.ChunkedInputStream.readAheadBlocking(Unknown Source)
> 	at sun.net.www.http.ChunkedInputStream.readAhead(Unknown Source)
> 	at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
> 	at java.io.FilterInputStream.read(Unknown Source)
> 	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
> 	at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
> 	at org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:69)
> 	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
> 	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
> 	at org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:186)
> 	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> 	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> 	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> 	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> 	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> 	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> 	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> 	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
> 	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> 	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> 	at org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:291)
> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
> 	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> 	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:919)
> Regards,
> Lalit.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)