You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Ahmet Arslan <io...@yahoo.com> on 2013/01/14 14:22:47 UTC

Repeated service interruptions - failure processing document: null

Hello,

I am indexing a SharePoint 2010 instance using mcf-trunk (At revision 1432907)

There is no problem with a Document library that contains word excel etc.

However, I receive the following errors with a Document library that has *.aspx files in it.

Status of Jobs => Error: Repeated service interruptions - failure processing document: null

 WARN 2013-01-14 15:00:12,720 (Worker thread '13') - Service interruption reported for job 1358009105156 connection 'iknow': IO exception during indexing: null
ERROR 2013-01-14 15:00:12,763 (Worker thread '13') - Exception tossed: Repeated service interruptions - failure processing document: null
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
Caused by: org.apache.http.client.ClientProtocolException
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	... 6 more
Caused by: java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
	at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
	at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
	... 8 more
	
Status of Jobs => Error: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
	
	ERROR 2013-01-14 15:10:42,074 (Worker thread '15') - Exception tossed: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
	at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
	at org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
	at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
	at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
	at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
	at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
	
On the solr side I see :

INFO: Creating new http client, config:maxConnections=200&maxConnectionsPerHost=8
2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 ...long long chars ... 2B656B6970{}

Thanks,
Ahmet

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
It's also possible that the getACLs() problem has to do with these
files.  Apparently you can't get the permissions for them.  In that
case, if security is on, we can't index them, because we can't get
valid ACLs.

Karl

On Mon, Jan 14, 2013 at 11:46 AM, Karl Wright <da...@gmail.com> wrote:
> Hi Ahmet,
>
> We could specifically treat .aspx files specially, so that they are
> considered to never have any content.  But are there cases where
> someone might want to index any content that these URLs might return?
> Specifically, what do .aspx "files" typically contain, when found in a
> SharePoint hierarchy?
>
> Karl
>
> On Mon, Jan 14, 2013 at 11:37 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>> Hi Karl,
>>
>> Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No exceptions in the log.
>>
>> I increased the maximum POST size of solr/jetty but that 39 number didn't increased.
>>
>> I will check the size of remaining 130 - 39 *.aspx files.
>>
>> Actually I am mapping extracted content of this aspx files to a ignored dynamic field. (fmap.content=content_ignored) I don't use them. I am only interested in metadata of these aspx files. It would be great if there is a setting  to just grab metadata. Similar to Lists.
>>
>> Thanks,
>> Ahmet
>>
>> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:
>>
>>> From: Karl Wright <da...@gmail.com>
>>> Subject: Re: Repeated service interruptions - failure processing document: null
>>> To: dev@manifoldcf.apache.org
>>> Date: Monday, January 14, 2013, 5:46 PM
>>> I checked in a fix for this ticket on
>>> trunk.  Please let me know if it
>>> resolves this issue.
>>>
>>> Karl
>>>
>>> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <da...@gmail.com>
>>> wrote:
>>> > This is because httpclient is retrying on error for
>>> three times by
>>> > default.  This has to be disabled in the Solr
>>> connector, or the rest
>>> > of the logic won't work right.
>>> >
>>> > I've opened a ticket (CONNECTORS-610) for this problem
>>> too.
>>> >
>>> > Karl
>>> >
>>> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <io...@yahoo.com>
>>> wrote:
>>> >> Hi Karl,
>>> >>
>>> >> Thanks for quick fix.
>>> >>
>>> >> I am still seeing the following error after 'svn
>>> up' and 'ant build'
>>> >>
>>> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
>>> Exception tossed: Repeated service interruptions - failure
>>> processing document: null
>>> >>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> Repeated service interruptions - failure processing
>>> document: null
>>> >>         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>>> >> Caused by:
>>> org.apache.http.client.ClientProtocolException
>>> >>         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>> >>         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>> >>         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>> >>         at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>> >>         at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> >>         at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>> >>         at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
>>> >> Caused by:
>>> org.apache.http.client.NonRepeatableRequestException: Cannot
>>> retry request with a non-repeatable request entity.
>>> The cause lists the reason the original request failed.
>>> >>         at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>> >>         at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>> >>         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>> >>         ... 6 more
>>> >> Caused by: java.net.SocketException: Broken pipe
>>> >>         at
>>> java.net.SocketOutputStream.socketWrite0(Native Method)
>>> >>         at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>> >>         at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>> >>         at
>>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>> >>         at
>>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>> >>         at
>>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>> >>         at
>>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>> >>         at
>>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>> >>         at
>>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>> >>         at
>>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>> >>         at
>>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>> >>         at
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>> >>         at
>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>> >>         at
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>> >>         at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>> >>         ... 8 more
>>> >>
>>> >>
>>> >>
>>> >> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com>
>>> wrote:
>>> >>
>>> >>> From: Karl Wright <da...@gmail.com>
>>> >>> Subject: Re: Repeated service interruptions -
>>> failure processing document: null
>>> >>> To: dev@manifoldcf.apache.org
>>> >>> Date: Monday, January 14, 2013, 3:30 PM
>>> >>> Hi Ahmet,
>>> >>>
>>> >>> The exception that seems to be causing the
>>> abort is a socket
>>> >>> exception
>>> >>> coming from a socket write:
>>> >>>
>>> >>> > Caused by: java.net.SocketException:
>>> Broken pipe
>>> >>>
>>> >>> This makes sense in light of the http code
>>> returned from
>>> >>> Solr, which
>>> >>> was 413:  http://www.checkupdown.com/status/E413.html .
>>> >>>
>>> >>> So there is nothing actually *wrong* with the
>>> .aspx
>>> >>> documents, but
>>> >>> they are just way too big, and Solr is
>>> rejecting them for
>>> >>> that reason.
>>> >>>
>>> >>> Clearly, though, the Solr connector should
>>> recognize this
>>> >>> code as
>>> >>> meaning "never retry", so instead of killing
>>> the job, it
>>> >>> should just
>>> >>> skip the document.  I'll open a ticket for
>>> that now.
>>> >>>
>>> >>> Karl
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan
>>> <io...@yahoo.com>
>>> >>> wrote:
>>> >>> > Hello,
>>> >>> >
>>> >>> > I am indexing a SharePoint 2010 instance
>>> using
>>> >>> mcf-trunk (At revision 1432907)
>>> >>> >
>>> >>> > There is no problem with a Document
>>> library that
>>> >>> contains word excel etc.
>>> >>> >
>>> >>> > However, I receive the following errors
>>> with a Document
>>> >>> library that has *.aspx files in it.
>>> >>> >
>>> >>> > Status of Jobs => Error: Repeated
>>> service
>>> >>> interruptions - failure processing document:
>>> null
>>> >>> >
>>> >>> >  WARN 2013-01-14 15:00:12,720 (Worker
>>> thread '13')
>>> >>> - Service interruption reported for job
>>> 1358009105156
>>> >>> connection 'iknow': IO exception during
>>> indexing: null
>>> >>> > ERROR 2013-01-14 15:00:12,763 (Worker
>>> thread '13') -
>>> >>> Exception tossed: Repeated service
>>> interruptions - failure
>>> >>> processing document: null
>>> >>> >
>>> >>>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> >>> Repeated service interruptions - failure
>>> processing
>>> >>> document: null
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>>> >>> > Caused by:
>>> >>> org.apache.http.client.ClientProtocolException
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>> >>> >         at
>>> >>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>> >>> >         at
>>> >>>
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> >>> >         at
>>> >>>
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>>> >>> > Caused by:
>>> >>>
>>> org.apache.http.client.NonRepeatableRequestException:
>>> Cannot
>>> >>> retry request with a non-repeatable request
>>> entity.
>>> >>> The cause lists the reason the original request
>>> failed.
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>> >>> >         ...
>>> 6 more
>>> >>> > Caused by: java.net.SocketException:
>>> Broken pipe
>>> >>> >         at
>>> >>> java.net.SocketOutputStream.socketWrite0(Native
>>> Method)
>>> >>> >         at
>>> >>>
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>> >>> >         at
>>> >>>
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>> >>> >         at
>>> >>>
>>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>> >>> >         at
>>> >>>
>>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>> >>> >         at
>>> >>>
>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>> >>> >         at
>>> >>>
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>> >>> >         at
>>> >>>
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>> >>> >         ...
>>> 8 more
>>> >>> >
>>> >>> > Status of Jobs => Error: Unhandled Solr
>>> exception
>>> >>> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>>> >>> status:413, message:FULL head
>>> >>> >
>>> >>> >
>>>    ERROR 2013-01-14
>>> >>> 15:10:42,074 (Worker thread '15') - Exception
>>> tossed:
>>> >>> Unhandled Solr exception during indexing (0):
>>> Server at http://localhost:8983/solr/all returned
>>> non ok
>>> >>> status:413, message:FULL head
>>> >>> >
>>> >>>
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> >>> Unhandled Solr exception during indexing (0):
>>> Server at http://localhost:8983/solr/all returned
>>> non ok
>>> >>> status:413, message:FULL head
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>> >>> >         at
>>> >>>
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>> >>> >
>>> >>> > On the solr side I see :
>>> >>> >
>>> >>> > INFO: Creating new http client,
>>> >>>
>>> config:maxConnections=200&maxConnectionsPerHost=8
>>> >>> > 2013-01-14
>>> 15:18:21.775:WARN:oejh.HttpParser:Full
>>> >>>
>>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>>> >>> ...long long chars ... 2B656B6970{}
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Ahmet
>>> >>>
>>>

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
Hi Ahmet,

We could specifically treat .aspx files specially, so that they are
considered to never have any content.  But are there cases where
someone might want to index any content that these URLs might return?
Specifically, what do .aspx "files" typically contain, when found in a
SharePoint hierarchy?

Karl

On Mon, Jan 14, 2013 at 11:37 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> Hi Karl,
>
> Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No exceptions in the log.
>
> I increased the maximum POST size of solr/jetty but that 39 number didn't increased.
>
> I will check the size of remaining 130 - 39 *.aspx files.
>
> Actually I am mapping extracted content of this aspx files to a ignored dynamic field. (fmap.content=content_ignored) I don't use them. I am only interested in metadata of these aspx files. It would be great if there is a setting  to just grab metadata. Similar to Lists.
>
> Thanks,
> Ahmet
>
> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:
>
>> From: Karl Wright <da...@gmail.com>
>> Subject: Re: Repeated service interruptions - failure processing document: null
>> To: dev@manifoldcf.apache.org
>> Date: Monday, January 14, 2013, 5:46 PM
>> I checked in a fix for this ticket on
>> trunk.  Please let me know if it
>> resolves this issue.
>>
>> Karl
>>
>> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <da...@gmail.com>
>> wrote:
>> > This is because httpclient is retrying on error for
>> three times by
>> > default.  This has to be disabled in the Solr
>> connector, or the rest
>> > of the logic won't work right.
>> >
>> > I've opened a ticket (CONNECTORS-610) for this problem
>> too.
>> >
>> > Karl
>> >
>> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <io...@yahoo.com>
>> wrote:
>> >> Hi Karl,
>> >>
>> >> Thanks for quick fix.
>> >>
>> >> I am still seeing the following error after 'svn
>> up' and 'ant build'
>> >>
>> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
>> Exception tossed: Repeated service interruptions - failure
>> processing document: null
>> >>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> Repeated service interruptions - failure processing
>> document: null
>> >>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> >> Caused by:
>> org.apache.http.client.ClientProtocolException
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> >>         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> >>         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>         at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> >>         at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
>> >> Caused by:
>> org.apache.http.client.NonRepeatableRequestException: Cannot
>> retry request with a non-repeatable request entity.
>> The cause lists the reason the original request failed.
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> >>         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> >>         ... 6 more
>> >> Caused by: java.net.SocketException: Broken pipe
>> >>         at
>> java.net.SocketOutputStream.socketWrite0(Native Method)
>> >>         at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >>         at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >>         at
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> >>         at
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> >>         at
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> >>         at
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> >>         at
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> >>         at
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> >>         at
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> >>         at
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> >>         at
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> >>         at
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> >>         at
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> >>         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> >>         ... 8 more
>> >>
>> >>
>> >>
>> >> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com>
>> wrote:
>> >>
>> >>> From: Karl Wright <da...@gmail.com>
>> >>> Subject: Re: Repeated service interruptions -
>> failure processing document: null
>> >>> To: dev@manifoldcf.apache.org
>> >>> Date: Monday, January 14, 2013, 3:30 PM
>> >>> Hi Ahmet,
>> >>>
>> >>> The exception that seems to be causing the
>> abort is a socket
>> >>> exception
>> >>> coming from a socket write:
>> >>>
>> >>> > Caused by: java.net.SocketException:
>> Broken pipe
>> >>>
>> >>> This makes sense in light of the http code
>> returned from
>> >>> Solr, which
>> >>> was 413:  http://www.checkupdown.com/status/E413.html .
>> >>>
>> >>> So there is nothing actually *wrong* with the
>> .aspx
>> >>> documents, but
>> >>> they are just way too big, and Solr is
>> rejecting them for
>> >>> that reason.
>> >>>
>> >>> Clearly, though, the Solr connector should
>> recognize this
>> >>> code as
>> >>> meaning "never retry", so instead of killing
>> the job, it
>> >>> should just
>> >>> skip the document.  I'll open a ticket for
>> that now.
>> >>>
>> >>> Karl
>> >>>
>> >>>
>> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan
>> <io...@yahoo.com>
>> >>> wrote:
>> >>> > Hello,
>> >>> >
>> >>> > I am indexing a SharePoint 2010 instance
>> using
>> >>> mcf-trunk (At revision 1432907)
>> >>> >
>> >>> > There is no problem with a Document
>> library that
>> >>> contains word excel etc.
>> >>> >
>> >>> > However, I receive the following errors
>> with a Document
>> >>> library that has *.aspx files in it.
>> >>> >
>> >>> > Status of Jobs => Error: Repeated
>> service
>> >>> interruptions - failure processing document:
>> null
>> >>> >
>> >>> >  WARN 2013-01-14 15:00:12,720 (Worker
>> thread '13')
>> >>> - Service interruption reported for job
>> 1358009105156
>> >>> connection 'iknow': IO exception during
>> indexing: null
>> >>> > ERROR 2013-01-14 15:00:12,763 (Worker
>> thread '13') -
>> >>> Exception tossed: Repeated service
>> interruptions - failure
>> >>> processing document: null
>> >>> >
>> >>>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> >>> Repeated service interruptions - failure
>> processing
>> >>> document: null
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> >>> > Caused by:
>> >>> org.apache.http.client.ClientProtocolException
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>> >         at
>> >>>
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>> >>> > Caused by:
>> >>>
>> org.apache.http.client.NonRepeatableRequestException:
>> Cannot
>> >>> retry request with a non-repeatable request
>> entity.
>> >>> The cause lists the reason the original request
>> failed.
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> >>> >         ...
>> 6 more
>> >>> > Caused by: java.net.SocketException:
>> Broken pipe
>> >>> >         at
>> >>> java.net.SocketOutputStream.socketWrite0(Native
>> Method)
>> >>> >         at
>> >>>
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >>> >         at
>> >>>
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> >>> >         at
>> >>>
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> >>> >         at
>> >>>
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> >>> >         at
>> >>>
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> >>> >         at
>> >>>
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> >>> >         at
>> >>>
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> >>> >         at
>> >>>
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> >>> >         at
>> >>>
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> >>> >         at
>> >>>
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> >>> >         at
>> >>>
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> >>> >         ...
>> 8 more
>> >>> >
>> >>> > Status of Jobs => Error: Unhandled Solr
>> exception
>> >>> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>> >>> status:413, message:FULL head
>> >>> >
>> >>> >
>>    ERROR 2013-01-14
>> >>> 15:10:42,074 (Worker thread '15') - Exception
>> tossed:
>> >>> Unhandled Solr exception during indexing (0):
>> Server at http://localhost:8983/solr/all returned
>> non ok
>> >>> status:413, message:FULL head
>> >>> >
>> >>>
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> >>> Unhandled Solr exception during indexing (0):
>> Server at http://localhost:8983/solr/all returned
>> non ok
>> >>> status:413, message:FULL head
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>> >>> >         at
>> >>>
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>> >>> >
>> >>> > On the solr side I see :
>> >>> >
>> >>> > INFO: Creating new http client,
>> >>>
>> config:maxConnections=200&maxConnectionsPerHost=8
>> >>> > 2013-01-14
>> 15:18:21.775:WARN:oejh.HttpParser:Full
>> >>>
>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>> >>> ...long long chars ... 2B656B6970{}
>> >>> >
>> >>> > Thanks,
>> >>> > Ahmet
>> >>>
>>

Re: Repeated service interruptions - failure processing document: null

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Karl,

Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No exceptions in the log.

I increased the maximum POST size of solr/jetty but that 39 number didn't increased. 

I will check the size of remaining 130 - 39 *.aspx files.

Actually I am mapping extracted content of this aspx files to a ignored dynamic field. (fmap.content=content_ignored) I don't use them. I am only interested in metadata of these aspx files. It would be great if there is a setting  to just grab metadata. Similar to Lists.

Thanks,
Ahmet

--- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:

> From: Karl Wright <da...@gmail.com>
> Subject: Re: Repeated service interruptions - failure processing document: null
> To: dev@manifoldcf.apache.org
> Date: Monday, January 14, 2013, 5:46 PM
> I checked in a fix for this ticket on
> trunk.  Please let me know if it
> resolves this issue.
> 
> Karl
> 
> On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <da...@gmail.com>
> wrote:
> > This is because httpclient is retrying on error for
> three times by
> > default.  This has to be disabled in the Solr
> connector, or the rest
> > of the logic won't work right.
> >
> > I've opened a ticket (CONNECTORS-610) for this problem
> too.
> >
> > Karl
> >
> > On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <io...@yahoo.com>
> wrote:
> >> Hi Karl,
> >>
> >> Thanks for quick fix.
> >>
> >> I am still seeing the following error after 'svn
> up' and 'ant build'
> >>
> >> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
> Exception tossed: Repeated service interruptions - failure
> processing document: null
> >>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Repeated service interruptions - failure processing
> document: null
> >>         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >> Caused by:
> org.apache.http.client.ClientProtocolException
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >>         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>         at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >>         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
> >> Caused by:
> org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity. 
> The cause lists the reason the original request failed.
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >>         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >>         ... 6 more
> >> Caused by: java.net.SocketException: Broken pipe
> >>         at
> java.net.SocketOutputStream.socketWrite0(Native Method)
> >>         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >>         at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >>         at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >>         at
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >>         at
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >>         at
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >>         at
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >>         at
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >>         at
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >>         at
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >>         at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >>         at
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >>         at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >>         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >>         ... 8 more
> >>
> >>
> >>
> >> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com>
> wrote:
> >>
> >>> From: Karl Wright <da...@gmail.com>
> >>> Subject: Re: Repeated service interruptions -
> failure processing document: null
> >>> To: dev@manifoldcf.apache.org
> >>> Date: Monday, January 14, 2013, 3:30 PM
> >>> Hi Ahmet,
> >>>
> >>> The exception that seems to be causing the
> abort is a socket
> >>> exception
> >>> coming from a socket write:
> >>>
> >>> > Caused by: java.net.SocketException:
> Broken pipe
> >>>
> >>> This makes sense in light of the http code
> returned from
> >>> Solr, which
> >>> was 413:  http://www.checkupdown.com/status/E413.html .
> >>>
> >>> So there is nothing actually *wrong* with the
> .aspx
> >>> documents, but
> >>> they are just way too big, and Solr is
> rejecting them for
> >>> that reason.
> >>>
> >>> Clearly, though, the Solr connector should
> recognize this
> >>> code as
> >>> meaning "never retry", so instead of killing
> the job, it
> >>> should just
> >>> skip the document.  I'll open a ticket for
> that now.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan
> <io...@yahoo.com>
> >>> wrote:
> >>> > Hello,
> >>> >
> >>> > I am indexing a SharePoint 2010 instance
> using
> >>> mcf-trunk (At revision 1432907)
> >>> >
> >>> > There is no problem with a Document
> library that
> >>> contains word excel etc.
> >>> >
> >>> > However, I receive the following errors
> with a Document
> >>> library that has *.aspx files in it.
> >>> >
> >>> > Status of Jobs => Error: Repeated
> service
> >>> interruptions - failure processing document:
> null
> >>> >
> >>> >  WARN 2013-01-14 15:00:12,720 (Worker
> thread '13')
> >>> - Service interruption reported for job
> 1358009105156
> >>> connection 'iknow': IO exception during
> indexing: null
> >>> > ERROR 2013-01-14 15:00:12,763 (Worker
> thread '13') -
> >>> Exception tossed: Repeated service
> interruptions - failure
> >>> processing document: null
> >>> >
> >>>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >>> Repeated service interruptions - failure
> processing
> >>> document: null
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> >>> > Caused by:
> >>> org.apache.http.client.ClientProtocolException
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>> >         at
> >>>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> >>> > Caused by:
> >>>
> org.apache.http.client.NonRepeatableRequestException:
> Cannot
> >>> retry request with a non-repeatable request
> entity.
> >>> The cause lists the reason the original request
> failed.
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >>> >         at
> >>>
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >>> >         ...
> 6 more
> >>> > Caused by: java.net.SocketException:
> Broken pipe
> >>> >         at
> >>> java.net.SocketOutputStream.socketWrite0(Native
> Method)
> >>> >         at
> >>>
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >>> >         at
> >>>
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >>> >         at
> >>>
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >>> >         at
> >>>
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >>> >         at
> >>>
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >>> >         at
> >>>
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >>> >         at
> >>>
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >>> >         at
> >>>
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >>> >         at
> >>>
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >>> >         at
> >>>
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >>> >         at
> >>>
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >>> >         at
> >>>
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >>> >         at
> >>>
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >>> >         at
> >>>
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >>> >         ...
> 8 more
> >>> >
> >>> > Status of Jobs => Error: Unhandled Solr
> exception
> >>> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
> >>> status:413, message:FULL head
> >>> >
> >>> >     
>    ERROR 2013-01-14
> >>> 15:10:42,074 (Worker thread '15') - Exception
> tossed:
> >>> Unhandled Solr exception during indexing (0):
> Server at http://localhost:8983/solr/all returned
> non ok
> >>> status:413, message:FULL head
> >>> >
> >>>
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> >>> Unhandled Solr exception during indexing (0):
> Server at http://localhost:8983/solr/all returned
> non ok
> >>> status:413, message:FULL head
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
> >>> >         at
> >>>
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >>> >         at
> >>>
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
> >>> >
> >>> > On the solr side I see :
> >>> >
> >>> > INFO: Creating new http client,
> >>>
> config:maxConnections=200&maxConnectionsPerHost=8
> >>> > 2013-01-14
> 15:18:21.775:WARN:oejh.HttpParser:Full
> >>>
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
> >>> ...long long chars ... 2B656B6970{}
> >>> >
> >>> > Thanks,
> >>> > Ahmet
> >>>
> 

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
I checked in a fix for this ticket on trunk.  Please let me know if it
resolves this issue.

Karl

On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright <da...@gmail.com> wrote:
> This is because httpclient is retrying on error for three times by
> default.  This has to be disabled in the Solr connector, or the rest
> of the logic won't work right.
>
> I've opened a ticket (CONNECTORS-610) for this problem too.
>
> Karl
>
> On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>> Hi Karl,
>>
>> Thanks for quick fix.
>>
>> I am still seeing the following error after 'svn up' and 'ant build'
>>
>> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - Exception tossed: Repeated service interruptions - failure processing document: null
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> Caused by: org.apache.http.client.ClientProtocolException
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>         at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
>> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
>>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>         ... 6 more
>> Caused by: java.net.SocketException: Broken pipe
>>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>         at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>         at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>         at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>         at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>         at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>         at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>         at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>         at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>         at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>         at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>         ... 8 more
>>
>>
>>
>> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:
>>
>>> From: Karl Wright <da...@gmail.com>
>>> Subject: Re: Repeated service interruptions - failure processing document: null
>>> To: dev@manifoldcf.apache.org
>>> Date: Monday, January 14, 2013, 3:30 PM
>>> Hi Ahmet,
>>>
>>> The exception that seems to be causing the abort is a socket
>>> exception
>>> coming from a socket write:
>>>
>>> > Caused by: java.net.SocketException: Broken pipe
>>>
>>> This makes sense in light of the http code returned from
>>> Solr, which
>>> was 413:  http://www.checkupdown.com/status/E413.html .
>>>
>>> So there is nothing actually *wrong* with the .aspx
>>> documents, but
>>> they are just way too big, and Solr is rejecting them for
>>> that reason.
>>>
>>> Clearly, though, the Solr connector should recognize this
>>> code as
>>> meaning "never retry", so instead of killing the job, it
>>> should just
>>> skip the document.  I'll open a ticket for that now.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I am indexing a SharePoint 2010 instance using
>>> mcf-trunk (At revision 1432907)
>>> >
>>> > There is no problem with a Document library that
>>> contains word excel etc.
>>> >
>>> > However, I receive the following errors with a Document
>>> library that has *.aspx files in it.
>>> >
>>> > Status of Jobs => Error: Repeated service
>>> interruptions - failure processing document: null
>>> >
>>> >  WARN 2013-01-14 15:00:12,720 (Worker thread '13')
>>> - Service interruption reported for job 1358009105156
>>> connection 'iknow': IO exception during indexing: null
>>> > ERROR 2013-01-14 15:00:12,763 (Worker thread '13') -
>>> Exception tossed: Repeated service interruptions - failure
>>> processing document: null
>>> >
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> Repeated service interruptions - failure processing
>>> document: null
>>> >         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>>> > Caused by:
>>> org.apache.http.client.ClientProtocolException
>>> >         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>> >         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>> >         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>> >         at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>> >         at
>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>> >         at
>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>> >         at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>>> > Caused by:
>>> org.apache.http.client.NonRepeatableRequestException: Cannot
>>> retry request with a non-repeatable request entity.
>>> The cause lists the reason the original request failed.
>>> >         at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>> >         at
>>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>> >         at
>>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>> >         ... 6 more
>>> > Caused by: java.net.SocketException: Broken pipe
>>> >         at
>>> java.net.SocketOutputStream.socketWrite0(Native Method)
>>> >         at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>> >         at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>> >         at
>>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>> >         at
>>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>> >         at
>>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>> >         at
>>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>> >         at
>>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>> >         at
>>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>> >         at
>>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>> >         at
>>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>> >         at
>>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>> >         at
>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>> >         at
>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>> >         at
>>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>> >         ... 8 more
>>> >
>>> > Status of Jobs => Error: Unhandled Solr exception
>>> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>>> status:413, message:FULL head
>>> >
>>> >         ERROR 2013-01-14
>>> 15:10:42,074 (Worker thread '15') - Exception tossed:
>>> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>>> status:413, message:FULL head
>>> >
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>>> status:413, message:FULL head
>>> >         at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>>> >         at
>>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>>> >         at
>>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>>> >         at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>> >         at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>> >         at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>> >         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>> >         at
>>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>>> >         at
>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>> >         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>> >
>>> > On the solr side I see :
>>> >
>>> > INFO: Creating new http client,
>>> config:maxConnections=200&maxConnectionsPerHost=8
>>> > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full
>>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>>> ...long long chars ... 2B656B6970{}
>>> >
>>> > Thanks,
>>> > Ahmet
>>>

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
This is because httpclient is retrying on error for three times by
default.  This has to be disabled in the Solr connector, or the rest
of the logic won't work right.

I've opened a ticket (CONNECTORS-610) for this problem too.

Karl

On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> Hi Karl,
>
> Thanks for quick fix.
>
> I am still seeing the following error after 'svn up' and 'ant build'
>
> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - Exception tossed: Repeated service interruptions - failure processing document: null
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.http.client.ClientProtocolException
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>         at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>         ... 6 more
> Caused by: java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>         at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>         at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>         at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>         at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>         at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>         at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>         at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>         at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>         at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>         ... 8 more
>
>
>
> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:
>
>> From: Karl Wright <da...@gmail.com>
>> Subject: Re: Repeated service interruptions - failure processing document: null
>> To: dev@manifoldcf.apache.org
>> Date: Monday, January 14, 2013, 3:30 PM
>> Hi Ahmet,
>>
>> The exception that seems to be causing the abort is a socket
>> exception
>> coming from a socket write:
>>
>> > Caused by: java.net.SocketException: Broken pipe
>>
>> This makes sense in light of the http code returned from
>> Solr, which
>> was 413:  http://www.checkupdown.com/status/E413.html .
>>
>> So there is nothing actually *wrong* with the .aspx
>> documents, but
>> they are just way too big, and Solr is rejecting them for
>> that reason.
>>
>> Clearly, though, the Solr connector should recognize this
>> code as
>> meaning "never retry", so instead of killing the job, it
>> should just
>> skip the document.  I'll open a ticket for that now.
>>
>> Karl
>>
>>
>> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com>
>> wrote:
>> > Hello,
>> >
>> > I am indexing a SharePoint 2010 instance using
>> mcf-trunk (At revision 1432907)
>> >
>> > There is no problem with a Document library that
>> contains word excel etc.
>> >
>> > However, I receive the following errors with a Document
>> library that has *.aspx files in it.
>> >
>> > Status of Jobs => Error: Repeated service
>> interruptions - failure processing document: null
>> >
>> >  WARN 2013-01-14 15:00:12,720 (Worker thread '13')
>> - Service interruption reported for job 1358009105156
>> connection 'iknow': IO exception during indexing: null
>> > ERROR 2013-01-14 15:00:12,763 (Worker thread '13') -
>> Exception tossed: Repeated service interruptions - failure
>> processing document: null
>> >
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> Repeated service interruptions - failure processing
>> document: null
>> >         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> > Caused by:
>> org.apache.http.client.ClientProtocolException
>> >         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> >         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> >         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> >         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> >         at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >         at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> >         at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>> > Caused by:
>> org.apache.http.client.NonRepeatableRequestException: Cannot
>> retry request with a non-repeatable request entity.
>> The cause lists the reason the original request failed.
>> >         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> >         at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> >         at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> >         ... 6 more
>> > Caused by: java.net.SocketException: Broken pipe
>> >         at
>> java.net.SocketOutputStream.socketWrite0(Native Method)
>> >         at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> >         at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> >         at
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> >         at
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> >         at
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> >         at
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> >         at
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> >         at
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> >         at
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> >         at
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> >         at
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> >         at
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> >         at
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> >         at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> >         ... 8 more
>> >
>> > Status of Jobs => Error: Unhandled Solr exception
>> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>> status:413, message:FULL head
>> >
>> >         ERROR 2013-01-14
>> 15:10:42,074 (Worker thread '15') - Exception tossed:
>> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>> status:413, message:FULL head
>> >
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>> status:413, message:FULL head
>> >         at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>> >         at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>> >         at
>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>> >         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>> >         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>> >         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>> >         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>> >         at
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>> >         at
>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>> >         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>> >
>> > On the solr side I see :
>> >
>> > INFO: Creating new http client,
>> config:maxConnections=200&maxConnectionsPerHost=8
>> > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full
>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>> ...long long chars ... 2B656B6970{}
>> >
>> > Thanks,
>> > Ahmet
>>

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
CONNECTORS-609

Karl

On Mon, Jan 14, 2013 at 8:30 AM, Karl Wright <da...@gmail.com> wrote:
> Hi Ahmet,
>
> The exception that seems to be causing the abort is a socket exception
> coming from a socket write:
>
>> Caused by: java.net.SocketException: Broken pipe
>
> This makes sense in light of the http code returned from Solr, which
> was 413:  http://www.checkupdown.com/status/E413.html .
>
> So there is nothing actually *wrong* with the .aspx documents, but
> they are just way too big, and Solr is rejecting them for that reason.
>
> Clearly, though, the Solr connector should recognize this code as
> meaning "never retry", so instead of killing the job, it should just
> skip the document.  I'll open a ticket for that now.
>
> Karl
>
>
> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>> Hello,
>>
>> I am indexing a SharePoint 2010 instance using mcf-trunk (At revision 1432907)
>>
>> There is no problem with a Document library that contains word excel etc.
>>
>> However, I receive the following errors with a Document library that has *.aspx files in it.
>>
>> Status of Jobs => Error: Repeated service interruptions - failure processing document: null
>>
>>  WARN 2013-01-14 15:00:12,720 (Worker thread '13') - Service interruption reported for job 1358009105156 connection 'iknow': IO exception during indexing: null
>> ERROR 2013-01-14 15:00:12,763 (Worker thread '13') - Exception tossed: Repeated service interruptions - failure processing document: null
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> Caused by: org.apache.http.client.ClientProtocolException
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>         at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
>>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>         ... 6 more
>> Caused by: java.net.SocketException: Broken pipe
>>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>         at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>         at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>         at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>         at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>         at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>         at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>         at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>         at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>         at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>         at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>         ... 8 more
>>
>> Status of Jobs => Error: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
>>
>>         ERROR 2013-01-14 15:10:42,074 (Worker thread '15') - Exception tossed: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
>>         at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>>         at org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>>         at org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>
>> On the solr side I see :
>>
>> INFO: Creating new http client, config:maxConnections=200&maxConnectionsPerHost=8
>> 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 ...long long chars ... 2B656B6970{}
>>
>> Thanks,
>> Ahmet

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
Hmm, this makes no sense.

The code is this:

      com.microsoft.sharepoint.webpartpages.GetPermissionCollectionResponseGetPermissionCollectionResult
aclResult = aclCall.getPermissionCollection( encodedRelativePath,
"Item" );
      org.apache.axis.message.MessageElement[] aclList = aclResult.get_any();

It's the second line that fails.  So aclResult is coming back null
from aclCall.getPermissionCollection().  No error, nothing, just a
null result??

I can hack this, of course, but like I said it isn't making much
sense.  Can you perhaps turn on wire debugging and attach the output
to the CONNECTORS-611 ticket please?

Karl


On Mon, Jan 14, 2013 at 10:17 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> Hi,
>
> If I enable security (Active Directory), job seems hang and I get this too:
>
> FATAL 2013-01-14 17:13:46,871 (Worker thread '15') - Error tossed: null
> java.lang.NullPointerException
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getDocumentACLs(SPSProxyHelper.java:324)
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:929)
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321)
>
> Previous stack traces were produced with security=disabled.
>
> thanks,
> Ahmet
>
> --- On Mon, 1/14/13, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> From: Ahmet Arslan <io...@yahoo.com>
>> Subject: Re: Repeated service interruptions - failure processing document: null
>> To: dev@manifoldcf.apache.org
>> Date: Monday, January 14, 2013, 5:13 PM
>> Hi Karl,
>>
>> Thanks for quick fix.
>>
>> I am still seeing the following error after 'svn up' and
>> 'ant build'
>>
>> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
>> Exception tossed: Repeated service interruptions - failure
>> processing document: null
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> Repeated service interruptions - failure processing
>> document: null
>>     at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> Caused by: org.apache.http.client.ClientProtocolException
>>     at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>>     at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>>     at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>>     at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>>     at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>>     at
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>     at
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
>> Caused by:
>> org.apache.http.client.NonRepeatableRequestException: Cannot
>> retry request with a non-repeatable request entity.
>> The cause lists the reason the original request failed.
>>     at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>>     at
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>>     at
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>>     ... 6 more
>> Caused by: java.net.SocketException: Broken pipe
>>     at
>> java.net.SocketOutputStream.socketWrite0(Native Method)
>>     at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>     at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>     at
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>>     at
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>>     at
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>>     at
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>>     at
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>>     at
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>>     at
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>>     at
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>>     at
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>>     at
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>>     at
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>>     at
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>>     ... 8 more
>>
>>
>>
>> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com>
>> wrote:
>>
>> > From: Karl Wright <da...@gmail.com>
>> > Subject: Re: Repeated service interruptions - failure
>> processing document: null
>> > To: dev@manifoldcf.apache.org
>> > Date: Monday, January 14, 2013, 3:30 PM
>> > Hi Ahmet,
>> >
>> > The exception that seems to be causing the abort is a
>> socket
>> > exception
>> > coming from a socket write:
>> >
>> > > Caused by: java.net.SocketException: Broken pipe
>> >
>> > This makes sense in light of the http code returned
>> from
>> > Solr, which
>> > was 413:  http://www.checkupdown.com/status/E413.html .
>> >
>> > So there is nothing actually *wrong* with the .aspx
>> > documents, but
>> > they are just way too big, and Solr is rejecting them
>> for
>> > that reason.
>> >
>> > Clearly, though, the Solr connector should recognize
>> this
>> > code as
>> > meaning "never retry", so instead of killing the job,
>> it
>> > should just
>> > skip the document.  I'll open a ticket for that now.
>> >
>> > Karl
>> >
>> >
>> > On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com>
>> > wrote:
>> > > Hello,
>> > >
>> > > I am indexing a SharePoint 2010 instance using
>> > mcf-trunk (At revision 1432907)
>> > >
>> > > There is no problem with a Document library that
>> > contains word excel etc.
>> > >
>> > > However, I receive the following errors with a
>> Document
>> > library that has *.aspx files in it.
>> > >
>> > > Status of Jobs => Error: Repeated service
>> > interruptions - failure processing document: null
>> > >
>> > >  WARN 2013-01-14 15:00:12,720 (Worker thread
>> '13')
>> > - Service interruption reported for job 1358009105156
>> > connection 'iknow': IO exception during indexing: null
>> > > ERROR 2013-01-14 15:00:12,763 (Worker thread '13')
>> -
>> > Exception tossed: Repeated service interruptions -
>> failure
>> > processing document: null
>> > >
>> >
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> > Repeated service interruptions - failure processing
>> > document: null
>> > >         at
>> >
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
>> > > Caused by:
>> > org.apache.http.client.ClientProtocolException
>> > >         at
>> >
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>> > >         at
>> >
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>> > >         at
>> >
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>> > >         at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>> > >         at
>> >
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> > >         at
>> >
>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
>> > > Caused by:
>> > org.apache.http.client.NonRepeatableRequestException:
>> Cannot
>> > retry request with a non-repeatable request entity.
>> > The cause lists the reason the original request
>> failed.
>> > >         at
>> >
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>> > >         at
>> >
>> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>> > >         at
>> >
>> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>> > >         ... 6 more
>> > > Caused by: java.net.SocketException: Broken pipe
>> > >         at
>> > java.net.SocketOutputStream.socketWrite0(Native
>> Method)
>> > >         at
>> >
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>> > >         at
>> >
>> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> > >         at
>> >
>> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>> > >         at
>> >
>> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>> > >         at
>> >
>> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>> > >         at
>> >
>> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>> > >         at
>> >
>> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>> > >         at
>> >
>> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>> > >         at
>> >
>> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>> > >         at
>> >
>> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>> > >         at
>> >
>> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>> > >         at
>> >
>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>> > >         at
>> >
>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>> > >         at
>> >
>> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>> > >         ... 8 more
>> > >
>> > > Status of Jobs => Error: Unhandled Solr
>> exception
>> > during indexing (0): Server at http://localhost:8983/solr/all returned non ok
>> > status:413, message:FULL head
>> > >
>> > >         ERROR 2013-01-14
>> > 15:10:42,074 (Worker thread '15') - Exception tossed:
>> > Unhandled Solr exception during indexing (0): Server at
>> http://localhost:8983/solr/all returned non ok
>> > status:413, message:FULL head
>> > >
>> >
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>> > Unhandled Solr exception during indexing (0): Server at
>> http://localhost:8983/solr/all returned non ok
>> > status:413, message:FULL head
>> > >         at
>> >
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>> > >         at
>> >
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>> > >         at
>> >
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>> > >         at
>> >
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>> > >         at
>> >
>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>> > >         at
>> >
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>> > >
>> > > On the solr side I see :
>> > >
>> > > INFO: Creating new http client,
>> > config:maxConnections=200&maxConnectionsPerHost=8
>> > > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full
>> >
>> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
>> > ...long long chars ... 2B656B6970{}
>> > >
>> > > Thanks,
>> > > Ahmet
>> >
>>

Re: Repeated service interruptions - failure processing document: null

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi,

If I enable security (Active Directory), job seems hang and I get this too:

FATAL 2013-01-14 17:13:46,871 (Worker thread '15') - Error tossed: null
java.lang.NullPointerException
	at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getDocumentACLs(SPSProxyHelper.java:324)
	at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.getDocumentVersions(SharePointRepository.java:929)
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:321)

Previous stack traces were produced with security=disabled.

thanks,
Ahmet

--- On Mon, 1/14/13, Ahmet Arslan <io...@yahoo.com> wrote:

> From: Ahmet Arslan <io...@yahoo.com>
> Subject: Re: Repeated service interruptions - failure processing document: null
> To: dev@manifoldcf.apache.org
> Date: Monday, January 14, 2013, 5:13 PM
> Hi Karl,
> 
> Thanks for quick fix. 
> 
> I am still seeing the following error after 'svn up' and
> 'ant build'
> 
> ERROR 2013-01-14 17:09:41,949 (Worker thread '6') -
> Exception tossed: Repeated service interruptions - failure
> processing document: null
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Repeated service interruptions - failure processing
> document: null
>     at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.http.client.ClientProtocolException
>     at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>     at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>     at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>     at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>     at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>     at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>     at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
> Caused by:
> org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity. 
> The cause lists the reason the original request failed.
>     at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>     at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>     at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>     ... 6 more
> Caused by: java.net.SocketException: Broken pipe
>     at
> java.net.SocketOutputStream.socketWrite0(Native Method)
>     at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>     at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>     at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>     at
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>     at
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>     at
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>     at
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>     at
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>     at
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>     at
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>     at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>     at
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>     at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>     at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>     ... 8 more
> 
> 
> 
> --- On Mon, 1/14/13, Karl Wright <da...@gmail.com>
> wrote:
> 
> > From: Karl Wright <da...@gmail.com>
> > Subject: Re: Repeated service interruptions - failure
> processing document: null
> > To: dev@manifoldcf.apache.org
> > Date: Monday, January 14, 2013, 3:30 PM
> > Hi Ahmet,
> > 
> > The exception that seems to be causing the abort is a
> socket
> > exception
> > coming from a socket write:
> > 
> > > Caused by: java.net.SocketException: Broken pipe
> > 
> > This makes sense in light of the http code returned
> from
> > Solr, which
> > was 413:  http://www.checkupdown.com/status/E413.html .
> > 
> > So there is nothing actually *wrong* with the .aspx
> > documents, but
> > they are just way too big, and Solr is rejecting them
> for
> > that reason.
> > 
> > Clearly, though, the Solr connector should recognize
> this
> > code as
> > meaning "never retry", so instead of killing the job,
> it
> > should just
> > skip the document.  I'll open a ticket for that now.
> > 
> > Karl
> > 
> > 
> > On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com>
> > wrote:
> > > Hello,
> > >
> > > I am indexing a SharePoint 2010 instance using
> > mcf-trunk (At revision 1432907)
> > >
> > > There is no problem with a Document library that
> > contains word excel etc.
> > >
> > > However, I receive the following errors with a
> Document
> > library that has *.aspx files in it.
> > >
> > > Status of Jobs => Error: Repeated service
> > interruptions - failure processing document: null
> > >
> > >  WARN 2013-01-14 15:00:12,720 (Worker thread
> '13')
> > - Service interruption reported for job 1358009105156
> > connection 'iknow': IO exception during indexing: null
> > > ERROR 2013-01-14 15:00:12,763 (Worker thread '13')
> -
> > Exception tossed: Repeated service interruptions -
> failure
> > processing document: null
> > >
> >
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> > Repeated service interruptions - failure processing
> > document: null
> > >         at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> > > Caused by:
> > org.apache.http.client.ClientProtocolException
> > >         at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> > >         at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> > >         at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> > >         at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> > >         at
> >
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> > >         at
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> > >         at
> >
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> > > Caused by:
> > org.apache.http.client.NonRepeatableRequestException:
> Cannot
> > retry request with a non-repeatable request entity. 
> > The cause lists the reason the original request
> failed.
> > >         at
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> > >         at
> >
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> > >         at
> >
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> > >         ... 6 more
> > > Caused by: java.net.SocketException: Broken pipe
> > >         at
> > java.net.SocketOutputStream.socketWrite0(Native
> Method)
> > >         at
> >
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> > >         at
> >
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> > >         at
> >
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> > >         at
> >
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> > >         at
> >
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> > >         at
> >
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> > >         at
> >
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> > >         at
> >
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> > >         at
> >
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> > >         at
> >
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> > >         at
> >
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> > >         at
> >
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> > >         at
> >
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> > >         at
> >
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> > >         ... 8 more
> > >
> > > Status of Jobs => Error: Unhandled Solr
> exception
> > during indexing (0): Server at http://localhost:8983/solr/all returned non ok
> > status:413, message:FULL head
> > >
> > >         ERROR 2013-01-14
> > 15:10:42,074 (Worker thread '15') - Exception tossed:
> > Unhandled Solr exception during indexing (0): Server at
> http://localhost:8983/solr/all returned non ok
> > status:413, message:FULL head
> > >
> >
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> > Unhandled Solr exception during indexing (0): Server at
> http://localhost:8983/solr/all returned non ok
> > status:413, message:FULL head
> > >         at
> >
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
> > >         at
> >
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
> > >         at
> >
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
> > >         at
> >
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
> > >         at
> >
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
> > >         at
> >
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
> > >         at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
> > >         at
> >
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
> > >         at
> >
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> > >         at
> >
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
> > >
> > > On the solr side I see :
> > >
> > > INFO: Creating new http client,
> > config:maxConnections=200&maxConnectionsPerHost=8
> > > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full
> >
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
> > ...long long chars ... 2B656B6970{}
> > >
> > > Thanks,
> > > Ahmet
> >
> 

Re: Repeated service interruptions - failure processing document: null

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Karl,

Thanks for quick fix. 

I am still seeing the following error after 'svn up' and 'ant build'

ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - Exception tossed: Repeated service interruptions - failure processing document: null
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
	at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
Caused by: org.apache.http.client.ClientProtocolException
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
	at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:790)
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	... 6 more
Caused by: java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
	at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
	at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
	at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
	... 8 more



--- On Mon, 1/14/13, Karl Wright <da...@gmail.com> wrote:

> From: Karl Wright <da...@gmail.com>
> Subject: Re: Repeated service interruptions - failure processing document: null
> To: dev@manifoldcf.apache.org
> Date: Monday, January 14, 2013, 3:30 PM
> Hi Ahmet,
> 
> The exception that seems to be causing the abort is a socket
> exception
> coming from a socket write:
> 
> > Caused by: java.net.SocketException: Broken pipe
> 
> This makes sense in light of the http code returned from
> Solr, which
> was 413:  http://www.checkupdown.com/status/E413.html .
> 
> So there is nothing actually *wrong* with the .aspx
> documents, but
> they are just way too big, and Solr is rejecting them for
> that reason.
> 
> Clearly, though, the Solr connector should recognize this
> code as
> meaning "never retry", so instead of killing the job, it
> should just
> skip the document.  I'll open a ticket for that now.
> 
> Karl
> 
> 
> On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com>
> wrote:
> > Hello,
> >
> > I am indexing a SharePoint 2010 instance using
> mcf-trunk (At revision 1432907)
> >
> > There is no problem with a Document library that
> contains word excel etc.
> >
> > However, I receive the following errors with a Document
> library that has *.aspx files in it.
> >
> > Status of Jobs => Error: Repeated service
> interruptions - failure processing document: null
> >
> >  WARN 2013-01-14 15:00:12,720 (Worker thread '13')
> - Service interruption reported for job 1358009105156
> connection 'iknow': IO exception during indexing: null
> > ERROR 2013-01-14 15:00:12,763 (Worker thread '13') -
> Exception tossed: Repeated service interruptions - failure
> processing document: null
> >
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Repeated service interruptions - failure processing
> document: null
> >         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> > Caused by:
> org.apache.http.client.ClientProtocolException
> >         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
> >         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> >         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
> >         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
> >         at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >         at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> >         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> > Caused by:
> org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity. 
> The cause lists the reason the original request failed.
> >         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
> >         at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
> >         at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> >         ... 6 more
> > Caused by: java.net.SocketException: Broken pipe
> >         at
> java.net.SocketOutputStream.socketWrite0(Native Method)
> >         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> >         at
> java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> >         at
> org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
> >         at
> org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
> >         at
> org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
> >         at
> org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
> >         at
> org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
> >         at
> org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
> >         at
> org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
> >         at
> org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
> >         at
> org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
> >         at
> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
> >         at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> >         at
> org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
> >         ... 8 more
> >
> > Status of Jobs => Error: Unhandled Solr exception
> during indexing (0): Server at http://localhost:8983/solr/all returned non ok
> status:413, message:FULL head
> >
> >         ERROR 2013-01-14
> 15:10:42,074 (Worker thread '15') - Exception tossed:
> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
> status:413, message:FULL head
> >
> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok
> status:413, message:FULL head
> >         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
> >         at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
> >         at
> org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
> >         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
> >         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
> >         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
> >         at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
> >         at
> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
> >         at
> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
> >         at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
> >
> > On the solr side I see :
> >
> > INFO: Creating new http client,
> config:maxConnections=200&maxConnectionsPerHost=8
> > 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full
> [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616
> ...long long chars ... 2B656B6970{}
> >
> > Thanks,
> > Ahmet
> 

Re: Repeated service interruptions - failure processing document: null

Posted by Karl Wright <da...@gmail.com>.
Hi Ahmet,

The exception that seems to be causing the abort is a socket exception
coming from a socket write:

> Caused by: java.net.SocketException: Broken pipe

This makes sense in light of the http code returned from Solr, which
was 413:  http://www.checkupdown.com/status/E413.html .

So there is nothing actually *wrong* with the .aspx documents, but
they are just way too big, and Solr is rejecting them for that reason.

Clearly, though, the Solr connector should recognize this code as
meaning "never retry", so instead of killing the job, it should just
skip the document.  I'll open a ticket for that now.

Karl


On Mon, Jan 14, 2013 at 8:22 AM, Ahmet Arslan <io...@yahoo.com> wrote:
> Hello,
>
> I am indexing a SharePoint 2010 instance using mcf-trunk (At revision 1432907)
>
> There is no problem with a Document library that contains word excel etc.
>
> However, I receive the following errors with a Document library that has *.aspx files in it.
>
> Status of Jobs => Error: Repeated service interruptions - failure processing document: null
>
>  WARN 2013-01-14 15:00:12,720 (Worker thread '13') - Service interruption reported for job 1358009105156 connection 'iknow': IO exception during indexing: null
> ERROR 2013-01-14 15:00:12,763 (Worker thread '13') - Exception tossed: Repeated service interruptions - failure processing document: null
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: null
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:585)
> Caused by: org.apache.http.client.ClientProtocolException
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
>         at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>         at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>         at org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:768)
> Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity.  The cause lists the reason the original request failed.
>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:692)
>         at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:523)
>         at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
>         ... 6 more
> Caused by: java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:169)
>         at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:110)
>         at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:165)
>         at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:92)
>         at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
>         at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
>         at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
>         at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
>         at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:197)
>         at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
>         at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>         at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:718)
>         ... 8 more
>
> Status of Jobs => Error: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
>
>         ERROR 2013-01-14 15:10:42,074 (Worker thread '15') - Exception tossed: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unhandled Solr exception during indexing (0): Server at http://localhost:8983/solr/all returned non ok status:413, message:FULL head
>         at org.apache.manifoldcf.agents.output.solr.HttpPoster.handleSolrException(HttpPoster.java:360)
>         at org.apache.manifoldcf.agents.output.solr.HttpPoster.indexPost(HttpPoster.java:477)
>         at org.apache.manifoldcf.agents.output.solr.SolrConnector.addOrReplaceDocument(SolrConnector.java:594)
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>         at org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.processDocuments(SharePointRepository.java:1559)
>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>
> On the solr side I see :
>
> INFO: Creating new http client, config:maxConnections=200&maxConnectionsPerHost=8
> 2013-01-14 15:18:21.775:WARN:oejh.HttpParser:Full [671412972,-1,m=5,g=6144,p=6144,c=6144]={2F736F6C722F616 ...long long chars ... 2B656B6970{}
>
> Thanks,
> Ahmet