You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Shinichiro Abe <sh...@gmail.com> on 2012/08/01 11:48:22 UTC

Re: Repeated service interruptions

Hi Karl,

I still have a problem.
I reduced maximum number of connections into 2.
I rebooted the file server, not domain controller. 
When I configured the paths[1], the log said no error 
and ShareDrive connector crawled the files successfully. 
When I made the path's config default(matching * ),
the log said "all pipe instances are busy" error.
Both of path's config pointed the same location.

Also when this error occurred, watching the log of ingest,
HttpPoster was waiting for response stream 
and couldn't get response from Solr, 
and threw SocketTimeoutException.
I increased jcifs.smb.client.responseTimeout 
but still threw the exception.
On Solr, Jetty threw SocketException(socket wr
ite error).
I'm working on checking Solr logs.
Solr may do something wrong when running /update/extract.

Do you know something like this?
Does path's matching config affect those errors? 

[1]Paths Tab:
Include  directory(s)  matching  /01* 

P.S. 
Thank you for fix CONNECTORS-494.
I checked trunk code, worked well.

Thank you,
Shinichiro Abe

On 2012/07/24, at 22:13, Karl Wright wrote:

> Hi Abe-san,
> 
> Did you figure out what the problem was?
> 
> Karl
> 
> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <da...@gmail.com> wrote:
>> Hi Abe-san,
>> 
>> Sometimes what looks like a server error can actually be due to the
>> domain controller.  I wonder if the domain controller needs to be
>> rebooted?
>> 
>> Karl
>> 
>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
>> <sh...@gmail.com> wrote:
>>> Hi Karl,
>>> Thank you for the reply.
>>> I tried to reduce maximum number of connections from 10
>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
>>> Thank you.
>>> Shinichiro Abe
>>> 
>>> On 2012/07/19, at 15:55, Karl Wright wrote:
>>> 
>>>> Hi Abe-san,
>>>> 
>>>> The "all pipe instances are busy" error is coming from the Windows
>>>> server you are trying to crawl.  I don't know what is happening there
>>>> but here are some possibilities:
>>>> 
>>>> (1) The Windows server is just overloaded; you can try reducing the
>>>> maximum number of connections to 2 or 3 to see if that helps.
>>>> (2) The Windows server needs rebooting.
>>>> 
>>>> Thanks,
>>>> Karl
>>>> 
>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
>>>> <sh...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> I use windows shares connector and ran a job.
>>>>> The job was aborted without done normally and the job's status said:
>>>>> Error: Repeated service interruptions - failure processing document: Read timed out
>>>>> 
>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest version's jcifs.jar.
>>>>> Is the crawled server busy? I think the server MCF is installed seems not to be busy,
>>>>> the other servers in which MCF will crawls seem to be busy.
>>>>> How can I run the job without error? What's wrong?
>>>>> 
>>>>> 
>>>>> the logs of connector:
>>>>> 
>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.
>>>>>       at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
>>>>>       at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
>>>>> ..
>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 3 while getting share security: All pipe instances are busy.
>>>>> ..
>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' response when getting document version for smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
>>>>> ..
>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest service interruption reported for job 1342076182624 connection 'Windows shares': Timeout or other service interruption: All pipe instances are busy.
>>>>> ..
>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>> ..
>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>> ..
>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception tossed: Repeated service interruptions - failure processing document: Read timed out
>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Read timed out
>>>>>       at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
>>>>>       at java.net.SocketInputStream.socketRead0(Native Method)
>>>>>       at java.net.SocketInputStream.read(Unknown Source)
>>>>>       at java.net.SocketInputStream.read(Unknown Source)
>>>>>       at org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
>>>>>       at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
>>>>> 
>>>>> Thanks in advance,
>>>>> Shinichiro Abe
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>

Re: Repeated service interruptions

Posted by Shinichiro Abe <sh...@gmail.com>.

Hi Shigeki-san,

I do not know the cause, but I looked at the log of Solr, 
there were some exceptions that were raised by indexing certain files.
And I excluded from indexing these files, 
as a result I could crawl successfully.
If you check Solr's log out then you may find something like this.

Regards,
Shinichiro Abe

On 2012/09/05, at 14:37, Shigeki Kobayashi wrote:

> Hi Abe-san
> 
> I've just faced the same thing as you did, and now having a trouble in figuring out how to solve this problem. 
> 
> Did you figure out how to get ride of this problem? If so, it would be nice if you could share how you did it.
> 
> 
> Regards,
> 
> Shigeki
> 
> 2012/8/2 Shinichiro Abe <sh...@gmail.com>
> Thanks very much for the help!
> I understand.
> Shinichiro Abe
> 
> On 2012/08/01, at 19:35, Karl Wright wrote:
> 
> > On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
> > <sh...@gmail.com> wrote:
> >> Hi Karl,
> >>
> >> I still have a problem.
> >> I reduced maximum number of connections into 2.
> >> I rebooted the file server, not domain controller.
> >> When I configured the paths[1], the log said no error
> >> and ShareDrive connector crawled the files successfully.
> >> When I made the path's config default(matching * ),
> >> the log said "all pipe instances are busy" error.
> >> Both of path's config pointed the same location.
> >>
> >> Also when this error occurred, watching the log of ingest,
> >> HttpPoster was waiting for response stream
> >> and couldn't get response from Solr,
> >> and threw SocketTimeoutException.
> >> I increased jcifs.smb.client.responseTimeout
> >> but still threw the exception.
> >> On Solr, Jetty threw SocketException(socket wr
> >> ite error).
> >> I'm working on checking Solr logs.
> >> Solr may do something wrong when running /update/extract.
> >>
> >
> > If Solr threw the exception this sounds likely.
> >
> >> Do you know something like this?
> >> Does path's matching config affect those errors?
> >>
> >> [1]Paths Tab:
> >> Include  directory(s)  matching  /01*
> >>
> >
> > This should have nothing to do with socket exceptions, except possibly
> > that the crawler winds up trying to read a file that isn't actually a
> > file but is something else, like a named pipe or something.  This
> > typically doesn't happen if the server is a Windows machine but if it
> > is a Samba server I could imagine something like that happening.
> >
> > Karl
> >
> >> P.S.
> >> Thank you for fix CONNECTORS-494.
> >> I checked trunk code, worked well.
> >>
> >> Thank you,
> >> Shinichiro Abe
> >>
> >> On 2012/07/24, at 22:13, Karl Wright wrote:
> >>
> >>> Hi Abe-san,
> >>>
> >>> Did you figure out what the problem was?
> >>>
> >>> Karl
> >>>
> >>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <da...@gmail.com> wrote:
> >>>> Hi Abe-san,
> >>>>
> >>>> Sometimes what looks like a server error can actually be due to the
> >>>> domain controller.  I wonder if the domain controller needs to be
> >>>> rebooted?
> >>>>
> >>>> Karl
> >>>>
> >>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
> >>>> <sh...@gmail.com> wrote:
> >>>>> Hi Karl,
> >>>>> Thank you for the reply.
> >>>>> I tried to reduce maximum number of connections from 10
> >>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
> >>>>> Thank you.
> >>>>> Shinichiro Abe
> >>>>>
> >>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
> >>>>>
> >>>>>> Hi Abe-san,
> >>>>>>
> >>>>>> The "all pipe instances are busy" error is coming from the Windows
> >>>>>> server you are trying to crawl.  I don't know what is happening there
> >>>>>> but here are some possibilities:
> >>>>>>
> >>>>>> (1) The Windows server is just overloaded; you can try reducing the
> >>>>>> maximum number of connections to 2 or 3 to see if that helps.
> >>>>>> (2) The Windows server needs rebooting.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Karl
> >>>>>>
> >>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
> >>>>>> <sh...@gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I use windows shares connector and ran a job.
> >>>>>>> The job was aborted without done normally and the job's status said:
> >>>>>>> Error: Repeated service interruptions - failure processing document: Read timed out
> >>>>>>>
> >>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest version's jcifs.jar.
> >>>>>>> Is the crawled server busy? I think the server MCF is installed seems not to be busy,
> >>>>>>> the other servers in which MCF will crawls seem to be busy.
> >>>>>>> How can I run the job without error? What's wrong?
> >>>>>>>
> >>>>>>>
> >>>>>>> the logs of connector:
> >>>>>>>
> >>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.
> >>>>>>>      at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
> >>>>>>>      at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 3 while getting share security: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' response when getting document version for smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest service interruption reported for job 1342076182624 connection 'Windows shares': Timeout or other service interruption: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception tossed: Repeated service interruptions - failure processing document: Read timed out
> >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Read timed out
> >>>>>>>      at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
> >>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
> >>>>>>>      at java.net.SocketInputStream.socketRead0(Native Method)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
> >>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>> Shinichiro Abe
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> 
>

Re: Repeated service interruptions

Posted by Shigeki Kobayashi <sh...@g.softbank.co.jp>.

Hi Abe-san

I've just faced the same thing as you did, and now having a trouble in
figuring out how to solve this problem.

Did you figure out how to get ride of this problem? If so, it would be nice
if you could share how you did it.


Regards,

Shigeki

2012/8/2 Shinichiro Abe <sh...@gmail.com>

> Thanks very much for the help!
> I understand.
> Shinichiro Abe
>
> On 2012/08/01, at 19:35, Karl Wright wrote:
>
> > On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
> > <sh...@gmail.com> wrote:
> >> Hi Karl,
> >>
> >> I still have a problem.
> >> I reduced maximum number of connections into 2.
> >> I rebooted the file server, not domain controller.
> >> When I configured the paths[1], the log said no error
> >> and ShareDrive connector crawled the files successfully.
> >> When I made the path's config default(matching * ),
> >> the log said "all pipe instances are busy" error.
> >> Both of path's config pointed the same location.
> >>
> >> Also when this error occurred, watching the log of ingest,
> >> HttpPoster was waiting for response stream
> >> and couldn't get response from Solr,
> >> and threw SocketTimeoutException.
> >> I increased jcifs.smb.client.responseTimeout
> >> but still threw the exception.
> >> On Solr, Jetty threw SocketException(socket wr
> >> ite error).
> >> I'm working on checking Solr logs.
> >> Solr may do something wrong when running /update/extract.
> >>
> >
> > If Solr threw the exception this sounds likely.
> >
> >> Do you know something like this?
> >> Does path's matching config affect those errors?
> >>
> >> [1]Paths Tab:
> >> Include  directory(s)  matching  /01*
> >>
> >
> > This should have nothing to do with socket exceptions, except possibly
> > that the crawler winds up trying to read a file that isn't actually a
> > file but is something else, like a named pipe or something.  This
> > typically doesn't happen if the server is a Windows machine but if it
> > is a Samba server I could imagine something like that happening.
> >
> > Karl
> >
> >> P.S.
> >> Thank you for fix CONNECTORS-494.
> >> I checked trunk code, worked well.
> >>
> >> Thank you,
> >> Shinichiro Abe
> >>
> >> On 2012/07/24, at 22:13, Karl Wright wrote:
> >>
> >>> Hi Abe-san,
> >>>
> >>> Did you figure out what the problem was?
> >>>
> >>> Karl
> >>>
> >>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <da...@gmail.com>
> wrote:
> >>>> Hi Abe-san,
> >>>>
> >>>> Sometimes what looks like a server error can actually be due to the
> >>>> domain controller.  I wonder if the domain controller needs to be
> >>>> rebooted?
> >>>>
> >>>> Karl
> >>>>
> >>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
> >>>> <sh...@gmail.com> wrote:
> >>>>> Hi Karl,
> >>>>> Thank you for the reply.
> >>>>> I tried to reduce maximum number of connections from 10
> >>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
> >>>>> Thank you.
> >>>>> Shinichiro Abe
> >>>>>
> >>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
> >>>>>
> >>>>>> Hi Abe-san,
> >>>>>>
> >>>>>> The "all pipe instances are busy" error is coming from the Windows
> >>>>>> server you are trying to crawl.  I don't know what is happening
> there
> >>>>>> but here are some possibilities:
> >>>>>>
> >>>>>> (1) The Windows server is just overloaded; you can try reducing the
> >>>>>> maximum number of connections to 2 or 3 to see if that helps.
> >>>>>> (2) The Windows server needs rebooting.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Karl
> >>>>>>
> >>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
> >>>>>> <sh...@gmail.com> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I use windows shares connector and ran a job.
> >>>>>>> The job was aborted without done normally and the job's status
> said:
> >>>>>>> Error: Repeated service interruptions - failure processing
> document: Read timed out
> >>>>>>>
> >>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest
> version's jcifs.jar.
> >>>>>>> Is the crawled server busy? I think the server MCF is installed
> seems not to be busy,
> >>>>>>> the other servers in which MCF will crawls seem to be busy.
> >>>>>>> How can I run the job without error? What's wrong?
> >>>>>>>
> >>>>>>>
> >>>>>>> the logs of connector:
> >>>>>>>
> >>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS:
> Possibly transient exception detected on attempt 1 while getting share
> security: All pipe instances are busy.
> >>>>>>>      at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
> >>>>>>>      at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS:
> Possibly transient exception detected on attempt 3 while getting share
> security: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy'
> response when getting document version for
> smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest
> service interruption reported for job 1342076182624 connection 'Windows
> shares': Timeout or other service interruption: All pipe instances are busy.
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service
> interruption reported for job 1342076182624 connection 'Windows shares':
> Ingestion API socket timeout exception waiting for response code: Read
> timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service
> interruption reported for job 1342076182624 connection 'Windows shares':
> Ingestion API socket timeout exception waiting for response code: Read
> timed out; ingestion will be retried again later
> >>>>>>> ..
> >>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception
> tossed: Repeated service interruptions - failure processing document: Read
> timed out
> >>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
> Repeated service interruptions - failure processing document: Read timed out
> >>>>>>>      at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
> >>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
> >>>>>>>      at java.net.SocketInputStream.socketRead0(Native Method)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
> >>>>>>>      at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
> >>>>>>>      at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
> >>>>>>>
> >>>>>>> Thanks in advance,
> >>>>>>> Shinichiro Abe
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
>
>

Re: Repeated service interruptions

Posted by Shinichiro Abe <sh...@gmail.com>.

Thanks very much for the help!
I understand. 
Shinichiro Abe

On 2012/08/01, at 19:35, Karl Wright wrote:

> On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> Hi Karl,
>> 
>> I still have a problem.
>> I reduced maximum number of connections into 2.
>> I rebooted the file server, not domain controller.
>> When I configured the paths[1], the log said no error
>> and ShareDrive connector crawled the files successfully.
>> When I made the path's config default(matching * ),
>> the log said "all pipe instances are busy" error.
>> Both of path's config pointed the same location.
>> 
>> Also when this error occurred, watching the log of ingest,
>> HttpPoster was waiting for response stream
>> and couldn't get response from Solr,
>> and threw SocketTimeoutException.
>> I increased jcifs.smb.client.responseTimeout
>> but still threw the exception.
>> On Solr, Jetty threw SocketException(socket wr
>> ite error).
>> I'm working on checking Solr logs.
>> Solr may do something wrong when running /update/extract.
>> 
> 
> If Solr threw the exception this sounds likely.
> 
>> Do you know something like this?
>> Does path's matching config affect those errors?
>> 
>> [1]Paths Tab:
>> Include  directory(s)  matching  /01*
>> 
> 
> This should have nothing to do with socket exceptions, except possibly
> that the crawler winds up trying to read a file that isn't actually a
> file but is something else, like a named pipe or something.  This
> typically doesn't happen if the server is a Windows machine but if it
> is a Samba server I could imagine something like that happening.
> 
> Karl
> 
>> P.S.
>> Thank you for fix CONNECTORS-494.
>> I checked trunk code, worked well.
>> 
>> Thank you,
>> Shinichiro Abe
>> 
>> On 2012/07/24, at 22:13, Karl Wright wrote:
>> 
>>> Hi Abe-san,
>>> 
>>> Did you figure out what the problem was?
>>> 
>>> Karl
>>> 
>>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <da...@gmail.com> wrote:
>>>> Hi Abe-san,
>>>> 
>>>> Sometimes what looks like a server error can actually be due to the
>>>> domain controller.  I wonder if the domain controller needs to be
>>>> rebooted?
>>>> 
>>>> Karl
>>>> 
>>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
>>>> <sh...@gmail.com> wrote:
>>>>> Hi Karl,
>>>>> Thank you for the reply.
>>>>> I tried to reduce maximum number of connections from 10
>>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
>>>>> Thank you.
>>>>> Shinichiro Abe
>>>>> 
>>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
>>>>> 
>>>>>> Hi Abe-san,
>>>>>> 
>>>>>> The "all pipe instances are busy" error is coming from the Windows
>>>>>> server you are trying to crawl.  I don't know what is happening there
>>>>>> but here are some possibilities:
>>>>>> 
>>>>>> (1) The Windows server is just overloaded; you can try reducing the
>>>>>> maximum number of connections to 2 or 3 to see if that helps.
>>>>>> (2) The Windows server needs rebooting.
>>>>>> 
>>>>>> Thanks,
>>>>>> Karl
>>>>>> 
>>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
>>>>>> <sh...@gmail.com> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I use windows shares connector and ran a job.
>>>>>>> The job was aborted without done normally and the job's status said:
>>>>>>> Error: Repeated service interruptions - failure processing document: Read timed out
>>>>>>> 
>>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest version's jcifs.jar.
>>>>>>> Is the crawled server busy? I think the server MCF is installed seems not to be busy,
>>>>>>> the other servers in which MCF will crawls seem to be busy.
>>>>>>> How can I run the job without error? What's wrong?
>>>>>>> 
>>>>>>> 
>>>>>>> the logs of connector:
>>>>>>> 
>>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.
>>>>>>>      at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
>>>>>>>      at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
>>>>>>> ..
>>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 3 while getting share security: All pipe instances are busy.
>>>>>>> ..
>>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' response when getting document version for smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
>>>>>>> ..
>>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest service interruption reported for job 1342076182624 connection 'Windows shares': Timeout or other service interruption: All pipe instances are busy.
>>>>>>> ..
>>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>>>> ..
>>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>>>> ..
>>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception tossed: Repeated service interruptions - failure processing document: Read timed out
>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Read timed out
>>>>>>>      at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
>>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
>>>>>>>      at java.net.SocketInputStream.socketRead0(Native Method)
>>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
>>>>>>>      at java.net.SocketInputStream.read(Unknown Source)
>>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
>>>>>>>      at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> Shinichiro Abe
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>

Re: Repeated service interruptions

Posted by Karl Wright <da...@gmail.com>.

On Wed, Aug 1, 2012 at 5:48 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Hi Karl,
>
> I still have a problem.
> I reduced maximum number of connections into 2.
> I rebooted the file server, not domain controller.
> When I configured the paths[1], the log said no error
> and ShareDrive connector crawled the files successfully.
> When I made the path's config default(matching * ),
> the log said "all pipe instances are busy" error.
> Both of path's config pointed the same location.
>
> Also when this error occurred, watching the log of ingest,
> HttpPoster was waiting for response stream
> and couldn't get response from Solr,
> and threw SocketTimeoutException.
> I increased jcifs.smb.client.responseTimeout
> but still threw the exception.
> On Solr, Jetty threw SocketException(socket wr
> ite error).
> I'm working on checking Solr logs.
> Solr may do something wrong when running /update/extract.
>

If Solr threw the exception this sounds likely.

> Do you know something like this?
> Does path's matching config affect those errors?
>
> [1]Paths Tab:
> Include  directory(s)  matching  /01*
>

This should have nothing to do with socket exceptions, except possibly
that the crawler winds up trying to read a file that isn't actually a
file but is something else, like a named pipe or something.  This
typically doesn't happen if the server is a Windows machine but if it
is a Samba server I could imagine something like that happening.

Karl

> P.S.
> Thank you for fix CONNECTORS-494.
> I checked trunk code, worked well.
>
> Thank you,
> Shinichiro Abe
>
> On 2012/07/24, at 22:13, Karl Wright wrote:
>
>> Hi Abe-san,
>>
>> Did you figure out what the problem was?
>>
>> Karl
>>
>> On Thu, Jul 19, 2012 at 5:52 AM, Karl Wright <da...@gmail.com> wrote:
>>> Hi Abe-san,
>>>
>>> Sometimes what looks like a server error can actually be due to the
>>> domain controller.  I wonder if the domain controller needs to be
>>> rebooted?
>>>
>>> Karl
>>>
>>> On Thu, Jul 19, 2012 at 5:12 AM, Shinichiro Abe
>>> <sh...@gmail.com> wrote:
>>>> Hi Karl,
>>>> Thank you for the reply.
>>>> I tried to reduce maximum number of connections from 10
>>>> to 5, but didn't  avoid busy error. I'll try to reduce more.
>>>> Thank you.
>>>> Shinichiro Abe
>>>>
>>>> On 2012/07/19, at 15:55, Karl Wright wrote:
>>>>
>>>>> Hi Abe-san,
>>>>>
>>>>> The "all pipe instances are busy" error is coming from the Windows
>>>>> server you are trying to crawl.  I don't know what is happening there
>>>>> but here are some possibilities:
>>>>>
>>>>> (1) The Windows server is just overloaded; you can try reducing the
>>>>> maximum number of connections to 2 or 3 to see if that helps.
>>>>> (2) The Windows server needs rebooting.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>> On Wed, Jul 18, 2012 at 10:09 PM, Shinichiro Abe
>>>>> <sh...@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I use windows shares connector and ran a job.
>>>>>> The job was aborted without done normally and the job's status said:
>>>>>> Error: Repeated service interruptions - failure processing document: Read timed out
>>>>>>
>>>>>> Why was the job aborted? I use ManifoldCF 0.5.1 and the latest version's jcifs.jar.
>>>>>> Is the crawled server busy? I think the server MCF is installed seems not to be busy,
>>>>>> the other servers in which MCF will crawls seem to be busy.
>>>>>> How can I run the job without error? What's wrong?
>>>>>>
>>>>>>
>>>>>> the logs of connector:
>>>>>>
>>>>>> WARN 2012-07-12 16:28:52,648 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.
>>>>>>       at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
>>>>>>       at jcifs.smb.SmbTransport.send(SmbTransport.java:663)
>>>>>> ..
>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: Possibly transient exception detected on attempt 3 while getting share security: All pipe instances are busy.
>>>>>> ..
>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - JCIFS: 'Busy' response when getting document version for smb://XX.XX.XX.XX/D$/abcde/1234/123456789/e123456789a.pdf: retrying...
>>>>>> ..
>>>>>> WARN 2012-07-12 16:36:37,585 (Worker thread '19') - Pre-ingest service interruption reported for job 1342076182624 connection 'Windows shares': Timeout or other service interruption: All pipe instances are busy.
>>>>>> ..
>>>>>> WARN 2012-07-12 19:14:30,335 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>>> ..
>>>>>> WARN 2012-07-12 20:43:50,210 (Worker thread '19') - Service interruption reported for job 1342076182624 connection 'Windows shares': Ingestion API socket timeout exception waiting for response code: Read timed out; ingestion will be retried again later
>>>>>> ..
>>>>>> ERROR 2012-07-12 20:43:50,210 (Worker thread '19') - Exception tossed: Repeated service interruptions - failure processing document: Read timed out
>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Repeated service interruptions - failure processing document: Read timed out
>>>>>>       at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:606)
>>>>>> Caused by: java.net.SocketTimeoutException: Read timed out
>>>>>>       at java.net.SocketInputStream.socketRead0(Native Method)
>>>>>>       at java.net.SocketInputStream.read(Unknown Source)
>>>>>>       at java.net.SocketInputStream.read(Unknown Source)
>>>>>>       at org.apache.manifoldcf.agents.output.solr.HttpPoster.readLine(HttpPoster.java:571)
>>>>>>       at org.apache.manifoldcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.java:598)
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Shinichiro Abe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>