You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Erlend Garåsen <e....@usit.uio.no> on 2012/04/26 16:19:37 UTC

Ingestion API socket timeout exception waiting for response code

It seems that MCF cannot delete documents from Solr. A timeout occurs, 
and the job stops after a while.

This is what I can see from the log:
  WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service 
interruption reported for job 1327930125433 connection 'Web crawler': 
Ingestion API socket timeout exception waiting for response code: Read 
timed out; ingestion will be retried again later

If I take a further look in Simple History, it seems that this error is 
related to document deletion.

I have tried to delete the document manually by using curl from the same 
server MCF is installed on in case we have some access restrictions, but 
Curr succeeded.

We do not have any problems with adding, the timeout only occurs while 
deleting documents.

I have checked our Solr configuration. MCF does use the correct path for 
document deletion, i.e. /update.

The correct realm, username and password for our Solr server are entered 
correctly and the SSL certificate is valid as well.

Erlend

-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Karl Wright <da...@gmail.com>.
Thanks for the update!
Karl

On Mon, May 7, 2012 at 7:15 AM, Erlend Garåsen <e....@usit.uio.no> wrote:
>
> Document deletion works perfectly after I reinstalled the SSL certificate
> and reentered the username and password to our Solr server. So I think this
> issue has been solved.
>
> Erlend
>
> On 27.04.12 12.11, Erlend Garåsen wrote:
>>
>>
>> Many thanks for your suggestions and help, Karl. Using a filesystem
>> crawl was actually a good idea for debugging/testing. To install a new
>> version of Solr is not that easy on our test server for many reasons,
>> generally because it is under control of another division dealing with
>> servers at the uni, even though I can get root access. Anyway, according
>> to the logs on our Solr 3.2 server, it seems that MCF successfully
>> managed to delete one test document I removed:
>> [2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7
>> [2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={}
>> status=0 QTime=7
>>
>> The result code is 200 according to Simple History in MCF.
>>
>> I entered the passwords once again for the Solr servers into the Solr
>> output configuration, deleted and uploaded our SSL certificate once
>> again before I did the filesystem test. I should have performed the
>> tests prior to the password updates.
>>
>> The crawl will start again later today at 6 pm on our production server,
>> so I will try to figure out whether we still have problems later. I'm
>> going to Scotland later this evening for some days without my laptop, so
>> I cannot check the status of my crawl before I'm back, but I'll let my
>> colleague watch the logs.
>>
>> Erlend
>>
>> On 26.04.12 21.14, Karl Wright wrote:
>>>
>>> Hi Erlend,
>>>
>>> I had some time today and was able to verify that everything worked
>>> fine against what I have currently on my laptop, which is Solr 3.2.
>>> The second job run looks like this:
>>>
>>> 04-26-2012 15:11:44.154 job end 1335467343879(test) 0 1
>>> 04-26-2012 15:11:34.159 document deletion (solr)
>>> file:/C:/testcrawl/there.txt 200 0 117
>>> 04-26-2012 15:11:24.690 read document C:\testcrawl OK 0 1
>>> 04-26-2012 15:11:24.494 job start 1335467343879(test) 0 1
>>>
>>> So it appears that either something changed in Solr, or SSL support is
>>> broken, or your network is not permitting a valid HTTP response for
>>> some reason.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<da...@gmail.com> wrote:
>>>>
>>>> Hi Erlend,
>>>>
>>>> Can you try the following:
>>>>
>>>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
>>>> using, and build it
>>>> (2) Start it
>>>> (3) Run a simple filesystem crawl using a Solr connection that is
>>>> created with the default values
>>>> (4) Delete a file in your filesystem that was crawled
>>>> (5) Crawl again
>>>>
>>>> Does the deletion happen OK?
>>>>
>>>> AFAIK, nothing has changed in the Solr connector that should affect
>>>> the ability to delete. This test will confirm that it is still
>>>> working.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
>>>> <e....@usit.uio.no> wrote:
>>>>>
>>>>> It seems that MCF cannot delete documents from Solr. A timeout
>>>>> occurs, and
>>>>> the job stops after a while.
>>>>>
>>>>> This is what I can see from the log:
>>>>> WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service
>>>>> interruption
>>>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>>>>> socket timeout exception waiting for response code: Read timed out;
>>>>> ingestion will be retried again later
>>>>>
>>>>> If I take a further look in Simple History, it seems that this error is
>>>>> related to document deletion.
>>>>>
>>>>> I have tried to delete the document manually by using curl from the
>>>>> same
>>>>> server MCF is installed on in case we have some access restrictions,
>>>>> but
>>>>> Curr succeeded.
>>>>>
>>>>> We do not have any problems with adding, the timeout only occurs while
>>>>> deleting documents.
>>>>>
>>>>> I have checked our Solr configuration. MCF does use the correct path
>>>>> for
>>>>> document deletion, i.e. /update.
>>>>>
>>>>> The correct realm, username and password for our Solr server are
>>>>> entered
>>>>> correctly and the SSL certificate is valid as well.
>>>>>
>>>>> Erlend
>>>>>
>>>>> --
>>>>> Erlend Garåsen
>>>>> Center for Information Technology Services
>>>>> University of Oslo
>>>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968,
>>>>> VIP: 31050
>>
>>
>>
>
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Erlend Garåsen <e....@usit.uio.no>.
Document deletion works perfectly after I reinstalled the SSL 
certificate and reentered the username and password to our Solr server. 
So I think this issue has been solved.

Erlend

On 27.04.12 12.11, Erlend Garåsen wrote:
>
> Many thanks for your suggestions and help, Karl. Using a filesystem
> crawl was actually a good idea for debugging/testing. To install a new
> version of Solr is not that easy on our test server for many reasons,
> generally because it is under control of another division dealing with
> servers at the uni, even though I can get root access. Anyway, according
> to the logs on our Solr 3.2 server, it seems that MCF successfully
> managed to delete one test document I removed:
> [2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7
> [2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={}
> status=0 QTime=7
>
> The result code is 200 according to Simple History in MCF.
>
> I entered the passwords once again for the Solr servers into the Solr
> output configuration, deleted and uploaded our SSL certificate once
> again before I did the filesystem test. I should have performed the
> tests prior to the password updates.
>
> The crawl will start again later today at 6 pm on our production server,
> so I will try to figure out whether we still have problems later. I'm
> going to Scotland later this evening for some days without my laptop, so
> I cannot check the status of my crawl before I'm back, but I'll let my
> colleague watch the logs.
>
> Erlend
>
> On 26.04.12 21.14, Karl Wright wrote:
>> Hi Erlend,
>>
>> I had some time today and was able to verify that everything worked
>> fine against what I have currently on my laptop, which is Solr 3.2.
>> The second job run looks like this:
>>
>> 04-26-2012 15:11:44.154 job end 1335467343879(test) 0 1
>> 04-26-2012 15:11:34.159 document deletion (solr)
>> file:/C:/testcrawl/there.txt 200 0 117
>> 04-26-2012 15:11:24.690 read document C:\testcrawl OK 0 1
>> 04-26-2012 15:11:24.494 job start 1335467343879(test) 0 1
>>
>> So it appears that either something changed in Solr, or SSL support is
>> broken, or your network is not permitting a valid HTTP response for
>> some reason.
>>
>> Karl
>>
>>
>> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<da...@gmail.com> wrote:
>>> Hi Erlend,
>>>
>>> Can you try the following:
>>>
>>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
>>> using, and build it
>>> (2) Start it
>>> (3) Run a simple filesystem crawl using a Solr connection that is
>>> created with the default values
>>> (4) Delete a file in your filesystem that was crawled
>>> (5) Crawl again
>>>
>>> Does the deletion happen OK?
>>>
>>> AFAIK, nothing has changed in the Solr connector that should affect
>>> the ability to delete. This test will confirm that it is still
>>> working.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
>>> <e....@usit.uio.no> wrote:
>>>> It seems that MCF cannot delete documents from Solr. A timeout
>>>> occurs, and
>>>> the job stops after a while.
>>>>
>>>> This is what I can see from the log:
>>>> WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service
>>>> interruption
>>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>>>> socket timeout exception waiting for response code: Read timed out;
>>>> ingestion will be retried again later
>>>>
>>>> If I take a further look in Simple History, it seems that this error is
>>>> related to document deletion.
>>>>
>>>> I have tried to delete the document manually by using curl from the
>>>> same
>>>> server MCF is installed on in case we have some access restrictions,
>>>> but
>>>> Curr succeeded.
>>>>
>>>> We do not have any problems with adding, the timeout only occurs while
>>>> deleting documents.
>>>>
>>>> I have checked our Solr configuration. MCF does use the correct path
>>>> for
>>>> document deletion, i.e. /update.
>>>>
>>>> The correct realm, username and password for our Solr server are
>>>> entered
>>>> correctly and the SSL certificate is valid as well.
>>>>
>>>> Erlend
>>>>
>>>> --
>>>> Erlend Garåsen
>>>> Center for Information Technology Services
>>>> University of Oslo
>>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968,
>>>> VIP: 31050
>
>


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Erlend Garåsen <e....@usit.uio.no>.
Many thanks for your suggestions and help, Karl. Using a filesystem 
crawl was actually a good idea for debugging/testing. To install a new 
version of Solr is not that easy on our test server for many reasons, 
generally because it is under control of another division dealing with 
servers at the uni, even though I can get root access. Anyway, according 
to the logs on our Solr 3.2 server, it seems that MCF successfully 
managed to delete one test document I removed:
[2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7
[2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={} 
status=0 QTime=7

The result code is 200 according to Simple History in MCF.

I entered the passwords once again for the Solr servers into the Solr 
output configuration, deleted and uploaded our SSL certificate once 
again before I did the filesystem test. I should have performed the 
tests prior to the password updates.

The crawl will start again later today at 6 pm on our production server, 
so I will try to figure out whether we still have problems later. I'm 
going to Scotland later this evening for some days without my laptop, so 
I cannot check the status of my crawl before I'm back, but I'll let my 
colleague watch the logs.

Erlend

On 26.04.12 21.14, Karl Wright wrote:
> Hi Erlend,
>
> I had some time today and was able to verify that everything worked
> fine against what I have currently on my laptop, which is Solr 3.2.
> The second job run looks like this:
>
> 04-26-2012 15:11:44.154 	job end 	1335467343879(test)		0 	1 	
> 04-26-2012 15:11:34.159 	document deletion (solr)
> 	file:/C:/testcrawl/there.txt	200 	0 	117
> 04-26-2012 15:11:24.690 	read document 	C:\testcrawl	OK 	0 	1 	
> 04-26-2012 15:11:24.494 	job start 	1335467343879(test)		0 	1
>
> So it appears that either something changed in Solr, or SSL support is
> broken, or your network is not permitting a valid HTTP response for
> some reason.
>
> Karl
>
>
> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<da...@gmail.com>  wrote:
>> Hi Erlend,
>>
>> Can you try the following:
>>
>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
>> using, and build it
>> (2) Start it
>> (3) Run a simple filesystem crawl using a Solr connection that is
>> created with the default values
>> (4) Delete a file in your filesystem that was crawled
>> (5) Crawl again
>>
>> Does the deletion happen OK?
>>
>> AFAIK, nothing has changed in the Solr connector that should affect
>> the ability to delete.  This test will confirm that it is still
>> working.
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
>> <e....@usit.uio.no>  wrote:
>>> It seems that MCF cannot delete documents from Solr. A timeout occurs, and
>>> the job stops after a while.
>>>
>>> This is what I can see from the log:
>>>   WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service interruption
>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>>> socket timeout exception waiting for response code: Read timed out;
>>> ingestion will be retried again later
>>>
>>> If I take a further look in Simple History, it seems that this error is
>>> related to document deletion.
>>>
>>> I have tried to delete the document manually by using curl from the same
>>> server MCF is installed on in case we have some access restrictions, but
>>> Curr succeeded.
>>>
>>> We do not have any problems with adding, the timeout only occurs while
>>> deleting documents.
>>>
>>> I have checked our Solr configuration. MCF does use the correct path for
>>> document deletion, i.e. /update.
>>>
>>> The correct realm, username and password for our Solr server are entered
>>> correctly and the SSL certificate is valid as well.
>>>
>>> Erlend
>>>
>>> --
>>> Erlend Garåsen
>>> Center for Information Technology Services
>>> University of Oslo
>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Karl Wright <da...@gmail.com>.
Hi Erlend,

I had some time today and was able to verify that everything worked
fine against what I have currently on my laptop, which is Solr 3.2.
The second job run looks like this:

04-26-2012 15:11:44.154 	job end 	1335467343879(test)		0 	1 	
04-26-2012 15:11:34.159 	document deletion (solr)
	file:/C:/testcrawl/there.txt	200 	0 	117
04-26-2012 15:11:24.690 	read document 	C:\testcrawl	OK 	0 	1 	
04-26-2012 15:11:24.494 	job start 	1335467343879(test)		0 	1

So it appears that either something changed in Solr, or SSL support is
broken, or your network is not permitting a valid HTTP response for
some reason.

Karl


On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright <da...@gmail.com> wrote:
> Hi Erlend,
>
> Can you try the following:
>
> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
> using, and build it
> (2) Start it
> (3) Run a simple filesystem crawl using a Solr connection that is
> created with the default values
> (4) Delete a file in your filesystem that was crawled
> (5) Crawl again
>
> Does the deletion happen OK?
>
> AFAIK, nothing has changed in the Solr connector that should affect
> the ability to delete.  This test will confirm that it is still
> working.
>
> Thanks,
> Karl
>
>
> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
> <e....@usit.uio.no> wrote:
>> It seems that MCF cannot delete documents from Solr. A timeout occurs, and
>> the job stops after a while.
>>
>> This is what I can see from the log:
>>  WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service interruption
>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>> socket timeout exception waiting for response code: Read timed out;
>> ingestion will be retried again later
>>
>> If I take a further look in Simple History, it seems that this error is
>> related to document deletion.
>>
>> I have tried to delete the document manually by using curl from the same
>> server MCF is installed on in case we have some access restrictions, but
>> Curr succeeded.
>>
>> We do not have any problems with adding, the timeout only occurs while
>> deleting documents.
>>
>> I have checked our Solr configuration. MCF does use the correct path for
>> document deletion, i.e. /update.
>>
>> The correct realm, username and password for our Solr server are entered
>> correctly and the SSL certificate is valid as well.
>>
>> Erlend
>>
>> --
>> Erlend Garåsen
>> Center for Information Technology Services
>> University of Oslo
>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Karl Wright <da...@gmail.com>.
Hi Erlend,

Can you try the following:

(1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
using, and build it
(2) Start it
(3) Run a simple filesystem crawl using a Solr connection that is
created with the default values
(4) Delete a file in your filesystem that was crawled
(5) Crawl again

Does the deletion happen OK?

AFAIK, nothing has changed in the Solr connector that should affect
the ability to delete.  This test will confirm that it is still
working.

Thanks,
Karl


On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
<e....@usit.uio.no> wrote:
> It seems that MCF cannot delete documents from Solr. A timeout occurs, and
> the job stops after a while.
>
> This is what I can see from the log:
>  WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service interruption
> reported for job 1327930125433 connection 'Web crawler': Ingestion API
> socket timeout exception waiting for response code: Read timed out;
> ingestion will be retried again later
>
> If I take a further look in Simple History, it seems that this error is
> related to document deletion.
>
> I have tried to delete the document manually by using curl from the same
> server MCF is installed on in case we have some access restrictions, but
> Curr succeeded.
>
> We do not have any problems with adding, the timeout only occurs while
> deleting documents.
>
> I have checked our Solr configuration. MCF does use the correct path for
> document deletion, i.e. /update.
>
> The correct realm, username and password for our Solr server are entered
> correctly and the SSL certificate is valid as well.
>
> Erlend
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050