You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Erlend Garåsen <e....@usit.uio.no> on 2012/05/07 13:15:40 UTC

Re: Ingestion API socket timeout exception waiting for response code

Document deletion works perfectly after I reinstalled the SSL 
certificate and reentered the username and password to our Solr server. 
So I think this issue has been solved.

Erlend

On 27.04.12 12.11, Erlend Garåsen wrote:
>
> Many thanks for your suggestions and help, Karl. Using a filesystem
> crawl was actually a good idea for debugging/testing. To install a new
> version of Solr is not that easy on our test server for many reasons,
> generally because it is under control of another division dealing with
> servers at the uni, even though I can get root access. Anyway, according
> to the logs on our Solr 3.2 server, it seems that MCF successfully
> managed to delete one test document I removed:
> [2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7
> [2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={}
> status=0 QTime=7
>
> The result code is 200 according to Simple History in MCF.
>
> I entered the passwords once again for the Solr servers into the Solr
> output configuration, deleted and uploaded our SSL certificate once
> again before I did the filesystem test. I should have performed the
> tests prior to the password updates.
>
> The crawl will start again later today at 6 pm on our production server,
> so I will try to figure out whether we still have problems later. I'm
> going to Scotland later this evening for some days without my laptop, so
> I cannot check the status of my crawl before I'm back, but I'll let my
> colleague watch the logs.
>
> Erlend
>
> On 26.04.12 21.14, Karl Wright wrote:
>> Hi Erlend,
>>
>> I had some time today and was able to verify that everything worked
>> fine against what I have currently on my laptop, which is Solr 3.2.
>> The second job run looks like this:
>>
>> 04-26-2012 15:11:44.154 job end 1335467343879(test) 0 1
>> 04-26-2012 15:11:34.159 document deletion (solr)
>> file:/C:/testcrawl/there.txt 200 0 117
>> 04-26-2012 15:11:24.690 read document C:\testcrawl OK 0 1
>> 04-26-2012 15:11:24.494 job start 1335467343879(test) 0 1
>>
>> So it appears that either something changed in Solr, or SSL support is
>> broken, or your network is not permitting a valid HTTP response for
>> some reason.
>>
>> Karl
>>
>>
>> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<da...@gmail.com> wrote:
>>> Hi Erlend,
>>>
>>> Can you try the following:
>>>
>>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
>>> using, and build it
>>> (2) Start it
>>> (3) Run a simple filesystem crawl using a Solr connection that is
>>> created with the default values
>>> (4) Delete a file in your filesystem that was crawled
>>> (5) Crawl again
>>>
>>> Does the deletion happen OK?
>>>
>>> AFAIK, nothing has changed in the Solr connector that should affect
>>> the ability to delete. This test will confirm that it is still
>>> working.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
>>> <e....@usit.uio.no> wrote:
>>>> It seems that MCF cannot delete documents from Solr. A timeout
>>>> occurs, and
>>>> the job stops after a while.
>>>>
>>>> This is what I can see from the log:
>>>> WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service
>>>> interruption
>>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>>>> socket timeout exception waiting for response code: Read timed out;
>>>> ingestion will be retried again later
>>>>
>>>> If I take a further look in Simple History, it seems that this error is
>>>> related to document deletion.
>>>>
>>>> I have tried to delete the document manually by using curl from the
>>>> same
>>>> server MCF is installed on in case we have some access restrictions,
>>>> but
>>>> Curr succeeded.
>>>>
>>>> We do not have any problems with adding, the timeout only occurs while
>>>> deleting documents.
>>>>
>>>> I have checked our Solr configuration. MCF does use the correct path
>>>> for
>>>> document deletion, i.e. /update.
>>>>
>>>> The correct realm, username and password for our Solr server are
>>>> entered
>>>> correctly and the SSL certificate is valid as well.
>>>>
>>>> Erlend
>>>>
>>>> --
>>>> Erlend Garåsen
>>>> Center for Information Technology Services
>>>> University of Oslo
>>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968,
>>>> VIP: 31050
>
>


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: Ingestion API socket timeout exception waiting for response code

Posted by Karl Wright <da...@gmail.com>.
Thanks for the update!
Karl

On Mon, May 7, 2012 at 7:15 AM, Erlend Garåsen <e....@usit.uio.no> wrote:
>
> Document deletion works perfectly after I reinstalled the SSL certificate
> and reentered the username and password to our Solr server. So I think this
> issue has been solved.
>
> Erlend
>
> On 27.04.12 12.11, Erlend Garåsen wrote:
>>
>>
>> Many thanks for your suggestions and help, Karl. Using a filesystem
>> crawl was actually a good idea for debugging/testing. To install a new
>> version of Solr is not that easy on our test server for many reasons,
>> generally because it is under control of another division dealing with
>> servers at the uni, even though I can get root access. Anyway, according
>> to the logs on our Solr 3.2 server, it seems that MCF successfully
>> managed to delete one test document I removed:
>> [2012-04-27 11:18:33.092] {delete=[file:/tmp/mcf/docs/app_lasso.pdf]} 0 7
>> [2012-04-27 11:18:33.092] [] webapp=/solr path=/update params={}
>> status=0 QTime=7
>>
>> The result code is 200 according to Simple History in MCF.
>>
>> I entered the passwords once again for the Solr servers into the Solr
>> output configuration, deleted and uploaded our SSL certificate once
>> again before I did the filesystem test. I should have performed the
>> tests prior to the password updates.
>>
>> The crawl will start again later today at 6 pm on our production server,
>> so I will try to figure out whether we still have problems later. I'm
>> going to Scotland later this evening for some days without my laptop, so
>> I cannot check the status of my crawl before I'm back, but I'll let my
>> colleague watch the logs.
>>
>> Erlend
>>
>> On 26.04.12 21.14, Karl Wright wrote:
>>>
>>> Hi Erlend,
>>>
>>> I had some time today and was able to verify that everything worked
>>> fine against what I have currently on my laptop, which is Solr 3.2.
>>> The second job run looks like this:
>>>
>>> 04-26-2012 15:11:44.154 job end 1335467343879(test) 0 1
>>> 04-26-2012 15:11:34.159 document deletion (solr)
>>> file:/C:/testcrawl/there.txt 200 0 117
>>> 04-26-2012 15:11:24.690 read document C:\testcrawl OK 0 1
>>> 04-26-2012 15:11:24.494 job start 1335467343879(test) 0 1
>>>
>>> So it appears that either something changed in Solr, or SSL support is
>>> broken, or your network is not permitting a valid HTTP response for
>>> some reason.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Apr 26, 2012 at 11:10 AM, Karl Wright<da...@gmail.com> wrote:
>>>>
>>>> Hi Erlend,
>>>>
>>>> Can you try the following:
>>>>
>>>> (1) Make a fresh Solr checkout of 3.6 or whatever Solr version you are
>>>> using, and build it
>>>> (2) Start it
>>>> (3) Run a simple filesystem crawl using a Solr connection that is
>>>> created with the default values
>>>> (4) Delete a file in your filesystem that was crawled
>>>> (5) Crawl again
>>>>
>>>> Does the deletion happen OK?
>>>>
>>>> AFAIK, nothing has changed in the Solr connector that should affect
>>>> the ability to delete. This test will confirm that it is still
>>>> working.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Thu, Apr 26, 2012 at 10:19 AM, Erlend Garåsen
>>>> <e....@usit.uio.no> wrote:
>>>>>
>>>>> It seems that MCF cannot delete documents from Solr. A timeout
>>>>> occurs, and
>>>>> the job stops after a while.
>>>>>
>>>>> This is what I can see from the log:
>>>>> WARN 2012-04-20 18:24:30,373 (Worker thread '16') - Service
>>>>> interruption
>>>>> reported for job 1327930125433 connection 'Web crawler': Ingestion API
>>>>> socket timeout exception waiting for response code: Read timed out;
>>>>> ingestion will be retried again later
>>>>>
>>>>> If I take a further look in Simple History, it seems that this error is
>>>>> related to document deletion.
>>>>>
>>>>> I have tried to delete the document manually by using curl from the
>>>>> same
>>>>> server MCF is installed on in case we have some access restrictions,
>>>>> but
>>>>> Curr succeeded.
>>>>>
>>>>> We do not have any problems with adding, the timeout only occurs while
>>>>> deleting documents.
>>>>>
>>>>> I have checked our Solr configuration. MCF does use the correct path
>>>>> for
>>>>> document deletion, i.e. /update.
>>>>>
>>>>> The correct realm, username and password for our Solr server are
>>>>> entered
>>>>> correctly and the SSL certificate is valid as well.
>>>>>
>>>>> Erlend
>>>>>
>>>>> --
>>>>> Erlend Garåsen
>>>>> Center for Information Technology Services
>>>>> University of Oslo
>>>>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>>>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968,
>>>>> VIP: 31050
>>
>>
>>
>
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050