You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Roland Everaert <re...@gmail.com> on 2013/10/17 14:17:05 UTC

A less picky manifoldCF about solr errors?

Hi,

I helped a customer to deploy solr+manifoldcf to index files from a windows
share drive. But every time solr is sending back an error message, the
manifoldcf jobs abort, which is not really convenient for hour long
indexing.

So is there a possibility to configure manifold so it doesn't stopped every
time solr return an http code different from 200?


Thanks,


Roland.

Re: A less picky manifoldCF about solr errors?

Posted by Roland Everaert <re...@gmail.com>.
Hi,

My customer manage to reproduce the error here is the exception:

ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.
SolrException; null:java.lang.RuntimeException:
java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V


According to the solr mailing list, solr (or tika) is bundled with the
wrong version of a jar. The customer is currently testing with the new
version of the jar. I am waiting their result. I will open a JIRA issue.


Regards,


Roland.



On Thu, Oct 17, 2013 at 2:48 PM, Karl Wright <da...@gmail.com> wrote:

> Please let me know what the actual exception trace is.  Thanks!
> Karl
>
>
> On Thu, Oct 17, 2013 at 8:47 AM, Roland Everaert <re...@gmail.com>wrote:
>
>> We already do that. But, solr is still raising exception for some file
>> types, I have to wait for the customer to provide me the corresponding log
>> from solr and message received by the mcf job.
>>
>>
>> Regards,
>>
>>
>> Roland.
>>
>>
>> On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Ah, here it is:
>>>
>>>
>>> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>>>
>>> Karl
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Roland,
>>>>
>>>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you
>>>> are seeing, there is a way to disable them.  I don't remember precisely
>>>> what you do, but it has been posted to this list (and others) so a google
>>>> search should find that for you.
>>>>
>>>> Thanks!
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <re...@gmail.com>wrote:
>>>>
>>>>> So far we had only to deal with HTTP code 500, because solr was not
>>>>> able to process some file types. We manage to tel solr to ignore tika
>>>>> exception. This helps us quite a lot, but solr as problem with processing
>>>>> some file types, and I have not yet find a way to tell solr to basically
>>>>> skip errors, while still logging them.
>>>>>
>>>>> I will check with the customer to get the error, but it was yesterday
>>>>> when it shows up and they have continued with the indexing (we are still at
>>>>> the initial indexing of the repository) and the logs with errors have
>>>>> disappeared.
>>>>>
>>>>>
>>>>> Thanks for your support,
>>>>>
>>>>>
>>>>> Roland.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com>wrote:
>>>>>
>>>>>> Hi Roland,
>>>>>>
>>>>>> It depends on what the error code is.  There is quite a bit of logic
>>>>>> in the Solr connector (and in ManifoldCF itself) for handling errors of
>>>>>> different kinds.  Fundamentally there are two main kinds of error condition
>>>>>> - one which causes a retry (and can, if so specified, cause either the
>>>>>> offending document to be skipped or the job aborted) and another which
>>>>>> always causes a job to abort.  The Solr connector has to decide based on
>>>>>> limited information exactly what to do.  General HTTP error codes such as
>>>>>> "500" errors, for example, contain little information and look just the
>>>>>> same whether the error represent a document Tika is unhappy with, or
>>>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>>>
>>>>>> If you can provide more detailed information as to the kind of
>>>>>> error(s) you are seeing then we can advise you further.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <
>>>>>> reveatwork@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>>>>>> windows share drive. But every time solr is sending back an error message,
>>>>>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>>>>>> indexing.
>>>>>>>
>>>>>>> So is there a possibility to configure manifold so it doesn't
>>>>>>> stopped every time solr return an http code different from 200?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Roland.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: A less picky manifoldCF about solr errors?

Posted by Karl Wright <da...@gmail.com>.
Please let me know what the actual exception trace is.  Thanks!
Karl


On Thu, Oct 17, 2013 at 8:47 AM, Roland Everaert <re...@gmail.com>wrote:

> We already do that. But, solr is still raising exception for some file
> types, I have to wait for the customer to provide me the corresponding log
> from solr and message received by the mcf job.
>
>
> Regards,
>
>
> Roland.
>
>
> On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Ah, here it is:
>>
>>
>> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>>
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Roland,
>>>
>>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you
>>> are seeing, there is a way to disable them.  I don't remember precisely
>>> what you do, but it has been posted to this list (and others) so a google
>>> search should find that for you.
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <re...@gmail.com>wrote:
>>>
>>>> So far we had only to deal with HTTP code 500, because solr was not
>>>> able to process some file types. We manage to tel solr to ignore tika
>>>> exception. This helps us quite a lot, but solr as problem with processing
>>>> some file types, and I have not yet find a way to tell solr to basically
>>>> skip errors, while still logging them.
>>>>
>>>> I will check with the customer to get the error, but it was yesterday
>>>> when it shows up and they have continued with the indexing (we are still at
>>>> the initial indexing of the repository) and the logs with errors have
>>>> disappeared.
>>>>
>>>>
>>>> Thanks for your support,
>>>>
>>>>
>>>> Roland.
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com>wrote:
>>>>
>>>>> Hi Roland,
>>>>>
>>>>> It depends on what the error code is.  There is quite a bit of logic
>>>>> in the Solr connector (and in ManifoldCF itself) for handling errors of
>>>>> different kinds.  Fundamentally there are two main kinds of error condition
>>>>> - one which causes a retry (and can, if so specified, cause either the
>>>>> offending document to be skipped or the job aborted) and another which
>>>>> always causes a job to abort.  The Solr connector has to decide based on
>>>>> limited information exactly what to do.  General HTTP error codes such as
>>>>> "500" errors, for example, contain little information and look just the
>>>>> same whether the error represent a document Tika is unhappy with, or
>>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>>
>>>>> If you can provide more detailed information as to the kind of
>>>>> error(s) you are seeing then we can advise you further.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <reveatwork@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>>>>> windows share drive. But every time solr is sending back an error message,
>>>>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>>>>> indexing.
>>>>>>
>>>>>> So is there a possibility to configure manifold so it doesn't stopped
>>>>>> every time solr return an http code different from 200?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> Roland.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: A less picky manifoldCF about solr errors?

Posted by Roland Everaert <re...@gmail.com>.
We already do that. But, solr is still raising exception for some file
types, I have to wait for the customer to provide me the corresponding log
from solr and message received by the mcf job.


Regards,


Roland.


On Thu, Oct 17, 2013 at 2:41 PM, Karl Wright <da...@gmail.com> wrote:

> Ah, here it is:
>
> http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html
>
> Karl
>
>
>
> On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Roland,
>>
>> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you are
>> seeing, there is a way to disable them.  I don't remember precisely what
>> you do, but it has been posted to this list (and others) so a google search
>> should find that for you.
>>
>> Thanks!
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <re...@gmail.com>wrote:
>>
>>> So far we had only to deal with HTTP code 500, because solr was not able
>>> to process some file types. We manage to tel solr to ignore tika exception.
>>> This helps us quite a lot, but solr as problem with processing some file
>>> types, and I have not yet find a way to tell solr to basically skip errors,
>>> while still logging them.
>>>
>>> I will check with the customer to get the error, but it was yesterday
>>> when it shows up and they have continued with the indexing (we are still at
>>> the initial indexing of the repository) and the logs with errors have
>>> disappeared.
>>>
>>>
>>> Thanks for your support,
>>>
>>>
>>> Roland.
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Roland,
>>>>
>>>> It depends on what the error code is.  There is quite a bit of logic in
>>>> the Solr connector (and in ManifoldCF itself) for handling errors of
>>>> different kinds.  Fundamentally there are two main kinds of error condition
>>>> - one which causes a retry (and can, if so specified, cause either the
>>>> offending document to be skipped or the job aborted) and another which
>>>> always causes a job to abort.  The Solr connector has to decide based on
>>>> limited information exactly what to do.  General HTTP error codes such as
>>>> "500" errors, for example, contain little information and look just the
>>>> same whether the error represent a document Tika is unhappy with, or
>>>> something more fundamental, like a complete misconfiguration of Solr.
>>>>
>>>> If you can provide more detailed information as to the kind of error(s)
>>>> you are seeing then we can advise you further.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <re...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>>>> windows share drive. But every time solr is sending back an error message,
>>>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>>>> indexing.
>>>>>
>>>>> So is there a possibility to configure manifold so it doesn't stopped
>>>>> every time solr return an http code different from 200?
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Roland.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: A less picky manifoldCF about solr errors?

Posted by Karl Wright <da...@gmail.com>.
Ah, here it is:

http://lucene.472066.n3.nabble.com/ignoreTikaException-value-td3645906.html

Karl



On Thu, Oct 17, 2013 at 8:39 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Roland,
>
> Usually 500 errors are from Tika (aka Solr Cell).  If that's what you are
> seeing, there is a way to disable them.  I don't remember precisely what
> you do, but it has been posted to this list (and others) so a google search
> should find that for you.
>
> Thanks!
> Karl
>
>
>
> On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <re...@gmail.com>wrote:
>
>> So far we had only to deal with HTTP code 500, because solr was not able
>> to process some file types. We manage to tel solr to ignore tika exception.
>> This helps us quite a lot, but solr as problem with processing some file
>> types, and I have not yet find a way to tell solr to basically skip errors,
>> while still logging them.
>>
>> I will check with the customer to get the error, but it was yesterday
>> when it shows up and they have continued with the indexing (we are still at
>> the initial indexing of the repository) and the logs with errors have
>> disappeared.
>>
>>
>> Thanks for your support,
>>
>>
>> Roland.
>>
>>
>>
>> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Roland,
>>>
>>> It depends on what the error code is.  There is quite a bit of logic in
>>> the Solr connector (and in ManifoldCF itself) for handling errors of
>>> different kinds.  Fundamentally there are two main kinds of error condition
>>> - one which causes a retry (and can, if so specified, cause either the
>>> offending document to be skipped or the job aborted) and another which
>>> always causes a job to abort.  The Solr connector has to decide based on
>>> limited information exactly what to do.  General HTTP error codes such as
>>> "500" errors, for example, contain little information and look just the
>>> same whether the error represent a document Tika is unhappy with, or
>>> something more fundamental, like a complete misconfiguration of Solr.
>>>
>>> If you can provide more detailed information as to the kind of error(s)
>>> you are seeing then we can advise you further.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <re...@gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>>> windows share drive. But every time solr is sending back an error message,
>>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>>> indexing.
>>>>
>>>> So is there a possibility to configure manifold so it doesn't stopped
>>>> every time solr return an http code different from 200?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> Roland.
>>>>
>>>
>>>
>>
>

Re: A less picky manifoldCF about solr errors?

Posted by Karl Wright <da...@gmail.com>.
Hi Roland,

Usually 500 errors are from Tika (aka Solr Cell).  If that's what you are
seeing, there is a way to disable them.  I don't remember precisely what
you do, but it has been posted to this list (and others) so a google search
should find that for you.

Thanks!
Karl



On Thu, Oct 17, 2013 at 8:37 AM, Roland Everaert <re...@gmail.com>wrote:

> So far we had only to deal with HTTP code 500, because solr was not able
> to process some file types. We manage to tel solr to ignore tika exception.
> This helps us quite a lot, but solr as problem with processing some file
> types, and I have not yet find a way to tell solr to basically skip errors,
> while still logging them.
>
> I will check with the customer to get the error, but it was yesterday when
> it shows up and they have continued with the indexing (we are still at the
> initial indexing of the repository) and the logs with errors have
> disappeared.
>
>
> Thanks for your support,
>
>
> Roland.
>
>
>
> On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Roland,
>>
>> It depends on what the error code is.  There is quite a bit of logic in
>> the Solr connector (and in ManifoldCF itself) for handling errors of
>> different kinds.  Fundamentally there are two main kinds of error condition
>> - one which causes a retry (and can, if so specified, cause either the
>> offending document to be skipped or the job aborted) and another which
>> always causes a job to abort.  The Solr connector has to decide based on
>> limited information exactly what to do.  General HTTP error codes such as
>> "500" errors, for example, contain little information and look just the
>> same whether the error represent a document Tika is unhappy with, or
>> something more fundamental, like a complete misconfiguration of Solr.
>>
>> If you can provide more detailed information as to the kind of error(s)
>> you are seeing then we can advise you further.
>>
>> Karl
>>
>>
>>
>> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <re...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I helped a customer to deploy solr+manifoldcf to index files from a
>>> windows share drive. But every time solr is sending back an error message,
>>> the manifoldcf jobs abort, which is not really convenient for hour long
>>> indexing.
>>>
>>> So is there a possibility to configure manifold so it doesn't stopped
>>> every time solr return an http code different from 200?
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Roland.
>>>
>>
>>
>

Re: A less picky manifoldCF about solr errors?

Posted by Roland Everaert <re...@gmail.com>.
So far we had only to deal with HTTP code 500, because solr was not able to
process some file types. We manage to tel solr to ignore tika exception.
This helps us quite a lot, but solr as problem with processing some file
types, and I have not yet find a way to tell solr to basically skip errors,
while still logging them.

I will check with the customer to get the error, but it was yesterday when
it shows up and they have continued with the indexing (we are still at the
initial indexing of the repository) and the logs with errors have
disappeared.


Thanks for your support,


Roland.



On Thu, Oct 17, 2013 at 2:22 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Roland,
>
> It depends on what the error code is.  There is quite a bit of logic in
> the Solr connector (and in ManifoldCF itself) for handling errors of
> different kinds.  Fundamentally there are two main kinds of error condition
> - one which causes a retry (and can, if so specified, cause either the
> offending document to be skipped or the job aborted) and another which
> always causes a job to abort.  The Solr connector has to decide based on
> limited information exactly what to do.  General HTTP error codes such as
> "500" errors, for example, contain little information and look just the
> same whether the error represent a document Tika is unhappy with, or
> something more fundamental, like a complete misconfiguration of Solr.
>
> If you can provide more detailed information as to the kind of error(s)
> you are seeing then we can advise you further.
>
> Karl
>
>
>
> On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <re...@gmail.com>wrote:
>
>> Hi,
>>
>> I helped a customer to deploy solr+manifoldcf to index files from a
>> windows share drive. But every time solr is sending back an error message,
>> the manifoldcf jobs abort, which is not really convenient for hour long
>> indexing.
>>
>> So is there a possibility to configure manifold so it doesn't stopped
>> every time solr return an http code different from 200?
>>
>>
>> Thanks,
>>
>>
>> Roland.
>>
>
>

Re: A less picky manifoldCF about solr errors?

Posted by Karl Wright <da...@gmail.com>.
Hi Roland,

It depends on what the error code is.  There is quite a bit of logic in the
Solr connector (and in ManifoldCF itself) for handling errors of different
kinds.  Fundamentally there are two main kinds of error condition - one
which causes a retry (and can, if so specified, cause either the offending
document to be skipped or the job aborted) and another which always causes
a job to abort.  The Solr connector has to decide based on limited
information exactly what to do.  General HTTP error codes such as "500"
errors, for example, contain little information and look just the same
whether the error represent a document Tika is unhappy with, or something
more fundamental, like a complete misconfiguration of Solr.

If you can provide more detailed information as to the kind of error(s) you
are seeing then we can advise you further.

Karl



On Thu, Oct 17, 2013 at 8:17 AM, Roland Everaert <re...@gmail.com>wrote:

> Hi,
>
> I helped a customer to deploy solr+manifoldcf to index files from a
> windows share drive. But every time solr is sending back an error message,
> the manifoldcf jobs abort, which is not really convenient for hour long
> indexing.
>
> So is there a possibility to configure manifold so it doesn't stopped
> every time solr return an http code different from 200?
>
>
> Thanks,
>
>
> Roland.
>