You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rahul Warawdekar <ra...@gmail.com> on 2011/05/26 15:22:45 UTC

Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

Hi All,

I am using Solr 3.1 for one of our search based applications.
We are using DIH to index our data and TikaEntityProcessor to index
attachments.
Currently we are running into an issue while extracting content from one of
our MS Excel 2007 files, using TikaEntityProcessor.

The issue is the TikaEntityProcessor is hung without throwing any exception
which in tuen causes the indexing to be hung on the server.

Has anyone faced a similar kind of issue in the past with
TikaEntityProcessor ?

Also, does someone know of a way to just skip this type of behaviour for
that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Markus,

It is Tika.
I tried using tika standalone.

On 5/26/11, Markus Jelsma <ma...@openindex.io> wrote:
> Can you rule out Tika or Solr by trying to parse the file with a stand-alone
> Tika?
>
>> Hi All,
>>
>> I am using Solr 3.1 for one of our search based applications.
>> We are using DIH to index our data and TikaEntityProcessor to index
>> attachments.
>> Currently we are running into an issue while extracting content from one
>> of
>> our MS Excel 2007 files, using TikaEntityProcessor.
>>
>> The issue is the TikaEntityProcessor is hung without throwing any
>> exception
>> which in tuen causes the indexing to be hung on the server.
>>
>> Has anyone faced a similar kind of issue in the past with
>> TikaEntityProcessor ?
>>
>> Also, does someone know of a way to just skip this type of behaviour for
>> that file and move to the next document to be indexed ?
>


-- 
Thanks and Regards
Rahul A. Warawdekar

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

Posted by Markus Jelsma <ma...@openindex.io>.
Can you rule out Tika or Solr by trying to parse the file with a stand-alone 
Tika?

> Hi All,
> 
> I am using Solr 3.1 for one of our search based applications.
> We are using DIH to index our data and TikaEntityProcessor to index
> attachments.
> Currently we are running into an issue while extracting content from one of
> our MS Excel 2007 files, using TikaEntityProcessor.
> 
> The issue is the TikaEntityProcessor is hung without throwing any exception
> which in tuen causes the indexing to be hung on the server.
> 
> Has anyone faced a similar kind of issue in the past with
> TikaEntityProcessor ?
> 
> Also, does someone know of a way to just skip this type of behaviour for
> that file and move to the next document to be indexed ?

Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

Posted by Gora Mohanty <go...@mimirtech.com>.
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar
<ra...@gmail.com> wrote:
> Hi All,
>
> I am using Solr 3.1 for one of our search based applications.
> We are using DIH to index our data and TikaEntityProcessor to index
> attachments.
> Currently we are running into an issue while extracting content from one of
> our MS Excel 2007 files, using TikaEntityProcessor.
[...]

Have not done this with Tika, but we have run into similar
issues while trying to convert Microsoft Word documents
externally, before indexing to Solr. It turned out in our case
that these documents were referring external URLs, which
were not always accessible to our converter sitting behind
a firewall.

> Also, does someone know of a way to just skip this type of behaviour for
> that file and move to the next document to be indexed ?
[...]

This is probably not of much help to you, but what we ended
up doing was killing a conversion process that was taking
longer than a maximum time.

Regards,
Gora