You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kerwin <ke...@gmail.com> on 2009/11/21 10:06:37 UTC

Issue Indexing zip file content in Solr 1.4

Hi,

 Has anyone faced this issue? If yes why is Tika 0.4 bundled with solr 1.4
.. Instead it should be Tika 0.5...

Problem:
I have a zip file with multiple files of different formats in it.
I am trying to index the zip file content with Solr 1.4 but the Autodetect
parser context is not being passed with the current 1.4 distribution of the
extractingDocumentLoader.So I am unable to index zip file content since an
Empty parser is being created. After indexing the file only the package
entries are displayed as content.
I replaced Tika 0.4 that come with the solr 1.4 distribution with Tika 0.5
along wih some other POI jars and this seems to work as the context is now
being passed and the delegate parser is able to deletate to the correct
parser.

In Tika 0.4 the Autodetect parser does not create the context but in Tika
0.5 it creates the context before calling the parse method.

Am I missing something? Please advise.

Re: Issue Indexing zip file content in Solr 1.4

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 21, 2009, at 4:06 AM, Kerwin wrote:

> Hi,
> 
> Has anyone faced this issue? If yes why is Tika 0.4 bundled with solr 1.4
> .. Instead it should be Tika 0.5...

0.5 was released after Solr 1.4.  See https://issues.apache.org/jira/browse/SOLR-1567

> 
> Problem:
> I have a zip file with multiple files of different formats in it.
> I am trying to index the zip file content with Solr 1.4 but the Autodetect
> parser context is not being passed with the current 1.4 distribution of the
> extractingDocumentLoader.So I am unable to index zip file content since an
> Empty parser is being created. After indexing the file only the package
> entries are displayed as content.
> I replaced Tika 0.4 that come with the solr 1.4 distribution with Tika 0.5
> along wih some other POI jars and this seems to work as the context is now
> being passed and the delegate parser is able to deletate to the correct
> parser.
> 
> In Tika 0.4 the Autodetect parser does not create the context but in Tika
> 0.5 it creates the context before calling the parse method.
> 
> Am I missing something? Please advise.

Sounds like we just need to upgrade.  What you did is perfectly reasonable.