You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Markus Schuch (JIRA)" <ji...@apache.org> on 2018/04/30 19:40:00 UTC

[jira] [Commented] (SOLR-2416) Solr Cell fails to index Zip file contents

    [ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458950#comment-16458950 ] 

Markus Schuch commented on SOLR-2416:
-------------------------------------

I just tested ZIP extraction with 7.3.0 and i can confirm that due to the new default behavior of Tika 1.15+ the Extracting Request Handler extracts the text of the embedded documents as well and not only the file names as stated in the issue description.

So this was fixed with SOLR-10335.

> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
>                 Key: SOLR-2416
>                 URL: https://issues.apache.org/jira/browse/SOLR-2416
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Jayendra Patil
>            Priority: Major
>             Fix For: 6.0
>
>         Attachments: SOLR-2416_ExtractingDocumentLoader.patch, SOLR-4216.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr Cell (ExtractingDocumentLoader.java) and Data Import handler (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - https://issues.apache.org/jira/browse/SOLR-2332.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org