You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2011/06/09 13:29:59 UTC

[jira] [Commented] (SOLR-2550) Apache Solr needs an updated TIKA version in its extraction libraries

    [ https://issues.apache.org/jira/browse/SOLR-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046476#comment-13046476 ] 

Jan Høydahl commented on SOLR-2550:
-----------------------------------

There will probably be no 1.4.2 release. Recommend to vote for SOLR-2372 to get TIKA0.9 into Solr 3.3, and then upgrade to 3.3 (which is trivial).

> Apache Solr needs an updated TIKA version in its extraction libraries
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2550
>                 URL: https://issues.apache.org/jira/browse/SOLR-2550
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Surendranadh Puranam
>            Priority: Critical
>              Labels: extraction, indexing, pdf, secure
>             Fix For: 1.4.2
>
>
> There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.
> Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org