You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Surendranadh Puranam (JIRA)" <ji...@apache.org> on 2011/05/27 14:22:47 UTC

[jira] [Created] (SOLR-2550) Apache Solr needs an updated TIKA version in its extraction libraries

Apache Solr needs an updated TIKA version in its extraction libraries
---------------------------------------------------------------------

                 Key: SOLR-2550
                 URL: https://issues.apache.org/jira/browse/SOLR-2550
             Project: Solr
          Issue Type: Bug
          Components: contrib - Solr Cell (Tika extraction)
    Affects Versions: 1.4.1
            Reporter: Surendranadh Puranam
            Priority: Critical
             Fix For: 1.4.2


There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.

Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (SOLR-2550) Apache Solr needs an updated TIKA version in its extraction libraries

Posted by "Jan Høydahl (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/SOLR-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046476#comment-13046476 ] 

Jan Høydahl commented on SOLR-2550:
-----------------------------------

There will probably be no 1.4.2 release. Recommend to vote for SOLR-2372 to get TIKA0.9 into Solr 3.3, and then upgrade to 3.3 (which is trivial).

> Apache Solr needs an updated TIKA version in its extraction libraries
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2550
>                 URL: https://issues.apache.org/jira/browse/SOLR-2550
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Surendranadh Puranam
>            Priority: Critical
>              Labels: extraction, indexing, pdf, secure
>             Fix For: 1.4.2
>
>
> There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.
> Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Resolved] (SOLR-2550) Apache Solr needs an updated TIKA version in its extraction libraries

Posted by "Steven Rowe (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/SOLR-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steven Rowe resolved SOLR-2550.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 3.1
         Assignee: Steven Rowe

Solr Cell upgraded to Tika 0.8, which included PDFbox 1.1.0, in the Solr 3.1 release.

                
> Apache Solr needs an updated TIKA version in its extraction libraries
> ---------------------------------------------------------------------
>
>                 Key: SOLR-2550
>                 URL: https://issues.apache.org/jira/browse/SOLR-2550
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 1.4.1
>            Reporter: Surendranadh Puranam
>            Assignee: Steven Rowe
>            Priority: Critical
>              Labels: extraction, indexing, pdf, secure
>             Fix For: 3.1, 1.4.2
>
>
> There are issues with some PDF documents when it gets indexed (extracted?). There is an issue being fixed by PDFBOX in the version PDFBox 1.1.0. But Apache solr 1.4.1 doesn't have the latest version of these jars which is causing these failures. We have tika-pareser0.4 in this solr 1.4.1 distribution which has to be updated to 0.9 version.
> Reference for the issue and the solution : https://issues.apache.org/jira/browse/PDFBOX-617

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org