You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (Created JIRA)" <ji...@apache.org> on 2011/11/16 01:11:52 UTC

[jira] [Created] (SOLR-2901) Upgrade Solr to Tika 1.0

Upgrade Solr to Tika 1.0
------------------------

                 Key: SOLR-2901
                 URL: https://issues.apache.org/jira/browse/SOLR-2901
             Project: Solr
          Issue Type: Improvement
          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
            Reporter: Jan Høydahl


Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187531#comment-13187531 ] 

Jan Høydahl commented on SOLR-2901:
-----------------------------------

Thanks for looking at it. I'd prefer if the old spanish text would still have been detected as spanish :) Yet another proof that the Tika algorithm is not super strong with short texts of very similar languages, but as you say, "we knew that"..
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

Fixes bug in the new stream.type code
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Resolved JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl resolved SOLR-2901.
-------------------------------

    Resolution: Fixed

Checked in jdom-1.0.jar with LICENSE and NOTICE files in both 3.x and trunk.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

Cleanup excess imports. Think it's good to go.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188489#comment-13188489 ] 

Jan Høydahl commented on SOLR-2901:
-----------------------------------

Could someone fix the classpath config for IntelliJ IDEA in dev-tools?
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187435#comment-13187435 ] 

Robert Muir commented on SOLR-2901:
-----------------------------------

Patch seems to work... though the test is more evidence in addition to Mike's experiments
that something is seriously up with spanish/galician and tika's detector :)

                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Issue Comment Edited] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Issue Comment Edited JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191331#comment-13191331 ] 

Jan Høydahl edited comment on SOLR-2901 at 1/23/12 10:32 PM:
-------------------------------------------------------------

New patch. Bumps Tika version in CHANGES files. Replaces deprecated getParser(mt) (2nd take, this time DefaultParser):
{noformat}
-      parser = config.getParser(mt);
+      parser = new DefaultParser().getParsers().get(mt);
{noformat}

                
      was (Author: janhoy):
    New patch. Bumps Tika version in CHANGES files. Replaces deprecated getParser(mt) wihch I believe is equivalent:
{noformat}
-      parser = config.getParser(mt);
+      parser = new CompositeParser().getParsers().get(mt);
{noformat}

Ready for commit?
                  
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Assigned JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl reassigned SOLR-2901:
---------------------------------

    Assignee: Jan Høydahl
    
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

First patch version.

* Tika 1.0 removes previous deprecations, so this patch changes how the API is used in a few places. 
* For MailEntityProcessor we also improve detection by passing part's fileName in as MetaData
* For ExtractingDocumentLoader we now provide stream's content type as hint in MetaData, but this is not tested extensively..
* Added tests for new languages detected
* Updated eclipse classpath file to point to the new jars. Nothing done for other IDEs

One place still uses a deprecated method, that is in ExtractingDocumentLoader where we say parser = config.getParser(mediaType) - did not find the new equivalent.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Fix Version/s: 4.0
                   3.6
    
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

New patch. Bumps Tika version in CHANGES files. Replaces deprecated getParser(mt) wihch I believe is equivalent:
{noformat}
-      parser = config.getParser(mt);
+      parser = new CompositeParser().getParsers().get(mt);
{noformat}

Ready for commit?
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Commented JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187416#comment-13187416 ] 

Jan Høydahl commented on SOLR-2901:
-----------------------------------

If you want to try this patch, you also need three jars to be put in contrib/extraction/lib:
http://dl.dropbox.com/u/20080302/tikajars/commons-compress-1.3.jar
http://dl.dropbox.com/u/20080302/tikajars/tika-core-1.0.jar
http://dl.dropbox.com/u/20080302/tikajars/tika-parsers-1.0.jar
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

Small change - respect potential custom tika config also when loading parser for stream.type. Added a few exceptional tests for wrong stream.type.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Resolved JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl resolved SOLR-2901.
-------------------------------

    Resolution: Fixed

Checked in to trunk and merged back to 3x
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Reopened JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl reopened SOLR-2901:
-------------------------------


Re-opening, as the {{jdom-1.0.jar}} must also be included, as a dependency for {{Rome}} used by {{FeedParser}}
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Steven Rowe (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188621#comment-13188621 ] 

Steven Rowe commented on SOLR-2901:
-----------------------------------

bq. Could someone fix the classpath config for IntelliJ IDEA in dev-tools?

IntelliJ IDEA effectively grabs {{\*\*/lib/\*.jar}} for its classpath (where {{**}} refers to all modules with {{lib/}} directories), rather than referring to explicitly named jar files, so as long as you rename jars (or add or remove jars, for that matter) in library directories that were already there, nothing needs to be done.

However, the Maven configuration will need fixing, since dependencies' versions are by contrast explicitly declared: In {{dev-tools/maven/pom.xml.template}}, the {{tika.version}} property setting  should be changed from {{<tika.version>0.10</tika.version>}} to {{<tika.version>1.0</tika.version>}}.  (This property is used in both the {{tika-core}} and the {{tika-parsers}} dependency version declarations in the {{<dependencyManagement>}} section in the same file.)  The {{commons-compress}} dependency is handled through Maven's transitive dependency mechanism, since it's declared as a dependency in the {{tika-parsers}} POM, and so no configuration changes are required for it.
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2901) Upgrade Solr to Tika 1.0

Posted by "Jan Høydahl (Updated JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Høydahl updated SOLR-2901:
------------------------------

    Attachment: SOLR-2901.patch

This even includes the pom.xml.template change :)
                
> Upgrade Solr to Tika 1.0
> ------------------------
>
>                 Key: SOLR-2901
>                 URL: https://issues.apache.org/jira/browse/SOLR-2901
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - LangId, contrib - Solr Cell (Tika extraction)
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-2901.patch, SOLR-2901.patch, SOLR-2901.patch
>
>
> Tika 1.0 was released November 7th and includes a number of improvements: http://tika.apache.org/1.0/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org