You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org> on 2011/12/20 13:43:30 UTC

[jira] [Commented] (TIKA-686) Split tika-parsers into separate components

    [ https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173147#comment-13173147 ] 

Antoni Mylka commented on TIKA-686:
-----------------------------------

Why keep this issue open?

PdfParser appeared in PdfBox (PDFBOX-1132). Keeping both hardly makes sense and has already been identified as a problem (TIKA-810). Pushing parsers upstream covers the "I'm in favor of anything that helps with avoiding dependencies on POI" use case of Ken. We agree that we keep the dependency from tika-parsers to POI (doubts about that dispelled in http://mail-archives.apache.org/mod_mbox/tika-dev/201112.mbox/%3C4EEBA9CA.9030900%40gmail.com%3E). With this dependency, it will be possible to use the maven exclusion construct, exactly as described in my "I like exclusions better" post. So all known use cases are covered.

Since we can't actually remove the PdfParser from Tika now (as that would definitely be a backward-incompatible change), we should deprecate it, remove it from the /META-INF/services/org.apache.tika.parser.Parser and replace the implementation with a delegation to the pdfbox version, but that would fall within the scope of TIKA-810.

Anyway, this can be closed. The discussion can continue in TIKA-810 and in some new issue for POI.

WDYT?
                
> Split tika-parsers into separate components
> -------------------------------------------
>
>                 Key: TIKA-686
>                 URL: https://issues.apache.org/jira/browse/TIKA-686
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Christopher Currie
>            Priority: Minor
>
> The email thread [1] from two years ago that led to splitting Tika into separate components also suggested splitting tika-parsers into separate components based on dependencies. This would be extremely useful, especially in cases where a given parser has no dependencies beyond tika-core. Please consider refactoring the parsers into separate components for 1.0.
> [1] http://markmail.org/message/tavirkqhn6r2szrz

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira