You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2012/01/16 13:20:39 UTC

[jira] [Created] (TIKA-841) User supplied parsers should be preferred

User supplied parsers should be preferred
-----------------------------------------

                 Key: TIKA-841
                 URL: https://issues.apache.org/jira/browse/TIKA-841
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.0
            Reporter: Nick Burch


Currently, user supplied Detectors are preferred over built in ones, via logic in DefaultDetector. This allows users to easily add their own detectors which are used in preference, as well as making it easy to override the built in ones.

However, there is no such logic for Parsers. Instead, the last parser in the DefaultParser / CompositeParser list for a given mimetype will be used (the map only holds one entry, so last in wins). This makes it hard for users to override the parser for a type that the builtin parsers support, as it isn't predictable where in the list parsers will go

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-841) User supplied parsers should be preferred

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188469#comment-13188469 ] 

Nick Burch commented on TIKA-841:
---------------------------------

Fixed in r1232902, with code similar to the DefaultDetector code
                
> User supplied parsers should be preferred
> -----------------------------------------
>
>                 Key: TIKA-841
>                 URL: https://issues.apache.org/jira/browse/TIKA-841
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>             Fix For: 1.1
>
>
> Currently, user supplied Detectors are preferred over built in ones, via logic in DefaultDetector. This allows users to easily add their own detectors which are used in preference, as well as making it easy to override the built in ones.
> However, there is no such logic for Parsers. Instead, the last parser in the DefaultParser / CompositeParser list for a given mimetype will be used (the map only holds one entry, so last in wins). This makes it hard for users to override the parser for a type that the builtin parsers support, as it isn't predictable where in the list parsers will go

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-841) User supplied parsers should be preferred

Posted by "Nick Burch (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch resolved TIKA-841.
-----------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
    
> User supplied parsers should be preferred
> -----------------------------------------
>
>                 Key: TIKA-841
>                 URL: https://issues.apache.org/jira/browse/TIKA-841
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>             Fix For: 1.1
>
>
> Currently, user supplied Detectors are preferred over built in ones, via logic in DefaultDetector. This allows users to easily add their own detectors which are used in preference, as well as making it easy to override the built in ones.
> However, there is no such logic for Parsers. Instead, the last parser in the DefaultParser / CompositeParser list for a given mimetype will be used (the map only holds one entry, so last in wins). This makes it hard for users to override the parser for a type that the builtin parsers support, as it isn't predictable where in the list parsers will go

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-841) User supplied parsers should be preferred

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186894#comment-13186894 ] 

Nick Burch commented on TIKA-841:
---------------------------------

I would propose to fix this by adding logic similar to that in DefaultDetector to DefaultParser. This would apply only to the ServiceLoader constructor, and would ensure that user parsers go last in the list. The (MediaTypeRegistry, List<Parser>) constructor will allow people to control their own ordering if they want
                
> User supplied parsers should be preferred
> -----------------------------------------
>
>                 Key: TIKA-841
>                 URL: https://issues.apache.org/jira/browse/TIKA-841
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Nick Burch
>
> Currently, user supplied Detectors are preferred over built in ones, via logic in DefaultDetector. This allows users to easily add their own detectors which are used in preference, as well as making it easy to override the built in ones.
> However, there is no such logic for Parsers. Instead, the last parser in the DefaultParser / CompositeParser list for a given mimetype will be used (the map only holds one entry, so last in wins). This makes it hard for users to override the parser for a type that the builtin parsers support, as it isn't predictable where in the list parsers will go

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira