You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/09/14 18:44:34 UTC

[jira] Commented: (TIKA-514) Provide constructor for AutoDetectParser that has explicit list of supported parsers

    [ https://issues.apache.org/jira/browse/TIKA-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909317#action_12909317 ] 

Ken Krugler commented on TIKA-514:
----------------------------------

As Jukka noted, the CompositeParser class could be cleaned up, now that parsers self-describe their supported types.

{quote}
BTW, the need to pass a MediaType->Parser map to
CompositeParser.setParsers() is a remnant of the time when we didn't
have the Parser.getSupportedTypes() method. Nowadays it would probably
be better to simply pass a collection of parsers and use
getSupportedTypes() calls for dispatch during CompositeParser.parse().
{quote}


> Provide constructor for AutoDetectParser that has explicit list of supported parsers
> ------------------------------------------------------------------------------------
>
>                 Key: TIKA-514
>                 URL: https://issues.apache.org/jira/browse/TIKA-514
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>             Fix For: 0.8
>
>
> To reduce the size of the Tika dependency chain, it's useful to exclude the supporting jars for types that don't need to process (e.g. Microsoft docs, PDFs, etc). This can easily remove 20MB of 3rd party jars.
> With 0.8-SNAPSHOT, the TikaConfig(Classpath) constructor now finds and instantiates all Parser-based classes found on the classpath. Which can trigger errors when 3rd party jars are missing.
> One solution, as proposed by Jukka, is to provide an alternative constructor for AutoDetectParser which includes the list of supported parsers, and avoids creating the default TikaConfig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.