You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Bertrand Delacretaz (JIRA)" <ji...@apache.org> on 2007/10/15 08:48:50 UTC

[jira] Commented: (TIKA-67) Add an auto-detecting Parser implementation

    [ https://issues.apache.org/jira/browse/TIKA-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534750 ] 

Bertrand Delacretaz commented on TIKA-67:
-----------------------------------------

I haven't looked at the implementation details but I like the idea

> Add an auto-detecting Parser implementation
> -------------------------------------------
>
>                 Key: TIKA-67
>                 URL: https://issues.apache.org/jira/browse/TIKA-67
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-67.patch
>
>
> We should have an AutoDetectParser class that uses the MIME framework to automatically detect the type of the document being parsed, and that dispatches the parsing task to the parser class configured for the detected MIME type.
> The class would work like this:
>     InputStream stream = ...;
>     ContentHandler handler = ...;
>     Metadata metadata = new Metadata();
>     metadata.set(Metadata.CONTENT_TYPE, ...); // optional content type hint
>     metadata.set("filename", ...); // optional file name hint
>     AutoDetectParser parser = new AutoDetectParser();
>     parser.setConfig(...); // optional TikaConfig configuration
>     parser.parse(stream, handler, metadata);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.