You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2007/10/15 00:03:50 UTC

[jira] Updated: (TIKA-67) Add an auto-detecting Parser implementation

     [ https://issues.apache.org/jira/browse/TIKA-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-67:
------------------------------

    Attachment: TIKA-67.patch

The attached patch contains an implementation of the proposed AutoDetectParser class.

> Add an auto-detecting Parser implementation
> -------------------------------------------
>
>                 Key: TIKA-67
>                 URL: https://issues.apache.org/jira/browse/TIKA-67
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-67.patch
>
>
> We should have an AutoDetectParser class that uses the MIME framework to automatically detect the type of the document being parsed, and that dispatches the parsing task to the parser class configured for the detected MIME type.
> The class would work like this:
>     InputStream stream = ...;
>     ContentHandler handler = ...;
>     Metadata metadata = new Metadata();
>     metadata.set(Metadata.CONTENT_TYPE, ...); // optional content type hint
>     metadata.set("filename", ...); // optional file name hint
>     AutoDetectParser parser = new AutoDetectParser();
>     parser.setConfig(...); // optional TikaConfig configuration
>     parser.parse(stream, handler, metadata);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.