You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/06/12 18:02:20 UTC

[jira] [Commented] (TIKA-1120) Enable direct use of org.apache.tika.mime.MediaType.detect(...)

    [ https://issues.apache.org/jira/browse/TIKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681353#comment-13681353 ] 

Nick Burch commented on TIKA-1120:
----------------------------------

The latest detection documentation is at <https://tika.apache.org/1.3/detection.html> - the URL you referenced is for an older version of Tika

I don't think people probably should be doing the things in your code... You should really be going to a TikaConfig object <http://tika.apache.org/1.3/api/org/apache/tika/config/TikaConfig.html>, and either getting a Detector from that, or the mime types registry. 

Are you able to suggest some tweaks to the most recent documentation that would make this clearer for someone in your situation?
                
> Enable direct use of org.apache.tika.mime.MediaType.detect(...)
> ---------------------------------------------------------------
>
>                 Key: TIKA-1120
>                 URL: https://issues.apache.org/jira/browse/TIKA-1120
>             Project: Tika
>          Issue Type: Wish
>          Components: mime
>    Affects Versions: 1.3
>            Reporter: Oliver Kopp
>            Priority: Minor
>
> When using mime type detection, the classes allow following use:
>     try (InputStream is = theInputStream;
>          BufferedInputStream bis = new BufferedInputStream(is);) {
>         MimeTypes mt = new MimeTypes();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = mt.detect(bis, null);
>         return mediaType.toString();
>     }
> When debugging this, the MimeTypes class instantiates its internal patterns with  an empty MediaTypeRegistry. Therefore, getDefaultMimeTypes() is never called and thus tika-mimetypes.xml never read.
> Is it possible to enable direct usage of MediaType.detect()? Like adding a new constructor, where the MediaTypeRegistry can be set? 
> If not, the code comments (or the documentation at https://tika.apache.org/0.10/detection.html) should point out that MimeTypes() should not instantiated directly for mime type detection, but the detectors should be used. Possibly, a minimum example should be added to make the usage clear.
> Following example works here
>     try (InputStream is = theInputStream;
>             BufferedInputStream bis = new BufferedInputStream(is);) {
>         AutoDetectParser parser = new AutoDetectParser();
>         Detector detector = parser.getDetector();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = detector.detect(bis, md);
>         return mediaType.toString();
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira