You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Jira)" <ji...@apache.org> on 2020/09/11 08:00:00 UTC

[jira] [Commented] (TIKA-3195) Inconsistent result of tika.detect(InputStream) and tika.detect(TikaInputStream)

    [ https://issues.apache.org/jira/browse/TIKA-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194067#comment-17194067 ] 

Nick Burch commented on TIKA-3195:
----------------------------------

This is expected behaviour. Ogg is a container format. It isn't possible to detect the flavour/subtype of the ogg file with just the file header. To do that, you need to open up the file, and check what substreams it contains. That requires the whole file (or at least most of it) be read, and then made available again for parsing. TikaInputStream supports that, so container-based detection is enabled. A regular InputStream doesn't, so only supports mime-based file header detection

> Inconsistent result of tika.detect(InputStream) and tika.detect(TikaInputStream)
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-3195
>                 URL: https://issues.apache.org/jira/browse/TIKA-3195
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.24.1
>            Reporter: xiaojie
>            Priority: Major
>
> When we tried to detect ogg video, samples can be found from  [https://filesamples.com/formats/ogv]
> We noticed that tika will return different result when detect:
> {code:java}
> tikaDetectedType = tika.detect(inputStream);
> # output: application/ogg
> try(TikaInputStream tikaInputStream = TikaInputStream.get(inputStream)) {
>   tikaDetectedType = tika.detect(tikaInputStream);
> }
> # output: video/ogg{code}
> The expect result should be video/ogg. 
> Is this the expected behavior?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)