You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/11/18 17:34:58 UTC

MimeType detection and fall back

Hi,

I'm using MimeTypes::public MimeType getMimeType(String name, byte[] data)

And have noticed a peculiarity in the code:
        // First, try to get the mime-type from the content
        MimeType mimeType = getMimeType(data); ///////// HERE

        // If no mime-type found, then try to get the mime-type from
        // the document name
        if (mimeType == null) {
            mimeType = getMimeType(name);
        }

        return mimeType; 


The issue is that the fall through condition is never met, because the first call to getMimeType() (marked with HERE) always returns a value.

My test case is I am feeding it a .mbox file and it is returning that it is a text/plain instead of application/mbox

I'm on Tika 0.8.

Thanks,
Grant

RE: MimeType detection and fall back

Posted by Jukka Zitting <jz...@adobe.com>.
Hi,

From: Grant Ingersoll [mailto:gsingers@apache.org]
> Shall I open a bug for this?

Sorry for the late reply! I filed TIKA-566 as a more general improvement issue that covers also this problem and will look at fixing it in a moment.

The getMimeType() methods in the MimeTypes class have been somewhat neglected lately after the introduction of the Detector interface and the Tika.detect() convenience methods. I'd like to deprecate the getMimeType() methods once we have equivalent or better alternatives in the Tika façade class.

BR,

Jukka Zitting

Re: MimeType detection and fall back

Posted by Grant Ingersoll <gs...@apache.org>.
Shall I open a bug for this?

On Nov 18, 2010, at 11:34 AM, Grant Ingersoll wrote:

> Hi,
> 
> I'm using MimeTypes::public MimeType getMimeType(String name, byte[] data)
> 
> And have noticed a peculiarity in the code:
>        // First, try to get the mime-type from the content
>        MimeType mimeType = getMimeType(data); ///////// HERE
> 
>        // If no mime-type found, then try to get the mime-type from
>        // the document name
>        if (mimeType == null) {
>            mimeType = getMimeType(name);
>        }
> 
>        return mimeType; 
> 
> 
> The issue is that the fall through condition is never met, because the first call to getMimeType() (marked with HERE) always returns a value.
> 
> My test case is I am feeding it a .mbox file and it is returning that it is a text/plain instead of application/mbox
> 
> I'm on Tika 0.8.
> 
> Thanks,
> Grant