You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/11/18 17:34:58 UTC
MimeType detection and fall back
Hi,
I'm using MimeTypes::public MimeType getMimeType(String name, byte[] data)
And have noticed a peculiarity in the code:
// First, try to get the mime-type from the content
MimeType mimeType = getMimeType(data); ///////// HERE
// If no mime-type found, then try to get the mime-type from
// the document name
if (mimeType == null) {
mimeType = getMimeType(name);
}
return mimeType;
The issue is that the fall through condition is never met, because the first call to getMimeType() (marked with HERE) always returns a value.
My test case is I am feeding it a .mbox file and it is returning that it is a text/plain instead of application/mbox
I'm on Tika 0.8.
Thanks,
Grant
RE: MimeType detection and fall back
Posted by Jukka Zitting <jz...@adobe.com>.
Hi,
From: Grant Ingersoll [mailto:gsingers@apache.org]
> Shall I open a bug for this?
Sorry for the late reply! I filed TIKA-566 as a more general improvement issue that covers also this problem and will look at fixing it in a moment.
The getMimeType() methods in the MimeTypes class have been somewhat neglected lately after the introduction of the Detector interface and the Tika.detect() convenience methods. I'd like to deprecate the getMimeType() methods once we have equivalent or better alternatives in the Tika façade class.
BR,
Jukka Zitting
Re: MimeType detection and fall back
Posted by Grant Ingersoll <gs...@apache.org>.
Shall I open a bug for this?
On Nov 18, 2010, at 11:34 AM, Grant Ingersoll wrote:
> Hi,
>
> I'm using MimeTypes::public MimeType getMimeType(String name, byte[] data)
>
> And have noticed a peculiarity in the code:
> // First, try to get the mime-type from the content
> MimeType mimeType = getMimeType(data); ///////// HERE
>
> // If no mime-type found, then try to get the mime-type from
> // the document name
> if (mimeType == null) {
> mimeType = getMimeType(name);
> }
>
> return mimeType;
>
>
> The issue is that the fall through condition is never met, because the first call to getMimeType() (marked with HERE) always returns a value.
>
> My test case is I am feeding it a .mbox file and it is returning that it is a text/plain instead of application/mbox
>
> I'm on Tika 0.8.
>
> Thanks,
> Grant