You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/02/27 01:22:06 UTC
[jira] Resolved: (TIKA-384) incorrect mime type detection when
Metadata.RESOURCE_NAME_KEY set
[ https://issues.apache.org/jira/browse/TIKA-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-384.
--------------------------------
Resolution: Invalid
Assignee: Jukka Zitting
This is how the type detection is supposed to work. The text/css type is essentially a more accurate subtype of text/plain, and the added filename information allows the detection code to return the more accurate type as a result to the caller.
> incorrect mime type detection when Metadata.RESOURCE_NAME_KEY set
> -----------------------------------------------------------------
>
> Key: TIKA-384
> URL: https://issues.apache.org/jira/browse/TIKA-384
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 0.6
> Environment: Java: 1.6.0_17; Java HotSpot(TM) Client VM 14.3-b01
> System: Windows XP version 5.1 running on x86; Cp1252; en_GB (nb)
> Reporter: Jim Kay
> Assignee: Jukka Zitting
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> When Metadata.RESOURCE_NAME_KEY set is set as in:
> metadata.set(Metadata.RESOURCE_NAME_KEY, f.getCanonicalPath())
> the incorrect mime type is set
> I was trying to add .csv files as a type by editing the xml mime types. When I ran a .csv file (and for comparison a .css file) through TikaGUI they were both passed successfully as text.
> In my AutoDetectParser example I had set the RESOURCE_NAME_KEY to f.getCanonicalPath() (this code was copied - I don't know what it does). In this example .css and .csv were NOT identified as text/plain.
> The issue is in MimeTypes with the following code:
> String resourceName = metadata.get(Metadata.RESOURCE_NAME_KEY);
> if (resourceName != null) {
> String name = null;
> ...
> ...
> if (name != null) {
> MimeType hint = getMimeType(name);
> if (hint.isDescendantOf(type)) {
> type = hint;
> }
> }
> If the RESOURCE_NAME_KEY is not null then the code ultimately resets type to hint, however hint is text/css. So the correct identification of type as text/plain is overwritten.
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.