You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/02/27 01:22:06 UTC

[jira] Resolved: (TIKA-384) incorrect mime type detection when Metadata.RESOURCE_NAME_KEY set

     [ https://issues.apache.org/jira/browse/TIKA-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-384.
--------------------------------

    Resolution: Invalid
      Assignee: Jukka Zitting

This is how the type detection is supposed to work. The text/css type is essentially a more accurate subtype of text/plain, and the added filename information allows the detection code to return the more accurate type as a result to the caller.

> incorrect mime type detection when Metadata.RESOURCE_NAME_KEY set
> -----------------------------------------------------------------
>
>                 Key: TIKA-384
>                 URL: https://issues.apache.org/jira/browse/TIKA-384
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.6
>         Environment: Java: 1.6.0_17; Java HotSpot(TM) Client VM 14.3-b01
> System: Windows XP version 5.1 running on x86; Cp1252; en_GB (nb)
>            Reporter: Jim Kay
>            Assignee: Jukka Zitting
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When Metadata.RESOURCE_NAME_KEY set is set as in:
> metadata.set(Metadata.RESOURCE_NAME_KEY, f.getCanonicalPath())
> the incorrect mime type is set
> I was trying to add .csv files as a type by editing the xml mime types.  When I ran a .csv file (and for comparison a .css file) through TikaGUI they were both passed successfully as text.
> In my AutoDetectParser example I had set the RESOURCE_NAME_KEY to  f.getCanonicalPath() (this code was copied - I don't know what it does). In this example .css and .csv were NOT identified as text/plain.
> The issue is in MimeTypes with the following code:
>         String resourceName = metadata.get(Metadata.RESOURCE_NAME_KEY);
>         if (resourceName != null) {
>             String name = null;
> ...
> ...
>             if (name != null) {
>                 MimeType hint = getMimeType(name);
>                 if (hint.isDescendantOf(type)) {
>                     type = hint;
>                 }
>             }
> If the RESOURCE_NAME_KEY is not null then the code ultimately resets type to hint, however hint is text/css. So the correct identification of type as text/plain is overwritten.
>         }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.