You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Simon Tyler (JIRA)" <ji...@apache.org> on 2010/03/26 21:22:27 UTC

[jira] Commented: (TIKA-391) Intermittent errors detecting xls files

    [ https://issues.apache.org/jira/browse/TIKA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850334#action_12850334 ] 

Simon Tyler commented on TIKA-391:
----------------------------------


Having just done some Tika performance testing I should say this fix slows Tika down somewhat (50% slower). 

This is because the fix causes Tika to look through all magic mime matches to get the set of matches rather than stopping at the first.

I did try using the priority as a stop i.e. return all matches of equal highest priority but this does not solve the problem of not using the hints correctly.

An optimum fix would be to move the usage of the hints into the getMimeType method and returning the first magic mime magic that also matches the hints. In case of no hint matches you would need to remember the first match.

Simon

> Intermittent errors detecting xls files
> ---------------------------------------
>
>                 Key: TIKA-391
>                 URL: https://issues.apache.org/jira/browse/TIKA-391
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.6
>            Reporter: Simon Tyler
>         Attachments: MimeTypes.java
>
>
> I am doing some testing of Tika 0.6 and noticed some odd results for the testEXCEL.xls file included in the test suite. 
> 100 calls to the following code:
>  
>             is = new BufferedInputStream(new FileInputStream(filename));
>  
>             Metadata metadata = new Metadata();
>             metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>  
>             String type = tika.detect(is, metadata);
>  
> Results in different matches as application/msword or application/vnd.ms-excel seemingly at random.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.