You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/07/29 19:03:17 UTC

[jira] Commented: (TIKA-391) Intermittent errors detecting xls files

    [ https://issues.apache.org/jira/browse/TIKA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893700#action_12893700 ] 

Nick Burch commented on TIKA-391:
---------------------------------

I've done some work on this, committed in r980508.

As part of this, I've got the compareTo to be more intelligent, largely along the lines you suggested. I've also ensured that the magic and magicdetector both have the right mimetype on them, which helps with the sorting and avoids confusion when debugging!

I've added a test that shows that you can detect a test excel file repeatedly without getting the wrong answer. However, for reliable OLE2 document detection, you should use the new ContainerAwareDetector, since mime magic detection won't always work on OLE2 documents as we can't be sure where in the file the magic bits will be.

> Intermittent errors detecting xls files
> ---------------------------------------
>
>                 Key: TIKA-391
>                 URL: https://issues.apache.org/jira/browse/TIKA-391
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.6
>            Reporter: Simon Tyler
>            Assignee: Chris A. Mattmann
>             Fix For: 0.8
>
>         Attachments: MimeTypes.java
>
>
> I am doing some testing of Tika 0.6 and noticed some odd results for the testEXCEL.xls file included in the test suite. 
> 100 calls to the following code:
>  
>             is = new BufferedInputStream(new FileInputStream(filename));
>  
>             Metadata metadata = new Metadata();
>             metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>  
>             String type = tika.detect(is, metadata);
>  
> Results in different matches as application/msword or application/vnd.ms-excel seemingly at random.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.