You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/09/16 12:21:33 UTC
[jira] Commented: (TIKA-516) Excel 5 files are inconsistently
detected as either "application/msword" or "application/vnd.ms-excel"
[ https://issues.apache.org/jira/browse/TIKA-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910075#action_12910075 ]
Nick Burch commented on TIKA-516:
---------------------------------
Have you tried with a recent svn checkout / recent nightly build? Something similar to this was fixed since 0.7
Also, if you're working with container formats and detection, you might want to switch to using the ContainerAwareDetector - it can correctly detect the type of container based formats like word and excel even if it doesn't have the filename.
> Excel 5 files are inconsistently detected as either "application/msword" or "application/vnd.ms-excel"
> ------------------------------------------------------------------------------------------------------
>
> Key: TIKA-516
> URL: https://issues.apache.org/jira/browse/TIKA-516
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: excel5.xls
>
>
> Using the AutoDetectParser on an Excel 5 file inconsistently detects it as either "application/msword" or "application/vnd.ms-excel"
> See the following code:
> public static void main(String[] args) throws Exception {
> FileInputStream stream = null;
> try {
> for (int i = 0; i < 10; i++) {
> File file = new File("excel5.xls");
> stream = new FileInputStream(file);
> AutoDetectParser parser = new AutoDetectParser();
> Metadata metadata = new Metadata();
> metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> parser.parse(stream, new DefaultHandler(), metadata);
> System.out.println(metadata.get(Metadata.CONTENT_TYPE));
> }
> } finally {
> if (stream != null) {
> stream.close();
> }
> }
> }
> an example output is:
> application/vnd.ms-excel
> application/msword
> application/msword
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/vnd.ms-excel
> application/msword
> application/vnd.ms-excel
> application/msword
> The excel 5 file I used is attached to this bug.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.