You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/03/05 11:35:42 UTC

[jira] [Commented] (TIKA-1256) Windows 07 excel ".xlsx" file Tika 1.4 api is detecting wrong mimetype.

    [ https://issues.apache.org/jira/browse/TIKA-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920723#comment-13920723 ] 

Nick Burch commented on TIKA-1256:
----------------------------------

For container based formats (like ooxml), you need to use Tika Core and Tika Parsers (plus their dependencies) for accurate detection. Mime magic alone isn't enough to identify which file type (eg .xlsx) it is inside the container, we either need a hint about the filename, or the parsers jars for the container detectors

> Windows 07  excel ".xlsx" file Tika 1.4 api is detecting wrong mimetype. 
> -------------------------------------------------------------------------
>
>                 Key: TIKA-1256
>                 URL: https://issues.apache.org/jira/browse/TIKA-1256
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.4
>            Reporter: Kavitha
>
> I am using Tika 1.4 jars for standard alone project. 
> While running from eclipse Tika 1.4 jars detecting correct mimetype, 
> I build jar file from my project and running my standalone project from command prompt its detecting wrong mimetype.
> I am attaching my code 
> Parser parser = new AutoDetectParser();
> InputStream stream = new FileInputStream(file);
> int writeUnlimited = -1;
> ContentHandler contentHandler = new BodyContentHandler(writeUnlimited);
> Metadata metadata = new Metadata();
> parser.parse(stream, contentHandler, metadata, new ParseContext());
> mimeType = metadata.get(Metadata.CONTENT_TYPE);
> logger.info("Correct MimeType value for '" + file.getName() + "' file is: " + mimeType);
> Output from eclipse is
> Correct MimeType value for 'CIQ_83517.xlsx' file is: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
> Output from command prompt
> Correct MimeType value for 'CIQ_83517.xlsx' file is: application/x-tika-ooxml
> I have only tika 1.4 and its dependent jar files.
> Is it issue with my code or tika1.4 jar has some issue?
> Iam using java 1.6 version.
> Thanks for your help



--
This message was sent by Atlassian JIRA
(v6.2#6252)