You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/08/18 00:38:16 UTC
[jira] Created: (TIKA-484) xlsx files created with open office are
detected as application/zip
xlsx files created with open office are detected as application/zip
-------------------------------------------------------------------
Key: TIKA-484
URL: https://issues.apache.org/jira/browse/TIKA-484
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.7
Environment: Ubuntu
Reporter: Victor Kazakov
Priority: Minor
Create an xlsx file in open office.
Parse the file using a org.apache.tika.parser.AutoDetectParser
It gets recognized as a zip file.
Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-484) xlsx files created with open office are
detected as application/zip
Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victor Kazakov updated TIKA-484:
--------------------------------
Attachment: openofficexlsxfile.xlsx
A simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-484) xlsx files created with open office
are detected as application/zip
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899774#action_12899774 ]
Jukka Zitting commented on TIKA-484:
------------------------------------
Can you attach a simple example document that shows the problem?
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-484) xlsx files created with open office are
detected as application/zip
Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victor Kazakov resolved TIKA-484.
---------------------------------
Resolution: Not A Problem
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-484) xlsx files created with open office
are detected as application/zip
Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909542#action_12909542 ]
Victor Kazakov commented on TIKA-484:
-------------------------------------
I passed the file name to the parser and it was able to properly figure out the file type.
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
Thank you.
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-484) xlsx files created with open office
are detected as application/zip
Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906555#action_12906555 ]
Nick Burch commented on TIKA-484:
---------------------------------
I've just tried this file with Tika-App (which passes the filename into the detector), and it get the content type correct:
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
When working with container based files such as .xlsx, you either need to pass in the file name, or use the ContainerAwareDetector. If you ask the normal mime-magic detector, without a filename hint, it won't be able to figure it out.
Could you please confirm what steps you're taking that cause it to not work for you, and ensure you are passing in the filename?
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (TIKA-484) xlsx files created with
open office are detected as application/zip
Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899910#action_12899910 ]
Victor Kazakov edited comment on TIKA-484 at 8/18/10 12:51 PM:
---------------------------------------------------------------
Attached a simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser
was (Author: kazvictor):
A simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
> Key: TIKA-484
> URL: https://issues.apache.org/jira/browse/TIKA-484
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.7
> Environment: Ubuntu
> Reporter: Victor Kazakov
> Priority: Minor
> Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office.
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.