You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Victor Kazakov (JIRA)" <ji...@apache.org> on 2010/08/18 00:38:16 UTC

[jira] Created: (TIKA-484) xlsx files created with open office are detected as application/zip

xlsx files created with open office are detected as application/zip
-------------------------------------------------------------------

                 Key: TIKA-484
                 URL: https://issues.apache.org/jira/browse/TIKA-484
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
         Environment: Ubuntu
            Reporter: Victor Kazakov
            Priority: Minor


Create an xlsx file in open office. 
Parse the file using a org.apache.tika.parser.AutoDetectParser
It gets recognized as a zip file.

Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Kazakov updated TIKA-484:
--------------------------------

    Attachment: openofficexlsxfile.xlsx

A simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser 

> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899774#action_12899774 ] 

Jukka Zitting commented on TIKA-484:
------------------------------------

Can you attach a simple example document that shows the problem?

> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victor Kazakov resolved TIKA-484.
---------------------------------

    Resolution: Not A Problem

> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909542#action_12909542 ] 

Victor Kazakov commented on TIKA-484:
-------------------------------------

I passed the file name to the parser and it was able to properly figure out the file type.

metadata.set(Metadata.RESOURCE_NAME_KEY, filename);

Thank you.

> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906555#action_12906555 ] 

Nick Burch commented on TIKA-484:
---------------------------------

I've just tried this file with Tika-App (which passes the filename into the detector), and it get the content type correct:
  Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

When working with container based files such as .xlsx, you either need to pass in the file name, or use the ContainerAwareDetector. If you ask the normal mime-magic detector, without a filename hint, it won't be able to figure it out.

Could you please confirm what steps you're taking that cause it to not work for you, and ensure you are passing in the filename?

> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (TIKA-484) xlsx files created with open office are detected as application/zip

Posted by "Victor Kazakov (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899910#action_12899910 ] 

Victor Kazakov edited comment on TIKA-484 at 8/18/10 12:51 PM:
---------------------------------------------------------------

Attached a simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser 

      was (Author: kazvictor):
    A simple xlsx file made in open office. This files gets detected as a zip file by the org.apache.tika.parser.AutoDetectParser 
  
> xlsx files created with open office are detected as application/zip
> -------------------------------------------------------------------
>
>                 Key: TIKA-484
>                 URL: https://issues.apache.org/jira/browse/TIKA-484
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Victor Kazakov
>            Priority: Minor
>         Attachments: openofficexlsxfile.xlsx
>
>
> Create an xlsx file in open office. 
> Parse the file using a org.apache.tika.parser.AutoDetectParser
> It gets recognized as a zip file.
> Note: I have only tried this with open office running on ubuntu.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.