You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "John Mastarone (JIRA)" <ji...@apache.org> on 2012/05/28 04:38:22 UTC

[jira] [Created] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

John Mastarone created TIKA-935:
-----------------------------------

             Summary: TikaException thrown when trying to parse archive (*.ar) files
                 Key: TIKA-935
                 URL: https://issues.apache.org/jira/browse/TIKA-935
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.2
         Environment: Windows 7
            Reporter: John Mastarone


A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann reassigned TIKA-935:
--------------------------------------

    Assignee: Chris A. Mattmann
    
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>            Assignee: Chris A. Mattmann
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved TIKA-935.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2

- patch applied in r1343137. Thanks John!
                
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>            Assignee: Chris A. Mattmann
>             Fix For: 1.2
>
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284284#comment-13284284 ] 

Chris A. Mattmann commented on TIKA-935:
----------------------------------------

ooops, NM, I see it's already there. OK, proceeding.
                
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>            Assignee: Chris A. Mattmann
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284283#comment-13284283 ] 

Chris A. Mattmann commented on TIKA-935:
----------------------------------------

Hi Josh, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
                
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>            Assignee: Chris A. Mattmann
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284283#comment-13284283 ] 

Chris A. Mattmann edited comment on TIKA-935 at 5/28/12 4:20 AM:
-----------------------------------------------------------------

Hi John, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
                
      was (Author: chrismattmann):
    Hi Josh, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
                  
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>            Assignee: Chris A. Mattmann
>             Fix For: 1.2
>
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-935) TikaException thrown when trying to parse archive (*.ar) files

Posted by "John Mastarone (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Mastarone updated TIKA-935:
--------------------------------

    Attachment: ArParserTest.java
                TIKA-935.patch

Patch uploaded which corrects the error in the *.ar file detection, along with new unit test class that makes use of existing .ar files in the test-documents folder.  With this patch, parsing occurs successfully in a latest build.  The unit tests pass.
                
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
>                 Key: TIKA-935
>                 URL: https://issues.apache.org/jira/browse/TIKA-935
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: Windows 7
>            Reporter: John Mastarone
>         Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI.  From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime   the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira