You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "John Mastarone (JIRA)" <ji...@apache.org> on 2012/05/28 04:38:22 UTC
[jira] [Created] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
John Mastarone created TIKA-935:
-----------------------------------
Summary: TikaException thrown when trying to parse archive (*.ar) files
Key: TIKA-935
URL: https://issues.apache.org/jira/browse/TIKA-935
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.2
Environment: Windows 7
Reporter: John Mastarone
A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann reassigned TIKA-935:
--------------------------------------
Assignee: Chris A. Mattmann
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Assignee: Chris A. Mattmann
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-935.
------------------------------------
Resolution: Fixed
Fix Version/s: 1.2
- patch applied in r1343137. Thanks John!
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Assignee: Chris A. Mattmann
> Fix For: 1.2
>
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284284#comment-13284284 ]
Chris A. Mattmann commented on TIKA-935:
----------------------------------------
ooops, NM, I see it's already there. OK, proceeding.
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Assignee: Chris A. Mattmann
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284283#comment-13284283 ]
Chris A. Mattmann commented on TIKA-935:
----------------------------------------
Hi Josh, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Assignee: Chris A. Mattmann
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (TIKA-935) TikaException thrown when trying
to parse archive (*.ar) files
Posted by "Chris A. Mattmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284283#comment-13284283 ]
Chris A. Mattmann edited comment on TIKA-935 at 5/28/12 4:20 AM:
-----------------------------------------------------------------
Hi John, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
was (Author: chrismattmann):
Hi Josh, looks like you are trying to test for a file in your ArParserTest.java file. Can you upload the test file too?
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Assignee: Chris A. Mattmann
> Fix For: 1.2
>
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-935) TikaException thrown when trying to
parse archive (*.ar) files
Posted by "John Mastarone (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Mastarone updated TIKA-935:
--------------------------------
Attachment: ArParserTest.java
TIKA-935.patch
Patch uploaded which corrects the error in the *.ar file detection, along with new unit test class that makes use of existing .ar files in the test-documents folder. With this patch, parsing occurs successfully in a latest build. The unit tests pass.
> TikaException thrown when trying to parse archive (*.ar) files
> --------------------------------------------------------------
>
> Key: TIKA-935
> URL: https://issues.apache.org/jira/browse/TIKA-935
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.2
> Environment: Windows 7
> Reporter: John Mastarone
> Attachments: ArParserTest.java, TIKA-935.patch
>
>
> A TikaException is thrown when trying to drop either of the two .ar files from the parsers' test-documents folder into Tika-GUI. From looking at this: http://stuff.mit.edu/afs/athena/software/cygwin/cygwin_v1.3.2/usr/share/magic.mime the archive detection is not done correctly for these types of files in the PackageExtractor class, and a TarArchiveInputStream is chosen by default, incorrectly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira