You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/12/08 16:16:00 UTC

[jira] [Comment Edited] (TIKA-2483) Using PackageParser in ForkParser causes NPE

    [ https://issues.apache.org/jira/browse/TIKA-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283780#comment-16283780 ] 

Tim Allison edited comment on TIKA-2483 at 12/8/17 4:15 PM:
------------------------------------------------------------

Regression tests in prep for 1.17 show that we need to add quite a few more specializations of zip and tar to check for and avoid overwriting of mime types to zip. Lots of files that were identified as kmz, tika-ooxml, etc in 1.16 are now being identified as "zip" during the parse in 1.17-SNAPSHOT.

Current patch includes list semi-manually, which I abhor, but I added a test to make sure that PackageParser's list of specialization stays current with TikaConfig's default config. 

After 1.17 is released, we can either work towards getting rid of serialization of parsers in ForkParser and/or making TikaConfig serializable.  Until we do that, I don't see an elegant solution.


was (Author: tallison@mitre.org):
Regression tests in prep for 1.17 show that we need to add quite a few more specializations of zip and tar to check for and avoid overwriting. Lots of files that were identified as kmz, tika-ooxml, etc were now being identified as "zip" during the parse.

Current patch includes list semi-manually, which I abhor, but I added a test to make sure that PackageParser's list of specialization stays current with TikaConfig's default config. 

After 1.17 is released, we can either work towards getting rid of serialization of parsers in ForkParser and/or making TikaConfig serializable.  Until we do that, I don't see an elegant solution.

> Using PackageParser in ForkParser causes NPE
> --------------------------------------------
>
>                 Key: TIKA-2483
>                 URL: https://issues.apache.org/jira/browse/TIKA-2483
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.16
>            Reporter: TzeKai Lee
>         Attachments: testForkedPackageParsing.patch
>
>
> {quote}
> Caused by: java.lang.NullPointerException
>         at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:158)
>         at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
>         at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:78)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:242)
>         at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:379)
>         at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:165)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> {quote}
> The mediaTypeRegistry handling code in parse() of PackageParser seems cause the problem due to ForkParser cannot properly construct default TikaConfig. Also since TikaConfig is not serializable, there is no way to assign mediaTypeRegistry/bufferedMediaTypeRegistry before calling parse()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)