You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Kaka Lee (Jira)" <ji...@apache.org> on 2022/01/05 07:04:00 UTC
[jira] [Created] (TIKA-3639) NullPointerException throws when parsing zip file
Kaka Lee created TIKA-3639:
------------------------------
Summary: NullPointerException throws when parsing zip file
Key: TIKA-3639
URL: https://issues.apache.org/jira/browse/TIKA-3639
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 2.2.0
Reporter: Kaka Lee
Attachments: 123.zip, IWORKDocumentType.png, detectype.png, exception.png
Always throws a NullPointerException when detect zip file, it can be reproduced through the following steps.
# Create a zip file with a index.xml, the xml is simple
{code:java}
<?xml version='1.0' encoding='UTF-8' ?>
<index>
</index> {code}
# add dependency to pom.xml, the *Key* dependency ** is *tika-parser-apple-module*
{code:java}
<dependencies>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<type>pom</type>
<version>2.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parser-apple-module</artifactId>
<version>2.2.1</version>
</dependency> {code}
# using tika.detect to parse zip file, it will throws a NPE
{code:java}
String filePath = "123.zip";
Tika tika = new Tika();
type = tika.detect(new FileInputStream(new File(filePath)));{code}
Notice that when using tika.detect(String name), it‘s normal and return "application/zip", the NPE situation only occur when using tika.detect(InputStream stream)。
It seems when tika parse a zip file through {*}IWorkPackageParser{*}, tika will parsing index.xml, it will parse '.Number', '.key', '.pages', 'encrypted' file using below class in xml, when Number, key, pages are all empty, the encrypted's namespace is null, then in the for-loop it will throws a NPE.
the source code below:
{code:java}
KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation",
MediaType.application("vnd.apple.keynote")),
NUMBERS("http://developer.apple.com/namespaces/ls", "document",
MediaType.application("vnd.apple.numbers")),
PAGES("http://developer.apple.com/namespaces/sl", "document",
MediaType.application("vnd.apple.pages")),
ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected")); {code}
{code:java}
public static IWORKDocumentType detectType(InputStream stream) {
QName qname = new XmlRootExtractor().extractRootElement(stream);
if (qname != null) {
String uri = qname.getNamespaceURI();
String local = qname.getLocalPart();
for (IWORKDocumentType type : values()) {
if (type.getNamespace().equals(uri) && type.getPart().equals(local)) { return type;
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)