You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Kaka Lee (Jira)" <ji...@apache.org> on 2022/01/05 07:04:00 UTC

[jira] [Created] (TIKA-3639) NullPointerException throws when parsing zip file

Kaka Lee created TIKA-3639:
------------------------------

             Summary: NullPointerException  throws when parsing zip file
                 Key: TIKA-3639
                 URL: https://issues.apache.org/jira/browse/TIKA-3639
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 2.2.0
            Reporter: Kaka Lee
         Attachments: 123.zip, IWORKDocumentType.png, detectype.png, exception.png

Always throws a NullPointerException when detect zip file, it can be reproduced through the following steps.
 # Create a zip file with a index.xml, the xml is simple
{code:java}
<?xml version='1.0' encoding='UTF-8' ?>
<index>
</index> {code}
 
 # add dependency to pom.xml, the *Key*  dependency ** is *tika-parser-apple-module* 
{code:java}
<dependencies>
        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>2.2.1</version>
        </dependency>        
            <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parsers</artifactId>
            <type>pom</type>
            <version>2.2.1</version>
        </dependency>        
            <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-parser-apple-module</artifactId>
            <version>2.2.1</version>
        </dependency> {code}

 # using tika.detect to parse zip file, it will throws a NPE
{code:java}
String filePath = "123.zip";
Tika tika = new Tika(); 
type = tika.detect(new FileInputStream(new File(filePath)));{code}
 Notice that when using tika.detect(String name), it‘s normal and return "application/zip",  the NPE situation only occur  when using tika.detect(InputStream stream)。

 

It seems when tika parse a zip file through {*}IWorkPackageParser{*},  tika will parsing index.xml, it will parse '.Number', '.key', '.pages', 'encrypted' file using below class in xml, when Number, key, pages are all empty, the encrypted's namespace is null, then in the for-loop it will throws a NPE.

the source code below:
{code:java}
KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation",
                MediaType.application("vnd.apple.keynote")),
NUMBERS("http://developer.apple.com/namespaces/ls", "document",
                MediaType.application("vnd.apple.numbers")),
PAGES("http://developer.apple.com/namespaces/sl", "document",
                MediaType.application("vnd.apple.pages")),
ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected")); {code}
{code:java}
public static IWORKDocumentType detectType(InputStream stream) {  
   QName qname = new XmlRootExtractor().extractRootElement(stream);      
   if (qname != null) {                
    String uri = qname.getNamespaceURI();                 
    String local = qname.getLocalPart();                     
    for (IWORKDocumentType type : values()) {                     
    if (type.getNamespace().equals(uri) && type.getPart().equals(local)) {            return type;                     
    }              
   } 
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)