You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/01/05 12:27:00 UTC

[jira] [Assigned] (TIKA-3639) NullPointerException throws when parsing zip file

     [ https://issues.apache.org/jira/browse/TIKA-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison reassigned TIKA-3639:
---------------------------------

    Assignee: Tim Allison

> NullPointerException  throws when parsing zip file
> --------------------------------------------------
>
>                 Key: TIKA-3639
>                 URL: https://issues.apache.org/jira/browse/TIKA-3639
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.2.0, 2.2.1
>            Reporter: Kaka Lee
>            Assignee: Tim Allison
>            Priority: Blocker
>         Attachments: 123.zip, IWORKDocumentType.png, detectype.png, exception.png
>
>
> Always throws a NullPointerException when detect zip file, it can be reproduced through the following steps.
>  # Create a zip file with a index.xml, the xml is simple
> {code:java}
> <?xml version='1.0' encoding='UTF-8' ?>
> <index>
> </index> {code}
>  
>  # add dependency to pom.xml, the *Key*  dependency ** is *tika-parser-apple-module* 
> {code:java}
> <dependencies>
>         <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-core</artifactId>
>             <version>2.2.1</version>
>         </dependency>        
>             <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parsers</artifactId>
>             <type>pom</type>
>             <version>2.2.1</version>
>         </dependency>        
>             <dependency>
>             <groupId>org.apache.tika</groupId>
>             <artifactId>tika-parser-apple-module</artifactId>
>             <version>2.2.1</version>
>         </dependency> {code}
>  # using tika.detect to parse zip file, it will throws a NPE
> {code:java}
> String filePath = "123.zip";
> Tika tika = new Tika(); 
> type = tika.detect(new FileInputStream(new File(filePath)));{code}
>  Notice that when using tika.detect(String name), it‘s normal and return "application/zip",  the NPE situation only occur  when using tika.detect(InputStream stream)。
>  
> It seems when tika parse a zip file through {*}IWorkPackageParser{*},  tika will parsing index.xml, it will parse '.Number', '.key', '.pages', 'encrypted' file using below class in xml, when Number, key, pages are all empty, the encrypted's namespace is null, then in the for-loop it will throws a NPE.
> the source code below:
> {code:java}
> KEYNOTE("http://developer.apple.com/namespaces/keynote2", "presentation",
>                 MediaType.application("vnd.apple.keynote")),
> NUMBERS("http://developer.apple.com/namespaces/ls", "document",
>                 MediaType.application("vnd.apple.numbers")),
> PAGES("http://developer.apple.com/namespaces/sl", "document",
>                 MediaType.application("vnd.apple.pages")),
> ENCRYPTED(null, null, MediaType.application("x-tika-iworks-protected")); {code}
> {code:java}
> public static IWORKDocumentType detectType(InputStream stream) {  
>    QName qname = new XmlRootExtractor().extractRootElement(stream);      
>    if (qname != null) {                
>     String uri = qname.getNamespaceURI();                 
>     String local = qname.getLocalPart();                     
>     for (IWORKDocumentType type : values()) {                     
>     if (type.getNamespace().equals(uri) && type.getPart().equals(local)) {            return type;                     
>     }              
>    } 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)