You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sachin Shaju (JIRA)" <ji...@apache.org> on 2016/05/04 07:52:12 UTC

[jira] [Created] (TIKA-1966) Issue in parsing iWorksDocument with Apache Tika

Sachin Shaju created TIKA-1966:
----------------------------------

             Summary: Issue in parsing iWorksDocument with Apache Tika
                 Key: TIKA-1966
                 URL: https://issues.apache.org/jira/browse/TIKA-1966
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.12
         Environment: Ubuntu 15
            Reporter: Sachin Shaju


I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed content as it is instead getting some other output from the content handler. Code snippet that I've used and the output I got is added below.
 private void parseFile(File file) {
    try{
        File file = new File("/home/user/tika/samples/budget.numbers");
        FileInputStream inputStream = new FileInputStream(file);
        ParseContext context = new ParseContext();
        BodyContentHandler bodyHandler = new BodyContentHandler(-1);
        Parser parser=new AutoDetectParser();
        parser.parse(inputStream, bodyHandler, new Metadata(), context);
        System.out.println("Contents of the file :"+bodyHandler.toString());
        }
        catch(IOException | SAXException | TikaException e){
            e.printStackTrace();
        }
}

Output :-

Contents of the file :
Index/Document.iwa
Index/ViewState.iwa
Index/CalculationEngine.iwa
Index/Tables/HeaderStorageBucket-2.iwa
Index/Tables/Tile.iwa
Index/Metadata.iwa
Metadata/Properties.plist
I'm able to detect the file type using Detector api correctly. But am not getting the useful content out of the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)