You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "yousef abu elbeh (JIRA)" <ji...@apache.org> on 2017/01/26 13:56:24 UTC

[jira] [Updated] (TIKA-2252) could not parse document

     [ https://issues.apache.org/jira/browse/TIKA-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yousef abu elbeh updated TIKA-2252:
-----------------------------------
    Summary: could not parse document  (was: Parsing microsoft docs raise an error)

> could not parse document
> ------------------------
>
>                 Key: TIKA-2252
>                 URL: https://issues.apache.org/jira/browse/TIKA-2252
>             Project: Tika
>          Issue Type: Bug
>            Reporter: yousef abu elbeh
>
> Hi 
> i am using Tika to parse a document but each time i saw this error:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@124d02b2
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.Tika.parseToString(Tika.java:527)
> at org.apache.tika.Tika.parseToString(Tika.java:602)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.writePSTFile(PSTReader.java:79)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.processFolder(PSTReader.java:55)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.processFolder(PSTReader.java:45)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.processFolder(PSTReader.java:45)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.processFolder(PSTReader.java:45)
> at com.ligadata.datapreprocessing.fileutility.PSTReader.readPSTFile(PSTReader.java:27)
> at com.ligadata.datapreprocessing.emailextracter.MainClass.main(MainClass.java:61)
> Caused by: java.lang.IllegalArgumentException: Position 313856 past the end of the file
> at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:88)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:484)
> at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:169)
> at org.apache.poi.poifs.filesystem.NPOIFSStream$StreamBlockByteBufferIterator.next(NPOIFSStream.java:142)
> at org.apache.poi.poifs.property.NPropertyTable.buildProperties(NPropertyTable.java:87)
> at org.apache.poi.poifs.property.NPropertyTable.<init>(NPropertyTable.java:66)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:440)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:235)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:168)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:109)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 11 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)