You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/22 22:24:11 UTC
[jira] [Closed] (TIKA-1460) Could not parse predefined CMAP file
for 'Adobe-GBK1-UCS2'
[ https://issues.apache.org/jira/browse/TIKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Palsulich closed TIKA-1460.
---------------------------------
Resolution: Cannot Reproduce
Closing as Cannot Reproduce, since it's been a month since my last comment and we don't have the file which reproduces the issue. Please reopen if you're still running into this!
> Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'
> ----------------------------------------------------------
>
> Key: TIKA-1460
> URL: https://issues.apache.org/jira/browse/TIKA-1460
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.3
> Environment: win7,myeclipse8.5
> Reporter: onyas
> Priority: Critical
>
> for some reason,I could not upload the file,Here is the info..
> and i checked all the version in the directory of \org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@d640af
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of the file
> at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
> at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> ... 21 more
> the major code is :
> Parser parser = new AutoDetectParser();
> ContentHandler handler = new BodyContentHandler(getNum());
> Metadata metadata = new Metadata();
> ParseContext context = new ParseContext();
> InputStream stream = null;
> StringBuffer content = new StringBuffer();
> try {
> stream = new FileInputStream(file);
> if (stream != null) {
> parser.parse(stream, handler, metadata, context);
> content = content.append(handler);
>
> if(StringUtils.isNotBlank(content.toString())){
> hasContent = true;
> handler = null;
> metadata = null;
> context = null;
> }
> }
> And the exception is throwed at this line== parser.parse(stream, handler, metadata, context);
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)