You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/22 22:24:11 UTC

[jira] [Closed] (TIKA-1460) Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'

     [ https://issues.apache.org/jira/browse/TIKA-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Palsulich closed TIKA-1460.
---------------------------------
    Resolution: Cannot Reproduce

Closing as Cannot Reproduce, since it's been a month since my last comment and we don't have the file which reproduces the issue. Please reopen if you're still running into this!

> Could not parse predefined CMAP file for 'Adobe-GBK1-UCS2'
> ----------------------------------------------------------
>
>                 Key: TIKA-1460
>                 URL: https://issues.apache.org/jira/browse/TIKA-1460
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.3
>         Environment: win7,myeclipse8.5
>            Reporter: onyas
>            Priority: Critical
>
> for some reason,I could not upload the file,Here is the info..
> and i checked all the version in the directory of \org\apache\pdfbox\resources\cmap, I have not found the ’Adobe-GBK1-UCS2‘ file
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@d640af
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> Caused by: java.lang.IllegalArgumentException: Position 66048 past the end of the file
> 	at org.apache.poi.poifs.nio.FileBackedDataSource.read(FileBackedDataSource.java:50)
> 	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.getBlockAt(NPOIFSFileSystem.java:420)
> 	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readBAT(NPOIFSFileSystem.java:397)
> 	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.readCoreContents(NPOIFSFileSystem.java:356)
> 	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:202)
> 	at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:184)
> 	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:156)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	... 21 more
> the major code is :
>                 Parser parser = new AutoDetectParser();
> 		ContentHandler handler = new BodyContentHandler(getNum());
> 		Metadata metadata = new Metadata();
> 		ParseContext context = new ParseContext();
> 		InputStream stream = null;
> 		StringBuffer content = new StringBuffer();
> 		try {
> 			stream = new FileInputStream(file);
> 			if (stream != null) {
> 				parser.parse(stream, handler, metadata, context);
> 				content = content.append(handler);
> 				
> 				if(StringUtils.isNotBlank(content.toString())){
> 					hasContent = true;
> 					handler = null;
> 					metadata = null;
> 					context = null;
> 				}
> 			}
> And the exception is throwed at this line== parser.parse(stream, handler, metadata, context);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)