You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Wolfgang Kronberg (JIRA)" <ji...@apache.org> on 2013/03/25 19:21:18 UTC
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612945#comment-13612945 ] 

Wolfgang Kronberg commented on PDFBOX-940:
------------------------------------------

I still see this issue with 1.8.0 and 1.9.0-SNAPSHOT. In my case, the filename consists of binary rubbish, plus '-UCS2'.

Looking at the code of PDCIDFont.determineEncoding(), it seems to me that the error message is misleading:

                    cmap = parseCmap( resourceRootCMAP, ResourceLoader.loadResource( resourceName ));
                    if( cmap == null)
                    {
                        log.error("Error: Could not parse predefined CMAP file for '" + cidSystemInfo + "'" );
                    }

Obviously, the message is so harsh because parseCmap() of a predefined file (included with pdfbox) must never fail, otherwise it would be a bug in pdfbox. Usually, however, the reason for this message is not failing parsing, but simply that there is no predefined file for the given ressource name.

In my opinion, such a case should not be treated more harshly than the case that getCIDSystemInfo() yields null in the first place. PDCIDFont.determineEncoding() handles this case by silently calling super.determineEncoding(), which usually completes without any errors. Thus, in my opinion, the code snippet above should be changed to:

                	InputStream resIn = ResourceLoader.loadResource( resourceName );
                	if (resIn != null) {
                		cmap = parseCmap( resourceRootCMAP, resIn);
                		if( cmap == null)
                		{
                			log.error("Error: Could not parse predefined CMAP file for '" + cidSystemInfo + "'" );
                		}
                	} else {
                             super.determineEncoding();
                	}


Anyway, the binary rubbbish I observe probably points to some other bug, and I have not been able to pin that one down. I have loads of PDF documents exhibiting this bug, all of them unfortunately being confidential. In case any team member is interested, please email me so that I can provide you with some examples.

                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira