You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "krishna (JIRA)" <ji...@apache.org> on 2011/01/12 13:09:46 UTC

[jira] Created: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

[pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
-------------------------------------------------------------------------------------------

                 Key: PDFBOX-940
                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.4.0
         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
            Reporter: krishna


Hi,

   when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..

17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'

please find the solution


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Kevin Clark (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118852#comment-13118852 ] 

Kevin Clark commented on PDFBOX-940:
------------------------------------

I'm getting this via the Tika 0.10 release which uses 1.6.0.

 2011-10-01 16:48:27,586 (55308987) [Parser-thread-2] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'

Can't upload the pdf for privacy reasons, unfortunately.
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "krishna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

krishna updated PDFBOX-940:
---------------------------

    Attachment: pdf fonts2.JPG
                pdf fonts1.JPG

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Juhasz Istvan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055117#comment-13055117 ] 

Juhasz Istvan commented on PDFBOX-940:
--------------------------------------

SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
revision 1139575 (1.6.0-SNAPSHOT)
(pdf - embedded truetype (cid) font with encoding identity-h)

pdf>java -jar pdfbox-app-1.6.0-SNAPSHOT.jar ExtractText -debug a015.pdf a015.txt
Loading PDF a015.pdf
Time for loading: 0.062 seconds
Starting text extraction
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: cidSystemInfo: Adobe-UCS-0
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: resourceName: org/apache/pdfbox/resources/cmap/Adobe-Identity-UCS
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: cidSystemInfo: Adobe-UCS-0
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: resourceName: org/apache/pdfbox/resources/cmap/Adobe-Identity-UCS
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
Time for extraction: 0.984 seconds

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Gabriel Gravel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007162#comment-13007162 ] 

Gabriel Gravel commented on PDFBOX-940:
---------------------------------------

I have a similar problem. I used to have the following error when extracting text from a batch of PDF files using pdfbox 1.3.1:
ERROR   10 Mar 2011 00:22:44.038 [org.apache.pdfbox.pdmodel.font.PDFont] line:285 - Error: Could not parse predefined CMAP file for 'Adobe-UCS-0'
After reading the comments here, I have upgraded to 1.5.0 and am now having the following error:
ERROR   15 Mar 2011 14:31:10.195 [org.apache.pdfbox.pdmodel.font.PDCIDFont] line:324 - Error: Could not parse predefined CMAP file for 'Adobe-UCS-UCS2'

However, I still seem to be able to extract the text correctly from the file. Should I be worried about this error or can I ignore it altogether? Here's a link to one of the problematic files: http://cbpp-pcpe.phac-aspc.gc.ca/intervention_pdf/en/72.pdf

Thanks for your time



> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Antoni Mylka (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126212#comment-13126212 ] 

Antoni Mylka edited comment on PDFBOX-940 at 10/12/11 10:31 PM:
----------------------------------------------------------------

I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.

I'm not a PDF expert, but in that file, I have the following PDF objects:

24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj

22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj

18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
(the binary content of the stream ommitted for readability)
endstream
endobj

So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed. 

Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:

descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
  descendentFont.cmap = this.cmap;
}

I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
                
      was (Author: antoni.mylka):
    I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.

I'm not a PDF expert, but in that file, I have the following PDF objects:

24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj

22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj

18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
endstream
endobj

So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed. 

Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:

descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
  descendentFont.cmap = this.cmap;
}

I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
                  
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "krishna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

krishna updated PDFBOX-940:
---------------------------

    Attachment: pdf properties3.JPG
                pdf properties2.JPG
                pdf properties1.JPG

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "krishna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011963#comment-13011963 ] 

krishna commented on PDFBOX-940:
--------------------------------

Hi Andreas,

Because of security reasons, i can't upload the document...

Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'  problem was resolved in the 1.5.0 version, but  Error: Could not parse predefined CMAP file for 'Adobe-UCS-UCS2'  error was present their...

Please check this error..


Thanks,
Murali

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Arjohn Kampman (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128901#comment-13128901 ] 

Arjohn Kampman commented on PDFBOX-940:
---------------------------------------

I'm also seeing "Could not parse predefined CMAP file for 'Adobe-Identity-UCS'" error message on some files. Debugging this error with the current trunk (r1184806), I noticed that PDCIDFont.determineEncoding() starts with a cidSystemInfo value "Adobe-UCS-0" and replaces this with "Adobe-Identity-UCS" in the else-if-statement. This triggers the error message because there is no such cmap file.

However, considering that the first if-statement maps any cidSystemInfo values containing "Identity" to "Identity-H", I'm wondering: should "Adobe-UCS-0" be mapped to "Identity-H" rather than "Adobe-Identity-UCS"?
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001325#comment-13001325 ] 

Lars Torunski commented on PDFBOX-940:
--------------------------------------

Similiar problem: 2011-03-02 08:30:24,126 [PWS-Index-Thread-569] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'

Both PDFXC-Indentity0-0 and Adobe-WinCharSetFFFF-0 aren't available in org/apache/pdfbox/resources/cmap/

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427180#comment-13427180 ] 

Lars Torunski commented on PDFBOX-940:
--------------------------------------

My problem with Adobe-WinCharSetFFFF-UCS2 still exists in version 1.7.1:

2012-08-02 10:19:25,771 [PWS-Index-Thread-41] ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'

                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016284#comment-13016284 ] 

Che-wei Kuo edited comment on PDFBOX-940 at 4/21/11 1:20 AM:
-------------------------------------------------------------

Dear all,


@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:

"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS' "

I'm not sure if it was the same issue.
Thanks.


Best Regards


      was (Author: gjwei):
    Dear all,


@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:

"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Could not parse predefined CMAP file for 'Adobe-UCS-UCS2' "

I'm not sure if it was the same issue.
Thanks.


Best Regards

  
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "MH (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142997#comment-13142997 ] 

MH commented on PDFBOX-940:
---------------------------

Same error message here:

SEVERE  Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'

when PDF has under or over content (e.g. watermark). Without such content, the error does not appear. (PDFBox 1.6.0)
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "krishna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022618#comment-13022618 ] 

krishna commented on PDFBOX-940:
--------------------------------

Hi

'PDFXC-Indentity0-0'  was fixed in 1.5.0 & 'Adobe-WinCharSetFFFF-UCS2'  error was present there in 1.5.0

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011991#comment-13011991 ] 

Lars Torunski commented on PDFBOX-940:
--------------------------------------

With 1.5.0 the error

2011-03-28 10:38:20,207 [PWS-Index-Thread-35] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'

doesn't occur anymore. But we are getting

2011-03-28 11:52:51,162 [PWS-Index-Thread-44] ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'

with the same pdf file now.

I'm not allowed to attach the pdf here, but I can send you the pdf by email.

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013531#comment-13013531 ] 

Henrique Nunes edited comment on PDFBOX-940 at 3/30/11 5:03 PM:
----------------------------------------------------------------

Hi. I'm having the same problem:

30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'

I'm using pdfbox-app-1.5.0.jar with Jython 2.5.2 on Windows 7 64bit

No problems when on Ubuntu 10.

UPDATE: I built pdfbox-app-1.6.0-SNAPSHOT from the latest sources and the issue persists.

      was (Author: hjrnunes):
    Hi. I'm having the same problem:

30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'

I'm using pdfbox-app-1.5.0.jar
  
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "ECI (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209259#comment-13209259 ] 

ECI commented on PDFBOX-940:
----------------------------




I reproduce the issue with PDF-Box 1.4 used by Apache Tika: 
In my log file, I have 2012-01-25 16:47:03 ERROR 127.0.0.1 [PDFont:285] - Error: Could not parse predefined CMAP file for 'Adobe-UCS-0'  .

This is on some PDF documents only.

                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011852#comment-13011852 ] 

Lars Torunski commented on PDFBOX-940:
--------------------------------------

Currently I can test the 1.5.0 version only.

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022588#comment-13022588 ] 

Che-wei Kuo commented on PDFBOX-940:
------------------------------------

Sorry, I got the wrong error message.

It should be this one:

"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding 
Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS' "


> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henrique Nunes updated PDFBOX-940:
----------------------------------

    Attachment: gen_preview1.png
                oob_pdf.pdf

These are the files relevant for my comment below.

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013531#comment-13013531 ] 

Henrique Nunes commented on PDFBOX-940:
---------------------------------------

Hi. I'm having the same problem:

30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'

I'm using pdfbox-app-1.5.0.jar

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054303#comment-13054303 ] 

Joscha Feth commented on PDFBOX-940:
------------------------------------

font.PDFont: Error: Could not parse predefined CMAP file for 'Adobe-UCS-0'

still appearing in 1.5.0

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011654#comment-13011654 ] 

Andreas Lehmkühler commented on PDFBOX-940:
-------------------------------------------

I solved the issue with Gabriels pdf in revision 1085755.

@Lars, krishna
Do you still have the described problem using the current trunk version? If the problem still persists, can you provide us with a sample pdf?

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "krishna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

krishna updated PDFBOX-940:
---------------------------

    Attachment: pdf fonts.JPG

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055163#comment-13055163 ] 

Lars Torunski commented on PDFBOX-940:
--------------------------------------

Can we close this issue for PDFXC-Indentity0-0 and Adobe-WinCharSetFFFF-0?

And create a new one for "UCS" with Adobe-Identity-UCS, Adobe-WinCharSetFFFF-UCS2, Adobe-UCS-0, Adobe-UCS-UCS2 etc.?

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016284#comment-13016284 ] 

Che-wei Kuo commented on PDFBOX-940:
------------------------------------

Dear all,


@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:

"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Could not parse predefined CMAP file for 'Adobe-UCS-UCS2' "

I'm not sure if it was the same issue.
Thanks.


Best Regards


> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126212#comment-13126212 ] 

Antoni Mylka commented on PDFBOX-940:
-------------------------------------

I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.

I'm not a PDF expert, but in that file, I have the following PDF objects:

24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj

22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj

18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
endstream
endobj

So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed. 

Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:

descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
  descendentFont.cmap = this.cmap;
}

I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Tim Böhler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500245#comment-13500245 ] 

Tim Böhler commented on PDFBOX-940:
-----------------------------------

I confirm, that this is still not fixed in 1.7.1:
2012-11-19 14:56:14,036 ERROR [scheduler_Worker-10] [pdfbox.pdmodel.font.PDCIDFont] determineEncoding Error: Could not parse predefined CMAP file for 'Adobe--UCS2'
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Alex Wajda (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509848#comment-13509848 ] 

Alex Wajda commented on PDFBOX-940:
-----------------------------------

We use 1.6.0 and the issue is there:

ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Herm (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412691#comment-13412691 ] 

Herm  commented on PDFBOX-940:
------------------------------

Error still there in 1.6.0. Issue still open and unresolved.  When is a fix planned for this issue?
                
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002025#comment-13002025 ] 

Andreas Lehmkühler commented on PDFBOX-940:
-------------------------------------------

Please update to the newest version of PDFBox a try again. We added some text extraction improvements including some fixes for the handling of CMaps. Attach a sample pdf, if it still won't work.

> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-940
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-940
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>         Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
>            Reporter: krishna
>         Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Hi,
>    when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465  ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP  file for 'PDFXC-Indentity0-0'
> please find the solution

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira