You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "krishna (JIRA)" <ji...@apache.org> on 2011/01/12 13:09:46 UTC
[jira] Created: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not
parse predefined CMAP file for 'PDFXC-Indentity0-0'
[pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
-------------------------------------------------------------------------------------------
Key: PDFBOX-940
URL: https://issues.apache.org/jira/browse/PDFBOX-940
Project: PDFBox
Issue Type: Bug
Affects Versions: 1.4.0
Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
Reporter: krishna
Hi,
when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
please find the solution
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Kevin Clark (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118852#comment-13118852 ]
Kevin Clark commented on PDFBOX-940:
------------------------------------
I'm getting this via the Tika 0.10 release which uses 1.6.0.
2011-10-01 16:48:27,586 (55308987) [Parser-thread-2] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'
Can't upload the pdf for privacy reasons, unfortunately.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not
parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "krishna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
krishna updated PDFBOX-940:
---------------------------
Attachment: pdf fonts2.JPG
pdf fonts1.JPG
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Juhasz Istvan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055117#comment-13055117 ]
Juhasz Istvan commented on PDFBOX-940:
--------------------------------------
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
revision 1139575 (1.6.0-SNAPSHOT)
(pdf - embedded truetype (cid) font with encoding identity-h)
pdf>java -jar pdfbox-app-1.6.0-SNAPSHOT.jar ExtractText -debug a015.pdf a015.txt
Loading PDF a015.pdf
Time for loading: 0.062 seconds
Starting text extraction
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: cidSystemInfo: Adobe-UCS-0
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: resourceName: org/apache/pdfbox/resources/cmap/Adobe-Identity-UCS
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: cidSystemInfo: Adobe-UCS-0
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
INFO: resourceName: org/apache/pdfbox/resources/cmap/Adobe-Identity-UCS
2011.06.26. 18:38:39 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
Time for extraction: 0.984 seconds
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Gabriel Gravel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007162#comment-13007162 ]
Gabriel Gravel commented on PDFBOX-940:
---------------------------------------
I have a similar problem. I used to have the following error when extracting text from a batch of PDF files using pdfbox 1.3.1:
ERROR 10 Mar 2011 00:22:44.038 [org.apache.pdfbox.pdmodel.font.PDFont] line:285 - Error: Could not parse predefined CMAP file for 'Adobe-UCS-0'
After reading the comments here, I have upgraded to 1.5.0 and am now having the following error:
ERROR 15 Mar 2011 14:31:10.195 [org.apache.pdfbox.pdmodel.font.PDCIDFont] line:324 - Error: Could not parse predefined CMAP file for 'Adobe-UCS-UCS2'
However, I still seem to be able to extract the text correctly from the file. Should I be worried about this error or can I ignore it altogether? Here's a link to one of the problematic files: http://cbpp-pcpe.phac-aspc.gc.ca/intervention_pdf/en/72.pdf
Thanks for your time
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont]
Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Antoni Mylka (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126212#comment-13126212 ]
Antoni Mylka edited comment on PDFBOX-940 at 10/12/11 10:31 PM:
----------------------------------------------------------------
I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.
I'm not a PDF expert, but in that file, I have the following PDF objects:
24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj
22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj
18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
(the binary content of the stream ommitted for readability)
endstream
endobj
So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed.
Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:
descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
descendentFont.cmap = this.cmap;
}
I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
was (Author: antoni.mylka):
I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.
I'm not a PDF expert, but in that file, I have the following PDF objects:
24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj
22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj
18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
endstream
endobj
So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed.
Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:
descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
descendentFont.cmap = this.cmap;
}
I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not
parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "krishna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
krishna updated PDFBOX-940:
---------------------------
Attachment: pdf properties3.JPG
pdf properties2.JPG
pdf properties1.JPG
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "krishna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011963#comment-13011963 ]
krishna commented on PDFBOX-940:
--------------------------------
Hi Andreas,
Because of security reasons, i can't upload the document...
Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0' problem was resolved in the 1.5.0 version, but Error: Could not parse predefined CMAP file for 'Adobe-UCS-UCS2' error was present their...
Please check this error..
Thanks,
Murali
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Arjohn Kampman (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128901#comment-13128901 ]
Arjohn Kampman commented on PDFBOX-940:
---------------------------------------
I'm also seeing "Could not parse predefined CMAP file for 'Adobe-Identity-UCS'" error message on some files. Debugging this error with the current trunk (r1184806), I noticed that PDCIDFont.determineEncoding() starts with a cidSystemInfo value "Adobe-UCS-0" and replaces this with "Adobe-Identity-UCS" in the else-if-statement. This triggers the error message because there is no such cmap file.
However, considering that the first if-statement maps any cidSystemInfo values containing "Identity" to "Identity-H", I'm wondering: should "Adobe-UCS-0" be mapped to "Identity-H" rather than "Adobe-Identity-UCS"?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001325#comment-13001325 ]
Lars Torunski commented on PDFBOX-940:
--------------------------------------
Similiar problem: 2011-03-02 08:30:24,126 [PWS-Index-Thread-569] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'
Both PDFXC-Indentity0-0 and Adobe-WinCharSetFFFF-0 aren't available in org/apache/pdfbox/resources/cmap/
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427180#comment-13427180 ]
Lars Torunski commented on PDFBOX-940:
--------------------------------------
My problem with Adobe-WinCharSetFFFF-UCS2 still exists in version 1.7.1:
2012-08-02 10:19:25,771 [PWS-Index-Thread-41] ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont]
Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016284#comment-13016284 ]
Che-wei Kuo edited comment on PDFBOX-940 at 4/21/11 1:20 AM:
-------------------------------------------------------------
Dear all,
@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:
"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS' "
I'm not sure if it was the same issue.
Thanks.
Best Regards
was (Author: gjwei):
Dear all,
@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:
"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Could not parse predefined CMAP file for 'Adobe-UCS-UCS2' "
I'm not sure if it was the same issue.
Thanks.
Best Regards
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "MH (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142997#comment-13142997 ]
MH commented on PDFBOX-940:
---------------------------
Same error message here:
SEVERE Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS'
when PDF has under or over content (e.g. watermark). Without such content, the error does not appear. (PDFBox 1.6.0)
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "krishna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022618#comment-13022618 ]
krishna commented on PDFBOX-940:
--------------------------------
Hi
'PDFXC-Indentity0-0' was fixed in 1.5.0 & 'Adobe-WinCharSetFFFF-UCS2' error was present there in 1.5.0
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011991#comment-13011991 ]
Lars Torunski commented on PDFBOX-940:
--------------------------------------
With 1.5.0 the error
2011-03-28 10:38:20,207 [PWS-Index-Thread-35] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-0'
doesn't occur anymore. But we are getting
2011-03-28 11:52:51,162 [PWS-Index-Thread-44] ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
with the same pdf file now.
I'm not allowed to attach the pdf here, but I can send you the pdf by email.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (PDFBOX-940) [pdmodel.font.PDFont]
Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013531#comment-13013531 ]
Henrique Nunes edited comment on PDFBOX-940 at 3/30/11 5:03 PM:
----------------------------------------------------------------
Hi. I'm having the same problem:
30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
I'm using pdfbox-app-1.5.0.jar with Jython 2.5.2 on Windows 7 64bit
No problems when on Ubuntu 10.
UPDATE: I built pdfbox-app-1.6.0-SNAPSHOT from the latest sources and the issue persists.
was (Author: hjrnunes):
Hi. I'm having the same problem:
30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
I'm using pdfbox-app-1.5.0.jar
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "ECI (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209259#comment-13209259 ]
ECI commented on PDFBOX-940:
----------------------------
I reproduce the issue with PDF-Box 1.4 used by Apache Tika:
In my log file, I have 2012-01-25 16:47:03 ERROR 127.0.0.1 [PDFont:285] - Error: Could not parse predefined CMAP file for 'Adobe-UCS-0' .
This is on some PDF documents only.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011852#comment-13011852 ]
Lars Torunski commented on PDFBOX-940:
--------------------------------------
Currently I can test the 1.5.0 version only.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022588#comment-13022588 ]
Che-wei Kuo commented on PDFBOX-940:
------------------------------------
Sorry, I got the wrong error message.
It should be this one:
"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Error: Could not parse predefined CMAP file for 'Adobe-Identity-UCS' "
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henrique Nunes updated PDFBOX-940:
----------------------------------
Attachment: gen_preview1.png
oob_pdf.pdf
These are the files relevant for my comment below.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Henrique Nunes (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013531#comment-13013531 ]
Henrique Nunes commented on PDFBOX-940:
---------------------------------------
Hi. I'm having the same problem:
30/Mar/2011 17:15:10 org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
SEVERE: Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
I'm using pdfbox-app-1.5.0.jar
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Joscha Feth (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054303#comment-13054303 ]
Joscha Feth commented on PDFBOX-940:
------------------------------------
font.PDFont: Error: Could not parse predefined CMAP file for 'Adobe-UCS-0'
still appearing in 1.5.0
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011654#comment-13011654 ]
Andreas Lehmkühler commented on PDFBOX-940:
-------------------------------------------
I solved the issue with Gabriels pdf in revision 1085755.
@Lars, krishna
Do you still have the described problem using the current trunk version? If the problem still persists, can you provide us with a sample pdf?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could not
parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "krishna (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
krishna updated PDFBOX-940:
---------------------------
Attachment: pdf fonts.JPG
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Lars Torunski (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055163#comment-13055163 ]
Lars Torunski commented on PDFBOX-940:
--------------------------------------
Can we close this issue for PDFXC-Indentity0-0 and Adobe-WinCharSetFFFF-0?
And create a new one for "UCS" with Adobe-Identity-UCS, Adobe-WinCharSetFFFF-UCS2, Adobe-UCS-0, Adobe-UCS-UCS2 etc.?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Che-wei Kuo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016284#comment-13016284 ]
Che-wei Kuo commented on PDFBOX-940:
------------------------------------
Dear all,
@Andreas Lehmkühler
Thank you for dealing the original issue.
However, there are still some errors after I build it in revision 1088324.
The error messages becomes:
"org.apache.pdfbox.pdmodel.font.PDCIDFont determineEncoding
Could not parse predefined CMAP file for 'Adobe-UCS-UCS2' "
I'm not sure if it was the same issue.
Thanks.
Best Regards
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126212#comment-13126212 ]
Antoni Mylka commented on PDFBOX-940:
-------------------------------------
I stumbled upon the same problem, on a confidential file. In the process I think I found an issue: PDFBOX-1137.
I'm not a PDF expert, but in that file, I have the following PDF objects:
24 0 obj
<</Type/Font/Subtype/Type0/BaseFont/TT491A9C96tCID/Encoding 18 0 R/DescendantFonts[22 0 R]>>
endobj
22 0 obj
<</Subtype/CIDFontType2/CIDSystemInfo 23 0 R/BaseFont/XJXBKC+TT491A9C96tCID/Type/Font/Name/R22/FontDescriptor 21 0 R/DW 1000
/W[691[259]
724[677
626
626]
737[677]]/CIDToGIDMap/Identity
>>
endobj
18 0 obj
<</Type/CMap/Name/R18/WMode 0/CMapName/WinCharSetFFFF-H/CIDSystemInfo<<
/Registry(Adobe)
/Ordering(WinCharSetFFFF)
/Supplement 0
>>
/Filter/FlateDecode/Length 19 0 R>>stream
endstream
endobj
So there is an embedded CMAP for WinCharSetFFFF-H, a parent font which refers to the embedded CMAP as its encoding, and a child font with no encoding. Applying the PDFBOX-1137 patch allowed the CMAP to be parsed.
Then, in PDType0Font constructor, I added an if, just after the descendant font is constructed, I made it "inherit" the cmap from the parent font. This fixed NPEs during text extraction, which happened because the cmap was missing:
descendentFont = PDFontFactory.createFont( descendantFontDictionary );
if (descendentFont.cmap == null) {
descendentFont.cmap = this.cmap;
}
I don't even know if this makes sense. Is the descendant font supposed to "inherit" the encoding from the parent font? This "fixed" the visible errors, but the output I get is still garbled. It's supposed to be a text in traditional Chinese. Can anyone with more PDF knowledge take a look at this?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Tim Böhler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500245#comment-13500245 ]
Tim Böhler commented on PDFBOX-940:
-----------------------------------
I confirm, that this is still not fixed in 1.7.1:
2012-11-19 14:56:14,036 ERROR [scheduler_Worker-10] [pdfbox.pdmodel.font.PDCIDFont] determineEncoding Error: Could not parse predefined CMAP file for 'Adobe--UCS2'
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Alex Wajda (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509848#comment-13509848 ]
Alex Wajda commented on PDFBOX-940:
-----------------------------------
We use 1.6.0 and the issue is there:
ERROR org.apache.pdfbox.pdmodel.font.PDCIDFont - Error: Could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts1.JPG, pdf fonts2.JPG, pdf fonts.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Herm (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412691#comment-13412691 ]
Herm commented on PDFBOX-940:
------------------------------
Error still there in 1.6.0. Issue still open and unresolved. When is a fix planned for this issue?
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: gen_preview1.png, oob_pdf.pdf, pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PDFBOX-940) [pdmodel.font.PDFont] Error: Could
not parse predefined CMAP file for 'PDFXC-Indentity0-0'
Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002025#comment-13002025 ]
Andreas Lehmkühler commented on PDFBOX-940:
-------------------------------------------
Please update to the newest version of PDFBox a try again. We added some text extraction improvements including some fixes for the handling of CMaps. Attach a sample pdf, if it still won't work.
> [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> -------------------------------------------------------------------------------------------
>
> Key: PDFBOX-940
> URL: https://issues.apache.org/jira/browse/PDFBOX-940
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Tomcat 6.0.18, windows server 2003, pdfbox-1.4.0.jar
> Reporter: krishna
> Attachments: pdf fonts.JPG, pdf fonts1.JPG, pdf fonts2.JPG, pdf properties1.JPG, pdf properties2.JPG, pdf properties3.JPG
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Hi,
> when i am trying to upload a pdf document the following error is thrown in the tomcat.. i am using pdfbox-1.4.0.jar..
> 17:29:33,465 ERROR [pdmodel.font.PDFont] Error: Could not parse predefined CMAP file for 'PDFXC-Indentity0-0'
> please find the solution
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira