You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Sergey Makarov (JIRA)" <ji...@apache.org> on 2019/05/15 15:37:00 UTC

[jira] [Updated] (PDFBOX-4549) No Unicode mapping

     [ https://issues.apache.org/jira/browse/PDFBOX-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Makarov updated PDFBOX-4549:
-----------------------------------
    Description: 
Hello, if i try get text from pdf (attached), i will result empty out and many warns. Font attached also.
 Acrobat reader will open succeed, i can select and copy text

my code:
{code:java}
private static void parseOne(String path) throws IOException {
    String pdfFileInText;
    PDFTextStripper tStripper;
    File file = new File(path);
    tStripper = new PDFTextStripper();
    MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 500000000).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
    PDDocument document = PDDocument.load(file, memUsageSetting);
    if (!document.isEncrypted()) {
        pdfFileInText = tStripper.getText(document);
        System.out.print(pdfFileInText);
    }
    document.close();
}{code}
Error:
{code:java}
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
{code}
 

  was:
Hello, if i try get text from pdf (attached), i will result empty out and many warns. Font attached also.
Acrobat reader will open succeed, i can select and copy text

my code:

 
{code:java}
private static void parseOne(String path) throws IOException {
    String pdfFileInText;
    PDFTextStripper tStripper;
    File file = new File(path);
    tStripper = new PDFTextStripper();
    MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 500000000).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
    PDDocument document = PDDocument.load(file, memUsageSetting);
    if (!document.isEncrypted()) {
        pdfFileInText = tStripper.getText(document);
        System.out.print(pdfFileInText);
    }
    document.close();
}{code}
Error:
{code:java}
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
{code}
 


> No Unicode mapping
> ------------------
>
>                 Key: PDFBOX-4549
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4549
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Sergey Makarov
>            Priority: Major
>         Attachments: XO_Thames.zip, our_star_wars.pdf
>
>
> Hello, if i try get text from pdf (attached), i will result empty out and many warns. Font attached also.
>  Acrobat reader will open succeed, i can select and copy text
> my code:
> {code:java}
> private static void parseOne(String path) throws IOException {
>     String pdfFileInText;
>     PDFTextStripper tStripper;
>     File file = new File(path);
>     tStripper = new PDFTextStripper();
>     MemoryUsageSetting memUsageSetting = MemoryUsageSetting.setupMixed(0, 500000000).setTempDir(new File("/home/user/pdfBoxTest/newFiles/"));
>     PDDocument document = PDDocument.load(file, memUsageSetting);
>     if (!document.isEncrypted()) {
>         pdfFileInText = tStripper.getText(document);
>         System.out.print(pdfFileInText);
>     }
>     document.close();
> }{code}
> Error:
> {code:java}
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
> WARNING: Invalid ToUnicode CMap in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+83 (83) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+116 (116) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+97 (97) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+114 (114) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+87 (87) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+115 (115) in font HPDFAA+XOThames
> May 15, 2019 6:30:01 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
> WARNING: Invalid ToUnicode CMap in font HPDFAB+DejaVuSansMono,Book
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org