You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Webrtc Go (JIRA)" <ji...@apache.org> on 2014/05/19 10:10:37 UTC
[jira] [Updated] (PDFBOX-2083) Some characters overlap other characters, font changed

     [ https://issues.apache.org/jira/browse/PDFBOX-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Webrtc Go updated PDFBOX-2083:
------------------------------

    Attachment: vgsdmuuhd5ak03orqudq10.jpg
                technical-guide.pdf

the jpeg file is form page No.11 of the pdf file

> Some characters overlap other characters, font changed
> ------------------------------------------------------
>
>                 Key: PDFBOX-2083
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2083
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: windows8 
>            Reporter: Webrtc Go
>         Attachments: technical-guide.pdf, vgsdmuuhd5ak03orqudq10.jpg
>
>
> Hi, please forgive my english first.
> I tried to convert a pdf file to images, using pdfbox 1.8.4 within tika-app-1.5.jar.
> The jpeg files I got were not ideal.
> The content in the images were different from the pdf file.
> Some characters were in different places, and some characters overlapped others.
> There were many lines of console information which read:
> '13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> 13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <y> from <Courier New Italic> to the default font 
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> ...'
> Could you give me some instruction, tell me how to solve this problem, how to get ideal images?
> Thanks a lot.
> I attached the pdf file and one of the images.
> And here are my code:
> PDDocument doc = PDDocument.load(input + ".pdf");
> List<PDPage> pages = doc.getDocumentCatalog().getAllPages(); 
> for (int i = 0; i < pages.size(); i++) { 
>     PDPage page = pages.get(i); 
>     BufferedImage image = page.convertToImage(); 
>     Iterator<ImageWriter> iter = ImageIO.getImageWritersBySuffix("JPG"); 
>     ImageWriter writer = iter.next(); 
>     File outFile = new File(input + i + ".jpg");
>     FileOutputStream out = new FileOutputStream(outFile); 
>     ImageOutputStream outImage = ImageIO.createImageOutputStream(out); 
>     writer.setOutput(outImage); 
>     writer.write(new IIOImage(image, null, null)); 
>     writer.dispose(); 
>     out.close(); 
> } 
> doc.close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)