You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/05/19 17:49:38 UTC
[jira] [Closed] (PDFBOX-2083) Some characters overlap other characters, font changed

     [ https://issues.apache.org/jira/browse/PDFBOX-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-2083.
-----------------------------------

    Resolution: Duplicate

As I suspected in my answer to you in the user list, it works fine in version 2. You can download it here:
https://pdfbox.apache.org/downloads.html#scm
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/

How to render (API is different):
{code}
PDDocument doc = PDDocument.loadNonSeq(input + ".pdf", null);
List<PDPage> pages = doc.getDocumentCatalog().getAllPages(); 
for (int i = 0; i < pages.size(); i++) 
{ 
  BufferedImage bim = new PDFRenderer(document).renderImageWithDpi(i, dpi, ImageType.RGB);
  ImageIOUtil.writeImage(bim, input + i + ".jpg"), dpi);
}
doc.close();
{code}

I'm closing this as duplicate, but feel free to comment and/or reopen if you find a flaw in the rendering. You're also welcome to ask questions on the user list if something is unclear about the 2.0 API. Btw, note that I am using loadNonSeq() instead of load(). Good luck!

> Some characters overlap other characters, font changed
> ------------------------------------------------------
>
>                 Key: PDFBOX-2083
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2083
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: windows8 
>            Reporter: Webrtc Go
>         Attachments: technical-guide.pdf, vgsdmuuhd5ak03orqudq10.jpg
>
>
> Hi, please forgive my english first.
> I tried to convert a pdf file to images, using pdfbox 1.8.4 within tika-app-1.5.jar.
> The jpeg files I got were not ideal.
> The content in the images were different from the pdf file.
> Some characters were in different places, and some characters overlapped others.
> There were many lines of console information which read:
> '13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> 13:49:07,094 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <y> from <Courier New Italic> to the default font 
> 13:49:07,095 WARN [PDSimpleFont:107] Changing font on <l> from <Courier New Italic> to the default font 
> ...'
> Could you give me some instruction, tell me how to solve this problem, how to get ideal images?
> Thanks a lot.
> I attached the pdf file and one of the images.
> And here are my code:
> PDDocument doc = PDDocument.load(input + ".pdf");
> List<PDPage> pages = doc.getDocumentCatalog().getAllPages(); 
> for (int i = 0; i < pages.size(); i++) { 
>     PDPage page = pages.get(i); 
>     BufferedImage image = page.convertToImage(); 
>     Iterator<ImageWriter> iter = ImageIO.getImageWritersBySuffix("JPG"); 
>     ImageWriter writer = iter.next(); 
>     File outFile = new File(input + i + ".jpg");
>     FileOutputStream out = new FileOutputStream(outFile); 
>     ImageOutputStream outImage = ImageIO.createImageOutputStream(out); 
>     writer.setOutput(outImage); 
>     writer.write(new IIOImage(image, null, null)); 
>     writer.dispose(); 
>     out.close(); 
> } 
> doc.close();



--
This message was sent by Atlassian JIRA
(v6.2#6252)