You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Tilen Bobek <ti...@gmail.com> on 2009/03/17 19:30:36 UTC

Converting PDF to image

Hello everyone!

I tried to convert each page from a PDF document to BufferedImage and store
each image to disk.

Steps:

- I downloaded PDFBox from svn and build it with ant, created jar lib from
classes (added Resources to the jar)
- I'm using the jar in NetBeans
- tried to call convertToImage() function on a PDPage instance and got
exception that a class cannot be found so I downloaded FontBox-1.0.1.jar and
added it to NetBeans project
- and the following code snippet still throws an exception that it can't
find a mehtod in CMapParser class from FontBox-1.0.1.jar library

Code snippet from a example I wrote (important: path stored in filePath
points to an existing PDF):


        try {

            // laod PDF document
            PDDocument document = PDDocument.load(new File(filePath));

            // get all pages
            List<PDPage> pages =
document.getDocumentCatalog().getAllPages();

            // for each page
            for (int i = 0; i < pages.size(); i++) {
                // single page
                PDPage singlePage = pages.get(i);

                // to BufferedImage
                BufferedImage buffImage =  singlePage.convertToImage(); //
<-- HERE GETS THE FOLLOWING EXCEPTION THROWN FROM MY CODE

                // write image to disk
                ImageIO.write(buffImage, "image/png", new
File("C:\\Users\\Funky\\Desktop\\page" + i + ".png"));
            }

        } catch (IOException ex) {
            ex.printStackTrace();
        }

The exception I get:

Exception in thread "main" java.lang.NoSuchMethodError:
org.fontbox.cmap.CMapParser.parse(Ljava/lang/String;Ljava/io/InputStream;)Lorg/fontbox/cmap/CMap;
        at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:513)
        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:367)
        at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:325)
        at
org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:66)
        at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:491)
        at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:214)
        at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:173)
        at
org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:88)
        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:669)
        at PDFBox.Example.<init>(Example.java:45)
        at PDFBox.Example.main(Example.java:63)

I looked in PDFont.java source file and saw that the function
CMapParser.parse(...) takes two String attributes.

What can I do to make the PDPage.convertToImage() function work properly?

Thanks for any help!

Regards.

Tilen

Re: Converting PDF to image

Posted by W <wi...@gmail.com>.
Btw,  I have a problem when converting math formula from pdf to image,
is there any workaround to solve this problem ?


Regards,
Wildan

On Thu, Mar 19, 2009 at 1:30 AM, Hilel New <hi...@gmail.com> wrote:
> try "png" instead of "image/png" as the second parameter to ImageIO.write()
> by the way, you don't have to download fontbox, it is in a directory named
> external in your pdfbox distribution.
>
>
> On Tue, Mar 17, 2009 at 8:30 PM, Tilen Bobek <ti...@gmail.com> wrote:
>


-- 
---
OpenThink Labs
www.tobethink.com

Aligning IT and Education

>> 021-99325243
Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana

Re: Converting PDF to image

Posted by Hilel New <hi...@gmail.com>.
try "png" instead of "image/png" as the second parameter to ImageIO.write()
by the way, you don't have to download fontbox, it is in a directory named
external in your pdfbox distribution.


On Tue, Mar 17, 2009 at 8:30 PM, Tilen Bobek <ti...@gmail.com> wrote:

> Hello everyone!
>
> I tried to convert each page from a PDF document to BufferedImage and store
> each image to disk.
>
> Steps:
>
> - I downloaded PDFBox from svn and build it with ant, created jar lib from
> classes (added Resources to the jar)
> - I'm using the jar in NetBeans
> - tried to call convertToImage() function on a PDPage instance and got
> exception that a class cannot be found so I downloaded FontBox-1.0.1.jar
> and
> added it to NetBeans project
> - and the following code snippet still throws an exception that it can't
> find a mehtod in CMapParser class from FontBox-1.0.1.jar library
>
> Code snippet from a example I wrote (important: path stored in filePath
> points to an existing PDF):
>
>
>        try {
>
>            // laod PDF document
>            PDDocument document = PDDocument.load(new File(filePath));
>
>            // get all pages
>            List<PDPage> pages =
> document.getDocumentCatalog().getAllPages();
>
>            // for each page
>            for (int i = 0; i < pages.size(); i++) {
>                // single page
>                PDPage singlePage = pages.get(i);
>
>                // to BufferedImage
>                BufferedImage buffImage =  singlePage.convertToImage(); //
> <-- HERE GETS THE FOLLOWING EXCEPTION THROWN FROM MY CODE
>
>                // write image to disk
>                ImageIO.write(buffImage, "image/png", new
> File("C:\\Users\\Funky\\Desktop\\page" + i + ".png"));
>            }
>
>        } catch (IOException ex) {
>            ex.printStackTrace();
>        }
>
> The exception I get:
>
> Exception in thread "main" java.lang.NoSuchMethodError:
>
> org.fontbox.cmap.CMapParser.parse(Ljava/lang/String;Ljava/io/InputStream;)Lorg/fontbox/cmap/CMap;
>        at org.apache.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:513)
>        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:367)
>        at
>
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:325)
>        at
>
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:66)
>        at
>
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:491)
>        at
>
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:214)
>        at
>
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:173)
>        at
> org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:88)
>        at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:669)
>        at PDFBox.Example.<init>(Example.java:45)
>        at PDFBox.Example.main(Example.java:63)
>
> I looked in PDFont.java source file and saw that the function
> CMapParser.parse(...) takes two String attributes.
>
> What can I do to make the PDPage.convertToImage() function work properly?
>
> Thanks for any help!
>
> Regards.
>
> Tilen
>