You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Gilad Denneboom <gi...@gmail.com> on 2013/07/23 14:02:34 UTC

Re: Issue with PDF - Image conversion

I'm now encountering the same issue myself, ironically... Any ideas on
possible ways to solve this issue when the fonts are not fully embedded?


On Tue, Jun 18, 2013 at 5:03 PM, Gilad Denneboom
<gi...@gmail.com>wrote:

> This is not related to PDFBox... It's about how you're generating the
> files (in InDesign, from the document properties).
>
>
> On Tue, Jun 18, 2013 at 4:50 PM, Robin Thomas Panicker <ro...@qburst.com>wrote:
>
>> Thanks Gilad, can you please provide me some more insight on that... maybe
>> a code snippet or some reference or pointer or something?
>>
>> Regards,
>> Robin
>>
>>
>>
>> On Tue, Jun 18, 2013 at 6:10 PM, Gilad Denneboom
>> <gi...@gmail.com>wrote:
>>
>> > Seems like it might be a fonts issue... Try embedding the full font
>> instead
>> > of just the subset when generating the file.
>> >
>> >
>> > On Tue, Jun 18, 2013 at 2:30 PM, Robin Thomas Panicker <
>> robin@qburst.com
>> > >wrote:
>> >
>> > > Sorry about that Gilad.
>> > > I have uploaded the same
>> > > here<https://www.dropbox.com/sh/ujrgmh47zku0zm9/h8z_4SR3Aw>
>> > >
>> > > Hope this helps,
>> > >
>> > > Thanks,
>> > > Robin
>> > >
>> > >
>> > >
>> > > On Tue, Jun 18, 2013 at 5:41 PM, Gilad Denneboom
>> > > <gi...@gmail.com>wrote:
>> > >
>> > > > I'm not seeing any attachments... It's possible the mailing list
>> > doesn't
>> > > > allow them. You can upload them to some file-sharing site and post
>> the
>> > > > links here.
>> > > >
>> > > >
>> > > > On Tue, Jun 18, 2013 at 7:38 AM, Robin Thomas Panicker <
>> > robin@qburst.com
>> > > > >wrote:
>> > > >
>> > > > > Thanks a lot Gilad and Andreas,
>> > > > > I was out of town last week and hence could not reply.
>> > > > >
>> > > > > I have attached the sample PDF and the image generated (only for
>> the
>> > > > first
>> > > > > page)
>> > > > >
>> > > > > If you notice the original pdf and the converted image,  the words
>> > "The
>> > > > > pressures" and "The solution" is not coming correctly in the
>> > converted
>> > > > > image. The rest of the image looks fine.
>> > > > >
>> > > > > I have also attached a very very crude java code that does a
>> > standalone
>> > > > > task of converting this pdf into image.
>> > > > >
>> > > > > Can you please let me know what could be possibly causing the
>> image
>> > > > issue?
>> > > > >
>> > > > > Thanks,
>> > > > > Robin
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Jun 11, 2013 at 5:37 PM, Andreas Lehmkuehler <
>> > andreas@lehmi.de
>> > > > >wrote:
>> > > > >
>> > > > >> Hi,
>> > > > >>
>> > > > >> Am 10.06.2013 11:15, schrieb Robin Thomas Panicker:
>> > > > >>
>> > > > >>  Thanks a lot Gilad, for responding. I was not sure on what more
>> > > > >>> information
>> > > > >>> to provide. Now that you have asked me the specific details,
>> let me
>> > > > >>> provide
>> > > > >>> you with more information.
>> > > > >>>
>> > > > >>> I am using the below code to do the conversion of PDF - image.
>> > > (Trying
>> > > > to
>> > > > >>> save the first page of the pdf as an image file)
>> > > > >>>
>> > > > >>>   String pdfFile ="d:/hs/4.pdf";
>> > > > >>>   document = PDDocument.load( pdfFile );
>> > > > >>>
>> > > > >>>              List pages =
>> > > > document.getDocumentCatalog().**getAllPages();
>> > > > >>>              PDPage page = ( PDPage ) pages.get( 0 );
>> > > > >>>              int width = ( int ) page.getArtBox().getWidth();
>> > > > >>>              int height = ( int ) page.getArtBox().getHeight();
>> > > > >>>              BufferedImage image = page.convertToImage(
>> imageType,
>> > > > >>> resolution );
>> > > > >>>
>> > > > >>>
>> > > > >>> On a machine (prod server) where the conversion DOES NOT work, I
>> > have
>> > > > >>> Ubuntu 12.4, open office 3.0
>> > > > >>> while on a machine (development machine) where the conversion
>> > works,
>> > > I
>> > > > >>> have
>> > > > >>> Ubuntu 10.10 and open office 3.0
>> > > > >>>
>> > > > >>> On both the machines I am using the same code and version of
>> PDFBox
>> > > on
>> > > > >>> both
>> > > > >>> is 1.8.1
>> > > > >>>
>> > > > >>> The issue that I face is that the image conversion simply doesnt
>> > work
>> > > > >>> correctly ( I can see parts of image / text garbled, or missing)
>> > > There
>> > > > is
>> > > > >>> no error or warning on the log outputs.
>> > > > >>>
>> > > > >>> Please let me know if I can provide you with any more
>> information
>> > in
>> > > > >>> understanding the problem
>> > > > >>>
>> > > > >> Without a sample pdf this is just a guess:
>> > > > >>
>> > > > >> The fact that you are using open office 3.0 leads to the
>> assumption
>> > > that
>> > > > >> the pdf
>> > > > >> in question contains fonts as embedded subsets. Those are not
>> fully
>> > > > >> supported
>> > > > >> by PDFBox. There are different issues with those kind of fonts.
>> > > > >> As you are using different platforms (Ubuntu 10.10 vs 12.04) you
>> are
>> > > > most
>> > > > >> likely
>> > > > >> using different versions of the JDK (1.6 vs 1.7). There are some
>> 1.7
>> > > > >> specific
>> > > > >> issues with embedded font subsets.
>> > > > >>
>> > > > >>
>> > > > >>  Thanks,
>> > > > >>> Robin
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> On Mon, Jun 10, 2013 at 2:25 PM, Gilad Denneboom
>> > > > >>> <gi...@gmail.com>**wrote:
>> > > > >>>
>> > > > >>>  A lof of information missing, there... How are you converting
>> the
>> > > PDF
>> > > > >>>> files, exactly? What type of problems do you encounter? Which
>> > > version
>> > > > of
>> > > > >>>> PDFBox do you use? And what does it have to do with your Office
>> > > suite
>> > > > >>>>
>> > > > >>>> Without more information it's impossible to help you with your
>> > > > problem.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> On Mon, Jun 10, 2013 at 8:22 AM, Robin Thomas Panicker <
>> > > > >>>> robin@qburst.com
>> > > > >>>>
>> > > > >>>>> wrote:
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>>  Hi,
>> > > > >>>>>           I am using PDFBox to convert PDF documents into
>> images.
>> > > > >>>>> However
>> > > > >>>>>
>> > > > >>>> in
>> > > > >>>>
>> > > > >>>>> some machines I am facing an issue. The conversion does not
>> > happen
>> > > > >>>>>
>> > > > >>>> correct.
>> > > > >>>>
>> > > > >>>>> I can see missing text / images etc.
>> > > > >>>>>
>> > > > >>>>> Please note that this happens only in a few machines. I use
>> > Ubuntu
>> > > > and
>> > > > >>>>> OpenOffice. I have tried with a variety of combinations for
>> > > > difference
>> > > > >>>>> version of Ubuntu and Openoffice (and even LibreOffice)
>> > > > >>>>>
>> > > > >>>>> However I am unable to find out why it does not work on some
>> > > > machines.
>> > > > >>>>>
>> > > > >>>>> Can anyone please help?
>> > > > >>>>>
>> > > > >>>>> Thanks,
>> > > > >>>>> Robin
>> > > > >>>>>
>> > > > >>>>
>> > > > >> BR
>> > > > >> Andreas Lehmkühler
>> > > > >>
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Robin Panicker,
>> > > > > Q*Burst*
>> > > > > www.qburst.com
>> > > > > Skype: Robin.at.qburst
>> > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Robin Panicker,
>> > > Q*Burst*
>> > > www.qburst.com
>> > > Skype: Robin.at.qburst
>> > >
>> >
>>
>>
>>
>> --
>>
>> Robin Panicker,
>> Q*Burst*
>> www.qburst.com
>> Skype: Robin.at.qburst
>>
>
>

Re: Issue with PDF - Image conversion

Posted by Gilad Denneboom <gi...@gmail.com>.
Thanks for the update!


On Tue, Jul 23, 2013 at 5:54 PM, Andreas Lehmkuehler <an...@lehmi.de>wrote:

> Hi,
>
> Am 23.07.2013 14:02, schrieb Gilad Denneboom:
>
>  I'm now encountering the same issue myself, ironically... Any ideas on
>> possible ways to solve this issue when the fonts are not fully embedded?
>>
> That's on my TODO list and I've an already working local version (for TTF
> [1] and CFF [2] fonts). I're still some minor issues and I've o clean up
> the code,
> but I guess I'll commit it in a couple of days.
>
> Saying that, both improvements will be part of PDFBox 2.0. No, we don't
> have
> any schedule for the new major version and I'm afraid I won't backport it
> to 1.8.x as there are a lot of api-changes.
>
> BR
> Andreas Lehmkühler
>
> [1] https://issues.apache.org/**jira/browse/PDFBOX-490<https://issues.apache.org/jira/browse/PDFBOX-490>
> [2] https://issues.apache.org/**jira/browse/PDFBOX-1608<https://issues.apache.org/jira/browse/PDFBOX-1608>
>
>

Re: Issue with PDF - Image conversion

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 23.07.2013 14:02, schrieb Gilad Denneboom:
> I'm now encountering the same issue myself, ironically... Any ideas on
> possible ways to solve this issue when the fonts are not fully embedded?
That's on my TODO list and I've an already working local version (for TTF [1] 
and CFF [2] fonts). I're still some minor issues and I've o clean up the code,
but I guess I'll commit it in a couple of days.

Saying that, both improvements will be part of PDFBox 2.0. No, we don't have
any schedule for the new major version and I'm afraid I won't backport it
to 1.8.x as there are a lot of api-changes.

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-490
[2] https://issues.apache.org/jira/browse/PDFBOX-1608