You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Candace Bain <ca...@gmail.com> on 2014/02/24 20:26:35 UTC

Text extract/replace using PDFBox

I'm using PDFBox to programatically create a PDF file by finding and
replacing text in a template PDF file.  The template was created by someone
in a different department.  This worked correctly in a previous version of
the software, but we've added some new text for the next version of the
software that is not working.

The problem seems to be that the text in the previous version of the file
is using the Imago font with Ansi encoding, whereas the text that was added
to the newer version of the file is using the Imago font with Identiy-H
encoding.

This file was created with Adobe InDesign, and I am not familiar enough
with the product to know how to ensure that the fonts in the exported PDF
file only use Ansi encoding.  Is this possible, or is it possible to
process the template with another application to make sure we're using a
font with Ansi encoding?

I've attached the template file that is causing the problem in case that is
useful,

Best regards,

Candace

Re: Text extract/replace using PDFBox

Posted by Candace Bain <ca...@gmail.com>.
Ok, our designer figured this out.  Apparently if you have the letter f
followed by the letter i the string will not be not stored as ascii text.
 Apparently this an artifact of typesetting:

http://en.wikipedia.org/wiki/Typographic_ligature

When I change the PDF template so the strings I need to replace do not
contain "fi" then they are saved as ascii text and I'm able to
programatically replace them.

That's the first issue I've ever looked at that had to do with printing
presses.

Thanks for the PDFbox library, it's very useful!

Best regards,

Candace


On Mon, Feb 24, 2014 at 1:26 PM, Candace Bain <ca...@gmail.com>wrote:

> I'm using PDFBox to programatically create a PDF file by finding and
> replacing text in a template PDF file.  The template was created by someone
> in a different department.  This worked correctly in a previous version of
> the software, but we've added some new text for the next version of the
> software that is not working.
>
> The problem seems to be that the text in the previous version of the file
> is using the Imago font with Ansi encoding, whereas the text that was added
> to the newer version of the file is using the Imago font with Identiy-H
> encoding.
>
> This file was created with Adobe InDesign, and I am not familiar enough
> with the product to know how to ensure that the fonts in the exported PDF
> file only use Ansi encoding.  Is this possible, or is it possible to
> process the template with another application to make sure we're using a
> font with Ansi encoding?
>
> I've attached the template file that is causing the problem in case that
> is useful,
>
> Best regards,
>
> Candace
>