You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Kevin Ternes <KT...@thegeneral.com> on 2016/04/28 19:39:25 UTC

Editing Text again

So I have a bunch of source PDFs that I use PDFBox 2.0.0 to fill out and sometimes edit.
Specifically, for certain business cases I remove or update the text "(signed by Named Insured)".
I edit using a method similar to the one over on SourceForge, https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text

However, if a PDF gets edited by _Acrobat_ and the change is, for example, "(Signed by Named Insured.)" where the S is capitalized and a period is inserted, the method will no longer be able to find the target text even if I make the corresponding changes in my method call.

Using PDFDebugger, I see that this:
    0.699 0.676 0.639 0.747 k
    /TT1 8 Tf
    0.539 -10.877 Td
    (\(signed by Named Insured\)) Tj
    0.698 0.675 0.639 0.74 k
    /TT1 9.96 Tf
    -0.87 -27.115 Td

Has been changed to this:
    0.699 0.676 0.639 0.747 k
    /TT1 8 Tf
    0.539 -10.877 Td
    (\() Tj
    /C2_2 8 Tf
    (\0006) Tj
    /TT1 8 Tf
    1 0 0 1 113.02 381.017 Tm
    (igned by Named Insured) Tj
    /C2_2 8 Tf
    87.164 0 Td
    (\000\021) Tj
    /TT1 8 Tf
    (\)) Tj
    0.698 0.675 0.639 0.74 k
    /TT1 9.96 Tf
    -96.573 -27.115 Td
And it is obvious why the method will no longer work.

Has anyone any suggestions on how to programmatically deal with this?
Or is there a setting in Acrobat that I can use to tell it to stop doing this crap?!


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Editing Text again

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 28.04.2016 um 19:39 schrieb Kevin Ternes:
> So I have a bunch of source PDFs that I use PDFBox 2.0.0 to fill out and sometimes edit.
> Specifically, for certain business cases I remove or update the text "(signed by Named Insured)".
> I edit using a method similar to the one over on SourceForge, https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text

See https://pdfbox.apache.org/2.0/migration.html "Why was the 
ReplaceText example removed?"

What you could do instead is to draw a blank rectangle and put your text 
on top. However the old text would still exist in text extraction.

Tilman


>
> However, if a PDF gets edited by _Acrobat_ and the change is, for example, "(Signed by Named Insured.)" where the S is capitalized and a period is inserted, the method will no longer be able to find the target text even if I make the corresponding changes in my method call.
>
> Using PDFDebugger, I see that this:
>      0.699 0.676 0.639 0.747 k
>      /TT1 8 Tf
>      0.539 -10.877 Td
>      (\(signed by Named Insured\)) Tj
>      0.698 0.675 0.639 0.74 k
>      /TT1 9.96 Tf
>      -0.87 -27.115 Td
>
> Has been changed to this:
>      0.699 0.676 0.639 0.747 k
>      /TT1 8 Tf
>      0.539 -10.877 Td
>      (\() Tj
>      /C2_2 8 Tf
>      (\0006) Tj
>      /TT1 8 Tf
>      1 0 0 1 113.02 381.017 Tm
>      (igned by Named Insured) Tj
>      /C2_2 8 Tf
>      87.164 0 Td
>      (\000\021) Tj
>      /TT1 8 Tf
>      (\)) Tj
>      0.698 0.675 0.639 0.74 k
>      /TT1 9.96 Tf
>      -96.573 -27.115 Td
> And it is obvious why the method will no longer work.
>
> Has anyone any suggestions on how to programmatically deal with this?
> Or is there a setting in Acrobat that I can use to tell it to stop doing this crap?!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org