You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Nathan Artz <na...@gmail.com> on 2017/06/19 23:47:59 UTC

Text Bounding Boxes and Reflow

Hi -

My ultimate goal is to be able to properly replace text and 'reflow' it - I
know this isn't handled out of the box, but there are a few subproblems I
am trying to first solve that would help me towards this goal. Also, my
text reflowing does not need to be perfect! Just 'kind of work. I am doing
my best to replicate what Acrobat does when you click 'Edit' text as my
attempt, specifically:

A. Like in acrobat when a user clicks 'edit' pdf, it seems that contiguous
areas of text are put together into a single editable textbox, and text
reflows within this text box. Does PDFBOX (or other suggested libraries)
have heuristics or other ideas on how to create these 'paragraph' like
bounding boxes?

B. Acrobat, for instance, will reflow and refit text naturally put inside
of a Form Text Field. Does PDFBOX support form text fields (that presumably
would fit/reflow the text inside of them?) and be able to 'flatten' it?

I'm really looking to explore what possibly higher level libraries exist
that could help towards this goal. I'm trying avoid having to, say, read in
all the quads of the text runs and come up with heuristics that determine
what is an isn't "contiguous" text myself.

Thanks!

Nate

Re: Text Bounding Boxes and Reflow

Posted by Tilman Hausherr <TH...@t-online.de>.
PDFTextStripper has heuristics for paragraphs. This issue
https://issues.apache.org/jira/browse/PDFBOX-3804
has test files and a parameter to change.

Yes it does support form fields and flattening. But that is something 
different than the first problem. Start with 
AppearanceGeneratorHelper.java and search from there...

Tilman


Am 20.06.2017 um 01:47 schrieb Nathan Artz:
> Hi -
>
> My ultimate goal is to be able to properly replace text and 'reflow' it - I
> know this isn't handled out of the box, but there are a few subproblems I
> am trying to first solve that would help me towards this goal. Also, my
> text reflowing does not need to be perfect! Just 'kind of work. I am doing
> my best to replicate what Acrobat does when you click 'Edit' text as my
> attempt, specifically:
>
> A. Like in acrobat when a user clicks 'edit' pdf, it seems that contiguous
> areas of text are put together into a single editable textbox, and text
> reflows within this text box. Does PDFBOX (or other suggested libraries)
> have heuristics or other ideas on how to create these 'paragraph' like
> bounding boxes?
>
> B. Acrobat, for instance, will reflow and refit text naturally put inside
> of a Form Text Field. Does PDFBOX support form text fields (that presumably
> would fit/reflow the text inside of them?) and be able to 'flatten' it?
>
> I'm really looking to explore what possibly higher level libraries exist
> that could help towards this goal. I'm trying avoid having to, say, read in
> all the quads of the text runs and come up with heuristics that determine
> what is an isn't "contiguous" text myself.
>
> Thanks!
>
> Nate
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org