You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Tim Allison <ta...@apache.org> on 2019/06/04 14:51:03 UTC

Extract actual /Table /TD /TR markup info?

All,
  I have some pdfs with actual /Table /TD /TR markup.

  How much effort would it be to extend PDFTextStripper to add, e.g.
startTable(), endTable(), startTD(), endTD(), etc...?

  If I do have time to work on this (uncertain at this point), would
there be interest in putting this into PDFBox...or am I missing where
it already exists?

       Cheers,

                       Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Extract actual /Table /TD /TR markup info?

Posted by Tilman Hausherr <TH...@t-online.de>.
There is certainly a need for it. Related questions are regularly asked 
on SO.

I don't know how much effort is needed... I did work a bit on the 
structure tree, but in an abstract way so that I've never really 
understood the meaning.

Tilman

Am 04.06.2019 um 16:51 schrieb Tim Allison:
> All,
>    I have some pdfs with actual /Table /TD /TR markup.
>
>    How much effort would it be to extend PDFTextStripper to add, e.g.
> startTable(), endTable(), startTD(), endTD(), etc...?
>
>    If I do have time to work on this (uncertain at this point), would
> there be interest in putting this into PDFBox...or am I missing where
> it already exists?
>
>         Cheers,
>
>                         Tim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org