You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by David Hoffer <dh...@gmail.com> on 2009/05/25 07:08:54 UTC

Re: PDFTextStripper option to control table text extraction?

> I am converting PDF to text using PDFTextStripper, however I have a few
> tables where some columns have more rows than others.  I need the data
> returned to maintain its position within its column in the table.  Currently
> it is simply removing the text where the table data is missing so the
> converted data has no relation to the original table.  Is there a formatting
> option that will retain the original table format?
>
> My current options are:
>
> stripper.setSortByPosition(true);
> stripper.setLineSeparator("\n");
> stripper.setPageSeparator("\n");
>
> -Dave
>
>