You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Edson Alves Pereira <lo...@gmail.com> on 2012/01/05 14:43:37 UTC

Adding blank lines suport to PDFTextStripper

Hello folks, i want to convert some PDFs to text files, but pdfbox doesn't
support blank lines and all lines are written without spaces between them.
I'd like to add this feature to help with my task and i'd you give me some
points where to start.

I known i must start with PDFTextStripper.writePage(), but i not sure what
variable i must consider to calculate.

Regards,
Edson

Re: Adding blank lines suport to PDFTextStripper

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Jan 5, 2012 at 2:43 PM, Edson Alves Pereira <lo...@gmail.com> wrote:
> I known i must start with PDFTextStripper.writePage(), but i not sure what
> variable i must consider to calculate.

Check the handleLineSeparation() method in PDFTextStripper. You should
be able to calculate the distance between consecutive lines a bit like
is already being done in the isParagraphSeparation() method called by
handleLineSeparation().

BR,

Jukka Zitting