You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Martin Obreshkov <ma...@gmail.com> on 2009/05/29 16:48:48 UTC

extract text problem

Hi i want to extract text from a PDF file (Book) and than to index the book
content. When i extract the text there are no new lines, tabs , etc .... How
can i extract text from pdf and keep the original formatting (mainly for new
lines and tabs).

-- 
When I raise my flashing sword, and my hand takes hold on judgment, I will
take vengeance upon mine enemies, and I will repay those who haze me. Oh,
Lord, raise me to Thy right hand and count me among Thy saints.

Re: extract text problem

Posted by Andreas Lehmkühler <an...@lehmi.de>.

Hi Martin,

what version of PDFBox are you using? Did you ever try the sort-option 
of the ExtractText commandline tool?

Andreas Lehmkühler

Martin Obreshkov schrieb:
> Hi i want to extract text from a PDF file (Book) and than to index the book
> content. When i extract the text there are no new lines, tabs , etc .... How
> can i extract text from pdf and keep the original formatting (mainly for new
> lines and tabs).
>