You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Martin Obreshkov <ma...@gmail.com> on 2009/05/29 16:48:48 UTC
extract text problem
Hi i want to extract text from a PDF file (Book) and than to index the book
content. When i extract the text there are no new lines, tabs , etc .... How
can i extract text from pdf and keep the original formatting (mainly for new
lines and tabs).
--
When I raise my flashing sword, and my hand takes hold on judgment, I will
take vengeance upon mine enemies, and I will repay those who haze me. Oh,
Lord, raise me to Thy right hand and count me among Thy saints.
Re: extract text problem
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi Martin,
what version of PDFBox are you using? Did you ever try the sort-option
of the ExtractText commandline tool?
Andreas Lehmkühler
Martin Obreshkov schrieb:
> Hi i want to extract text from a PDF file (Book) and than to index the book
> content. When i extract the text there are no new lines, tabs , etc .... How
> can i extract text from pdf and keep the original formatting (mainly for new
> lines and tabs).
>