You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Trasca Virgil <vi...@yahoo.com> on 2011/02/11 14:33:57 UTC
Words/characters order is not preserved during text extraction
Hi,
Did anybody have this issue before? You can see in the attached screen shot the
original text in the document is
<0>652.5</0> while the extracted text is 652.5<0> </0>. I am using PDFBox 1.4.0
I get this behavior with both ExtracText application and with the
PDFTextStripper class.
What could be the cause for this? Is there any solution or work around to this?
Thanks,
Virgil