You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ahmed Eltayeb (JIRA)" <ji...@apache.org> on 2017/03/16 13:57:41 UTC
[jira] [Created] (PDFBOX-3719) pdfbox reads spaces as tabs
Ahmed Eltayeb created PDFBOX-3719:
-------------------------------------
Summary: pdfbox reads spaces as tabs
Key: PDFBOX-3719
URL: https://issues.apache.org/jira/browse/PDFBOX-3719
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.8.13
Reporter: Ahmed Eltayeb
Attachments: DummyDoc.docx, DummyDoc.pdf
i converted this pdf from the attached word document "DummyDoc.docx"
then when using pdfbox1.8 to extract text
java -jar pdfbox-app-1.8.13.jar ExtractText "DummyDoc.pdf" txt.txt
and the generated is
Dummy document for tag extraction
Section 1
\\DummyTagOne_01
This is text body one
\\DummyTagOne_02
This is text body two
Section 2
\\DummyTagTwo_01
This is text body three
\\DummyTagTwo_02
This is text body four
\\DummyTagTwo_03
This is text body five
as you can see "This is text body one " instead of "This is text body one " and so on
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org