You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2019/03/17 10:37:00 UTC
[jira] [Resolved] (PDFBOX-4480) Problem extracting text in newline
characters and spaces beetween words
[ https://issues.apache.org/jira/browse/PDFBOX-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr resolved PDFBOX-4480.
-------------------------------------
Resolution: Fixed
Assignee: Tilman Hausherr
Fix Version/s: 3.0.0 PDFBox
2.0.15
> Problem extracting text in newline characters and spaces beetween words
> -----------------------------------------------------------------------
>
> Key: PDFBOX-4480
> URL: https://issues.apache.org/jira/browse/PDFBOX-4480
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.13
> Environment: macOs
> Reporter: ANIL SANGHANI
> Assignee: Tilman Hausherr
> Priority: Major
> Labels: textextraction
> Fix For: 2.0.15, 3.0.0 PDFBox
>
> Attachments: Document.txt, Narasimhan S.pdf, PDFBOX-4480-huge-CapHeight.pdf.txt
>
>
>
> I have a PDF file , when I try to extract its text using
> It ignores some Enter characters between lines, so the last word in the line and the first word in the next line appear as 1 word without spaces between them !!
> For Example, In Attached Pdf
> main Bsk as mainBsk
> [narasimhan1989@gmail.com Bangalore|mailto:narasimhan1989@gmail.comBangalore] as narasimhan1989@gmail.comBangalore
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org