You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Brian Carrier (JIRA)" <ji...@apache.org> on 2009/02/23 23:58:02 UTC
[jira] Resolved: (PDFBOX-43) spaces in extracted text
[ https://issues.apache.org/jira/browse/PDFBOX-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Carrier resolved PDFBOX-43.
---------------------------------
Resolution: Incomplete
I found the original bug report here:
http://sourceforge.net/tracker/index.php?func=detail&aid=1153181&group_id=78314&atid=552832
It does not have any of the files mentioned, so we can't reproduce.
> spaces in extracted text
> ------------------------
>
> Key: PDFBOX-43
> URL: https://issues.apache.org/jira/browse/PDFBOX-43
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1153181
> Originally submitted by benlitchfield on 2005-02-27 17:45.
> See "Wenjie broken text.pdf" There are spaces
> between words.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> This issue is fixed for nont standard type3 fonts,
> which "Wenjie broken text.pdf" uses.
> The extra spaces in the ocalc.pdf is a different problem that is
> still being looked into.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> FYI, This problem is seen with PDFs that use Type3 fonts. A
> solution is in the works.
> Ben
> [comment on SourceForge]
> Originally sent by benlitchfield.
> Logged In: YES
> user_id=601708
> In the ocalc.pdf there are some spacing issues as well
> For example
> "deviat e from the existing textb ooks, I would certain ly make
> major changes by emphasizing several"
> [comment on SourceForge]
> Originally sent by fuwenjie.
> Logged In: YES
> user_id=1219597
> I found that it is sometimes happened that the font size is not
> assigned correctly. The font size would all be 1.0 in that
> case. Under that circumstance, it is seldom happened that
> the width is not correct either. In those cases, the width is
> often less than 1.0 which is obviously impossible.
> A word in the original text may break into serveral parts and
> the return value of GetY() of each part may not right, causing
> the characters overlapes with others.
> The incorrect Y value and width may obstruct us in reforming
> the word according to the Y value and width of each part.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.