You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Geert Coelmont (JIRA)" <ji...@apache.org> on 2008/07/25 16:23:31 UTC

[jira] Commented: (PDFBOX-80) Does not convert spacing. gourps words

    [ https://issues.apache.org/jira/browse/PDFBOX-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616904#action_12616904 ] 

Geert Coelmont commented on PDFBOX-80:
--------------------------------------

I have a similar issue with some PDFs. The current version of PDFBox is glueing some words together while there is white space in the rendered version. I can mail the whole PDF but i'd rather not upload it here as it contains sensitive information.
Below is the portion of the PDF. 

1.00 g BT /Fo1 10.00 Tf 487.65 552.59 Td 0.000 Tc (OIN108061059) Tj ET 0 g
BT /Fo1 10.00 Tf 300.93 552.59 Td 0.000 Tc (NL 0073.17.554.B01) Tj ET

PDFbox extracts this without a space between the 2 words OIN108061059 and NL 0073... 


> Does not convert spacing.  gourps words
> ---------------------------------------
>
>                 Key: PDFBOX-80
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-80
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1258807
> Originally submitted by gohan222 on 2005-08-13 22:47.
> The PDFTextStripper misses some spacing in between
> words.  It crunches sentences together on occasions. 
> After running extract look for string "demandsofGoogle's".
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1258807&file_id=145590
> p125-ghemawat.zip (application/zip), 252359 bytes
> Removes spacing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.