You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/07/25 23:24:39 UTC

[jira] [Comment Edited] (PDFBOX-2232) Is there difference between character \n and character space(32) in pdf stream

    [ https://issues.apache.org/jira/browse/PDFBOX-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074944#comment-14074944 ] 

John Hewson edited comment on PDFBOX-2232 at 7/25/14 9:24 PM:
--------------------------------------------------------------

PDFBox's extracted text is not quite the same as copying to the clipboard from Acrobat. Using Acrobat's "Save As Other..." and selecting plain text is what you want to compare.


was (Author: jahewson):
Extracted text is not quite the same as copying to the clipboard from Acrobat. Using Acrobat's "Save As Other..." and selecting plain text is what you want to compare.

> Is there difference between character \n and character space(32) in pdf stream
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2232
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2232
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Reporter: huangchangan
>
> when extract text from pdf files with PDFTextStripper, I get a space(32) at each end of paragraph or  cells in a table, while in the text copyed from Adobe reader, the end character is \n, I wonder whether pdfbox convert character \n to space(32),I checked function processEncodedText in PDFStreamEngine and get no usefull information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)