You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Juraj Lonc (JIRA)" <ji...@apache.org> on 2013/03/21 16:13:15 UTC

[jira] [Commented] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

    [ https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13609022#comment-13609022 ] 

Juraj Lonc commented on PDFBOX-1545:
------------------------------------

This iteration is not supposed to give you whole words.
This iteration gives you tokes exactly in the same way they are stored in PDF. Every single letter could be stored separately.
                
> ReplaceString fails to replace text, however RemoveText or TextExtraction works fine
> ------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1545
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1545
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.7.1
>         Environment: ubuntu 32bit, Java 6
>            Reporter: MartinV
>              Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf :
> https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
> (anyone with link can view and download it...)
> As i found during iteration in "Tj" and "tj" operations :
>  COSString previous = (COSString)tokens.get( j-1 );
>  String string = previous.getString();
> Those strings are just empty or with length of 2 (some whitespaces only) ... i would expect to get some separated group of words from my PDF.
> I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?).
> I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you
> PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira