You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "MartinV (JIRA)" <ji...@apache.org> on 2013/03/21 04:23:15 UTC

[jira] [Created] (PDFBOX-1545) ReplaceString fails to replace text, however RemoveText or TextExtraction works fine

MartinV created PDFBOX-1545:
-------------------------------

             Summary: ReplaceString fails to replace text, however RemoveText or TextExtraction works fine
                 Key: PDFBOX-1545
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1545
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 1.7.1
         Environment: ubuntu 32bit, Java 6
            Reporter: MartinV
            Priority: Minor


org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf :

https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing
(anyone with link can view and download it...)

As i found during iteration in "Tj" and "tj" operations :
 COSString previous = (COSString)tokens.get( j-1 );
 String string = previous.getString();
Those strings are just empty or with length of 2 (some whitespaces only) so cannot be replaced.

I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?).

I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you

PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira