You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Gustavo Hexsel (JIRA)" <ji...@apache.org> on 2009/02/10 18:44:59 UTC

[jira] Created: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Methods are marked as deprecated but they're effectively dead
-------------------------------------------------------------

                 Key: PDFBOX-422
                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 0.8.0-incubator
            Reporter: Gustavo Hexsel


There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.

This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.

Simply removing the methods would have been a much better solution in this case. 

Example of said methods:

org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
org.apache.pdfbox.util.PDFTextStripper#writeCharacters


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672809#action_12672809 ] 

quartz edited comment on PDFBOX-422 at 2/11/09 2:27 PM:
----------------------------------------------------------------

Thanks for the prompt response.

Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).

This is fine, the class is doing what is supposed to (according to its name).  We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.

But that's a different matter.  Having the compiler warn me that the method does not @Override a parent method anymore is on the right track.

      was (Author: quartz):
    Thanks for the prompt response.

Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).

This is fine, the class is doing what is supposed to (according to its name).  We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.

  
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Brian Carrier <ca...@digital-evidence.org>.
[removing this from jira]

Do you have a suggestion for how PDFBox could most ideally solve your  
situation?  Could you get the needed info by making a class that  
extends PDFTextStripper and overrides processTextPosition()?  Then  
you could see all of the TextPositions and where they are located?

On Feb 11, 2009, at 5:27 PM, Gustavo Hexsel (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/PDFBOX-422? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel&focusedCommentId=12672809#action_12672809 ]
>
> Gustavo Hexsel commented on PDFBOX-422:
> ---------------------------------------
>
> Thanks for the prompt response.
>
> Yes, I saw the methods, they just don't carry the text position  
> anymore (also, blocks get merged).
>
> This is fine, the class is doing what is supposed to (according to  
> its name).  We had a use-case (specifically document redaction)  
> that needed to bring back the text and the associated positions of  
> each char, which we were doing by using the startup of the text  
> block and each individual character width.
>
>
>> Methods are marked as deprecated but they're effectively dead
>> -------------------------------------------------------------
>>
>>                 Key: PDFBOX-422
>>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>>             Project: PDFBox
>>          Issue Type: Bug
>>          Components: Text extraction
>>    Affects Versions: 0.8.0-incubator
>>            Reporter: Gustavo Hexsel
>>
>> There are several methods on PDFTextStripper and PDFStreamEngine  
>> that are marked @deprecated, but they are not really used by the  
>> existing infrastructure anymore.
>> This would be ok if such methods weren't callbacks.  In this case,  
>> it breaks pre-existing code, and prevents the compiler from  
>> letting you know the methods are not to be used anymore.
>> Simply removing the methods would have been a much better solution  
>> in this case.
>> Example of said methods:
>> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
>> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
>> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672809#action_12672809 ] 

Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------

Thanks for the prompt response.

Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).

This is fine, the class is doing what is supposed to (according to its name).  We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.


> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Carrier resolved PDFBOX-422.
----------------------------------

    Resolution: Fixed

Wrapper methods removed.

Sending        trunk/src/main/java/org/apache/pdfbox/util/PDFStreamEngine.java
Sending        trunk/src/main/java/org/apache/pdfbox/util/PDFText2HTML.java
Sending        trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripper.java
Sending        trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripperByArea.java
Transmitting file data ....
Committed revision 743530.

Note that wrapper methods existed for the following base methods:
PDFStreamEngine.showCharacter() -> PDFStreamEngine.processTextPosition()
PDFStreamEngine.showString() -> PDFStreamEngine.processEncodedText()
PDFTextStripper.flushText() -> PDFTextStripper.writePage()
PDFTextStripper.processsLineSeparator() -> PDFTextStripper.writeLineSeparator()
PDFTextStripper.processsWordSeparator() -> PDFTextStripper.writeWordSeparator()




> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672419#action_12672419 ] 

Brian Carrier commented on PDFBOX-422:
--------------------------------------

The functions were renamed to make them more consistent with other functions and with their actions.  The original names were kept to maintain backwards compatibility.  Can  you give an example of how this breaks existing code?  

> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 11.02.2009 16:07:02 Jukka Zitting wrote:
> Hi,
> 
> On Wed, Feb 11, 2009 at 4:02 PM, Brian Carrier
> <ca...@digital-evidence.org> wrote:
> > Because the next release will be the first incubator release, I'm inclined
> > to go with option 1 so that we start with a clean API [...]
> 
> +1 as people will in any case need to update any client code.
> 
> We should add a note about this specific case in the release notes, as
> the required client code changes are more complex than a simple
> package rename.

+1 to that. The package change is a chance to do things in a better,
cleaner way.


Jeremias Maerki


Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Brian Carrier <ca...@digital-evidence.org>.
On Feb 11, 2009, at 11:29 AM, Kenneth D Weinert wrote:
>
>> processLineSeperator() -> writeLineSeperator()
>> processWordSeperator() -> writeWordSeperator()
>
> Can we also spell Separator correctly as long as we're making the  
> change?

Actually, they are correct in the code.  The typos were only in my e- 
mail.

brian

Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Kenneth D Weinert <ke...@quarter-flash.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brian Carrier wrote:

> Option 1: We remove the deprecated wrapper functions so that people who
> use @Override will get compiler errors.

+1 for reasons stated before

> processLineSeperator() -> writeLineSeperator()
> processWordSeperator() -> writeWordSeperator()

Can we also spell Separator correctly as long as we're making the change?


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmS/OEACgkQH0OpnUzq8fAIdACeJgJJfu1rZUX20XzEDiwbDQjx
fPYAoIx5dCVbv9FYHO9GHykcHwPurDPX
=msIc
-----END PGP SIGNATURE-----

Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Feb 11, 2009 at 4:02 PM, Brian Carrier
<ca...@digital-evidence.org> wrote:
> Because the next release will be the first incubator release, I'm inclined
> to go with option 1 so that we start with a clean API [...]

+1 as people will in any case need to update any client code.

We should add a note about this specific case in the release notes, as
the required client code changes are more complex than a simple
package rename.

BR,

Jukka Zitting

Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by Brian Carrier <ca...@digital-evidence.org>.
There seems to be no way to solve this issue and make everyone  
happy.  Any thoughts from this group?

Background: As part of the page rotation and ordering of Right to  
Left language fixes, I cleaned up the code and renamed some functions  
to make them more consistent with what they actually do and to bring  
them inline with other naming conventions in the code. I created  
deprecated versions of the old function names and they simply called  
the new functions.

There seem to be three options to move forward:

Option 1: We remove the deprecated wrapper functions so that people  
who use @Override will get compiler errors.
Option 2: We keep the wrapper functions in there.
Option 3: We return to the old names.

Because the next release will be the first incubator release, I'm  
inclined to go with option 1 so that we start with a clean API, but I  
wanted to know if there were thoughts from the rest of this group.

For reference, here are the function name changes:
flushText() -> writePage()
processLineSeperator() -> writeLineSeperator()
processWordSeperator() -> writeWordSeperator()
showCharacter() -> processTextPosition()

brian



On Feb 10, 2009, at 5:11 PM, Gustavo Hexsel (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/PDFBOX-422? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel&focusedCommentId=12672426#action_12672426 ]
>
> Gustavo Hexsel commented on PDFBOX-422:
> ---------------------------------------
>
> These methods used to be called from the flushPage() method, so we  
> used them as callbacks since we need the geometry as well as the  
> text in our code.
>
> The new code for PDFTextStripper is more true to its name, it  
> really deals with text and text only.  The problem is that the  
> methods are still there but they don't get called anymore.  So, our  
> code compiled but all the text was null (since our extras weren't  
> valid anymore).
>
> It would have been much more useful simply to remove the methods  
> since at least the compiler would have flagged our code as not  
> being a callback anymore.
>
>
>
> We might fork the old PDFTextStripper into a TextGeometryStripper  
> or the like, if I can get management to approve it (probably not,  
> my contract is up tomorrow an I'm going on vacation :)
>
> I'll post a patch if we do that.
>
>> Methods are marked as deprecated but they're effectively dead
>> -------------------------------------------------------------
>>
>>                 Key: PDFBOX-422
>>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>>             Project: PDFBox
>>          Issue Type: Bug
>>          Components: Text extraction
>>    Affects Versions: 0.8.0-incubator
>>            Reporter: Gustavo Hexsel
>>
>> There are several methods on PDFTextStripper and PDFStreamEngine  
>> that are marked @deprecated, but they are not really used by the  
>> existing infrastructure anymore.
>> This would be ok if such methods weren't callbacks.  In this case,  
>> it breaks pre-existing code, and prevents the compiler from  
>> letting you know the methods are not to be used anymore.
>> Simply removing the methods would have been a much better solution  
>> in this case.
>> Example of said methods:
>> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
>> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
>> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead

Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672426#action_12672426 ] 

Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------

These methods used to be called from the flushPage() method, so we used them as callbacks since we need the geometry as well as the text in our code.

The new code for PDFTextStripper is more true to its name, it really deals with text and text only.  The problem is that the methods are still there but they don't get called anymore.  So, our code compiled but all the text was null (since our extras weren't valid anymore).  

It would have been much more useful simply to remove the methods since at least the compiler would have flagged our code as not being a callback anymore.



We might fork the old PDFTextStripper into a TextGeometryStripper or the like, if I can get management to approve it (probably not, my contract is up tomorrow an I'm going on vacation :)

I'll post a patch if we do that.

> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
>                 Key: PDFBOX-422
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-422
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks.  In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case. 
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.