You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Gustavo Hexsel (JIRA)" <ji...@apache.org> on 2009/02/10 18:44:59 UTC
[jira] Created: (PDFBOX-422) Methods are marked as deprecated but
they're effectively dead
Methods are marked as deprecated but they're effectively dead
-------------------------------------------------------------
Key: PDFBOX-422
URL: https://issues.apache.org/jira/browse/PDFBOX-422
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 0.8.0-incubator
Reporter: Gustavo Hexsel
There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
Simply removing the methods would have been a much better solution in this case.
Example of said methods:
org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (PDFBOX-422) Methods are marked as
deprecated but they're effectively dead
Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672809#action_12672809 ]
quartz edited comment on PDFBOX-422 at 2/11/09 2:27 PM:
----------------------------------------------------------------
Thanks for the prompt response.
Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).
This is fine, the class is doing what is supposed to (according to its name). We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.
But that's a different matter. Having the compiler warn me that the method does not @Override a parent method anymore is on the right track.
was (Author: quartz):
Thanks for the prompt response.
Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).
This is fine, the class is doing what is supposed to (according to its name). We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Posted by Brian Carrier <ca...@digital-evidence.org>.
[removing this from jira]
Do you have a suggestion for how PDFBox could most ideally solve your
situation? Could you get the needed info by making a class that
extends PDFTextStripper and overrides processTextPosition()? Then
you could see all of the TextPositions and where they are located?
On Feb 11, 2009, at 5:27 PM, Gustavo Hexsel (JIRA) wrote:
>
> [ https://issues.apache.org/jira/browse/PDFBOX-422?
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12672809#action_12672809 ]
>
> Gustavo Hexsel commented on PDFBOX-422:
> ---------------------------------------
>
> Thanks for the prompt response.
>
> Yes, I saw the methods, they just don't carry the text position
> anymore (also, blocks get merged).
>
> This is fine, the class is doing what is supposed to (according to
> its name). We had a use-case (specifically document redaction)
> that needed to bring back the text and the associated positions of
> each char, which we were doing by using the startup of the text
> block and each individual character width.
>
>
>> Methods are marked as deprecated but they're effectively dead
>> -------------------------------------------------------------
>>
>> Key: PDFBOX-422
>> URL: https://issues.apache.org/jira/browse/PDFBOX-422
>> Project: PDFBox
>> Issue Type: Bug
>> Components: Text extraction
>> Affects Versions: 0.8.0-incubator
>> Reporter: Gustavo Hexsel
>>
>> There are several methods on PDFTextStripper and PDFStreamEngine
>> that are marked @deprecated, but they are not really used by the
>> existing infrastructure anymore.
>> This would be ok if such methods weren't callbacks. In this case,
>> it breaks pre-existing code, and prevents the compiler from
>> letting you know the methods are not to be used anymore.
>> Simply removing the methods would have been a much better solution
>> in this case.
>> Example of said methods:
>> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
>> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
>> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but
they're effectively dead
Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672809#action_12672809 ]
Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------
Thanks for the prompt response.
Yes, I saw the methods, they just don't carry the text position anymore (also, blocks get merged).
This is fine, the class is doing what is supposed to (according to its name). We had a use-case (specifically document redaction) that needed to bring back the text and the associated positions of each char, which we were doing by using the startup of the text block and each individual character width.
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PDFBOX-422) Methods are marked as deprecated but
they're effectively dead
Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brian Carrier resolved PDFBOX-422.
----------------------------------
Resolution: Fixed
Wrapper methods removed.
Sending trunk/src/main/java/org/apache/pdfbox/util/PDFStreamEngine.java
Sending trunk/src/main/java/org/apache/pdfbox/util/PDFText2HTML.java
Sending trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripper.java
Sending trunk/src/main/java/org/apache/pdfbox/util/PDFTextStripperByArea.java
Transmitting file data ....
Committed revision 743530.
Note that wrapper methods existed for the following base methods:
PDFStreamEngine.showCharacter() -> PDFStreamEngine.processTextPosition()
PDFStreamEngine.showString() -> PDFStreamEngine.processEncodedText()
PDFTextStripper.flushText() -> PDFTextStripper.writePage()
PDFTextStripper.processsLineSeparator() -> PDFTextStripper.writeLineSeparator()
PDFTextStripper.processsWordSeparator() -> PDFTextStripper.writeWordSeparator()
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but
they're effectively dead
Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672419#action_12672419 ]
Brian Carrier commented on PDFBOX-422:
--------------------------------------
The functions were renamed to make them more consistent with other functions and with their actions. The original names were kept to maintain backwards compatibility. Can you give an example of how this breaks existing code?
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
On 11.02.2009 16:07:02 Jukka Zitting wrote:
> Hi,
>
> On Wed, Feb 11, 2009 at 4:02 PM, Brian Carrier
> <ca...@digital-evidence.org> wrote:
> > Because the next release will be the first incubator release, I'm inclined
> > to go with option 1 so that we start with a clean API [...]
>
> +1 as people will in any case need to update any client code.
>
> We should add a note about this specific case in the release notes, as
> the required client code changes are more complex than a simple
> package rename.
+1 to that. The package change is a chance to do things in a better,
cleaner way.
Jeremias Maerki
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Posted by Brian Carrier <ca...@digital-evidence.org>.
On Feb 11, 2009, at 11:29 AM, Kenneth D Weinert wrote:
>
>> processLineSeperator() -> writeLineSeperator()
>> processWordSeperator() -> writeWordSeperator()
>
> Can we also spell Separator correctly as long as we're making the
> change?
Actually, they are correct in the code. The typos were only in my e-
mail.
brian
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated
but they're effectively dead
Posted by Kenneth D Weinert <ke...@quarter-flash.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Brian Carrier wrote:
> Option 1: We remove the deprecated wrapper functions so that people who
> use @Override will get compiler errors.
+1 for reasons stated before
> processLineSeperator() -> writeLineSeperator()
> processWordSeperator() -> writeWordSeperator()
Can we also spell Separator correctly as long as we're making the change?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmS/OEACgkQH0OpnUzq8fAIdACeJgJJfu1rZUX20XzEDiwbDQjx
fPYAoIx5dCVbv9FYHO9GHykcHwPurDPX
=msIc
-----END PGP SIGNATURE-----
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated
but they're effectively dead
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Wed, Feb 11, 2009 at 4:02 PM, Brian Carrier
<ca...@digital-evidence.org> wrote:
> Because the next release will be the first incubator release, I'm inclined
> to go with option 1 so that we start with a clean API [...]
+1 as people will in any case need to update any client code.
We should add a note about this specific case in the release notes, as
the required client code changes are more complex than a simple
package rename.
BR,
Jukka Zitting
Re: [jira] Commented: (PDFBOX-422) Methods are marked as deprecated but they're effectively dead
Posted by Brian Carrier <ca...@digital-evidence.org>.
There seems to be no way to solve this issue and make everyone
happy. Any thoughts from this group?
Background: As part of the page rotation and ordering of Right to
Left language fixes, I cleaned up the code and renamed some functions
to make them more consistent with what they actually do and to bring
them inline with other naming conventions in the code. I created
deprecated versions of the old function names and they simply called
the new functions.
There seem to be three options to move forward:
Option 1: We remove the deprecated wrapper functions so that people
who use @Override will get compiler errors.
Option 2: We keep the wrapper functions in there.
Option 3: We return to the old names.
Because the next release will be the first incubator release, I'm
inclined to go with option 1 so that we start with a clean API, but I
wanted to know if there were thoughts from the rest of this group.
For reference, here are the function name changes:
flushText() -> writePage()
processLineSeperator() -> writeLineSeperator()
processWordSeperator() -> writeWordSeperator()
showCharacter() -> processTextPosition()
brian
On Feb 10, 2009, at 5:11 PM, Gustavo Hexsel (JIRA) wrote:
>
> [ https://issues.apache.org/jira/browse/PDFBOX-422?
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12672426#action_12672426 ]
>
> Gustavo Hexsel commented on PDFBOX-422:
> ---------------------------------------
>
> These methods used to be called from the flushPage() method, so we
> used them as callbacks since we need the geometry as well as the
> text in our code.
>
> The new code for PDFTextStripper is more true to its name, it
> really deals with text and text only. The problem is that the
> methods are still there but they don't get called anymore. So, our
> code compiled but all the text was null (since our extras weren't
> valid anymore).
>
> It would have been much more useful simply to remove the methods
> since at least the compiler would have flagged our code as not
> being a callback anymore.
>
>
>
> We might fork the old PDFTextStripper into a TextGeometryStripper
> or the like, if I can get management to approve it (probably not,
> my contract is up tomorrow an I'm going on vacation :)
>
> I'll post a patch if we do that.
>
>> Methods are marked as deprecated but they're effectively dead
>> -------------------------------------------------------------
>>
>> Key: PDFBOX-422
>> URL: https://issues.apache.org/jira/browse/PDFBOX-422
>> Project: PDFBox
>> Issue Type: Bug
>> Components: Text extraction
>> Affects Versions: 0.8.0-incubator
>> Reporter: Gustavo Hexsel
>>
>> There are several methods on PDFTextStripper and PDFStreamEngine
>> that are marked @deprecated, but they are not really used by the
>> existing infrastructure anymore.
>> This would be ok if such methods weren't callbacks. In this case,
>> it breaks pre-existing code, and prevents the compiler from
>> letting you know the methods are not to be used anymore.
>> Simply removing the methods would have been a much better solution
>> in this case.
>> Example of said methods:
>> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
>> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
>> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
[jira] Commented: (PDFBOX-422) Methods are marked as deprecated but
they're effectively dead
Posted by "Gustavo Hexsel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PDFBOX-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672426#action_12672426 ]
Gustavo Hexsel commented on PDFBOX-422:
---------------------------------------
These methods used to be called from the flushPage() method, so we used them as callbacks since we need the geometry as well as the text in our code.
The new code for PDFTextStripper is more true to its name, it really deals with text and text only. The problem is that the methods are still there but they don't get called anymore. So, our code compiled but all the text was null (since our extras weren't valid anymore).
It would have been much more useful simply to remove the methods since at least the compiler would have flagged our code as not being a callback anymore.
We might fork the old PDFTextStripper into a TextGeometryStripper or the like, if I can get management to approve it (probably not, my contract is up tomorrow an I'm going on vacation :)
I'll post a patch if we do that.
> Methods are marked as deprecated but they're effectively dead
> -------------------------------------------------------------
>
> Key: PDFBOX-422
> URL: https://issues.apache.org/jira/browse/PDFBOX-422
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 0.8.0-incubator
> Reporter: Gustavo Hexsel
>
> There are several methods on PDFTextStripper and PDFStreamEngine that are marked @deprecated, but they are not really used by the existing infrastructure anymore.
> This would be ok if such methods weren't callbacks. In this case, it breaks pre-existing code, and prevents the compiler from letting you know the methods are not to be used anymore.
> Simply removing the methods would have been a much better solution in this case.
> Example of said methods:
> org.apache.pdfbox.util.PDFTextStripper#processLineSeparator
> org.apache.pdfbox.util.PDFTextStripper#processWordSeparator
> org.apache.pdfbox.util.PDFTextStripper#writeCharacters
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.