You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Christian Appl (Jira)" <ji...@apache.org> on 2020/09/02 11:40:00 UTC

[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

    [ https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189176#comment-17189176 ] 

Christian Appl commented on PDFBOX-4723:
----------------------------------------

I quote:
"there is a difference between identity and equality. e.g. two number objects with a content of 12 might be equal but these are still different objects."

This statement is absolutely correct - while it's opposite is also true:
When an object changed and it's content is now 13 instead of 12, it remains the same identical object.

The native implementation of the Object#hashCode() Method states the following contract:

{color:#8c8c8c}The general contract of \{@code hashCode} is:
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<ul>{color}{color:#8c8c8c}
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}Whenever it is invoked on the same object more than once during
{color}{color:#8c8c8c}* an execution of a Java application, the \{@code hashCode} method
{color}{color:#8c8c8c}* must consistently return the same integer, provided no information
{color}{color:#8c8c8c}* used in \{@code equals} comparisons on the object is modified.
{color}{color:#8c8c8c}* This integer need not remain consistent from one execution of an
{color}{color:#8c8c8c}* application to another execution of the same application.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}If two objects are equal according to the \{@code equals(Object)}
{color}{color:#8c8c8c}* method, then calling the \{@code hashCode} method on each of
{color}{color:#8c8c8c}* the two objects must produce the same integer result.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}<li>{color}{color:#8c8c8c}It is {color}{color:#8c8c8c}<em>{color}{color:#8c8c8c}not{color}{color:#8c8c8c}</em>{color}{color:#8c8c8c} required that if two objects are unequal
{color}{color:#8c8c8c}* according to the {@link {color}{color:#000000}java.lang.Object{color}{color:#8c8c8c}#equals({color}{color:#000000}java.lang.Object{color}{color:#8c8c8c})}
{color}{color:#8c8c8c}* method, then calling the \{@code hashCode} method on each of the
{color}{color:#8c8c8c}* two objects must produce distinct integer results. However, the
{color}{color:#8c8c8c}* programmer should be aware that producing distinct integer results
{color}{color:#8c8c8c}* for unequal objects may improve the performance of hash tables.
{color}{color:#8c8c8c}* {color}{color:#8c8c8c}</ul>{color}


Following Revision 
7c4abe695fbc73e9ce71ed37218aa608db4180ed 30.01.2020 "port equals and hashCode from trunk to 2.0; add SmallMap for COSDictionary; fix PDPageTree"
 the class COSStream overrides the methods equals and hashCode.

If an object contained in a HashTable/Map (as seen in COSWriter) is searched using:

key = {color:#871094}objectKeys{color}.get(actual);
then, said HashTable is using the following method, to find the value for said object:

{color:#0033b3}public synchronized {color}{color:#007e8a}V {color}{color:#00627a}get{color}({color:#000000}Object {color}key) {
 {color:#000000}Entry{color}<?,?> {color:#000000}tab{color}[] = {color:#871094}table{color};
 {color:#0033b3}int {color}{color:#000000}hash {color}= key.hashCode();
 {color:#0033b3}int {color}{color:#000000}index {color}= ({color:#000000}hash {color}& {color:#1750eb}0x7FFFFFFF{color}) % {color:#000000}tab{color}.{color:#871094}length{color};
 {color:#0033b3}for {color}({color:#000000}Entry{color}<?,?> e = {color:#000000}tab{color}[{color:#000000}index{color}] ; e != {color:#0033b3}null {color}; e = e.{color:#871094}next{color}) {
 {color:#0033b3}if {color}((e.{color:#871094}hash {color}== {color:#000000}hash{color}) && e.{color:#871094}key{color}.equals(key)) {
 {color:#0033b3}return {color}({color:#007e8a}V{color})e.{color:#871094}value{color};
 }
 }
 {color:#0033b3}return null{color};
}

COSStream is implementing the hashCode method as:

{color:#8c8c8c}/**
{color}{color:#8c8c8c} * \{@inheritDoc}
{color}{color:#8c8c8c} */
{color}{color:#9e880d}@Override
{color}{color:#0033b3}public int {color}{color:#00627a}hashCode{color}() {
 {color:#000000}Object{color}[] {color:#000000}members {color}= {{color:#871094}items{color}, {color:#871094}randomAccess{color}, {color:#871094}scratchFile{color}, {color:#871094}isWriting{color}};
 {color:#0033b3}return {color}{color:#000000}Arrays{color}.hashCode({color:#000000}members{color});
}

Where "randomAccess" is a non final field that may change at any time. A constant and consistent hash value can not be reached using this implementation, whenever the hashvalue of one of the fields changes (or one of those values changes), then also the hash value of the given COSStream, will change.
Resulting in a situation, where the identical object may be recognized as equal according to:

{color:#0033b3}public boolean {color}{color:#00627a}equals{color}({color:#000000}Object {color}o) {
 {color:#0033b3}if {color}(o == {color:#0033b3}this{color})
 {
 {color:#0033b3}return true{color};
 }
[...]

But it's Hashvalue will still differ.

This can lead to a paradoxical situation, where a HashTable clearly contains an object, that is identical to the searched object (==), but the HashTable will still claim, that such an object is neither contained, nor can it's value be found in the HashTable, when the hash value of said object has changed in the meantime - for some reason.

According to:
"{color:#8c8c8c}Whenever it is invoked on the same object more than once during an execution of a Java application, the \{@code hashCode} method must consistently return the same integer, provided no information
used in \{@code equals} comparisons on the object is modified.{color}"

Conclusion:
Either the implementation of "equals()" is erroneous, as the information determining the hash value has changed and such objects may not be considered as equal. (Even when poiniting to the same exact identical instance.) Which would lead to the conclusion, that even though this == obj is true, the objects must and can not be equal, as the object's state (content) has changed!

Or the implementation of "hashCode()" is erroneous, because information are considered for the hashcode calculation, that are not tested for and are not checked in the implementation of equals.

Or the implementation of "hashCode()" is erroneous and the hashcode of said instance should be constant no matter what. (Which is implied by using "o == this" in equals).

As (according to contract) the hash value of an object should be consistent when no information changed, that would be used to determine the equality of this and another object.
However you want to define the role and purpose of equals, this behaviour is inconsistent and violates the contract given above.

Please reevaluate and rethink overriding equals and hashCode! It is easy to provoke situations where this implementation leads to paradox results for java.util classes such as Collections and Maps.
The intended changes are highly problematic. (as this behaviour of COSStream is - should it still be contained in the current revision.)

Theoretically:
A new ContentStream is created (let us call this instance at this exact point in time A)
Now "createOutputStream" is called for this instance, resulting in:

{color:#000000}IOUtils{color}.closeQuietly({color:#871094}randomAccess{color});
{color:#871094}randomAccess {color}= {color:#871094}scratchFile{color}.createBuffer(); (the instance at this point in time is called B)

This results in a change to the Hashcode of that object.  The hashcodes of object A and object B are not equal.
If the "equals" method was adapted accordingly to also consider the content information, that is checked for in "hashCode", then this would mean that A does not equal B even though A is identical to B. (identical instance.)

Is this the intended behaviour?

Please revert this!

> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4723
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4723
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: PDModel
>    Affects Versions: 2.0.18
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: bird_burst.heic.pdf, screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to detect if entries are in fact duplicates of others. This currently fails as even though one might add the same instance of an annotation object multiple times to setAnnotations getting the annotations will have individual instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org