You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Christian Appl (Jira)" <ji...@apache.org> on 2020/07/27 12:59:00 UTC

[jira] [Commented] (PDFBOX-4723) Add equals() and hashCode() to PDAnnotation and COS objects

    [ https://issues.apache.org/jira/browse/PDFBOX-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165687#comment-17165687 ] 

Christian Appl commented on PDFBOX-4723:
----------------------------------------

My apologies for frequenting this issue again, but I'm still not very glad about, or convinced by the herby intended changes. Following revision number c3b92be9b7cac83112cfe5b25b31fc7ef7b3600b (commit 3rd January 2020) current state of branch 2.0 the class COSStream does still contain a modified equals method.
 As ArrayList is using the equals() method to check whether an object is already contained in a collection the following simple example:
{code:java}
@Test
public void testHeic() throws Exception {
   File birdHeicPDF = new File("...", "bird_burst.heic.pdf");
   PDDocument document = PDDocument.load(birdHeicPDF);
   try {
      List<COSBase> contentStreams = new ArrayList<COSBase>();
      // Get content stream from first page.
      PDPage firstPage = document.getPage(0);
      COSBase firstPageContents =
         ((COSObject)firstPage.getCOSObject().getItem(COSName.CONTENTS)).getObject();
      // Add this content stream to the storage.
      contentStreams.add(firstPageContents);

      // Iterate pages.
      for (int i = 0; i<document.getNumberOfPages(); i++) {
         PDPage currentPage = document.getPage(i);
         COSBase pageContents = 
            ((COSObject)currentPage.getCOSObject().getItem(COSName.CONTENTS)).getObject();
         // Is the first page content stream equal to the current page's content stream.
         System.out.println("Are contents equal for page " + i + "? => " +           
            firstPageContents.equals(pageContents));
         // Is the content stream already contained in the ArrayList?
         System.out.println("Is content contained in collected content streams? => " + 
            contentStreams.contains(pageContents));
      }
   } finally {
      document.close();
   }
}
{code}
Results in this output:
*Are contents equal for page 0? => true*
*Is content contained in collected content streams? => true*
*Are contents equal for page 1? => true*
*Is content contained in collected content streams? => true*
*Are contents equal for page 2? => true*
*Is content contained in collected content streams? => true*
*Are contents equal for page 3? => true*
*Is content contained in collected content streams? => true*

For the appended PDF "bird_burst.heic.pdf". (The PDF is representing frames from one of the example HEIC images, found at: https://nokiatech.github.io/heif/examples.html)


Is this behaviour really as expected? Shall the content streams of different pages, placed in different COSObjects really be treated as identical items? Because they aren't. From my point of view this result is erroneous.

Are you absolutely sure, that you are not using classes from the standard library, which don't follow your definition of the equals method?
How am I intended to implement a simple check such as the one above? Are you using specialized Collections for such comparisons?
Shall COSBase instances be comparable in this way?

> Add equals() and hashCode() to PDAnnotation and COS objects
> -----------------------------------------------------------
>
>                 Key: PDFBOX-4723
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4723
>             Project: PDFBox
>          Issue Type: Sub-task
>          Components: PDModel
>    Affects Versions: 2.0.18
>            Reporter: Maruan Sahyoun
>            Assignee: Maruan Sahyoun
>            Priority: Major
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: bird_burst.heic.pdf, screenshot-1.png
>
>
> In order to proper support removeAll/retainAll for COSArrayList we need to detect if entries are in fact duplicates of others. This currently fails as even though one might add the same instance of an annotation object multiple times to setAnnotations getting the annotations will have individual instances. See the discussion at PDFBOX-4669.
> In order to proper support removal we need to be able to detect equality where an object is equal if the underlying COSDictionary has the same entries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org