You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/08/06 01:27:25 UTC

[GitHub] [accumulo] ctubbsii opened a new pull request, #2854: Update TabletFile methods

ctubbsii opened a new pull request, #2854:
URL: https://github.com/apache/accumulo/pull/2854

   * Remove unused validate method in StoredTabletFile
   * Add compareTo/equals/hashCode to StoredTabletFile
   * Add javadoc to TabletFile compareTo/equals/hashCode to explain need
     for commutativity
   * Add secondary comparison to compareTo for StoredTabletFile cases, to
     ensure consistent ordering
   * Simplify equals and short-circuit if same object in TabletFile


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#issuecomment-1212299115

   Okay, it looks like there are several issues, and this PR isn't going to cut it:
   
   1. compareTo and equals and hashCode currently rely on the implementations in the base class. This makes it unreliable for using the StoredTabletFile class in data structures in the context of requiring separate entries for each metadata entry when it is found stored non-normalized.
   2. Overriding any of these methods can break existing code that relies on the current behavior. But, the current behavior might be broken, too (because of bullet point number 1 above).
   3. There are a few cases where we place TabletFile in the same data structure as the StoredTabletFile, requiring them to have cohesive implementations for these methods.
   4. ScanServerRefTabletFile complicates things further, because it overrides equals and hashCode, forcing incompatibility with the base class implementation, if these happen to get stored in the same data structure.
   
   I had considered removing the compareTo method, and defining an explicit comparator. That would address the places where we are trying to compare using the natural ordering of the base class and the subclass. But, it doesn't address these other issues.
   
   Creating an interface at the base might help make things less confusing... I'm not sure. Either way, it looks like we're going to have to dig through every use of these classes in any kind of data structure, to evaluate for correct reliance on the equals, hashCode, and compareTo implementations.
   
   Closing this PR, because it's not the right fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
EdColeman commented on PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#issuecomment-1211065698

   Currently (without these changes) StoredTabletFile does not have equals, hashCode or CompareTo - they just by default use the parent file (TabletFile).  Currently, that means that any metadata entry in StoredTabletFile is ignored when equals, hashCode or compareTo is called.  Any StoredTableFIle will compare with other StoredTableFiles, or a TabletFile without considering the metadata string as long as the normalized path compares / is equal.
   
   These changes just make that explicit for equals() and hashCode() - but the change also modifies compareTo() so that the metadata string in StoredTabletFile is part of the comparison.  With what @milleruntime suggested is intent of the StoredTabletFile class - this change aligns the compareTo() with his statement that objects with different metadata string should not compare as equal (comapreTo(..) will not equal 0 ).  However, two objects with different metadata entries, but the same normalized path will be equal and have the same hash code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] milleruntime commented on pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
milleruntime commented on PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#issuecomment-1209584883

   The original intent of `StoredTabletFile` was to have a more restrictive type of a `TabletFile`. A `StoredTabletFile` should have an exact string of what is in the metadata to prevent junk from never being deleted, while a `TabletFile` may not. To me, that implies that they should not be equal if refer to different metadata entries. Is it possible to have a `TabletFile` equal to a `StoredTabletFile` with these changes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii closed pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
ctubbsii closed pull request #2854: Update TabletFile methods
URL: https://github.com/apache/accumulo/pull/2854


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#issuecomment-1212300178

   > Is it possible to have a `TabletFile` equal to a `StoredTabletFile` with these changes?
   
   It was possible before these changes. These changes didn't change that. In fact, we got breakages when we tried to make that impossible. So, there is currently code relying on these being equal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#discussion_r940621599


##########
core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java:
##########
@@ -128,24 +128,42 @@ public Path getPath() {
     return metaPath;
   }
 
+  /**
+   * The ordering of this class depends on the normal String ordering of the normalized paths first,
+   * then any original path, if one exists as a StoredTabletFile, second. To ensure subclasses don't
+   * break things, and to preserve commutativity, subclasses should defer to this implementation.
+   */

Review Comment:
   The addition of using StoredTabletFile original path in the comparison breaks the implied contract that
   
   `(x.compareTo(y) == 0 ) ==> (x.equals(y))`
   
    Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map)  This is similar to the way `BigDecimal` handles `compareTo` and `equals` But is is unusual.
   
   This seems that it could introduce issues. If somehow there were two metadata entries that had different "original" paths but resolved to the same normalized path and if they were put into a HashSet and a TreeSet the collections will have different contents.
   
   `
   For example:
   
   If you have two StoredTabletFiles:
   
       StoredTabletFile file1 = new StoredTabletFile(
               "hdfs://nn.somewhere.com:86753/accumulo/tables/" + id + "/" + dir + "/" + filename);
   
       StoredTabletFile file2 = new StoredTabletFile(
               "hdfs://nn.somewhere.com:86753/./accumulo/tables/" + id + "/" + dir + "/" + filename);
   
   And stored them in a TreeSet and a HashSet you end up with different collections (both show the "original name(s)" in the collection:
   
   TreeSet - 2 entries
      hdfs://nn.somewhere.com:86753/./accumulo/tables/2a/t-0003/C0004.rf
      hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
   
   HashSet - 1 entry
      hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
   
   `



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on code in PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#discussion_r943728287


##########
core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java:
##########
@@ -128,24 +128,42 @@ public Path getPath() {
     return metaPath;
   }
 
+  /**
+   * The ordering of this class depends on the normal String ordering of the normalized paths first,
+   * then any original path, if one exists as a StoredTabletFile, second. To ensure subclasses don't
+   * break things, and to preserve commutativity, subclasses should defer to this implementation.
+   */

Review Comment:
   > Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map) This is similar to the way `BigDecimal` handles `compareTo` and `equals` But is is unusual.
   
   Yeah, that's definitely a problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
EdColeman commented on PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#issuecomment-1212184493

   Had a discussion with @milleruntime about this - while these change make the existing code work - it may be better to make other changes in the code so that classes that inherit from `TabletFile` do not need to rely on this behavior in these changes.  While we have looked at `StoredTabletFile` `ScanServerRefTabletFile` also extends `TabletFile` and may have similar issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #2854: Update TabletFile methods

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #2854:
URL: https://github.com/apache/accumulo/pull/2854#discussion_r940621599


##########
core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java:
##########
@@ -128,24 +128,42 @@ public Path getPath() {
     return metaPath;
   }
 
+  /**
+   * The ordering of this class depends on the normal String ordering of the normalized paths first,
+   * then any original path, if one exists as a StoredTabletFile, second. To ensure subclasses don't
+   * break things, and to preserve commutativity, subclasses should defer to this implementation.
+   */

Review Comment:
   The addition of using StoredTabletFile original path in the comparison breaks the implied contract that
   
   `(x.compareTo(y) == 0 ) ==> (x.equals(y))`
   
    Collections that use hashCode and equals (HashSet / Map) will behave differently than collections that use compareTo (TreeSet / Map)  This is similar to the way `BigDecimal` handles `compareTo` and `equals` But is is unusual.
   
   This seems that it could introduce issues. If somehow there were two metadata entries that had different "original" paths but resolved to the same normalized path and if they were put into a HashSet and a TreeSet the collections will have different contents.
   
   For example:
   
   If you have two StoredTabletFiles:
   
   ```
       StoredTabletFile file1 = new StoredTabletFile(
               "hdfs://nn.somewhere.com:86753/accumulo/tables/" + id + "/" + dir + "/" + filename);
   
       StoredTabletFile file2 = new StoredTabletFile(
               "hdfs://nn.somewhere.com:86753/./accumulo/tables/" + id + "/" + dir + "/" + filename);
   
   And stored them in a TreeSet and a HashSet you end up with different collections (both show the "original name(s)" in the collection:
   
   TreeSet - 2 entries
      hdfs://nn.somewhere.com:86753/./accumulo/tables/2a/t-0003/C0004.rf
      hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
   
   HashSet - 1 entry
      hdfs://nn.somewhere.com:86753/accumulo/tables/2a/t-0003/C0004.rf
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org