You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/06/18 20:52:13 UTC

[GitHub] [iceberg] nastra opened a new pull request #2712: Fix undefined equality behavior when searching in a CharSequence collection

nastra opened a new pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712


   This should fix the ErrorProne issue https://errorprone.info/bugpattern/CollectionUndefinedEquality


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#issuecomment-865340811


   The changes look good to me, but I'm curious why some of the suppressions are still needed after changing to use `CharSequenceSet` instead of `Set<CharSequence>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#issuecomment-866151789


   Thanks, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r655700520



##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       Are the suppressions still needed now that a different set class is used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2712: Fix undefined equality behavior when searching in a CharSequence collection

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r654568912



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -332,7 +332,8 @@ protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotI
 
     ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of())
         .filterManifestEntries(entry -> entry.status() != ManifestEntry.Status.ADDED &&
-            newSnapshots.contains(entry.snapshotId()) && requiredDataFiles.contains(entry.file().path()))
+            newSnapshots.contains(entry.snapshotId()) &&
+            requiredDataFiles.stream().anyMatch(entry.file().path().toString()::contains))

Review comment:
       This was correct because the set passed in is actually a `CharSequenceSet`: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/BaseRowDelta.java#L29
   
   To fix the error-prone warning and ensure that future uses are also `CharSequenceSet`, I think that a better fix is to change `Set<CharSequence>` to `CharSequenceSet` in the method arguments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r655199579



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -332,7 +332,8 @@ protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotI
 
     ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of())
         .filterManifestEntries(entry -> entry.status() != ManifestEntry.Status.ADDED &&
-            newSnapshots.contains(entry.snapshotId()) && requiredDataFiles.contains(entry.file().path()))
+            newSnapshots.contains(entry.snapshotId()) &&
+            requiredDataFiles.stream().anyMatch(entry.file().path().toString()::contains))

Review comment:
       I think we might want to adjust all the places where `Set<CharSequence>` is being used just to make sure. I went ahead and adjusted all production code.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r655199579



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -332,7 +332,8 @@ protected void validateDataFilesExist(TableMetadata base, Long startingSnapshotI
 
     ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of())
         .filterManifestEntries(entry -> entry.status() != ManifestEntry.Status.ADDED &&
-            newSnapshots.contains(entry.snapshotId()) && requiredDataFiles.contains(entry.file().path()))
+            newSnapshots.contains(entry.snapshotId()) &&
+            requiredDataFiles.stream().anyMatch(entry.file().path().toString()::contains))

Review comment:
       I think we might want to adjust all the places where `Set<CharSequence>` is being used just to make sure. I went ahead and adjusted all production code.

##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       unfortunately yes. ErrorProne still thinks the issue around `CollectionUndefinedEquality`, but those are false positives

##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       unfortunately yes. ErrorProne still thinks the issue around `CollectionUndefinedEquality` exists, but those are false positives




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #2712: Fix undefined equality behavior when searching in a CharSequence collection

Posted by GitBox <gi...@apache.org>.
nastra commented on pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#issuecomment-864121837


   /cc @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a change in pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r655924500



##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       unfortunately yes. ErrorProne still thinks the issue around `CollectionUndefinedEquality`, but those are false positives

##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       unfortunately yes. ErrorProne still thinks the issue around `CollectionUndefinedEquality` exists, but those are false positives




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#issuecomment-865340811


   The changes look good to me, but I'm curious why some of the suppressions are still needed after changing to use `CharSequenceSet` instead of `Set<CharSequence>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2712: Use CharSequenceSet instead of Set to avoid undefined equality check behavior

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2712:
URL: https://github.com/apache/iceberg/pull/2712#discussion_r655700520



##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -337,6 +338,7 @@ private boolean canContainDeletedFiles(ManifestFile manifest) {
     return canContainExpressionDeletes || canContainDroppedPartitions || canContainDroppedFiles || canContainDropBySeq;
   }
 
+  @SuppressWarnings("CollectionUndefinedEquality")

Review comment:
       Are the suppressions still needed now that a different set class is used?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org