You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/01 15:13:52 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

aokolnychyi opened a new pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640


   This PR adds support for case sensitivity in `MergingSnapshotProducer`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#issuecomment-983740285


   cc @jackye1995 @rdblue @RussellSpitzer @flyrain @karuppayya @szehon-ho @kbendick 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r764081785



##########
File path: core/src/test/java/org/apache/iceberg/TestDeleteFiles.java
##########
@@ -268,6 +268,40 @@ public void testCannotDeleteFileWhereNotAllRowsMatchPartitionFilter() {
             .commit());
   }
 
+  @Test
+  public void testDeleteCaseSensitivity() {
+    table.newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .commit();
+
+    Expression rowFilter = Expressions.lessThan("iD", 5);
+
+    AssertHelpers.assertThrows("Should use case sensitive binding by default",

Review comment:
       Iceberg historically uses case sensitivity resolution by default in all other places.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#issuecomment-988110672


   Thanks, @aokolnychyi!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r761496244



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       I think the argument name matters less than the method name. Why does this use `caseSensitiveBinding`? Why not add `caseSensitive` and use that as the implementation? We already have a type parameter for the class (`ThisT`) and a `self()` method so we can implement a chainable call here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r760415519



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       It is a little bit inconsistent in the code. Some places call it `newCaseSensitive` while others `isCaseSensistive`. I went with the current naming to follow the `newXXX` pattern we use for non-boolean vars. That being said, I don't have a preference at all. Either option works for me.
   
   Yeah, the method name is to avoid conflicts. I considered calling it `caseSensitiveInternal` but not sure it is any better.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r761496896



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {
+    this.caseSensitive = newCaseSensitive;
+    filterManager.caseSensitive(newCaseSensitive);
+    deleteFilterManager.caseSensitive(newCaseSensitive);
+  }
+
+  protected boolean caseSensitive() {

Review comment:
       I think this should be `isCaseSensitive`. I think we mostly use the `is` prefix for returning booleans like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r760320426



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       Other method argument is 'isCaseSensitive', should we follow?

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       Also wondering, method clashing is why this method can not be just caseSensitive(isCaseSensitive)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r761497193



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -271,7 +282,7 @@ protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotI
     Set<Long> newSnapshots = history.second();
 
     ManifestGroup conflictGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of())
-        .caseSensitive(caseSensitive)
+        .caseSensitive(caseSensitive())

Review comment:
       Nit: why access an instance field through the getter method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r761536408



##########
File path: core/src/test/java/org/apache/iceberg/TestDeleteFiles.java
##########
@@ -268,6 +268,40 @@ public void testCannotDeleteFileWhereNotAllRowsMatchPartitionFilter() {
             .commit());
   }
 
+  @Test
+  public void testDeleteCaseSensitivity() {
+    table.newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .commit();
+
+    Expression rowFilter = Expressions.lessThan("iD", 5);
+
+    AssertHelpers.assertThrows("Should use case sensitive binding by default",

Review comment:
       Case sensitive is more strict, so I think that makes sense as the default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r760711665



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       Maybe we could be consistent with the name. I saw three names(caseSensitive, isCaseSensitive, newCaseSensitive) used for the parameters. Kind of prefer one simple name for all of them, like `caseSensitive`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r761318266



##########
File path: core/src/test/java/org/apache/iceberg/TestDeleteFiles.java
##########
@@ -268,6 +268,40 @@ public void testCannotDeleteFileWhereNotAllRowsMatchPartitionFilter() {
             .commit());
   }
 
+  @Test
+  public void testDeleteCaseSensitivity() {
+    table.newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .commit();
+
+    Expression rowFilter = Expressions.lessThan("iD", 5);
+
+    AssertHelpers.assertThrows("Should use case sensitive binding by default",

Review comment:
       Just curious, does it make sense to make non case sensitive the default?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r764081785



##########
File path: core/src/test/java/org/apache/iceberg/TestDeleteFiles.java
##########
@@ -268,6 +268,40 @@ public void testCannotDeleteFileWhereNotAllRowsMatchPartitionFilter() {
             .commit());
   }
 
+  @Test
+  public void testDeleteCaseSensitivity() {
+    table.newFastAppend()
+        .appendFile(DATA_FILE_BUCKET_0_IDS_0_2)
+        .commit();
+
+    Expression rowFilter = Expressions.lessThan("iD", 5);
+
+    AssertHelpers.assertThrows("Should use case sensitive binding by default",

Review comment:
       Iceberg historically uses case sensitive resolution by default in all other places.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3640: Core: Support case sensitivity in MergingSnapshotProducer

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3640:
URL: https://github.com/apache/iceberg/pull/3640#discussion_r764082136



##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {

Review comment:
       Updated the var and method names.

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -125,6 +127,16 @@ public ThisT set(String property, String value) {
     return self();
   }
 
+  protected void caseSensitiveBinding(boolean newCaseSensitive) {
+    this.caseSensitive = newCaseSensitive;
+    filterManager.caseSensitive(newCaseSensitive);
+    deleteFilterManager.caseSensitive(newCaseSensitive);
+  }
+
+  protected boolean caseSensitive() {

Review comment:
       Fixed.

##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -271,7 +282,7 @@ protected void validateAddedDataFiles(TableMetadata base, Long startingSnapshotI
     Set<Long> newSnapshots = history.second();
 
     ManifestGroup conflictGroup = new ManifestGroup(ops.io(), manifests, ImmutableList.of())
-        .caseSensitive(caseSensitive)
+        .caseSensitive(caseSensitive())

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org