You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "stevenzwu (via GitHub)" <gi...@apache.org> on 2023/05/01 20:39:18 UTC

[GitHub] [iceberg] stevenzwu opened a new pull request, #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

stevenzwu opened a new pull request, #7493:
URL: https://github.com/apache/iceberg/pull/7493

   Flink RowData implementation classes (like GenericRowData) all implement proper overrides for those methods. We also intend to use RowDataProjection as map key for MapDataStatistics from sink shuffling work. Hence this change is also required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183184498


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   when the nested struct is null, there is a RowDataProjection instanced created that wraps the null RowData. Let me think if there is other way to fix this. e.g. maybe we should return a null RowDataProjection in this case.
   
   <img width="1288" alt="image" src="https://user-images.githubusercontent.com/1545663/235820171-075787cf-bd50-416c-b284-7c49652f7579.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#issuecomment-1546297817

   @pvary @hililiwei @yegangy0718 can you take another look? I made small adjustment in the last commit after rebased with PR #7517 .
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yegangy0718 commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "yegangy0718 (via GitHub)" <gi...@apache.org>.
yegangy0718 commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1181957826


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -254,4 +258,78 @@ public MapData getMap(int pos) {
   public RowData getRow(int pos, int numFields) {
     return (RowData) getValue(pos);
   }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+
+    if (!(o instanceof RowDataProjection)) {
+      return false;
+    }
+
+    RowDataProjection that = (RowDataProjection) o;
+    return deepEquals(that);

Review Comment:
   do we need to check `o == null`?
   Inside `deepEquals`,  `other.getRowKind()` may fail if `other == null`



##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -254,4 +258,78 @@ public MapData getMap(int pos) {
   public RowData getRow(int pos, int numFields) {
     return (RowData) getValue(pos);
   }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+
+    if (!(o instanceof RowDataProjection)) {
+      return false;
+    }
+
+    RowDataProjection that = (RowDataProjection) o;
+    return deepEquals(that);
+  }
+
+  @Override
+  public int hashCode() {
+    int result = Objects.hashCode(getRowKind());
+    for (int pos = 0; pos < getArity(); pos++) {
+      if (!isNullAt(pos)) {
+        // Arrays.deepHashCode handles array object properly
+        result = 31 * result + Arrays.deepHashCode(new Object[] {getValue(pos)});

Review Comment:
   curious, when calculating hashcode, do we need to take Map/List as a special case? Will `Arrays.deepHashCode` be able to handle that?  
   Or it's because `This projection will not project the nested children types of repeated types like lists and
      maps`, so we can ignore them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183184498


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   when the location struct is null, there is a RowDataProjection instanced created that wraps the null RowData. Let me think if there is other way to fix this. e.g. maybe we should return a null RowDataProjection in this case.
   
   <img width="1288" alt="image" src="https://user-images.githubusercontent.com/1545663/235820171-075787cf-bd50-416c-b284-7c49652f7579.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu merged pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu merged PR #7493:
URL: https://github.com/apache/iceberg/pull/7493


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183235983


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   https://github.com/apache/iceberg/issues/7507



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1192547402


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -254,4 +258,78 @@ public MapData getMap(int pos) {
   public RowData getRow(int pos, int numFields) {
     return (RowData) getValue(pos);
   }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+
+    if (!(o instanceof RowDataProjection)) {
+      return false;
+    }
+
+    RowDataProjection that = (RowDataProjection) o;
+    return deepEquals(that);
+  }
+
+  @Override
+  public int hashCode() {
+    int result = Objects.hashCode(getRowKind());
+    for (int pos = 0; pos < getArity(); pos++) {
+      if (!isNullAt(pos)) {
+        // Arrays.deepHashCode handles array object properly
+        result = 31 * result + Arrays.deepHashCode(new Object[] {getValue(pos)});

Review Comment:
   @yegangy0718 Java List and Map objects implements `hashCode` properly. here we are just leveraging `java.util.Arrays` to handle the hashCode for array type field.
   
   @hililiwei what do you mean by preferring the first one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "chenjunjiedada (via GitHub)" <gi...@apache.org>.
chenjunjiedada commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183162278


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   Could you elaborate a bit on this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#issuecomment-1530218076

   @hililiwei @chenjunjiedada @yegangy0718 can you help review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "hililiwei (via GitHub)" <gi...@apache.org>.
hililiwei commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1188078049


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -254,4 +258,78 @@ public MapData getMap(int pos) {
   public RowData getRow(int pos, int numFields) {
     return (RowData) getValue(pos);
   }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+
+    if (!(o instanceof RowDataProjection)) {
+      return false;
+    }
+
+    RowDataProjection that = (RowDataProjection) o;
+    return deepEquals(that);
+  }
+
+  @Override
+  public int hashCode() {
+    int result = Objects.hashCode(getRowKind());
+    for (int pos = 0; pos < getArity(); pos++) {
+      if (!isNullAt(pos)) {
+        // Arrays.deepHashCode handles array object properly
+        result = 31 * result + Arrays.deepHashCode(new Object[] {getValue(pos)});

Review Comment:
   Not so sure, but I prefer the first one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1181953434


##########
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/DataGenerators.java:
##########
@@ -81,7 +81,7 @@ public static class Primitives implements DataGenerator {
 
     private final Schema icebergSchema =
         new Schema(
-            Types.NestedField.required(1, "partition_field", Types.StringType.get()),
+            Types.NestedField.required(1, "row_id", Types.StringType.get()),

Review Comment:
   renaming the field id to also match what the `RowDataProjection` usage (like `idOnly` projected schema)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183235983


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   Created a new issue to discuss how to handle null nested struct. https://github.com/apache/iceberg/issues/7507



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yegangy0718 commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "yegangy0718 (via GitHub)" <gi...@apache.org>.
yegangy0718 commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1182788479


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -254,4 +258,78 @@ public MapData getMap(int pos) {
   public RowData getRow(int pos, int numFields) {
     return (RowData) getValue(pos);
   }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+
+    if (!(o instanceof RowDataProjection)) {
+      return false;
+    }
+
+    RowDataProjection that = (RowDataProjection) o;
+    return deepEquals(that);

Review Comment:
   NVM. When `o == null`, `(o instanceof RowDataProjection)` = false, at line 268, it will return false directly. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "chenjunjiedada (via GitHub)" <gi...@apache.org>.
chenjunjiedada commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183167475


##########
flink/v1.17/flink/src/test/java/org/apache/iceberg/flink/DataGenerators.java:
##########
@@ -38,11 +38,11 @@
 import org.apache.avro.SchemaBuilder;
 import org.apache.avro.generic.GenericData;
 import org.apache.avro.util.Utf8;
+import org.apache.commons.lang3.NotImplementedException;

Review Comment:
   Nit: we use `UnsupportedOperationException` in other places, should we keep consistency? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7493: Flink: add toString, equals, hashCode overrides for RowDataProjection.

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7493:
URL: https://github.com/apache/iceberg/pull/7493#discussion_r1183232770


##########
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java:
##########
@@ -166,7 +169,8 @@ public int getArity() {
 
   @Override
   public RowKind getRowKind() {
-    return rowData.getRowKind();
+    // rowData can be null for nested struct

Review Comment:
   Dug out this issue: https://github.com/apache/iceberg/issues/2738
   
   If the nested struct is null, right now both StructProjection and RowDataProjection returns a Projection object wrapping a null struct. I actually think it is probably better to just return a null Projection object in this case. 
   
   Let me create a separate issue to follow up on this and see if the community agrees with the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org