You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/07 20:32:10 UTC

[GitHub] [iceberg] rdblue opened a new pull request #1307: Fix struct comparison and add struct and partition set implementations

rdblue opened a new pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307


   `StructLikeWrapper` is used to provide `equals` and `hashCode` methods that are consistent across all `StructLike` implementations. But, the implementation did not handle different implementations of `CharSequence` correctly. This fixes the implementations by adding a `Comparator` implementation for structs, and adding classes to implement Java's `hashCode`. These implementations depend on the Iceberg type of an object.
   
   The type of a struct is not always passed to `StructLikeWrapper`, so this includes a temporary work-around to use the existing implementation for code paths that do not currently have a struct's type.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#discussion_r468201047



##########
File path: core/src/main/java/org/apache/iceberg/util/StructLikeWrapper.java
##########
@@ -80,12 +108,20 @@ public boolean equals(Object other) {
 
   @Override
   public int hashCode() {
-    int result = 97;
-    int len = struct.size();
-    result = 41 * result + len;
-    for (int i = 0; i < len; i += 1) {
-      result = 41 * result + Objects.hashCode(struct.get(i, Object.class));
+    if (hashCode == null) {
+      if (structHash != null) {
+        this.hashCode = structHash.hash(struct);
+      } else {
+        int result = 97;

Review comment:
       Yes. This is a work-around for when there is no type, so that the current code paths don't fail. The follow-up commit removes this after adding `specId` to `DataFile`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#discussion_r468212930



##########
File path: api/src/main/java/org/apache/iceberg/types/JavaHashes.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.function.IntFunction;
+import org.apache.iceberg.StructLike;
+
+public class JavaHashes {
+  private JavaHashes() {
+  }
+
+  public static int hashCode(CharSequence str) {
+    int result = 177;
+    for (int i = 0; i < str.length(); i += 1) {
+      char ch = str.charAt(i);
+      result = 31 * result + (int) ch;
+    }
+    return result;
+  }
+
+  static JavaHash<CharSequence> strings() {
+    return CharSequenceHash.INSTANCE;
+  }
+
+  static JavaHash<StructLike> struct(Types.StructType struct) {
+    return new StructLikeHash(struct);
+  }
+
+  static JavaHash<List<?>> list(Types.ListType list) {
+    return new ListHash(list);
+  }
+
+  private static class CharSequenceHash implements JavaHash<CharSequence> {
+    private static final CharSequenceHash INSTANCE = new CharSequenceHash();
+
+    private CharSequenceHash() {
+    }
+
+    @Override
+    public int hash(CharSequence str) {
+      if (str == null) {
+        return 0;
+      }
+
+      return JavaHashes.hashCode(str);
+    }
+  }
+
+  private static class StructLikeHash implements JavaHash<StructLike> {
+    private final JavaHash<Object>[] hashes;
+
+    private StructLikeHash(Types.StructType struct) {
+      this.hashes = struct.fields().stream()
+          .map(field ->
+            "unknown-partition".equals(field.doc()) ?

Review comment:
       I had an idea to fix this by making the hash function for strings check the value class. That's much cleaner. The method already has a null check so using `value instanceof CharSequence` isn't much more expensive. And, we should get a more consistent hash code by hashing `value.toString` than delegating to `Object#hashCode`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#issuecomment-671624846


   Thanks for reviewing, @aokolnychyi! Tests are passing now, so I'll merge.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#discussion_r468202694



##########
File path: api/src/main/java/org/apache/iceberg/types/JavaHashes.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.function.IntFunction;
+import org.apache.iceberg.StructLike;
+
+public class JavaHashes {
+  private JavaHashes() {
+  }
+
+  public static int hashCode(CharSequence str) {
+    int result = 177;
+    for (int i = 0; i < str.length(); i += 1) {
+      char ch = str.charAt(i);
+      result = 31 * result + (int) ch;
+    }
+    return result;
+  }
+
+  static JavaHash<CharSequence> strings() {
+    return CharSequenceHash.INSTANCE;
+  }
+
+  static JavaHash<StructLike> struct(Types.StructType struct) {
+    return new StructLikeHash(struct);
+  }
+
+  static JavaHash<List<?>> list(Types.ListType list) {
+    return new ListHash(list);
+  }
+
+  private static class CharSequenceHash implements JavaHash<CharSequence> {
+    private static final CharSequenceHash INSTANCE = new CharSequenceHash();
+
+    private CharSequenceHash() {
+    }
+
+    @Override
+    public int hash(CharSequence str) {
+      if (str == null) {
+        return 0;
+      }
+
+      return JavaHashes.hashCode(str);
+    }
+  }
+
+  private static class StructLikeHash implements JavaHash<StructLike> {
+    private final JavaHash<Object>[] hashes;
+
+    private StructLikeHash(Types.StructType struct) {
+      this.hashes = struct.fields().stream()
+          .map(field ->
+            "unknown-partition".equals(field.doc()) ?

Review comment:
       This is an odd case. Without this, all tests are passing except for the forward-compatibility test that validates Spark can read tables with unknown transforms. When we see an unknown transform, we don't know what type of results it produces so we can't create a hash function or comparator ahead of time like we try to do here.
   
   The actual problem is that when we create a hash function, it expects a `CharSequence` becuase our assumed result type is the most generic: string. That causes hashing to fail with a `ClassCastException`. The work-around, which is ugly, is to use the partition field doc to signal when this happens so that the hash function can use `Objects::hash` that can hash any object using its `hashCode` method, instead of trying to use our `CharSequence` hash function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#issuecomment-671459137


   Let me take a look right now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#discussion_r468051224



##########
File path: api/src/main/java/org/apache/iceberg/types/JavaHashes.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.function.IntFunction;
+import org.apache.iceberg.StructLike;
+
+public class JavaHashes {
+  private JavaHashes() {
+  }
+
+  public static int hashCode(CharSequence str) {
+    int result = 177;
+    for (int i = 0; i < str.length(); i += 1) {
+      char ch = str.charAt(i);
+      result = 31 * result + (int) ch;
+    }
+    return result;
+  }
+
+  static JavaHash<CharSequence> strings() {
+    return CharSequenceHash.INSTANCE;
+  }
+
+  static JavaHash<StructLike> struct(Types.StructType struct) {
+    return new StructLikeHash(struct);
+  }
+
+  static JavaHash<List<?>> list(Types.ListType list) {
+    return new ListHash(list);
+  }
+
+  private static class CharSequenceHash implements JavaHash<CharSequence> {
+    private static final CharSequenceHash INSTANCE = new CharSequenceHash();
+
+    private CharSequenceHash() {
+    }
+
+    @Override
+    public int hash(CharSequence str) {
+      if (str == null) {
+        return 0;
+      }
+
+      return JavaHashes.hashCode(str);
+    }
+  }
+
+  private static class StructLikeHash implements JavaHash<StructLike> {
+    private final JavaHash<Object>[] hashes;
+
+    private StructLikeHash(Types.StructType struct) {
+      this.hashes = struct.fields().stream()
+          .map(field ->
+            "unknown-partition".equals(field.doc()) ?

Review comment:
       Will it be safe to use `Objects::hashCode` in that case or shall we fail?

##########
File path: api/src/main/java/org/apache/iceberg/types/JavaHashes.java
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.types;
+
+import java.util.List;
+import java.util.Objects;
+import java.util.function.IntFunction;
+import org.apache.iceberg.StructLike;
+
+public class JavaHashes {
+  private JavaHashes() {
+  }
+
+  public static int hashCode(CharSequence str) {
+    int result = 177;
+    for (int i = 0; i < str.length(); i += 1) {
+      char ch = str.charAt(i);
+      result = 31 * result + (int) ch;
+    }
+    return result;
+  }
+
+  static JavaHash<CharSequence> strings() {
+    return CharSequenceHash.INSTANCE;
+  }
+
+  static JavaHash<StructLike> struct(Types.StructType struct) {
+    return new StructLikeHash(struct);
+  }
+
+  static JavaHash<List<?>> list(Types.ListType list) {
+    return new ListHash(list);
+  }
+
+  private static class CharSequenceHash implements JavaHash<CharSequence> {
+    private static final CharSequenceHash INSTANCE = new CharSequenceHash();
+
+    private CharSequenceHash() {
+    }
+
+    @Override
+    public int hash(CharSequence str) {
+      if (str == null) {
+        return 0;
+      }
+
+      return JavaHashes.hashCode(str);
+    }
+  }
+
+  private static class StructLikeHash implements JavaHash<StructLike> {
+    private final JavaHash<Object>[] hashes;
+
+    private StructLikeHash(Types.StructType struct) {
+      this.hashes = struct.fields().stream()
+          .map(field ->
+            "unknown-partition".equals(field.doc()) ?

Review comment:
       Is `unknown-partition` some sort of a constant?

##########
File path: core/src/main/java/org/apache/iceberg/util/StructLikeWrapper.java
##########
@@ -80,12 +108,20 @@ public boolean equals(Object other) {
 
   @Override
   public int hashCode() {
-    int result = 97;
-    int len = struct.size();
-    result = 41 * result + len;
-    for (int i = 0; i < len; i += 1) {
-      result = 41 * result + Objects.hashCode(struct.get(i, Object.class));
+    if (hashCode == null) {
+      if (structHash != null) {
+        this.hashCode = structHash.hash(struct);
+      } else {
+        int result = 97;

Review comment:
       Is this the work-around when we don't have a type? In which case does this happen?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1307: Fix struct comparison and add struct and partition set implementations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1307:
URL: https://github.com/apache/iceberg/pull/1307#discussion_r467256116



##########
File path: core/src/main/java/org/apache/iceberg/PartitionsTable.java
##########
@@ -95,15 +95,21 @@ private DataTask task(TableScan scan) {
     }
   }
 
-  static class PartitionSet {
+  static class PartitionMap {

Review comment:
       This was renamed to avoid a conflict with the new `PartitionSet` class, and updated to use `StructLikeWrapper` with correct `equals`/`hashCode` implementations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org