You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/22 17:40:18 UTC

[GitHub] [iceberg] findepi commented on a change in pull request #2849: Fix bucketing of strings with non-BMP characters

findepi commented on a change in pull request #2849:
URL: https://github.com/apache/iceberg/pull/2849#discussion_r675023438



##########
File path: api/src/test/java/org/apache/iceberg/transforms/TestBucketing.java
##########
@@ -215,6 +215,24 @@ public void testString() {
         hashBytes(asBytes), bucketFunc.hash(string));
   }
 
+  @Test
+  public void testStringWithSurrogatePair() {
+    String string = "string with a surrogate pair: 💰";
+    Assert.assertNotEquals("string has no surrogate pairs", string.length(), string.codePoints().count());
+    byte[] asBytes = string.getBytes(StandardCharsets.UTF_8);
+
+    Bucket<CharSequence> bucketFunc = Bucket.get(Types.StringType.get(), 100);
+
+    Assert.assertEquals("String hash should match hash of UTF-8 bytes",
+        hashBytes(asBytes), bucketFunc.hash(string));
+
+    Assert.assertNotEquals("It looks like Guava has been updated and now contains a fix for " +
+                    "https://github.com/google/guava/issues/5648. Please resolve the TODO in BucketString.hash " +
+                    "and remove this assertion",
+            hashBytes(asBytes),
+            MURMUR3.hashString(string, StandardCharsets.UTF_8).asInt());

Review comment:
       eventually my IDE is not against me on this :) 
   
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org