You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ConeyLiu (via GitHub)" <gi...@apache.org> on 2023/05/24 04:00:33 UTC

[GitHub] [iceberg] ConeyLiu opened a new pull request, #7693: Spark 3.3: Fixes bucket on binary column

ConeyLiu opened a new pull request, #7693:
URL: https://github.com/apache/iceberg/pull/7693

   Closes #7682 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#issuecomment-1565137649

   Thanks all. 
   
   > @ConeyLiu @jzhuge, shall we cherry-pick this into Spark 3.2? Do we have a test for this in Spark 3.4 (we are using the function catalog there)?
   
   Let met check it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1204958851


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -268,6 +268,23 @@ public void testDefaultSortOnLongTruncatedColumn() {
     assertEquals("Rows must match", expected, sql("SELECT * FROM %s ORDER BY c1", tableName));
   }
 
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   @jzhuge I added back this UT, not sure if is this what you expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#issuecomment-1561216112

   LGTM, thanks for fixing this @ConeyLiu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#issuecomment-1564796060

   @ConeyLiu @jzhuge, shall we cherry-pick this into 3.2 and add a test for Spark 3.4 (we are using the function catalog there)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1205049115


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -268,6 +268,23 @@ public void testDefaultSortOnLongTruncatedColumn() {
     assertEquals("Rows must match", expected, sql("SELECT * FROM %s ORDER BY c1", tableName));
   }
 
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   Hi @jzhuge, I checked the `TestRequiredDistributionAndOrdering ` again, it seems the new UT would be more reasonable to put here. You could see the around UTs all are about partition and ordering tests. I renamed the UT name from `testDefaultSortOnBinaryColumn ` to `testDefaultSortOnBinaryBucketedColumn` to keep the name consistent with others. However, I could move it if you stand by your point.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi merged PR #7693:
URL: https://github.com/apache/iceberg/pull/7693


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#issuecomment-1564789908

   Let me check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1203493416


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +298,21 @@ public void testRangeDistributionWithQuotedColumnNames() throws NoSuchTableExcep
         ImmutableList.of(row(7L)),
         sql("SELECT count(*) FROM %s", tableName));
   }
+
+  @Test
+  public void testDefaultSortOnBinaryColumn() {
+    sql(
+        "CREATE TABLE %s (c1 INT, c2 Binary) "
+            + "USING iceberg "
+            + "PARTITIONED BY (truncate(2, c2))",

Review Comment:
   Sorry for the mistake.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "jzhuge (via GitHub)" <gi...@apache.org>.
jzhuge commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1205979106


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -268,6 +268,23 @@ public void testDefaultSortOnLongTruncatedColumn() {
     assertEquals("Rows must match", expected, sql("SELECT * FROM %s ORDER BY c1", tableName));
   }
 
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   No, it is ok, I don't feel strongly about the move.
   
   Good to go!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "jzhuge (via GitHub)" <gi...@apache.org>.
jzhuge commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1204997353


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -268,6 +268,23 @@ public void testDefaultSortOnLongTruncatedColumn() {
     assertEquals("Rows must match", expected, sql("SELECT * FROM %s ORDER BY c1", tableName));
   }
 
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   Yes, it is the unit test. However, what do you think if we move it from TestRequiredDistributionAndOrdering.java to TestIcebergExpressions.java? I feel the later test source file is a more place to add the unit test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "jzhuge (via GitHub)" <gi...@apache.org>.
jzhuge commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1203451293


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +298,21 @@ public void testRangeDistributionWithQuotedColumnNames() throws NoSuchTableExcep
         ImmutableList.of(row(7L)),
         sql("SELECT count(*) FROM %s", tableName));
   }
+
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   Is there another place to add a unit test? This test file does not seem directly related.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1203493872


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +298,21 @@ public void testRangeDistributionWithQuotedColumnNames() throws NoSuchTableExcep
         ImmutableList.of(row(7L)),
         sql("SELECT count(*) FROM %s", tableName));
   }
+
+  @Test
+  public void testDefaultSortOnBinaryColumn() {

Review Comment:
   Updated to the dedicated test class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "jzhuge (via GitHub)" <gi...@apache.org>.
jzhuge commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1203446738


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRequiredDistributionAndOrdering.java:
##########
@@ -298,4 +298,21 @@ public void testRangeDistributionWithQuotedColumnNames() throws NoSuchTableExcep
         ImmutableList.of(row(7L)),
         sql("SELECT count(*) FROM %s", tableName));
   }
+
+  @Test
+  public void testDefaultSortOnBinaryColumn() {
+    sql(
+        "CREATE TABLE %s (c1 INT, c2 Binary) "
+            + "USING iceberg "
+            + "PARTITIONED BY (truncate(2, c2))",

Review Comment:
   Should this be `bucket` instead of `truncate`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on a diff in pull request #7693: Spark 3.3: Fixes bucket on binary column

Posted by "jzhuge (via GitHub)" <gi...@apache.org>.
jzhuge commented on code in PR #7693:
URL: https://github.com/apache/iceberg/pull/7693#discussion_r1204608289


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestIcebergExpressions.java:
##########
@@ -71,4 +72,34 @@ public void testTruncateExpressions() {
         ImmutableList.of(row(100, 10000L, new BigDecimal("10.50"), "10", "12")),
         sql("SELECT int_c, long_c, dec_c, str_c, CAST(binary_c AS STRING) FROM v"));
   }
+
+  @Test
+  public void testBucketExpressions() {

Review Comment:
   Could you please add another unit test described in issue #7682? Basically for a table with binary bucket partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org