You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/28 15:57:56 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

ajantha-bhat opened a new pull request, #6296:
URL: https://github.com/apache/iceberg/pull/6296

   For `RewriteDataFilesProcedure`, when the table has been created with a default sort order, there should be no need for the user to again specify the same sort order for the sort strategy during re-write. Currently, it fails with `Cannot set strategy sort order: unsorted` error for this scenario.
   
   This particular scenario works with rewrite spark action now also and used to work with `RewriteDataFilesProcedure` too. 
   But while introducing `Zorder` with re-write procedure, a strict check has been introduced which caused this particular scenario to fail.  This would have not happened if we had a test case to catch it. Fixed it now and added the test case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1034326681


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -184,6 +184,9 @@ private RewriteDataFiles checkAndApplyStrategy(
                 .toArray(String[]::new);
         return action.zOrder(columnNames);
       } else {
+        if (sortOrderFields.isEmpty()) {
+          return action.sort();

Review Comment:
   fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1042526654


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -356,6 +356,16 @@ public void testRewriteDataFilesWithInvalidInputs() {
                     + "sort_order => 'c1 ASC NULLS FIRST')",
                 catalogName, tableIdent));
 
+    // Test for sort strategy without any (default/user defined) sort_order
+    AssertHelpers.assertThrows(
+        "Should reject calls with error message",
+        IllegalArgumentException.class,
+        "Can't use SORT when there is no sort order",

Review Comment:
   I think the existing full error message is enough. Because I just captured a piece from that full message you might though we need to modify the message. 
   Let me know if you still want to change it, if not please merge this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1042526654


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -356,6 +356,16 @@ public void testRewriteDataFilesWithInvalidInputs() {
                     + "sort_order => 'c1 ASC NULLS FIRST')",
                 catalogName, tableIdent));
 
+    // Test for sort strategy without any (default/user defined) sort_order
+    AssertHelpers.assertThrows(
+        "Should reject calls with error message",
+        IllegalArgumentException.class,
+        "Can't use SORT when there is no sort order",

Review Comment:
   I think the existing full error message is enough. Because I just captured a piece from that full message you might thought we need to modify the message. 
   Let me know if you still want to change it, if not please merge this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1357737149

   Thanks @ajantha-bhat !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033738622


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -184,6 +184,9 @@ private RewriteDataFiles checkAndApplyStrategy(
                 .toArray(String[]::new);
         return action.zOrder(columnNames);
       } else {
+        if (sortOrderFields.isEmpty()) {
+          return action.sort();

Review Comment:
   this will set the sort order to null (unintialized), which in turn uses the table sort order here. 
   
   https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/actions/SortStrategy.java#L76



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033776082


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -356,6 +356,16 @@ public void testRewriteDataFilesWithInvalidInputs() {
                     + "sort_order => 'c1 ASC NULLS FIRST')",
                 catalogName, tableIdent));
 
+    // Test for sort strategy without any (default/user defined) sort_order
+    AssertHelpers.assertThrows(
+        "Should reject calls with error message",
+        IllegalArgumentException.class,
+        "Can't use SORT when there is no sort order",

Review Comment:
   This error message is present since the beginning of the sort strategy support (maybe added by you :) ).
   https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/actions/SortStrategy.java#L86
   
   Do you strongly feel the need to change it now? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033736149


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -549,6 +559,34 @@ public void testZOrderTableWithSpecialChars() {
     Assert.assertEquals("Table cache must be empty", 0, SparkTableCache.get().size());
   }
 
+  @Test
+  public void testDefaultSortOrder() {

Review Comment:
   This particular testcase fails without the fix in this PR. 
   
   I have also tested the same scenario in the 0.13.x now. It can pass. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged PR #6296:
URL: https://github.com/apache/iceberg/pull/6296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033767880


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -356,6 +356,16 @@ public void testRewriteDataFilesWithInvalidInputs() {
                     + "sort_order => 'c1 ASC NULLS FIRST')",
                 catalogName, tableIdent));
 
+    // Test for sort strategy without any (default/user defined) sort_order
+    AssertHelpers.assertThrows(
+        "Should reject calls with error message",
+        IllegalArgumentException.class,
+        "Can't use SORT when there is no sort order",

Review Comment:
   "when the table has no defined sort order and the user has not set a specific sort order"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033764881


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java:
##########
@@ -184,6 +184,9 @@ private RewriteDataFiles checkAndApplyStrategy(
                 .toArray(String[]::new);
         return action.zOrder(columnNames);
       } else {
+        if (sortOrderFields.isEmpty()) {
+          return action.sort();

Review Comment:
   we cam avoid another level of indentation if we use an else if
   
   ```java
     else if (sortOrdserFields.nonEmpty) {
          return action.sort(buildSortOrder(sortOrderFields, schema));
     } else {
          return action.sort()
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1329366120

   This issue has been reported by two of the users in iceberg-slack. 
   Here is the full background: https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1661239255706969
   
   cc: @RussellSpitzer, @Fokko  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1346988722

   ping @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1330119227

   > Minor suggestion on the branching structure, but this looks good to me
   Fixed it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#discussion_r1033778874


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -356,6 +356,16 @@ public void testRewriteDataFilesWithInvalidInputs() {
                     + "sort_order => 'c1 ASC NULLS FIRST')",
                 catalogName, tableIdent));
 
+    // Test for sort strategy without any (default/user defined) sort_order
+    AssertHelpers.assertThrows(
+        "Should reject calls with error message",
+        IllegalArgumentException.class,
+        "Can't use SORT when there is no sort order",

Review Comment:
   this is the current full error message `Can't use %s when there is no sort order, either define table %s's sort order or set sort order in the action`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6296: Spark-3.3: Use table sort order with sort strategy when user has not specified

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #6296:
URL: https://github.com/apache/iceberg/pull/6296#issuecomment-1331533750

   @RussellSpitzer: Can you please take a look at it again and help in merging the PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org