You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/11 17:27:43 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #3720: Spark: Refactor building write distribution and ordering

aokolnychyi opened a new pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720


   This PR has the following contributions:
   
   - When I was working on distributions for row-level commands, I realized that it is easier to decide which distribution mode to use  inside `SparkWriteConf`. Right now, we only parse the config in `SparkWriteConf` but then have extra checks in the util while building the actual distribution. That means we have bits of the same logic in multiple places.
   - We no longer need `OrderField`. We can simply use the corresponding Spark class for sort expressions. I did not mind having a custom implementation but it makes equality checks harder.
   - Tests for write distribution and ordering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767336275



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       Sounds fine to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767331294



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       The problem is that our default value isn't static and cannot be represented by a literal. It is just a constant and we still respect the table property.
   
   I'd probably just deprecate `WRITE_DISTRIBUTION_MODE_DEFAULT`. Do you agree, @rdblue?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767913112



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       I think we should follow up and deprecate it. We can do that in a PR that updates Flink to set its own default like here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767331294



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       The problem is that our default value isn't static and cannot be represented by a literal. We continue to respect the table property but it is just the constant does not make much sense now.
   
   I'd probably just deprecate `WRITE_DISTRIBUTION_MODE_DEFAULT`. Do you agree, @rdblue?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767499675



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       It turns out this property is used by Flink, which makes more sense as Flink does not support `range`. I'll probably keep things as is then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3720: Spark: Refactor building write distribution and ordering

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3720:
URL: https://github.com/apache/iceberg/pull/3720#discussion_r767310772



##########
File path: spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkWriteConf.java
##########
@@ -148,19 +152,23 @@ public String rewrittenFileSetId() {
   }
 
   public DistributionMode distributionMode() {
-    String defaultValue;
-    if (table.sortOrder().isSorted()) {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_RANGE;
-    } else {
-      defaultValue = TableProperties.WRITE_DISTRIBUTION_MODE_DEFAULT;

Review comment:
       Do we need to continue to respect this? I think the idea was for this to control what to do for unsorted tables, but now this hard-codes NONE. Should we deprecate the property and remove it in 0.14.0?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org