You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/04 20:00:21 UTC

[GitHub] [iceberg] SinghAsDev opened a new pull request, #4692: WIP: Remove redundant sorts from copy on write deletes

SinghAsDev opened a new pull request, #4692:
URL: https://github.com/apache/iceberg/pull/4692

   Remove sorts from delete from queries reading just one partition and already sorted data


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1126833576

   Thanks for the PR, @SinghAsDev! I can take a look next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130549098

   > I'm a little worried about the instance in which every file was written with the correct sort order BUT were written by independent writes. In this case I have dozens of files which overlap, but they all have the same sort order. In this case I'm not sure it makes sense to ignore the sort request, in this case it wouldn't be redundant and we would be better off if we apply the distribution.
   
   I guess the answer would be "it depends". In case of @SinghAsDev, the files seem to be properly compacted and sorted so that only a small number of files overlap. Anyway, the point I was trying to make is there are lot of assumptions that must be met that makes this use case pretty narrow. On top of that, we can't do that at all as Iceberg doesn't know whether a broadcast join will be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] puchengy commented on a diff in pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

puchengy commented on code in PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#discussion_r868531139


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -581,8 +581,13 @@ private static class WriterFactory implements DataWriterFactory, StreamingDataWr
     private final StructType dsSchema;
     private final boolean partitionedFanoutEnabled;
 
-    protected WriterFactory(Broadcast<Table> tableBroadcast, FileFormat format, long targetFileSize,
-                            Schema writeSchema, StructType dsSchema, boolean partitionedFanoutEnabled) {
+    protected WriterFactory(

Review Comment:
   Maybe undo this as it is not related to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130571173

   > That poses a problem as Iceberg does not know which join implementation Spark is going to pick so we can't say whether the sort is really redundant.
   
   Yea, I definitely think this should be configurable optimization at best. But, I won't be surprised if many GDPR use-cases benefit from it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rshanmugam1 commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by "rshanmugam1 (via GitHub)" <gi...@apache.org>.

rshanmugam1 commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1604626671

   facing  similar use-case to this.
   
   the input data is sorted in a range using a custom partitioner. when another writer reads the data, performs a simple transformation, and writes it back, the sort order is not preserved. This issue arises because the number of splits does not match the number of input files, which disrupts the range sort. Since the data size is substantial, performing a shuffle operation is costly. tried these options but did not help.
   
   spark.read()
       .option("file-open-cost", Long.MAX_VALUE) -- //  creates 1 split per row group..need 1 split per file
   
   @aokolnychyi 
   "read.split.open-file-cost to Long.MaxValue to force one file per split" -- this makes one file per row group not file.. 
   am i missing something here or another way to achieve this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on a diff in pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on code in PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#discussion_r868588696


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -581,8 +581,13 @@ private static class WriterFactory implements DataWriterFactory, StreamingDataWr
     private final StructType dsSchema;
     private final boolean partitionedFanoutEnabled;
 
-    protected WriterFactory(Broadcast<Table> tableBroadcast, FileFormat format, long targetFileSize,
-                            Schema writeSchema, StructType dsSchema, boolean partitionedFanoutEnabled) {
+    protected WriterFactory(

Review Comment:
   Thanks, dropped.



##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java:
##########
@@ -128,7 +133,8 @@ public WriteBuilder overwrite(Filter[] filters) {
   @Override
   public Write build() {
     // Validate
-    Preconditions.checkArgument(handleTimestampWithoutZone || !SparkUtil.hasTimestampWithoutZone(table.schema()),
+    Preconditions.checkArgument(

Review Comment:
   Thanks, dropped.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130501423

   Thoughts, @RussellSpitzer @szehon-ho @rdblue @flyrain?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] puchengy commented on a diff in pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

puchengy commented on code in PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#discussion_r868531934


##########
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWriteBuilder.java:
##########
@@ -128,7 +133,8 @@ public WriteBuilder overwrite(Filter[] filters) {
   @Override
   public Write build() {
     // Validate
-    Preconditions.checkArgument(handleTimestampWithoutZone || !SparkUtil.hasTimestampWithoutZone(table.schema()),
+    Preconditions.checkArgument(

Review Comment:
   Maybe undo this as it is not related to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1132469759

   
   > > However, we want to remove sorts automatically without users having to figure out if all files are already sorted or not.
   > 
   > Yeah, I agree but I am afraid Iceberg can't do this on its own as we don't know whether there will be a shuffle. We can't remove the local sort if there is a subquery that is rewritten as a sort-merge join, can we?
   
   Yea, we can't tell for sure, but wondering if you see value in allowing users who know what they are doing to turn this optimization on. We are experimenting with this at Pinterest already and see significant improvements.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1156557364

   
   > I'd be open for that but I feel like we need to think a little bit more about how to expose this.
   
   Yea, happy to discuss more along this.
   
   > @SinghAsDev, do you happen to have some performance numbers to share? I'd image a local sort be fairly cheap unless there is a shuffle spill or substantial GC time.
   
   We saw 10-20% cost increase with local sort. The cost also goes up with number of keys the sort is on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130513986

   I'm a little worried about the instance in which every file was written with the correct sort order BUT were written by independent writes. In this case I have dozens of files which overlap, but they all have the same sort order. In this case I'm not sure it makes sense to ignore the sort request, in this case it wouldn't be redundant and we would be better off if we apply the distribution.
   
   For disabling the sort I think disabling the distribution mode is good enough? Although maybe file as task is a nice bonus there? I'm a little worried since that parameter is a bit of a confusing internal implementation detail for an external user to wrap their head around.
   
   I am strongly in support though of passing through "sort order" to files and want that PR if we can do it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130567256

   > The use case we are talking about is copy-on-write DELETE **_executed using a broadcast join_** where we read files from the current spec, only one file per split, files are already reasonably compacted and sorted as needed. Right now, we can avoid the shuffle by setting the distribution mode to `none` but we can't disable a potentially redundant local sort. Is my understanding correct?
   
   That is correct. However, we want to remove sorts automatically without users having to figure out if all files are already sorted or not. So, imaging a dataset that goes through deletion daily. The first deletion should do a global sort while re-writing, however future deletes should not have to perform any sort (not even local sort).
   
   > If we had a way to pass options to DELETE commands, we could simply support the same using these steps:
   > 
   > * Set `read.split.open-file-cost` to `Long.MaxValue` to force one file per split
   > * Set `write.delete.distribution-mode` to `none`
   > * Set write option `use-table-distribution-and-ordering` to `false`
   > 
   > I will be adding OPTIONS to row-level commands in Spark too but it won't be there until Spark 3.4.
   
   For this to work users would have to first figure out if all files are sorted or not and then do things differently accordingly. If that is something user want to do they can do that by explicitly adding `order by` only in first delete. However, there will still be a redundant local sort in subsequent deletes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130551254

   I want to say i'm not against heuristic optimizations, but I do think we need to make them optional, explicitly enabled/disabled, and they should do the right thing at least 90% of the time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1135317507

   > Yea, we can't tell for sure, but wondering if you see value in allowing users who know what they are doing to turn this optimization on.
   
   I'd be open for that but I feel like we need to think a little bit more about how to expose this.
   
   @SinghAsDev, do you happen to have some performance numbers to share? I'd image a local sort be fairly cheap unless there is a shuffle spill.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130495934

I spent some time thinking about this. Let me summarize how I understand the proposal.

The use case we are talking about is copy-on-write DELETE **_executed using a broadcast join_** where we read files from the current spec, only one file per split, files are already reasonably compacted and sorted as needed. Right now, we can avoid the shuffle by setting the distribution mode to `none` but we can't disable a potentially redundant local sort. Is my understanding correct?

We can't target UPDATE and MERGE as those may change the ordering/partition of records even if we read properly sorted/partitioned files. We can't target tables with small files as having a split per file will be costly. The first two restrictions seem reasonable but I am afraid we can't target DELETE that triggers a shuffle as it will distribute records across tasks by the delete condition, not necessarily by the partition/sort key. That poses a problem as Iceberg does not know which join implementation Spark is going to pick so we can't say whether the sort is really redundant. The proper solution would be to report ordering of scan tasks to Spark, request required distribution/ordering on write and let Spark decide if the sort is redundant. There was a Spark proposal for exposing tasks ordering but it is not available yet.

If we had a way to pass options to MERGE commands, we could simply support the same using these steps:
- Set `read.split.open-file-cost` to `Long.MaxValue` to force one file per split
- Set `write.delete.distribution-mode` to `none`
- Set write option `use-table-distribution-and-ordering` to `false`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130499776

   BTW, I think propagating the sort order ID to written files is something we should do. We have to be careful to do that only when we actually requested a sort order. @SinghAsDev, would you be interested in submitting a PR for that? It is independent of what we are going to do here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130568404

   > BTW, I think propagating the sort order ID to written files is something we should do. We have to be careful to do that only when we actually requested a sort order. @SinghAsDev, would you be interested in submitting a PR for that? It is independent of what we are going to do here.
   
   Will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1130595244

   > However, we want to remove sorts automatically without users having to figure out if all files are already sorted or not.
   
   Yeah, I agree but I am afraid Iceberg can't do this on its own as we don't know whether there will be a shuffle. We can't remove the local sort if there is a subquery that is rewritten as a sort-merge join, can we?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] SinghAsDev commented on pull request #4692: WIP: Remove redundant sorts from copy on write deletes

Posted by GitBox <gi...@apache.org>.

SinghAsDev commented on PR #4692:
URL: https://github.com/apache/iceberg/pull/4692#issuecomment-1119818873

   Hi @aokolnychyi @RussellSpitzer @rdblue wanted with check with you all if there are any high level concerns on this approach?
   
   I will add more tests, and will either make `file-as-split` generic for any scans, or rename it to make it obvious that it is limited to copy-on-write scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org