You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/18 22:59:37 UTC

[GitHub] [iceberg] rbalamohan commented on a diff in pull request #6432: Consider moving to ParallelIterable in Deletes::toPositionIndex

rbalamohan commented on code in PR #6432:
URL: https://github.com/apache/iceberg/pull/6432#discussion_r1051680991


##########
core/src/main/java/org/apache/iceberg/deletes/Deletes.java:
##########
@@ -144,7 +146,18 @@ public static <T extends StructLike> PositionDeleteIndex toPositionIndex(
             deletes ->
                 CloseableIterable.transform(
                     locationFilter.filter(deletes), row -> (Long) POSITION_ACCESSOR.get(row)));
-    return toPositionIndex(CloseableIterable.concat(positions));
+    return toPositionIndex(positions);
+  }
+
+  public static PositionDeleteIndex toPositionIndex(List<CloseableIterable<Long>> positions) {

Review Comment:
   Thanks @rdblue. Yes, this happens when there are more than one "delete positional file" that qualifies for the data file. E.g Assume a trickle feed job ingests data into the partition. Due to late arriving data, another job updates the dataset for certain dataset in the partition & creates "positional files (POS)".  For update jobs with different criteria, same data file may get qualified and creates additional POS files.  Essentially during scanning, one data file may have to scan multiple POS files (e.g 4 pos files) and causes slowness. ParallelIterable helps in this case. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org