You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/10 23:21:34 UTC

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4683: Read deleted rows with metadata column IS_DELETED

szehon-ho commented on code in PR #4683:
URL: https://github.com/apache/iceberg/pull/4683#discussion_r869717071


##########
api/src/main/java/org/apache/iceberg/Schema.java:
##########
@@ -381,6 +381,22 @@ public String idToAlias(Integer fieldId) {
     return null;
   }
 
+  /**
+   * Returns the index of the given field id.
+   *
+   * @param fieldId a column id in this schema
+   * @return the index of the field in the schema, or -1 if one wasn't found
+   */
+  public int idToPosition(int fieldId) {
+    for (int i = 0; i < struct.fields().size(); i++) {

Review Comment:
   Will this work? ```struct.fields().indexOf(fieldId)```



##########
core/src/main/java/org/apache/iceberg/deletes/Deletes.java:
##########
@@ -63,13 +65,23 @@ public static <T> CloseableIterable<T> filter(CloseableIterable<T> rows, Functio
     return equalityFilter.filter(rows);
   }
 
-  public static <T> CloseableIterable<T> filter(CloseableIterable<T> rows, Function<T, Long> rowToPosition,
-                                                PositionDeleteIndex deleteSet) {
-    if (deleteSet.isEmpty()) {
-      return rows;
-    }
+  public static <T> CloseableIterable<T> markDeleted(CloseableIterable<T> rows, Predicate<T> isDeleted,
+                                                     Consumer<T> deleteMarker) {
+    return CloseableIterable.transform(rows, row -> {
+      if (isDeleted.test(row)) {
+        deleteMarker.accept(row);
+      }
+      return row;
+    });
+  }
 
-    PositionSetDeleteFilter<T> filter = new PositionSetDeleteFilter<>(rowToPosition, deleteSet);
+  public static <T> CloseableIterable<T> filter(CloseableIterable<T> rows, Predicate<T> shouldKeep) {

Review Comment:
   Not sure I see the point of this API, it seems equivalent to CloseableIterable.filter() ?



##########
core/src/main/java/org/apache/iceberg/deletes/Deletes.java:
##########
@@ -227,6 +237,39 @@ public void close() {
     }
   }
 
+  private static class PositionStreamDeleteMarker<T> extends PositionStreamDeleteFilter<T> {
+    private final Consumer<T> markDeleted;
+
+    private PositionStreamDeleteMarker(CloseableIterable<T> rows, Function<T, Long> extractPos,
+                                       CloseableIterable<Long> deletePositions, Consumer<T> markDeleted) {
+      super(rows, extractPos, deletePositions);
+      this.markDeleted = markDeleted;
+    }
+
+    @Override
+    protected PositionFilterIterator createPosDeleteIterator(CloseableIterator<T> items,
+                                                             CloseableIterator<Long> deletePosIterator) {
+      return new PositionDeleteMarkerIterator(items, deletePosIterator);
+    }
+
+    private class PositionDeleteMarkerIterator extends PositionFilterIterator {

Review Comment:
   I probably need to read fully , but I also feel on the same line that we can have the caller use functional composition, instead of trying to have one master iterator that does everything?  ie,
   
   ``` filter.filter(rows).transform(deleteMarker)```
   
   It would be cleaner and avoid having this complex iterator.



##########
core/src/main/java/org/apache/iceberg/deletes/Deletes.java:
##########
@@ -63,13 +65,23 @@ public static <T> CloseableIterable<T> filter(CloseableIterable<T> rows, Functio
     return equalityFilter.filter(rows);
   }
 
-  public static <T> CloseableIterable<T> filter(CloseableIterable<T> rows, Function<T, Long> rowToPosition,
-                                                PositionDeleteIndex deleteSet) {
-    if (deleteSet.isEmpty()) {
-      return rows;
-    }
+  public static <T> CloseableIterable<T> markDeleted(CloseableIterable<T> rows, Predicate<T> isDeleted,

Review Comment:
   I think this API does a lot of things, can it be like:
   ``` CloseableIterable<T> markDeleted(CloseableIterable<T> rows, Consumer<T> deleteMarker) ```
   
    just markDeleted and then have the user call:
   ``` Deletes.markDeleted(filtered, marker)```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org