You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/29 07:01:04 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #2651: Core: add checkShouldKeep for DeleteFilter

jackye1995 opened a new pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651


   @rdblue @openinx @chenjunjiedada 
   
   This is a bit related to #2372, but instead of adding a delete marker for each row, this PR directly adds a `CloseableIterable<Boolean> checkShouldKeep(CloseableIterable<T> records)` method that returns an iterable of booleans to indicate if each record should be kept or not.
   
   This is primarily to accommodate the usage of `DeleteFilter` in Trino, where records are supplied in batch through `Page`s, unlike the Flink and Spark cases where there is an explicit row level representation like `InternalRow` and `RowData`.
   
   For the Trino case, page is a more column oriented data structure consists of one `Block` per column, and each `Block` is an array of raw data. So a row in a page is a `page.getPosition(index)`, which is basically a bunch of raw data scattered in all the blocks. (this is probably an overly simplified explanation but should be enough context here)
   
   So in the delete case, instead of using the `CloseableIterable<T> filter(CloseableIterable<T> records)` method and get all the rows that are remaining, I find it much more efficient to know for each row, if it is deleted or not. With that information, I can simply reuse the same `Page` with only the positions that are not deleted, instead of filtering each row and reformulating a page with filtered rows. There is a method `page.getPositions(positionArray, ...)` which serves exactly this purpose.
   
   The work for delete marker seems to be more oriented to generating the "deleted or not" information after performing the filtering and presenting this information to the end data frame requester. But this change would be used for a lower level query engine data reader.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651317476



##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -246,4 +265,79 @@ protected boolean shouldKeep(T posDelete) {
       return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) FILENAME_ACCESSOR.get(posDelete)) == 0;
     }
   }
+
+  private static class PositionStreamDeleteChecker<T> extends CloseableGroup implements CloseableIterable<Optional<T>> {
+    private final CloseableIterable<T> rows;
+    private final Function<T, Long> extractPos;
+    private final CloseableIterable<Long> deletePositions;
+
+    private PositionStreamDeleteChecker(CloseableIterable<T> rows, Function<T, Long> extractPos,
+                                        CloseableIterable<Long> deletePositions) {
+      this.rows = rows;
+      this.extractPos = extractPos;
+      this.deletePositions = deletePositions;
+    }
+
+    @Override
+    public CloseableIterator<Optional<T>> iterator() {
+      CloseableIterator<Long> deletePosIterator = deletePositions.iterator();
+
+      CloseableIterator<Optional<T>> iter;
+      if (deletePosIterator.hasNext()) {
+        iter = new PositionCheckIterator(rows.iterator(), deletePosIterator);
+      } else {
+        iter = CloseableIterator.transform(rows.iterator(), Optional::ofNullable);
+        try {
+          deletePosIterator.close();
+        } catch (IOException e) {
+          throw new UncheckedIOException("Failed to close delete positions iterator", e);
+        }
+      }
+
+      addCloseable(iter);
+
+      return iter;
+    }
+
+    private class PositionCheckIterator extends CheckIterator<T> {

Review comment:
       I think to keep this clean, we should rewrite `PositionFilterIterator` so that it is based on this class. The `check` method here duplicates `shouldKeep` in the filter and I don't think it is a good idea to have that code duplicated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#issuecomment-851284546


   CC: @szlta
   
   This could be useful for Hive vectorized reads as well, when we start to consider deletes as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r642351720



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);
+
+  @Override
+  public boolean hasNext() {
+    boolean itemsHasNext = items.hasNext();
+    if (!itemsHasNext) {
+      close();
+    }
+
+    return !closed && itemsHasNext;
+  }
+
+  @Override
+  public Optional<T> next() {
+    if (!hasNext()) {
+      throw new NoSuchElementException();
+    }
+
+    T item = items.next();
+    return check(item) ? Optional.ofNullable(item) : Optional.empty();
+  }
+
+  @Override
+  public void close() {
+    if (!closed) {
+      try {
+        ((Closeable) items).close();

Review comment:
       Why not use CloseableIterable in ctor?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#issuecomment-861040175


   @jackye1995, my primary concern with this is that you're reading the data to produce whether or not a row should be present. I think that would mean basically reading all of the data twice in Trino: once into pages and once for the `CloseableIterable<Boolean>` used to mark whether a row in a page should be kept. That doesn't seem worth it to me, although you could argue that if your projection is narrow it would be okay. You may also already be re-using the pages to produce the underlying `StructLike` iterable. If that's the case, then you can ignore from here on...
   
   I think a better approach would be to adapt the pages in Trino to produce a view of a row that implements `StructLike`, at least in a limited capacity (i.e. for top-level or struct columns only, no array/map nesting). That's what happens in Spark vectorization. There is a class that wraps a batch and provides access to a row using its ordinal. If you had that, you'd be able to read the data into pages and still use Iceberg's delete filters to test each row.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651317361



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);

Review comment:
       Actually, the `shouldKeep` method name in the filter classes works well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651316782



##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -246,4 +265,79 @@ protected boolean shouldKeep(T posDelete) {
       return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) FILENAME_ACCESSOR.get(posDelete)) == 0;
     }
   }
+
+  private static class PositionStreamDeleteChecker<T> extends CloseableGroup implements CloseableIterable<Optional<T>> {
+    private final CloseableIterable<T> rows;
+    private final Function<T, Long> extractPos;
+    private final CloseableIterable<Long> deletePositions;
+
+    private PositionStreamDeleteChecker(CloseableIterable<T> rows, Function<T, Long> extractPos,
+                                        CloseableIterable<Long> deletePositions) {
+      this.rows = rows;
+      this.extractPos = extractPos;
+      this.deletePositions = deletePositions;
+    }
+
+    @Override
+    public CloseableIterator<Optional<T>> iterator() {
+      CloseableIterator<Long> deletePosIterator = deletePositions.iterator();
+
+      CloseableIterator<Optional<T>> iter;
+      if (deletePosIterator.hasNext()) {

Review comment:
       +1 for checking whether we need to use the positions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651316675



##########
File path: core/src/main/java/org/apache/iceberg/deletes/Deletes.java
##########
@@ -246,4 +265,79 @@ protected boolean shouldKeep(T posDelete) {
       return CHARSEQ_COMPARATOR.compare(dataLocation, (CharSequence) FILENAME_ACCESSOR.get(posDelete)) == 0;
     }
   }
+
+  private static class PositionStreamDeleteChecker<T> extends CloseableGroup implements CloseableIterable<Optional<T>> {
+    private final CloseableIterable<T> rows;
+    private final Function<T, Long> extractPos;
+    private final CloseableIterable<Long> deletePositions;
+
+    private PositionStreamDeleteChecker(CloseableIterable<T> rows, Function<T, Long> extractPos,
+                                        CloseableIterable<Long> deletePositions) {
+      this.rows = rows;
+      this.extractPos = extractPos;
+      this.deletePositions = deletePositions;
+    }
+
+    @Override
+    public CloseableIterator<Optional<T>> iterator() {
+      CloseableIterator<Long> deletePosIterator = deletePositions.iterator();
+
+      CloseableIterator<Optional<T>> iter;
+      if (deletePosIterator.hasNext()) {
+        iter = new PositionCheckIterator(rows.iterator(), deletePosIterator);
+      } else {
+        iter = CloseableIterator.transform(rows.iterator(), Optional::ofNullable);
+        try {
+          deletePosIterator.close();
+        } catch (IOException e) {
+          throw new UncheckedIOException("Failed to close delete positions iterator", e);
+        }
+      }
+
+      addCloseable(iter);

Review comment:
       This should be done before trying to close `deletePosIterator` in the `else` block, or else failing to close the delete positions iterator could leak the open rows iterator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r645249356



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);
+
+  @Override
+  public boolean hasNext() {
+    boolean itemsHasNext = items.hasNext();
+    if (!itemsHasNext) {
+      close();
+    }
+
+    return !closed && itemsHasNext;
+  }
+
+  @Override
+  public Optional<T> next() {
+    if (!hasNext()) {
+      throw new NoSuchElementException();
+    }
+
+    T item = items.next();
+    return check(item) ? Optional.ofNullable(item) : Optional.empty();
+  }
+
+  @Override
+  public void close() {
+    if (!closed) {
+      try {
+        ((Closeable) items).close();

Review comment:
       Yeah I am actually not sure why, I was just following the pattern in `FilterIterator`. But we are assuming `items` is a `CloseableIterable` and directly casting it like that. I see this was written by @rdblue , was there any reason we did that instead of directly setting the iterable as `CloseableIterable`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r645249356



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);
+
+  @Override
+  public boolean hasNext() {
+    boolean itemsHasNext = items.hasNext();
+    if (!itemsHasNext) {
+      close();
+    }
+
+    return !closed && itemsHasNext;
+  }
+
+  @Override
+  public Optional<T> next() {
+    if (!hasNext()) {
+      throw new NoSuchElementException();
+    }
+
+    T item = items.next();
+    return check(item) ? Optional.ofNullable(item) : Optional.empty();
+  }
+
+  @Override
+  public void close() {
+    if (!closed) {
+      try {
+        ((Closeable) items).close();

Review comment:
       Yeah I am actually not sure why, I was just following the pattern in `FilterIterator`. But we are assuming `items` is a `CloseableIterable` and directly casting it like that. I see this was written by @rdblue , was there any reason we did that instead of directly setting the iterable as `CloseableIterable`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 closed pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
jackye1995 closed pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651309534



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);

Review comment:
       I don't think that "check" is specific enough. You can "check" a condition and then take either action depending on the result. It doesn't imply a relationship between the condition and some behavior. I would probably come up with a better name for this implementation and the method here. Probably `accept`. Is there a reason not to pass a `Predicate` to perform the check instead of embedding it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651314328



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);
+
+  @Override
+  public boolean hasNext() {
+    boolean itemsHasNext = items.hasNext();
+    if (!itemsHasNext) {
+      close();
+    }
+
+    return !closed && itemsHasNext;
+  }
+
+  @Override
+  public Optional<T> next() {
+    if (!hasNext()) {
+      throw new NoSuchElementException();
+    }
+
+    T item = items.next();
+    return check(item) ? Optional.ofNullable(item) : Optional.empty();
+  }
+
+  @Override
+  public void close() {
+    if (!closed) {
+      try {
+        ((Closeable) items).close();

Review comment:
       I think that it may also be that we didn't want to assume that the incoming class was `Closeable`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651314069



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {

Review comment:
       Do we need this class? Or could it be a method in `CloseableIterable`?
   
   ```java
     static <E> CloseableIterable<Optional<E>> asOptions(CloseableIterable<E> iterable, Predicate<E> pred) {
       return CloseableIterable.transform(iterable,
           item -> pred.test(item) ? Optional.ofNullable(item) : Optional.empty());
     }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651307954



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {
+  private final Iterator<T> items;
+  private boolean closed;
+
+  protected CheckIterator(Iterator<T> items) {
+    this.items = items;
+    this.closed = false;
+  }
+
+  protected abstract boolean check(T item);
+
+  @Override
+  public boolean hasNext() {
+    boolean itemsHasNext = items.hasNext();
+    if (!itemsHasNext) {
+      close();
+    }
+
+    return !closed && itemsHasNext;
+  }
+
+  @Override
+  public Optional<T> next() {
+    if (!hasNext()) {
+      throw new NoSuchElementException();
+    }
+
+    T item = items.next();
+    return check(item) ? Optional.ofNullable(item) : Optional.empty();
+  }
+
+  @Override
+  public void close() {
+    if (!closed) {
+      try {
+        ((Closeable) items).close();

Review comment:
       I think that is' probably that `FilterIterator` was written before we added `CloseableIterator`. Originally, we only had `CloseableIterable` and not `CloseableIterator`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2651: Core: add checkShouldKeep for DeleteFilter

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2651:
URL: https://github.com/apache/iceberg/pull/2651#discussion_r651315291



##########
File path: api/src/main/java/org/apache/iceberg/io/CheckIterator.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.Optional;
+
+/**
+ * An Iterator of {@link Optional} that checks a predicate for each item based on another Iterator.
+ * If predicate is true, the optional item has content, otherwise the optional is empty.
+ *
+ * @param <T> the type of objects of the dependent Iterator
+ */
+public abstract class CheckIterator<T> implements CloseableIterator<Optional<T>> {

Review comment:
       I see that this is `CloseableIterator` rather than `CloseableIterable`, but there is also a `transform` method for `CloseableIterator` that might work, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org