You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/19 13:24:12 UTC

[GitHub] [iceberg] chenjunjiedada opened a new pull request #2841: API: add an action API for rewrite deletes

chenjunjiedada opened a new pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841


   This defines an action API for rewriting deletes as we mentioned in [here](https://github.com/apache/iceberg/pull/2372#issuecomment-845514501). With this API, users could define their own rewrite strategy to handle deletes according to different features of storage.
   
   This reuses parts of the idea from @RussellSpitzer 's rewrite framework. It is a simple version so that it could be implemented with the previous rewrite delete PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672512582



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-958099000


   This looks good to me. Let's merge it and follow up with PRs that consume these interfaces. We can make adjustments as needed later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r735975577



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.
+   * <p>
+   * The row filter will be converted to a partition filter with an inclusive projection, so that candidate deletes are
+   * selected if any row match the expression. The matching delete files will be converted to position delete files.

Review comment:
       This isn't quite correct. An inclusive projection matches if a file _can_ contain rows that match. But it doesn't necessarily. I would change this to "Any file that may contain rows matching this filter will be used by the action."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730468262



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.

Review comment:
       I don't think that there is a conversion strategy here, right? We can probably remove that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672760398



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I didn't add `validOptions` because we haven't finalized what options are valid and it can be added late. But I'm fine to add them now. 
   
   For `name`, It looks like the class name already includes the strategy name.  Do we still need it?
    

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing. 

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors can avoid reading useless delete contents.

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewriteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, we could add it later and apply it in `selectDeletes()` API.
   
   As for strategies, it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they could impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. So we are planning two strategies at first:  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is kind of bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors could read only the delete contents that match the data files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-892753150


   @RussellSpitzer @jackye1995 @yyanyy @openinx , I think we are on the same page now, can we merge this to continue rewrite actions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-882551386


   @jackye1995 @RussellSpitzer Could you please take a look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r709284031



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       Good question!  Besides converting to position deletes we could split equality delete into small ones so that each file scan can carry small equality deletes as Jack mentioned before. But as you pointed out, I think it would be hard to handle the sequence number since the committed snapshot should be immutable.  Can we do the trick?
   
   I'm not sure how much benefit that splitting can provide, but I believe small files bring worse problems.  How about we make `rewriteDeletes` more specific? Like `removeEqDeletes` and `rewritePosDeletes`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710617030



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       Also, will position -> data be something we take here or in `RewriteDataFiles`? I can see it being part of the data compaction. Like if we detect that certain data files have too many matching delete files, we want to rewrite them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710616377



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       I think there is a third use case two: simply compacting position deletes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-940834664


   @rdblue @jackye1995 Any other comment on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-958099734


   Thanks @chenjunjiedada for the hard work. Thanks everyone for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730121720



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       what about returning `Iterable<CombinedScanTask>`?

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {

Review comment:
       might be helpful to add some javadoc. I assume this is for converting equality deletes to position deletes from the design doc?

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> rewriteDeleteFiles(Iterable<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles);

Review comment:
       wondering why here it is `DeleteFile` while the `ConvertDeleteStrategy` plans with `FileScanTask`. 
   
   also wondering if both interfaces in this PR can use `RewriteFileGroup`? @RussellSpitzer 

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> rewriteDeleteFiles(Iterable<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles);

Review comment:
       wondering why here it is `DeleteFile` while the `ConvertDeleteStrategy` plans with `FileScanTask`. 
   
   also wondering if both interfaces in this PR can use `RewriteFileGroup` as return type? @RussellSpitzer 

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();

Review comment:
       wondering if both interfaces can share the same `Result` interface. Can both call this `removedDeleteFilesCount` which also match the other method of `addedDeleteFilesCount`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710616377



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       I think there is a third use case: simply compacting position deletes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730468809



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);

Review comment:
       Should we have a partition filter and a row/data filter, or just a row filter? Since equality delete files are stored by partition, I think that we will actually convert a filter to a partition filter and then rewrite any equality delete file in that partition. So it may make sense to allow both. If, for example, you have a table that is bucketed and you rewrite the deletes in 1/10th of the buckets every day. Then you'd want to be able to specify `id_bucket IN (...)` rather than supplying a data filter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r720862624



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles convertEqualityDeletes();

Review comment:
       According to the proposal, this would be a different action because it requires different analysis. Can you remove this? I think that the only strategy should be to rewrite position deletes. I'm not sure if "combine" is the right term there, but it's okay for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r721420666



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles convertEqualityDeletes();

Review comment:
       I was waiting for the proposal to be finalized. Now it looks like almost there.  I just updated them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r709264246



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       This was discussed in https://github.com/apache/iceberg/pull/2841#discussion_r672569391. It is in order to align with `RewriteStrategy`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-918248147


   ping @rdblue @openinx @jackye1995 again... This might be lost in PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-882551386


   @jackye1995 @RussellSpitzer Could you please take a look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r673688777



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       Thanks for the explanation. The case you described are also what I am interested in. I think for `RewriteStrategy` we first had a design doc so when there was the PR everything was in place. I am fine if we are taking incremental progress for delete strategies to gradually evolve the interface and add configuration options.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r678668956



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * rewrite the equality deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes rewriteEqDeletes();

Review comment:
       Curious on the reason for having `rewriteEqDeletes` and `rewritePosDeletes` here, looks like in `RewriteDataFiles` we define methods based on the exact strategy to use, and for rewriting delete files there could be strategies like eqDelete->posDelete, combine posDelete into one, and even take both delete and data files and return data files (I think the last one should be a separate interface). Here it seems like the user can decide whether to rewrite certain type of delete files, but I would think strategy might be the thing that the user want to control more? Sorry I've been away from the community conversations for a while so might lack a lot of context if this was previously discussed. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r735975978



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.
+   * <p>
+   * The row filter will be converted to a partition filter with an inclusive projection, so that candidate deletes are
+   * selected if any row match the expression. The matching delete files will be converted to position delete files.
+   *
+   * @param expression An iceberg expression used to find deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles rowFilter(Expression expression);

Review comment:
       Since we have just one filter, there is no need to add "row" in the method name. Let's change it back to `filter` to match `RewriteDataFiles`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r711171216



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       @jackye1995,  In order to make API more generic, maybe we can go back to use empty args and let the strategy implementation maintain the map like the implementation [here]( https://github.com/apache/iceberg/pull/2364/files#diff-b31571d37b1152b422d32af62978f2954359254ac11a7f4338e3116b1b262c4cR108). So that users could choose the deletes in their own way, for example, read from manifest directly.  And since we are rewriting deletes, it looks more natural to return `Iterable<DeleteFile>`.
   
   @rdblue @jackye1995 WDYT?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732047153



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+/**
+ * A strategy for the action to convert equality delete to position deletes.
+ */
+public interface ConvertEqualityDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       Nit: The order of methods doesn't match `RewriteStrategy` for data files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732039643



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.

Review comment:
       Nit: should be "size" not "sizing"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732799099



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+/**
+ * A strategy for the action to convert equality delete to position deletes.
+ */
+public interface ConvertEqualityDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();

Review comment:
       In my POC implementation, there is a local variable that holds the iterable. Accepting the `Itrerable<FileScanTask>` also sounds reasonable to me. Let up add it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r676788872



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       I don't have a design doc right now, but most of the ideas come from discussions in https://github.com/apache/iceberg/pull/2216 and https://github.com/apache/iceberg/pull/2364. I'm working on rebasing #2216 to apply #2372 (API definition) and #2841 (delete row reader). That would be a full implementation and hopefully could give you the full picture of how these APIs are used.  I will post in the following days.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-893047804


   cc @rdblue @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710609731



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {

Review comment:
       nit: shall we add some Javadoc similarly to `RewriteDataFiles`?
   
   ```
   /**
    * An action for rewriting delete files according to a rewrite strategy.
    * Generally used for optimizing the sizing and layout of delete files within a table.
    */
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r711176461



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       Since we can choose a specific rewrite strategy in `RewriteDeleteFiles`, and we don't have another strategy to handle equality deletes, I think we could keep this one as it is. Does that sound reasonable to you? @rdblue @jackye1995 .
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672512582



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions` and `options`

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions`

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       could you explain a bit about how these 2 interfaces will be used? For `RewriteStrategy` the method `Iterable<FileScanTask> selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` was quite straightforward, but here the method has no arguments so I am not very sure how would it be used to actually select deletes.
   
   I think it goes to the fundamental question of what are some potential rewrite delete strategies? For rewrite data files we know there are well-known operations like bin-packing. For deletes, what are some strategies we plan to implement?

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       `name` is mostly used for logging purpose. I agree it is already in the class name, but for consistency with `RewriteStrategy` I think we can still have it here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-887980139


   @openinx @jackye1995 I rebased the #2364, now you can take a quick look at how the APIs are used. 
   
   cc @rdblue @aokolnychyi @RussellSpitzer @yyanyy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672769715



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       `name` is mostly used for logging purpose. I agree it is already in the class name, but for consistency with `RewriteStrategy` I think we can still have it here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710727474



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {

Review comment:
       That sounds reasonable to me. Will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710727253



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the delete files to rewrite.
+     */
+    Set<DeleteFile> deleteFilesToReplace();

Review comment:
       it is committed in the action, see the action implementation here: https://github.com/apache/iceberg/pull/2364/files#diff-0f115e8104a36761f3afa09592e7beb70dcfeca2e5b088a05ed76a5778bf6178R84. 
   
   The `option()` is used to extend features like that. We can implement partially committing feature in the specific strategy according to the options passed in. I have considered this before. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732796568



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.

Review comment:
       will add doc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r736081077



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.
+   * <p>
+   * The row filter will be converted to a partition filter with an inclusive projection, so that candidate deletes are
+   * selected if any row match the expression. The matching delete files will be converted to position delete files.

Review comment:
       An inclusive metrics evaluator evaluates the file in that way. But when the expression was projected to partition filter, since the partition value of the file is determined, the matching logic should be determined. Please correct me if I understand wrongly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-892753150






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672787141



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewriteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, we could add it later and apply it in `selectDeletes()` API.
   
   As for strategies, it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they could impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. So we are planning two strategies at first:  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is kind of bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors could read only the delete contents that match the data files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672760398



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I didn't add `validOptions` because we haven't finalized what options are valid and it can be added late. But I'm fine to add them now. 
   
   For `name`, It looks like the class name already includes the strategy name.  Do we still need it?
    




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710613541



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the delete files to rewrite.
+     */
+    Set<DeleteFile> deleteFilesToReplace();

Review comment:
       Based on the name of these methods, I assume the commit will happen outside of the action? I think this is different compared to what @RussellSpitzer has in the new data compaction. 
   
   I believe the commit logic we use during the data compaction would also apply here as we can compact individual partitions concurrently. I'd love to see the same optimizations we added for data file rewrites. They make a big difference for large tables.
   
   @RussellSpitzer, thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r720862257



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeleteFiles(Set<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups file scans into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Set<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles);

Review comment:
       Why use `Set` here? The [proposal](https://docs.google.com/document/d/1-EyKSfwd_W9iI5jrzAvomVw3w1mb_kayVNT7f2I-SUg/edit#heading=h.fr22igmd6af0) uses `List`. If there's confusion maybe we should use `Iterable` instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r707514074



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       Why accept `FileScanTask` here? Will the delete action only find delete files based on a scan?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r736076234



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.
+   * <p>
+   * The row filter will be converted to a partition filter with an inclusive projection, so that candidate deletes are
+   * selected if any row match the expression. The matching delete files will be converted to position delete files.
+   *
+   * @param expression An iceberg expression used to find deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles rowFilter(Expression expression);

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r734148541



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);

Review comment:
       @rdblue , Anton mentioned that we could always convert the data filter to a partition filter, just want to confirm that do we have any specific reason to keep the partition filter?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
yyanyy commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-888635463


   +1 to Jack's comment on making the delete strategy API mimic the one for `RewriteStrategy` since I believe the intention there was so that the rewrites can happen in batches/people can decide if they want to rewrite a certain partition/subset of files. If we directly take no argument for select files, we are assuming users will always want to rewrite the entire table. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-884719032


   @RussellSpitzer @jackye1995 Is this ready to merge now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r720862327



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeleteFiles(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       Same here. I'd prefer to use `Iterable` instead of specific classes if it doesn't matter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r707514925



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       How does this strategy handle sequence numbers? Are all of the new delete files committed to the table at the next sequence number? If so, wouldn't that mean that this may only return position delete files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-882551386


   @jackye1995 @RussellSpitzer Could you please take a look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r709716318



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       When I was talking about that use case I was also in experimental phase, but so far what I see is that in practice the 2 most commonly used ones are still (1) equality -> position, (2) position -> data. Splitting equality deletes is most of the time not worth the computation effort because it's already doing a planning and it's more cost efficient to just rewrite it to position deletes. 
   
   Going back to the conversation we are having for the interface for `selectDeletesToRewrite`, it feels like we should have 2 different interfaces for rewriting equality and position deletes so that their interfaces can be a bit more specialized for their own use cases?

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       I think as long as we follow a similar pattern it's fine, but it's more important to make the interface suitable for deletes. Here we can consider 2 cases:
   1. rewrite equality -> position: we actually need the reverse of the current information, which is a list of data files associated with the delete file.
   2. rewrite position -> data: the current interface using FileScanTask seems suitable because we need to generate something similar for rewrite anyway.
   
   I see a few different ways we can go with this:
   1. use this current interface, the "scan task" is just a holder of delete and data files organized in the form ready for a scan. equality -> position can then reorganize information.
   2. invent a new interface suitable for both use cases, something like a `Iterable<RewriteDeleteTask>`
   3. instead of having a single rewrite strategy for both, having one for each, so that we can have something customized for equality delete.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       +1 for not having a custom impl here because if you think some strategy is valuable, it's likely also valuable for other users. Or maybe could you provide more context about what customization you would like to do in addition?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672787141



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710605835



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       +1 here too. We debated this while designing the new data compaction API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672760398



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I didn't add `validOptions` because we haven't finalized what options are valid and it can be added late. But I'm fine to add them now. 
   
   For `name`, It looks like the class name already includes the strategy name.  Do we still need it?
    

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing. 

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors can avoid reading useless delete contents.

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewriteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, we could add it later and apply it in `selectDeletes()` API.
   
   As for strategies, it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they could impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. So we are planning two strategies at first:  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is kind of bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors could read only the delete contents that match the data files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730232003



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       There is one minor difference between rewrite position delete and convert equality delete. Which is the convert action needs planning and rewrite does not. That's why here we use `FileScan` and use `DeleteFile` in the rewrite action. This has been changed back and forth. 

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> rewriteDeleteFiles(Iterable<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles);

Review comment:
       There is one difference between the rewrite position delete and the convert, it is about the planning. 

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {

Review comment:
       will do.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {

Review comment:
       Make sense to me, the below function should be used in the original API definition which has both rewrite and converts stuff

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.

Review comment:
       Hm, I think the first one should be enough, let me delete them.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();

Review comment:
       I call this combineXXX because I think bin-pack is a kind of combined action. But this indeed may be vague if we have something like reshuffle in the future. Let me make this more specific.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();
+
+  /**
+   * A filter for choosing deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {

Review comment:
       Thanks. We could always improve the API later.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();

Review comment:
       so we prefer to have the result separately, right?

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();
+
+    /**
+     * Returns the count of the added position delete files.
+     */
+    int addedDeleteFilesCount();

Review comment:
       Make sense to me. Will do.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.

Review comment:
       Instead of make API and doc generic, how about make them more specific? I mean change RewriteDeleteFiles to RewritePositionDeleteFiles.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r734858630



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);

Review comment:
       Thanks, done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732042919



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.
+ */
+public interface RewritePositionDeleteFiles
+    extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> {
+
+  /**
+   * bin-pack the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles binPackPositionDeletes();
+
+  /**
+   * A row filter for finding deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to find deletes.
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles rowFilter(Expression expression);

Review comment:
       I think that we plan to convert this to a partition filter using an inclusive projection like the equality delete rewrite, correct? @jackye1995, is there a use case for matching files and then using stats to filter out position delete files that don't match? I can't think of one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r679775054



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * rewrite the equality deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes rewriteEqDeletes();

Review comment:
       `rewriteEqDeletes` and `rewritePosDeletes` is a top-level choice for users to choose which kind of delete he or she wants to rewrite.  Here's an example: `actionsProvider().rewriteDeletes(table).rewriteEqDeletes().execute();`, users could clearly know what kind of deletes are handled. But you remind me that, we might also don't need these two as well, use one `strategy` API to select its implementation is enough.
   
   The strategy options are open and we could pass options from `options` interface from the specified strategy implementation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672512582



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions` and `options`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710606611



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       I think another common strategy would be to just compact delete files. For example, take 100 position deletes and write them as one (within a partition). Another use case is compacting a large number of small global deletes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710611471



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {

Review comment:
       Will it make sense to expose a filter? I guess it will be common to apply this optimization on specific partitions. We have this option for the data rewrite action.
   
   Something like this?
   
   ```
     RewriteDeleteFiles filter(Expression expression);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672569391



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       could you explain a bit about how these 2 interfaces will be used? For `RewriteStrategy` the method `Iterable<FileScanTask> selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` was quite straightforward, but here the method has no arguments so I am not very sure how would it be used to actually select deletes.
   
   I think it goes to the fundamental question of what are some potential rewrite delete strategies? For rewrite data files we know there are well-known operations like bin-packing. For deletes, what are some strategies we plan to implement?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r682238645



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       Is it possible for us to pass in `RewriteDeleteStrategy` here rather than a free formed string? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732799203



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+/**
+ * A strategy for the action to convert equality delete to position deletes.
+ */
+public interface ConvertEqualityDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       will update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-958099000






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r682242287



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       `RewriteDeleteStrategy` is defined in the core package, while this is in the api package. This is like the pattern for the `Catalog`, as the usage of loading catalog:
   
   ```java
   Catalog catalog = CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hadoopConf);
   ```
   
   we could hava a similar usage like:
   ```java
   actionsProvider()
     .rewriteDeletes(table)
     .strategy(ConvertEqDeletesStrategy.class.getName())
     .execute();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730155316



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();

Review comment:
       Not a strong opinion but have kept each action result separate so far. I think it probably makes sense to me here too.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {

Review comment:
       Are there any other conversions we plan to support other than equality -> position? According to the doc, it seems to be the only conversion we consider. If so, would it make sense to call it `ConvertEqualityDeleteFiles`? Just asking at this point.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {

Review comment:
       Are there any other conversions we plan to support other than equality -> position? According to the doc, it seems to be the only conversion we consider.
   
   If so, would it make sense to call it `ConvertEqualityDeleteFiles`? Just asking at this point.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {

Review comment:
       Are there any other conversions we plan to support other than equality -> position? According to the doc, it seems to be the only conversion we consider.
   
   If so, would it make sense to call it `ConvertEqualityDeleteFiles`? Just asking.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {

Review comment:
       Are there any other conversions we plan to support other than equality -> position? According to the doc, it seems to be the only conversion we consider.
   
   If so, would it make sense to call it `ConvertEqualityDeleteFiles`? Just asking. Then we can drop the method below.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.

Review comment:
       I am not sure the second sentence is accurate. It applies more to the rewrite action.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();

Review comment:
       Not a strong opinion but have kept each action result separate so far.
   I think it probably makes sense to me here too.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();
+
+    /**
+     * Returns the count of the added position delete files.
+     */
+    int addedDeleteFilesCount();

Review comment:
       If we think this action will be about converting equality to position, then these methods can be a bit more specific just like their Javadoc.
   
   For example, we can call them `convertedEqualityDeleteFilesCount` and  `addedPositionDeleteFilesCount`.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files according to a convert strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface ConvertDeleteFiles extends SnapshotUpdate<ConvertDeleteFiles, ConvertDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles convertEqualityDeletes();
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the count of the deletes that been converted.
+     */
+    int convertedDeleteFilesCount();
+
+    /**
+     * Returns the count of the added position delete files.
+     */
+    int addedDeleteFilesCount();

Review comment:
       If we think this action will be always about converting equality to position, then these methods can be a bit more specific just like their Javadoc.
   
   For example, we can call them `convertedEqualityDeleteFilesCount` and  `addedPositionDeleteFilesCount`.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.

Review comment:
       nit: In the doc, we are still debating whether a rewrite of equality deletes makes sense. I agree it is useful to have a generic name for this action and I like `RewriteDeleteFiles`. Should we ensure the Javadoc is generic too? What about dropping `position` in the class doc?

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();

Review comment:
       Question. There is consensus that bin-packing position deletes is probably the most useful rewrite. However, we may add a rewrite that would also reshuffle deletes, not only bin-pack them. We don't know whether that will be needed and when we will implement it, but what about calling this `binPackPositionDeletes` similar to `binPack` in data compaction? 

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();

Review comment:
       Question. There is consensus that bin-packing position deletes is probably the most useful rewrite. However, we may add a rewrite that would also reshuffle deletes, not only bin-pack them. We don't know whether that will be needed and when we will implement it but what about calling this `binPackPositionDeletes` similar to `binPack` in data compaction? That way, we can add another strategy later as `combineXXX` seems vague.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();
+
+  /**
+   * A filter for choosing deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {

Review comment:
       I assume we will compact each partition separately so we may want to provide some file group stats in the future, just like we do in data compaction. Too early to tell whether that will be useful, though. Just something to keep in mind in the future.

##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();
+
+  /**
+   * A filter for choosing deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {

Review comment:
       I assume we will compact each partition separately so we may want to provide some file group stats in the future, just like we do in data compaction. Too early to tell whether that will be useful, though. Just something to keep in mind.

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {

Review comment:
       @RussellSpitzer, we may want to rename `RewriteStrategy` into something data-specific since we are about to add more rewrite strategies.

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       Hm, I am not sure I get why we are accepting `dataFiles` here. It seems to be borrowed from the data compaction and does not match the design doc. I agree we should split delete files into group that will be processed independently but I am not sure what data files we will pass here?

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       Hm, I am not sure I get why we are accepting `dataFiles` here. It seems to be borrowed from the data compaction and does not match the design doc. I agree we should split delete files into group that will be processed independently but I am not sure what data files we will pass here.

##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertDeleteStrategy.java
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface ConvertDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to convert the deletes.
+   *
+   * @param deleteFilesToConvert a group of files to be converted together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Iterable<DeleteFile> convertDeleteFiles(Iterable<DeleteFile> deleteFilesToConvert);
+
+  /**
+   * Groups delete files into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param dataFiles iterable of data files that contain the DeleteFile to be converted
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Iterable<FileScanTask>> planDeleteFileGroups(Iterable<FileScanTask> dataFiles);

Review comment:
       In case of data compaction, it is an iterable of data file lists to be processed independently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732042010



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A row filter for finding the equality deletes to convert.

Review comment:
       I think that this should mention that this filter will be converted to a partition filter with an inclusive projection and that matching delete files will be rewritten. It should be clear that it isn't possible to match individual deletes, only whole files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732796367



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.
+ */
+public interface RewritePositionDeleteFiles
+    extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> {
+
+  /**
+   * bin-pack the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles binPackPositionDeletes();

Review comment:
       Make sense to remove this, `ConvertEqualityDeleteFiles` already remove the `convertEqualityDelete()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r682242287



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       `RewriteDeleteStrategy` is defined in the core package, while this is in the api package. This is like the pattern for the `Catalog`, we use the API like this:
   
   ```java
   Catalog catalog = CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hadoopConf);
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710603703



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);

Review comment:
       Thanks for the verification! @jackye1995. Now, I'm inclined to use more specific ones. @rdblue WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710611024



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {

Review comment:
       nit: `RewriteDeletes` -> `RewriteDeleteFiles` to match `RewriteDataFiles` for consistency?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710718294



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       I agree. This is the major compaction that we mentioned before.  And I think this may be already available through `RewriteDataFiles`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r676545239



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       Hi @chenjunjiedada ,  thanks for tracking the v2's compaction for long time, It's really challenging.   I think we reviewers were distracted by other internal or external things , which blocked the v2 compaction progress for a long time.  While the fact is: the current v2 compaction feature is the biggest blocker for v2 right now, I'd like to push this feature forward, so that we don't block v2 forever. (And thanks for your bandwidth @jackye1995,  you always come out to help when the things get stuck).
   
   About this PR, since it's a API definition,  I'd like to take a look the whole delete rewrite design or preview PR if you have one because in that case I could evaluate whether the API abstraction is perfectly matching the real implementation.   Of course,  we could introduce a basic API, and then iterate the interface design when we get more implementation. It's looks better for me to reach a global agreement before we starting to provide a specific design,  that can help us reduce context sync on the core paths.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-887999783


   Thank you! I reviewed the PR. I will paste my major comment also here:
   
   After reading `ConvertEqDeletesStrategy.rewriteDeletes`, I think we can make the `RewriteDeleteStrategy` interface closer to `RewriteStrategy` interface. What we have there is basically the equivalent of `planFileGroups` plus `rewriteFiles` in `RewriteStrategy`. So I would propose we have the following methods in `RewriteDeleteStrategy` to be more aligned:
   
   ```
   Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
   
   Iterable<List<FileScanTask>> planDeleteGroups(Iterable<DeleteFile> deleteFiles);
   
   Set<DeleteFile> rewriteDeletes(List<DeleteFile> deleteFilesToRewrite);
   ```
   
   And we can get the partition `StructLike` directly from the list of scan tasks instead of passing it through the task pair in `EqualityDeleteRewriter`. In this way, we can also enable partial progress for commits.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710620033



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       +1. That is the next action, I have verified internally that it could bring some benefit for query.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710613541



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the delete files to rewrite.
+     */
+    Set<DeleteFile> deleteFilesToReplace();

Review comment:
       Based on the name of these methods, I assume the commit will happen outside of the action? I think this is different compared to what @RussellSpitzer has in the new data compaction action. 
   
   I believe the commit logic we use during the data compaction would also apply here as we can compact individual partitions concurrently. I'd love to see the same optimizations we added for data file rewrites. They make a big difference for large tables.
   
   @RussellSpitzer, thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672787141



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       The argument of `selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` comes from `planFileGroups` and the filter logic. I think we could build similar logic by storing the iterable of `FileScanTask` as a local variable in the concrete `RewrtiteDeleteStrategy` class, and grouping them in `rewriteDeletes()` API. As for filter logic, I think we can add it later and apply it in `selectDeletes()` API.
   
   For strategies, I think it depends on different scenarios. In our cases, which mostly use HDFS as storage, we care more about the small files. they impact the planning, reading performance and also bring overhead to the name node. So the strategy we want badly is to merge deletes. This could be summarized to  1) merge all position deletes. 2) read all equality deletes under the partition and convert them into one position delete.  I think the first one is also bin-packing, the second one is convert-and-bin-packing.  In the future, we might also want to sort the merged position delete by file name and break it into pieces so that executors can avoid reading useless delete contents.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730468176



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.

Review comment:
       I think that "find" would be a better verb than "choose". This doesn't really allow choice, it limits the parts of the table that are considered.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730728126



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);

Review comment:
       Make sense to me. Even a partition filter is enough, but let me keep both.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r734697284



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  ConvertEqualityDeleteFiles filter(Expression expression);

Review comment:
       Yeah, I guess you could use `in(bucket("id"), 1, 2, 3)`. That would be fine. We can remove the partition filter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732039921



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.
+ */
+public interface RewritePositionDeleteFiles
+    extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> {
+
+  /**
+   * bin-pack the position deletes.

Review comment:
       Should be capitalized.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732040564



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.
+ * <p>
+ * Generally used for optimizing the sizing and layout of position delete files within a table.
+ */
+public interface RewritePositionDeleteFiles
+    extends SnapshotUpdate<RewritePositionDeleteFiles, RewritePositionDeleteFiles.Result> {
+
+  /**
+   * bin-pack the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewritePositionDeleteFiles binPackPositionDeletes();

Review comment:
       Why add `PositionDeletes` to the end of the method name and not just `binPack()`? Also, if we aren't exposing multiple strategies in the near term, should we simply remove this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732046404



##########
File path: core/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteStrategy.java
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+/**
+ * A strategy for the action to convert equality delete to position deletes.
+ */
+public interface ConvertEqualityDeleteStrategy {
+
+  /**
+   * Returns the name of this convert deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this convert strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this convert strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to convert.
+   *
+   * @return iterable of original delete file to be converted.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();

Review comment:
       In the data file rewrite strategy, the equivalent method accepts an `Iterable<FileScanTask>` Should this be passed an `Iterable<DeleteFile>` so that this is selecting from the already loaded files rather than responsible for finding delete files itself?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r721419052



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Returns a set of options which this rewrite strategy can use. This is an allowed-list and any options not
+   * specified here will be rejected at runtime.
+   */
+  Set<String> validOptions();
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);
+
+  /**
+   * Select the delete files to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeleteFiles();
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeleteFiles(Set<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups file scans into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Set<DeleteFile>> planDeleteFileGroups(Iterable<DeleteFile> deleteFiles);

Review comment:
       Updated to `Iterable`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r676791288



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       great, let me read through those, thank you!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r714249485



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);

Review comment:
       @aokolnychyi thanks for the feedback!
   
   > I think there is a third use case: simply compacting position deletes.
   
   Yes agree, that maps to the Hive minor compaction as well.
   
   > will position -> data be something we take here or in RewriteDataFiles? I can see it being part of the data compaction. Like if we detect that certain data files have too many matching delete files, we want to rewrite them.
   
   I remember I had a discussion around this with Russell when he was implementing bin packing. When we rewrite data files, yes position delete information is merged to that, but delete files are not dropped, because the delete file might still reference other files that are not included during bin-packing. We discussed the possibility for also adding that feature in the data compaction action, the conclusion was to have another action specifically for this use case to (1) not make data compaction too complicated, (2) have separated actions for different purposes, (3) have better SLA control for different actions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r676771098



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       +1 for a general design. And please let me know if there is anything that can be parallelized for implementation once we reached an agreement on this. I am currently mostly spending time trying to push the work on Trino side, but should have some bandwidth to take some implementation work for this if necessary. As openinx said, this is the biggest blocker for v2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710597076



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       I think we all agree that eq->pos is the one that can share with others. Let me update this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r682242287



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       `RewriteDeleteStrategy` is defined in the core package, while this is in the api package. This is like the pattern for the `Catalog`, as the usage of loading catalog:
   
   ```java
   Catalog catalog = CatalogUtil.loadCatalog(HiveCatalog.class.getName(), "hive", ImmutableMap.of(), hadoopConf);
   ```
   
   we could have a similar usage like:
   ```java
   actionsProvider()
     .rewriteDeletes(table)
     .strategy(ConvertEqDeletesStrategy.class.getName())
     .execute();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r710606611



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       I think another common strategy would be to just compact delete files. For example, take 100 position deletes and write them as one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r707517762



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeletes.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+
+public interface RewriteDeletes extends SnapshotUpdate<RewriteDeletes, RewriteDeletes.Result> {
+
+  /**
+   * Set the implementation class name for rewrite strategy.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeletes strategy(String strategyImpl);

Review comment:
       For data file rewrites, we didn't allow custom extension because we don't think that there's a lot of value in it just yet and we want to keep things simple if there isn't a use for dynamically loaded classes. I think we should probably take the same approach here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r716651849



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles convertEqualityDeletes();
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();
+
+  /**
+   * A filter for choosing deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the delete files to rewrite.
+     */
+    Set<DeleteFile> deleteFilesToReplace();

Review comment:
       Yes, This may confuse people. Anton already mentioned the commit happen out of action because of this. Let me change to return count.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r716278093



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewriteDeleteFiles.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting delete files according to a rewrite strategy.
+ * Generally used for optimizing the sizing and layout of delete files within a table.
+ */
+public interface RewriteDeleteFiles extends SnapshotUpdate<RewriteDeleteFiles, RewriteDeleteFiles.Result> {
+
+  /**
+   * Convert the equality deletes to the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles convertEqualityDeletes();
+
+  /**
+   * Combine the position deletes.
+   *
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles combinePositionDeletes();
+
+  /**
+   * A filter for choosing deletes to rewrite.
+   *
+   * @param expression An iceberg expression used to choose deletes.
+   * @return this for method chaining
+   */
+  RewriteDeleteFiles filter(Expression expression);
+
+  /**
+   * The action result that contains a summary of the execution.
+   */
+  interface Result {
+    /**
+     * Returns the delete files to rewrite.
+     */
+    Set<DeleteFile> deleteFilesToReplace();

Review comment:
       I don't think that you want to return actual data files here. `RewriteDataFiles` just returns counts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-885965089


   @rdblue @aokolnychyi @openinx You may want to take a look as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r672512582



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions` and `options`

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       I think we also need the methods similar to `RewriteStrategy` such as `name`, `validOptions`

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletes();

Review comment:
       could you explain a bit about how these 2 interfaces will be used? For `RewriteStrategy` the method `Iterable<FileScanTask> selectFilesToRewrite(Iterable<FileScanTask> dataFiles)` was quite straightforward, but here the method has no arguments so I am not very sure how would it be used to actually select deletes.
   
   I think it goes to the fundamental question of what are some potential rewrite delete strategies? For rewrite data files we know there are well-known operations like bin-packing. For deletes, what are some strategies we plan to implement?

##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+

Review comment:
       `name` is mostly used for logging purpose. I agree it is already in the class name, but for consistency with `RewriteStrategy` I think we can still have it here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-921353065


   I'll do a pass today too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#issuecomment-944689663






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r732039386



##########
File path: api/src/main/java/org/apache/iceberg/actions/RewritePositionDeleteFiles.java
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for rewriting the position delete files.

Review comment:
       Nit: no need for "the"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenjunjiedada commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
chenjunjiedada commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r730726986



##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.
+ */
+public interface ConvertEqualityDeleteFiles
+    extends SnapshotUpdate<ConvertEqualityDeleteFiles, ConvertEqualityDeleteFiles.Result> {
+
+  /**
+   * A filter for choosing the equality deletes to convert.

Review comment:
       OK, will update.

##########
File path: api/src/main/java/org/apache/iceberg/actions/ConvertEqualityDeleteFiles.java
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import org.apache.iceberg.expressions.Expression;
+
+/**
+ * An action for converting the equality delete files to position delete files according to a convert strategy.

Review comment:
       Will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2841: API: add an action API for rewrite deletes

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2841:
URL: https://github.com/apache/iceberg/pull/2841#discussion_r707513160



##########
File path: core/src/main/java/org/apache/iceberg/actions/RewriteDeleteStrategy.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.actions;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileScanTask;
+import org.apache.iceberg.Table;
+
+public interface RewriteDeleteStrategy {
+
+  /**
+   * Returns the name of this rewrite deletes strategy
+   */
+  String name();
+
+  /**
+   * Returns the table being modified by this rewrite strategy
+   */
+  Table table();
+
+  /**
+   * Select the deletes to rewrite.
+   *
+   * @param dataFiles iterable of FileScanTasks for data files in a given partition
+   * @return iterable of original delete file to be replaced.
+   */
+  Iterable<DeleteFile> selectDeletesToRewrite(Iterable<FileScanTask> dataFiles);
+
+  /**
+   * Define how to rewrite the deletes.
+   *
+   * @param deleteFilesToRewrite a group of files to be rewritten together
+   * @return iterable of delete files used to replace the original delete files.
+   */
+  Set<DeleteFile> rewriteDeletes(Set<DeleteFile> deleteFilesToRewrite);
+
+  /**
+   * Groups file scans into lists which will be processed in a single executable unit. Each group will end up being
+   * committed as an independent set of changes. This creates the jobs which will eventually be run as by the underlying
+   * Action.
+   *
+   * @param deleteFiles iterable of DeleteFile to be rewritten
+   * @return iterable of lists of FileScanTasks which will be processed together
+   */
+  Iterable<Set<DeleteFile>> planDeleteGroups(Iterable<DeleteFile> deleteFiles);
+
+  /**
+   * Sets options to be used with this strategy
+   */
+  RewriteDeleteStrategy options(Map<String, String> options);

Review comment:
       Nit: it would be nice to use the same method order as `RewriteStrategy` for consistency.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org