You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/24 16:04:22 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request #3176: Core: Add position and equality delta writer interfaces

aokolnychyi opened a new pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176


   This PR adds position and equality delta writer interfaces and contains a subset of changes in PR #2945.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715882169



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.
+   *
+   * @param path a data file path
+   * @param pos  a position
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  default void delete(CharSequence path, long pos, PartitionSpec spec, StructLike partition) {
+    delete(path, pos, null, spec, partition);
+  }
+
+  /**
+   * Deletes a position in the provided spec/partition and persists the old row.

Review comment:
       Same question as last javadoc




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715736643



##########
File path: core/src/main/java/org/apache/iceberg/io/EqualityDeltaWriter.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and equality deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface EqualityDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a row from the provided spec/partition.
+   * <p>
+   * This method assumes the delete record has the same schema as the rows that will be inserted.
+   *
+   * @param row a delete record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void delete(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a key from the provided spec/partition.
+   * <p>
+   * This method assumes the delete key schema matches the equality field IDs.
+   *
+   * @param key a delete key
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void deleteKey(T key, PartitionSpec spec, StructLike partition);

Review comment:
       @rdblue, this is `directDelete` you mentioned. Also added docs about the schema expectations.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715882008



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       I'm not sure I understand what this one means either, the old row is the original row matching the position of this delete file? Why would I be persisting it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715884598



##########
File path: core/src/main/java/org/apache/iceberg/io/EqualityDeltaWriter.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and equality deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface EqualityDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a row from the provided spec/partition.
+   * <p>
+   * This method assumes the delete record has the same schema as the rows that will be inserted.
+   *
+   * @param row a delete record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void delete(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a key from the provided spec/partition.
+   * <p>
+   * This method assumes the delete key schema matches the equality field IDs.

Review comment:
       I'll try to rephrase it then :) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715885892



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       ah so "Delete a position and record the deleted row in the delete file" vs "Delete a position"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715889348



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       Yeah, that's a good way to put it. I'll update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715884386



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       The spec allows us to persist the deleted row in positional delete files. This may be helpful to reconstruct CDC records or to persist the sort key for min/max filtering.
   
   That being said, I don't plan to persist it from Spark.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715738785



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.
+   *
+   * @param path a data file path
+   * @param pos  a position
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  default void delete(CharSequence path, long pos, PartitionSpec spec, StructLike partition) {
+    delete(path, pos, null, spec, partition);
+  }
+
+  /**
+   * Deletes a position in the provided spec/partition and persists the old row.
+   *
+   * @param path a data file path
+   * @param pos a position
+   * @param row a deleted row
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void delete(CharSequence path, long pos, T row, PartitionSpec spec, StructLike partition);

Review comment:
       @rdblue, I kept the optional `row` before `spec` and `partition`. In most new APIs, spec and partition are the last arguments so even though `row` is an optional argument, having spec and partition as last seems more consistent. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#issuecomment-926946744


   Updated the Javadoc and also added tests for delete and insert only cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#issuecomment-926751249


   cc @openinx @RussellSpitzer @szehon-ho @rdblue @kbendick @karuppayya @flyrain @pvary @jackye1995 @yyanyy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#issuecomment-927388380


   Looks great. Thanks for getting these in, @aokolnychyi!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715737202



##########
File path: core/src/main/java/org/apache/iceberg/io/EqualityDeltaWriter.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and equality deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface EqualityDeltaWriter<T> extends Closeable {

Review comment:
       This one will be implemented by the CDC writer that I will submit in a separate PR. It is large.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715929833



##########
File path: core/src/main/java/org/apache/iceberg/io/EqualityDeltaWriter.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and equality deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface EqualityDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a row from the provided spec/partition.
+   * <p>
+   * This method assumes the delete record has the same schema as the rows that will be inserted.
+   *
+   * @param row a delete record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void delete(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a key from the provided spec/partition.
+   * <p>
+   * This method assumes the delete key schema matches the equality field IDs.

Review comment:
       Changed.

##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       Updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715850867



##########
File path: core/src/main/java/org/apache/iceberg/io/EqualityDeltaWriter.java
##########
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and equality deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface EqualityDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a row from the provided spec/partition.
+   * <p>
+   * This method assumes the delete record has the same schema as the rows that will be inserted.
+   *
+   * @param row a delete record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void delete(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a key from the provided spec/partition.
+   * <p>
+   * This method assumes the delete key schema matches the equality field IDs.

Review comment:
       I don't know what this means :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#issuecomment-928026251


   Thanks for reviewing, @rdblue @RussellSpitzer!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715736241



##########
File path: core/src/main/java/org/apache/iceberg/io/BasePositionDeltaWriter.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.IOException;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class BasePositionDeltaWriter<T> implements PositionDeltaWriter<T> {
+
+  private final PartitioningWriter<T, DataWriteResult> dataWriter;
+  private final PartitioningWriter<PositionDelete<T>, DeleteWriteResult> deleteWriter;
+  private final PositionDelete<T> positionDelete;
+
+  private boolean closed;
+
+  public BasePositionDeltaWriter(PartitioningWriter<T, DataWriteResult> dataWriter,
+                                 PartitioningWriter<PositionDelete<T>, DeleteWriteResult> deleteWriter) {
+    Preconditions.checkArgument(dataWriter != null, "Data writer cannot be null");
+    Preconditions.checkArgument(deleteWriter != null, "Delete writer cannot be null");
+
+    this.dataWriter = dataWriter;
+    this.deleteWriter = deleteWriter;
+    this.positionDelete = PositionDelete.create();
+  }
+
+  @Override
+  public void insert(T row, PartitionSpec spec, StructLike partition) {
+    dataWriter.write(row, spec, partition);
+  }
+
+  @Override
+  public void delete(CharSequence path, long pos, T row, PartitionSpec spec, StructLike partition) {
+    positionDelete.set(path, pos, row);
+    deleteWriter.write(positionDelete, spec, partition);
+  }
+
+  @Override
+  public WriteResult result() {

Review comment:
       I am using the existing `WriteResult` that @openinx created. It has a builder and already takes care converting values to arrays for serialization.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715735532



##########
File path: core/src/main/java/org/apache/iceberg/io/BasePositionDeltaWriter.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.IOException;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+import org.apache.iceberg.deletes.PositionDelete;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class BasePositionDeltaWriter<T> implements PositionDeltaWriter<T> {

Review comment:
       This is the writer to use for Spark merge-on-read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3176: Core: Add position and equality delta writer interfaces

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3176:
URL: https://github.com/apache/iceberg/pull/3176#discussion_r715885892



##########
File path: core/src/main/java/org/apache/iceberg/io/PositionDeltaWriter.java
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.io;
+
+import java.io.Closeable;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.StructLike;
+
+/**
+ * A writer capable of writing data and position deletes that may belong to different specs and partitions.
+ *
+ * @param <T> the row type
+ */
+public interface PositionDeltaWriter<T> extends Closeable {
+
+  /**
+   * Inserts a row to the provided spec/partition.
+   *
+   * @param row a data record
+   * @param spec a partition spec
+   * @param partition a partition or null if the spec is unpartitioned
+   */
+  void insert(T row, PartitionSpec spec, StructLike partition);
+
+  /**
+   * Deletes a position in the provided spec/partition without persisting the old row.

Review comment:
       ah so "Delete a position and record the deleted row in the delete file"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org