You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/28 16:58:42 UTC

[GitHub] [iceberg] gustavoatt opened a new pull request, #5148: StreamingDelete becomes public class

gustavoatt opened a new pull request, #5148:
URL: https://github.com/apache/iceberg/pull/5148

   Make `StreamingDelete` a public class so that it can be used on packages outside of `org.apache.iceberg`.
   
   This is useful for whenever users want to subclass the default implementation for monitoring or to add custom snapshot properties.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gustavoatt commented on pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
gustavoatt commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1170204423

   Thank you Ryan!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gustavoatt commented on pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
gustavoatt commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1170092679

   An alternative approach that can work for us, could be having `MergingSnapshotProducer` being a public class. We basically need thin wrappers on top of default implementations of APIs similar to `BaseOverwriteFiles`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gustavoatt commented on pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
gustavoatt commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1169999432

   @rdblue let me provide more context. What we are doing at Airbnb, is that we are trying to fully integrate Airflow partition sensing with Airflow tables. One problem we have encountered when doing this is keeping track of empty partitions added explicitly via the command:
   
   ```sql
   INSERT OVERWRITE $TBL
   PARTITION (parKey=partVal,...)
   SELECT *
   FROM $EMPTY_SELECTION
   ```
   
   Currently there is no way for us to know whether such an insert took place for sensing purposes. We have decided to keep track of these by adding custom properties on `Snapshot::summary` via a subclass of `BaseOvewriteFiles` that captures the `overwriteByRowFilter` passed to find out whether we were inserting a single partition.
   
   We want to do the same for `DeleteFiles` so that we know when an empty partition "gets deleted", if that makes sense. Unfortunately the default implementation of `DeleteFiles` is protected which means we cannot subclass it outside of the iceberg package.
   
   Note that we are hoping to keep track of empty partitions without having to do any changes to the way we write to Spark tables, so injecting our custom `OverwriteFiles` and `DeleteFiles` via our custom catalog is the most transparent way we have found to keep track of these expressions without having to change user jobs code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1170200019

   @gustavoatt, I think this sounds reasonable. I didn't realize that other operations, like `BaseOverwriteFiles` were already public. Since this is in core and the other is public, I don't see a reason to keep this restricted. I merged this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5148:
URL: https://github.com/apache/iceberg/pull/5148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5148: StreamingDelete becomes public class

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5148:
URL: https://github.com/apache/iceberg/pull/5148#issuecomment-1169398192

   @gustavoatt, to add custom snapshot properties, you should be able to call `set` on the public interface. As for monitoring, should we add that into the class rather than making this public?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org