You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/01 12:19:52 UTC

[GitHub] [iceberg] moon-fall opened a new pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

moon-fall opened a new pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012


   this PR is to support only write the parimary key to the eqDeleteFile. 
   I think in some cases it is not necessary to write entire row into the equality-delete file,we  just need to write the values of equality fields into the equality-delete file,so that we can Improves write speed and saves space.
   
   A way to implement  is using the deleteKey method when write delete rowdata ,and I add the configuration onlyWriteEqualityFieldColumns in FlinkSInk for the behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moon-fall commented on pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
moon-fall commented on pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#issuecomment-907634671


   @rdblue i have update the PR, I add the configuration onlyWriteEqualityFieldColumns in FlinkSInk and some test, please  have a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
coolderli commented on pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#issuecomment-904437489


   I think the entire row is required. As https://github.com/apache/iceberg/issues/360#issuecomment-653532308  mentioned, when reading the format v2 table by Flink streaming, we can read the equality file first. In our implementation, we read the equality file as a data file, and we can get the entire row in the equality file. It is useful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moon-fall commented on pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
moon-fall commented on pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#issuecomment-905150619


   > @moon-fall, can you please add a description for this PR?
   > 
   > Also, it looks like this changes the intended behavior. We purposely wrote the entire row to delete files. It's reasonable to write just the equality delete columns, but merging this would break people that want the entire row. Can you find a good way to configure this and make it optional rather than changing the default behavior?
   
   ok,i have edited the description and i will modify the code to add the configure later.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moon-fall closed pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
moon-fall closed pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moon-fall commented on a change in pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
moon-fall commented on a change in pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#discussion_r695351726



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
##########
@@ -63,6 +70,15 @@ RowDataWrapper wrapper() {
     return wrapper;
   }
 
+  RowData projectDeleteData(RowData data) {
+    wrapper.wrap(data);

Review comment:
       this is to wrap the raw data ,I use RowDataWrapper to select the primary key to generate the projected RowData.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] moon-fall closed pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
moon-fall closed pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#issuecomment-903903645


   @moon-fall, can you please add a description for this PR?
   
   Also, it looks like this changes the intended behavior. We purposely wrote the entire row to delete files. It's reasonable to write just the equality delete columns, but merging this would break people that want the entire row. Can you find a good way to configure this and make it optional rather than changing the default behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#issuecomment-904747064


   @coolderli, the format supports writing just the delete columns. Whether that is a good idea depends on the use case. I think we can support this if people have a use case for it, but it shouldn't be the default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a change in pull request #3012: Flink: use deleteKey method when write delete data to only write the parimary key to the eqDeleteFile

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on a change in pull request #3012:
URL: https://github.com/apache/iceberg/pull/3012#discussion_r694985311



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
##########
@@ -63,6 +70,15 @@ RowDataWrapper wrapper() {
     return wrapper;
   }
 
+  RowData projectDeleteData(RowData data) {
+    wrapper.wrap(data);

Review comment:
       should this be `deleteWrapper`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org