You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/28 12:24:22 UTC

[GitHub] [iceberg] openinx opened a new pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

openinx opened a new pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996


   Many people will export the result of flink aggregate values into apache iceberg table,  for example: 
   
   ```sql
   SELECT count(click_num)  FROM click_events GROUP BY DATE(click_timestamp) ; 
   ```
   
   This stream query will count the click number since the beginning of today (00:00:00),  every emitted  events will be a UPSERT events which overwrite the previous accumulated click_num. 
   
   In this cases,  we will need to transform all INSERT/UPDATE_AFTER to be UPSERT, which means DELETE + INSERT the key.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#discussion_r549462204



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -172,6 +174,20 @@ public Builder writeParallelism(int newWriteParallelism) {
       return this;
     }
 
+    /**
+     * All INSERT/UPDATE_AFTER events from input stream will be transformed to UPSERT events, which means it will
+     * DELETE the old records and then INSERT the new records. In partitioned table, the partition fields should be
+     * a subset of equality fields, otherwise the old row that located in partition-A could not be deleted by the
+     * new row that located in partition-B.

Review comment:
       Does anything validate this constraint?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-751839146


   Mostly looks good, but I don't think that upsert should be supported for `UPDATE_AFTER`. Interested to hear your rationale for that case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-887440763


   Since @Reo-LEI  picked this PR,  I will close this PR now. And let's review that PR [here](https://github.com/apache/iceberg/pull/2863).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-882737252


   I think this is just waiting on someone to pick it up again. UPSERT should be unblocked now that row identifier fields have been added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Reo-LEI commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
Reo-LEI commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-886222692


   I'm pick this up on #2863 @rdblue 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] himanshpal commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
himanshpal commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-852783056


   @rdblue @openinx - Is there any update on this. Currently we are seeing duplicate rows while writing/compacting cdc events to table ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] haormj commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
haormj commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-881117051


   @openinx @rdblue Is there any update on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-882737252


   I think this is just waiting on someone to pick it up again. UPSERT should be unblocked now that row identifier fields have been added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx closed pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
openinx closed pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wg1026688210 commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
wg1026688210 commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-872675546


   Whether the pr can be merge .   In one of our scenarios, the binlog of tidb has no before_update data . We help flink  can help us to do it 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wg1026688210 edited a comment on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
wg1026688210 edited a comment on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-872675546


   Whether the pr can be merge .   In one of our scenarios, the binlog of tidb has no before_update data before after_update. We hope flink can help us to do it  @rdblue @openinx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wg1026688210 edited a comment on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
wg1026688210 edited a comment on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-872675546


   Whether the pr can be merge .   In one of our scenarios, the binlog of tidb has no before_update data . We hope flink can help us to do it 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#discussion_r549462998



##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/BaseDeltaTaskWriter.java
##########
@@ -70,6 +73,9 @@ public void write(RowData row) throws IOException {
     switch (row.getRowKind()) {
       case INSERT:
       case UPDATE_AFTER:
+        if (upsert) {

Review comment:
       It seems like this should only happen for the `INSERT` case because `UPDATE_AFTER` implies that there was an `UPDATE_BEFORE` that will perform the delete. This would delete the same row twice in that case, causing more equality deletes to be written for the row.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1996: Flink: Transform INSERT as one DELETE following one INSERT if configure to use UPSERT

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1996:
URL: https://github.com/apache/iceberg/pull/1996#issuecomment-882737252


   I think this is just waiting on someone to pick it up again. UPSERT should be unblocked now that row identifier fields have been added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org