You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/28 16:47:57 UTC

[GitHub] [iceberg] stevenzwu commented on pull request #6253: Flink: Write watermark to the snapshot summary

stevenzwu commented on PR #6253:
URL: https://github.com/apache/iceberg/pull/6253#issuecomment-1329418892

   there is a previous attempt on writing the watermark (data completeness) to the Iceberg table: https://github.com/apache/iceberg/pull/2109/files.
   
   Let me summarize the questions/points. Would be great to gather some inputs. @rdblue @pvary @dixingxing0 @zhangjun0x01 @liubo1022126 
   
   1. should we write the Flink watermark or use the min value of some timestamp column (typically associated with the time partitioned table)? it is easier to just write Flink watermark. but it might be more meaningful to use the timestamp column value as it is used for table partition.
   2. should we write the metadata as snapshot summary or table properties? It is minimal change to write as snapshot summary as shown in this PR or PR #2109. It is a little bigger change (like using transaction) to write as a table property, but it will be easier for consumer to extract the info.
   3. we need to be able to support multiple Flink jobs (e.g. from multiple regions) writing to the same Iceberg table. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org