You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/18 22:34:55 UTC

[GitHub] [iceberg] stevenzwu opened a new issue #1626: Implements the Flink source based on the new FLIP-27 interface

stevenzwu opened a new issue #1626:
URL: https://github.com/apache/iceberg/issues/1626


   Flink 1.11 release included a new source interface ([FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)) that works better with Iceberg
   
   It has two logical components
   
   - enumerator that runs on the jobmanager (driver)
   - parallel readers that runs on the taskmanagers (workers).
   
   It is a unified source interface for both streaming and batch mode. Its dynamic split assignment nature can avoid the straggler/outlier problem from the simple static round-robin split assignment (old interface).
   
   Initial scope can be the bounded/batch mode.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-743089839


   Thanks for the great work, @stevenzwu !  I will take a look in these days.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Right now, the `TestIcebergSource` all passes. I still need to spend more time to make the `TestIcebergSourceReaderDeletes` work. We will prepare a design doc so that the community can discuss important design and interface questions.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-714738924


   Thanks @stevenzwu. I'm looking forward to this as well. The new source interface allows OSS to once again chain AsyncWait tasks which has been a pain point, so thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Right now, the `TestIcebergSource` all passes. I still need to spend more time to make the `TestIcebergSourceReaderDeletes` work. 
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
openinx commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-711696891


   Looking forward to the draft PR or Poc patch, Thanks for the work @stevenzwu .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Right now, the `TestIcebergSource` all passes. But `TestIcebergSourceReaderDeletes` is flaky. Need to figure out why. 
   
   We will prepare a design doc so that the community can discuss important design and interface questions.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Both `TestIcebergSource` and `TestIcebergSourceReaderDeletes` pass now. 
   
   We will prepare a design doc so that the community can discuss important design and interface questions. Main goal is to support pluggable split assigner: simple assigner(no ordering or locality guarantee), assigner that provides some ordering guarantee (for backfill), assigner that is locality aware etc.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Both `TestIcebergSource` and `TestIcebergSourceReaderDeletes` pass now. 
   
   We will prepare a design doc so that the community can discuss important design and interface questions. Main goal is to support pluggable split assigner, which is not reflected in the current PoC code. E.g. here are some possible assignment strategies: simple assigner that provides no ordering or locality guarantee, assigner that provides some ordering guarantee (for backfill), assigner that is locality aware etc.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] tweise commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
tweise commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-770148349


   https://github.com/apache/iceberg/pull/2105
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Both `TestIcebergSource` and `TestIcebergSourceReaderDeletes` pass now. 
   
   We will prepare a design doc so that the community can discuss important design and interface questions.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu edited a comment on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu edited a comment on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-715622716


   @openinx @kbendick  you can find the PoC code [here](https://github.com/stevenzwu/iceberg/pull/1/files). Right now, the `TestIcebergSource` all passes. I still need to spend more time to make the `TestIcebergSourceReaderDeletes` work. 
   
   We will prepare a design doc so that the community can discuss important design and interface questions.
   
   @kbendick can you elaborate more on what is the pain point with chaining AsyncWait tasks and why FLIP-27 source interface helps? Just interested to learn.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #1626: Implements the Flink source based on the new FLIP-27 interface

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #1626:
URL: https://github.com/apache/iceberg/issues/1626#issuecomment-740234141


   My colleague (@sundargates) and I worked on this design doc: https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit#heading=h.z2xhib39g0nm. We would love to get feedbacks from the community. cc @openinx @JingsongLi @kbendick 
   
   Regarding the implementation, we are still waiting for the 1.11.3 release, as we need some of the API changes back-ported from 1.12 branch. Even with 1.11.3, we still need to copy the BulkFormat related code from the file source 1.12.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org