You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/06/17 07:36:28 UTC

[GitHub] [flink-ml] yunfengzhou-hub opened a new pull request, #112: [FLINK-27096] Flush buffer at epoch watermark

yunfengzhou-hub opened a new pull request, #112:
URL: https://github.com/apache/flink-ml/pull/112

   This PR reduces Flink ML iteration's latency by enforcing flush at each iteration epoch watermark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on pull request #112: [FLINK-27096] Flush buffer at epoch watermark

Posted by GitBox <gi...@apache.org>.
lindong28 commented on PR #112:
URL: https://github.com/apache/flink-ml/pull/112#issuecomment-1159323425

   @yunfengzhou-hub Thanks for the PR! LGTM. Could you explain the performance impact of this PR on the iteration benchmark with and without trivial UDF payload?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] yunfengzhou-hub commented on pull request #112: [FLINK-27096] Flush buffer at epoch watermark

Posted by GitBox <gi...@apache.org>.
yunfengzhou-hub commented on PR #112:
URL: https://github.com/apache/flink-ml/pull/112#issuecomment-1159880670

   Hi @lindong28, Thanks for reviewing this PR. 
   
   Before this PR is implemented, Flink ML stages with iterations, like KMeans, have to spend at least 100 ms for each iteration they perform, even if the payload of the UDF in the iteration body is trivial. This is because Flink ML has to wait until buffer timeout before it can forward the iteration epoch watermark to the next round.
   
   With the flushing behavior introduced by this PR, the 100 ms overhead described above would be saved. Flink ML stages could progress to the next epoch as soon as it receives the iteration watermark, instead of being restricted by buffer timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on pull request #112: [FLINK-27096] Flush buffer at epoch watermark

Posted by GitBox <gi...@apache.org>.
lindong28 commented on PR #112:
URL: https://github.com/apache/flink-ml/pull/112#issuecomment-1159959274

   Thanks for the information. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on pull request #112: [FLINK-27096] Flush buffer at epoch watermark

Posted by GitBox <gi...@apache.org>.
lindong28 commented on PR #112:
URL: https://github.com/apache/flink-ml/pull/112#issuecomment-1159956579

   For the record, @yunfengzhou-hub has done benchmarks to verify the performance impact of this PR. Here are the results:
   
   - Benchmark setup: an iterative data stream job with an almost-empty iteration body. Buffer timeout is 100 ms. Run 500 iterations.
   - Before this PR, Flink's the average time per iteration is 182 ms.
   - After this PR, Flink's the average time per iteration is 6.4 ms.
   - Spark's average time per iteration is 11.7 ms.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 merged pull request #112: [FLINK-27096] Flush buffer at epoch watermark

Posted by GitBox <gi...@apache.org>.
lindong28 merged PR #112:
URL: https://github.com/apache/flink-ml/pull/112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org