You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/06/21 09:58:19 UTC

[GitHub] [flink-ml] yunfengzhou-hub opened a new pull request, #113: [FLINK-27096] Optimize OneHotEncoder performance

yunfengzhou-hub opened a new pull request, #113:
URL: https://github.com/apache/flink-ml/pull/113

   This PR optimizes the performance of one-hot encoder algorithm with the following modifications:
   
   - Restructures the DAG of OneHotEncoder, so that the very first stream operator can pre-process input data with aggregation operations, so that the data transmission overhead is reduced.
   - Avoids unnecessary `String.format()` operation when passing error message to `Precondition.checkArgument`.
   
   These optimizations together reduces the net runtime of OneHotEncoder benchmark jobs to about 1/6.
   
   This PR also does the following:
   - Adds example benchmark json file for OneHotEncoder
   - Supports generating distinct double values in `DoubleGenerator`.
     - This modification has only slight influence on the performance of DoubleGenerator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 commented on pull request #113: [FLINK-27096] Optimize OneHotEncoder performance

Posted by GitBox <gi...@apache.org>.
lindong28 commented on PR #113:
URL: https://github.com/apache/flink-ml/pull/113#issuecomment-1161667912

   Thanks for the PR! LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-ml] lindong28 merged pull request #113: [FLINK-27096] Optimize OneHotEncoder performance

Posted by GitBox <gi...@apache.org>.
lindong28 merged PR #113:
URL: https://github.com/apache/flink-ml/pull/113


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org