You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/01 10:08:21 UTC

[GitHub] [spark] zhengruifeng commented on issue #25256: [SPARK-28514][ML] Remove the redundant transformImpl method in RF & GBT

zhengruifeng commented on issue #25256: [SPARK-28514][ML] Remove the redundant transformImpl method in RF & GBT
URL: https://github.com/apache/spark/pull/25256#issuecomment-517219577
 
 
   @BryanCutler  @srowen  I am neutral on model broadcasting, I notice that there are three approachs for broadcastable/small models to performance transformation:
   1, directly serialize the model in the closure (the most cases);
   2, broadcast the model in the `transform` method every time (like `Word2Vec`/`GBTRegressor`);
   3, broadcast the model if it is not broadcasted yet, the the broadcasted model can be reused among calls (like `CountVectorizer`);
   If the model broadcasting is better, can we apply it for all algs?
   
   As to this pr, if it can improve performance, I am OK to leave `GBTRegressor` & `RandomForestRegressor`;
   However, the `transformImpl` methods in `GBTClassifier` & `RandomForestClassifier` are never used, so I tend to remove them.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org