You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/07/19 22:20:00 UTC

[jira] [Commented] (SPARK-21476) RandomForest classification model not using broadcast in transform

    [ https://issues.apache.org/jira/browse/SPARK-21476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093892#comment-16093892 ] 

Sean Owen commented on SPARK-21476:
-----------------------------------

I'm not sure what you're suggesting, that something should or shouldn't be broadcast? RandomForestClassificationModel.predictRaw does not broadcast anything, but it also performs no distributed operation. But then what are you saying is serialized that shouldn't be, if nothing is broadcast?

> RandomForest classification model not using broadcast in transform
> ------------------------------------------------------------------
>
>                 Key: SPARK-21476
>                 URL: https://issues.apache.org/jira/browse/SPARK-21476
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Saurabh Agrawal
>
> I notice significant task deserialization latency while running prediction with pipelines using RandomForestClassificationModel. While digging into the source, found that the transform method in RandomForestClassificationModel binds to its parent ProbabilisticClassificationModel and the only concrete definition that RandomForestClassificationModel provides and which is actually used in transform is that of predictRaw. Broadcasting is not being used in predictRaw.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org