You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "黄龙龙 (JIRA)" <ji...@apache.org> on 2018/01/25 08:08:00 UTC

[jira] [Commented] (SPARK-23211) SparkR MLlib randomFroest parameter problem

    [ https://issues.apache.org/jira/browse/SPARK-23211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338894#comment-16338894 ] 

黄龙龙 commented on SPARK-23211:
-----------------------------

I want to konw the usage of parameter newData in spark.randomForest

{SparkR} and more details of it. Thanks.

[SparkR Document|http://spark.apache.org/docs/latest/api/R/index.html]

> SparkR MLlib randomFroest  parameter problem
> --------------------------------------------
>
>                 Key: SPARK-23211
>                 URL: https://issues.apache.org/jira/browse/SPARK-23211
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.1.0
>         Environment: {code:R}
> sdf_list <- randomSplit(train_data, rep(7, 3), 10086) 
> model <- spark.randomForest(
>   sdf_list[[1]],  
>   forward_count ~ .,   
>   type          = "regression",   
>   path          = paste0("./predict/model/randomForest_", x),   
>   overwrite     = TRUE,  
>   newData       = sdf_list[[2]])
> {code}
> train_data is a SparkDataFrame
> The notes of parameter newData is "a SparkDataFrame for testing."
> The notes of parameter path is "The directory where the model is saved."
> These all don't work normaly.
> why?
>            Reporter: 黄龙龙
>            Priority: Major
>              Labels: documentation, usability
>
> spark.randomForest() and randomSplit() problem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org