You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/04/23 14:04:38 UTC

[jira] [Resolved] (SPARK-7091) Too slow when use GradientBoostedTrees to classify train data set.

     [ https://issues.apache.org/jira/browse/SPARK-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-7091.
------------------------------
    Resolution: Invalid

Please ask questions at user@spark.apache.org

> Too slow when use GradientBoostedTrees to classify train data set.
> ------------------------------------------------------------------
>
>                 Key: SPARK-7091
>                 URL: https://issues.apache.org/jira/browse/SPARK-7091
>             Project: Spark
>          Issue Type: Question
>          Components: MLlib
>    Affects Versions: 1.3.1
>            Reporter: lee.xiaobo.2006
>
> This is one stage that consume too many times, The train data set shape is 1M*40K, any one can help me ?
> collectAsMap at DecisionTree.scala:642	2015/04/23 18:12:37	38 min	2/2 (1 skipped)	     228/228 (4 skipped)
> the call stack is:
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:641)
> org.apache.spark.mllib.tree.DecisionTree$.findBestSplits(DecisionTree.scala:613)
> org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:234)
> org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60)
> org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:194)
> org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:67)
> org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:135)
> org.apache.spark.mllib.api.python.PythonMLLibAPI.trainGradientBoostedTreesModel(PythonMLLibAPI.scala:644)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> py4j.Gateway.invoke(Gateway.java:259)
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> py4j.commands.CallCommand.execute(CallCommand.java:79)
> py4j.GatewayConnection.run(GatewayConnection.java:207)
> java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org