You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "lee.xiaobo.2006 (JIRA)" <ji...@apache.org> on 2015/04/23 13:59:39 UTC

[jira] [Created] (SPARK-7091) Too slow when use GradientBoostedTrees to classify train data set.

lee.xiaobo.2006 created SPARK-7091:
--------------------------------------

             Summary: Too slow when use GradientBoostedTrees to classify train data set.
                 Key: SPARK-7091
                 URL: https://issues.apache.org/jira/browse/SPARK-7091
             Project: Spark
          Issue Type: Question
          Components: MLlib
    Affects Versions: 1.3.1
            Reporter: lee.xiaobo.2006


This is one stage that consume too many times, The train data set shape is 1M*40K, any one can help me ?

collectAsMap at DecisionTree.scala:642	2015/04/23 18:12:37	38 min	2/2 (1 skipped)	     228/228 (4 skipped)

the call stack is:
org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:641)
org.apache.spark.mllib.tree.DecisionTree$.findBestSplits(DecisionTree.scala:613)
org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:234)
org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60)
org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:194)
org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:67)
org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:135)
org.apache.spark.mllib.api.python.PythonMLLibAPI.trainGradientBoostedTreesModel(PythonMLLibAPI.scala:644)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
py4j.Gateway.invoke(Gateway.java:259)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:207)
java.lang.Thread.run(Thread.java:724)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org