You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "lee.xiaobo.2006 (JIRA)" <ji...@apache.org> on 2015/04/23 13:59:39 UTC
[jira] [Created] (SPARK-7091) Too slow when use
GradientBoostedTrees to classify train data set.
lee.xiaobo.2006 created SPARK-7091:
--------------------------------------
Summary: Too slow when use GradientBoostedTrees to classify train data set.
Key: SPARK-7091
URL: https://issues.apache.org/jira/browse/SPARK-7091
Project: Spark
Issue Type: Question
Components: MLlib
Affects Versions: 1.3.1
Reporter: lee.xiaobo.2006
This is one stage that consume too many times, The train data set shape is 1M*40K, any one can help me ?
collectAsMap at DecisionTree.scala:642 2015/04/23 18:12:37 38 min 2/2 (1 skipped) 228/228 (4 skipped)
the call stack is:
org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:641)
org.apache.spark.mllib.tree.DecisionTree$.findBestSplits(DecisionTree.scala:613)
org.apache.spark.mllib.tree.RandomForest.run(RandomForest.scala:234)
org.apache.spark.mllib.tree.DecisionTree.run(DecisionTree.scala:60)
org.apache.spark.mllib.tree.GradientBoostedTrees$.org$apache$spark$mllib$tree$GradientBoostedTrees$$boost(GradientBoostedTrees.scala:194)
org.apache.spark.mllib.tree.GradientBoostedTrees.run(GradientBoostedTrees.scala:67)
org.apache.spark.mllib.tree.GradientBoostedTrees$.train(GradientBoostedTrees.scala:135)
org.apache.spark.mllib.api.python.PythonMLLibAPI.trainGradientBoostedTreesModel(PythonMLLibAPI.scala:644)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
py4j.Gateway.invoke(Gateway.java:259)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:207)
java.lang.Thread.run(Thread.java:724)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org