You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:44:15 UTC
[jira] [Resolved] (SPARK-24604) upgrade to spark 2.3.0 makes MPC
model training slower
[ https://issues.apache.org/jira/browse/SPARK-24604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24604.
----------------------------------
Resolution: Incomplete
> upgrade to spark 2.3.0 makes MPC model training slower
> ------------------------------------------------------
>
> Key: SPARK-24604
> URL: https://issues.apache.org/jira/browse/SPARK-24604
> Project: Spark
> Issue Type: Bug
> Components: ML, Spark Core
> Affects Versions: 2.3.0
> Reporter: Enrique Molina
> Priority: Major
> Labels: bulk-closed
>
> *not sure if this is a bug or not*
> we have a MPC model with spark 2.1.0 whose training used to take around {color:#14892c}3.5h{color}
> after upgrading to spark 2.3.0 now (with the same training data) it takes *{color:#d04437}14.5h{color}*
> after checking the notes of the upgrade i didnt find anything which should be affecting to it, except that the fact that MPC now gives you probability (not sure if this can make training time that long...)
> the interface shows that the jobs are still equally distributed among the workers but *treeAggregate* now takes {color:#FF0000}*60 seconds*{color} when it used to take about {color:#14892c}11 seconds{color}.
> the training data is in parquet and when reading we repartition on 20
> can this be a bug?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org