You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:44:15 UTC

[jira] [Resolved] (SPARK-24604) upgrade to spark 2.3.0 makes MPC model training slower

     [ https://issues.apache.org/jira/browse/SPARK-24604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24604.
----------------------------------
    Resolution: Incomplete

> upgrade to spark 2.3.0 makes MPC model training slower
> ------------------------------------------------------
>
>                 Key: SPARK-24604
>                 URL: https://issues.apache.org/jira/browse/SPARK-24604
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Enrique Molina
>            Priority: Major
>              Labels: bulk-closed
>
> *not sure if this is a bug or not*
> we have a MPC model with spark 2.1.0 whose training used to take around {color:#14892c}3.5h{color}
> after upgrading to spark 2.3.0 now (with the same training data) it takes *{color:#d04437}14.5h{color}*
> after checking the notes of the upgrade i didnt find anything which should be affecting to it, except that the fact that MPC now gives you probability (not sure if this can make training time that long...)
> the interface shows that the jobs are still equally distributed among the workers but *treeAggregate* now takes {color:#FF0000}*60 seconds*{color} when it used to take about {color:#14892c}11 seconds{color}.
> the training data is in parquet and when reading we repartition on 20
> can this be a bug?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org