You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/16 09:31:45 UTC

[jira] [Commented] (SPARK-10629) Gradient boosted trees: mapPartitions input size increasing

    [ https://issues.apache.org/jira/browse/SPARK-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14747088#comment-14747088 ] 

Sean Owen commented on SPARK-10629:
-----------------------------------

That sounds like the same issue in SPARK-10433; it's not clear that whatever it is only manifests as constantly increasing size. In any event, you should of course try 1.5+ to see if the fix fixes this anyway. I'd personally close this, given this information, until you are certain it happens now in the latest code, in which case it is something else.

> Gradient boosted trees: mapPartitions input size increasing 
> ------------------------------------------------------------
>
>                 Key: SPARK-10629
>                 URL: https://issues.apache.org/jira/browse/SPARK-10629
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.4.1
>            Reporter: Wenmin Wu
>
> First of all, I think my problem is quite different from https://issues.apache.org/jira/browse/SPARK-10433, which point that the input size increasing at each iteration.
> My problem is the mapPartitions input size increase in one iteration. My training samples has 2958359 features in total. Within one iteration, 3 collectAsMap operation had been called. And here is a summary of each call.
> | Stage Id |               Description                                | Duration  |   Input    | Shuffle Read | Shuffle Write |
> |:----------:|:---------------------------------------------------:|:-----------:|:-----------:|:----------------:|:----------------:|
> |      4      | mapPartitions at DecisionTree.scala:613 |  1.6 h      |710.2 MB |  	        | 	2.8 GB       |
> |      5      | collectAsMap at DecisionTree.scala:642  |  1.8 min  |                |   	2.8 GB        |                      |
> |      6      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 27.0 GB  |        |          5.6 GB |
> |      7      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |   |    5.6GB       |          |
> |      8      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 26.5 GB  |        |          	11.1 GB |
> |      9      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |  |    8.3 GB      |          |
> the mapPartitions operation took too long time! It's so strange! I wonder whether there is bug exits?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org