You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hong Shen (JIRA)" <ji...@apache.org> on 2015/01/27 03:49:34 UTC

[jira] [Updated] (SPARK-5421) SparkSql throw OOM at shuffle

     [ https://issues.apache.org/jira/browse/SPARK-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Shen updated SPARK-5421:
-----------------------------
    Description: 
ExternalAppendOnlyMap if only for the spark job that aggregator isDefined,  but sparkSQL's shuffledRDD haven't define aggregator, so sparkSQL won't spill at shuffle, it's very easy to throw OOM at shuffle.
One of the executor's log, here is  stderr:
15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://sparkDriver@10.196.128.140:40952/user/MapOutputTracker#1435377484]
15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Got the output locations
15/01/27 07:02:19 INFO storage.ShuffleBlockFetcherIterator: Getting 143 non-empty blocks out of 143 blocks
15/01/27 07:02:19 INFO storage.ShuffleBlockFetcherIterator: Started 4 remote fetches in 72 ms
15/01/27 07:47:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM

here is  stdout:
2015-01-27T07:44:43.487+0800: [Full GC 3961343K->3959868K(3961344K), 29.8959290 secs]
2015-01-27T07:45:13.460+0800: [Full GC 3961343K->3959992K(3961344K), 27.9218150 secs]
2015-01-27T07:45:41.407+0800: [GC 3960347K(3961344K), 3.0457450 secs]
2015-01-27T07:45:52.950+0800: [Full GC 3961343K->3960113K(3961344K), 29.3894670 secs]
2015-01-27T07:46:22.393+0800: [Full GC 3961118K->3960240K(3961344K), 28.9879600 secs]
2015-01-27T07:46:51.393+0800: [Full GC 3960240K->3960213K(3961344K), 34.1530900 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
#   Executing /bin/sh -c "kill 9050"...
2015-01-27T07:47:25.921+0800: [GC 3960214K(3961344K), 3.3959300 secs]


  was:ExternalAppendOnlyMap if only for the spark job that aggregator isDefined,  but sparkSQL's shuffledRDD haven't define aggregator, so sparkSQL won't spill at shuffle, it's very easy to throw OOM at shuffle.


> SparkSql throw OOM at shuffle
> -----------------------------
>
>                 Key: SPARK-5421
>                 URL: https://issues.apache.org/jira/browse/SPARK-5421
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Hong Shen
>
> ExternalAppendOnlyMap if only for the spark job that aggregator isDefined,  but sparkSQL's shuffledRDD haven't define aggregator, so sparkSQL won't spill at shuffle, it's very easy to throw OOM at shuffle.
> One of the executor's log, here is  stderr:
> 15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
> 15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker actor = Actor[akka.tcp://sparkDriver@10.196.128.140:40952/user/MapOutputTracker#1435377484]
> 15/01/27 07:02:19 INFO spark.MapOutputTrackerWorker: Got the output locations
> 15/01/27 07:02:19 INFO storage.ShuffleBlockFetcherIterator: Getting 143 non-empty blocks out of 143 blocks
> 15/01/27 07:02:19 INFO storage.ShuffleBlockFetcherIterator: Started 4 remote fetches in 72 ms
> 15/01/27 07:47:29 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
> here is  stdout:
> 2015-01-27T07:44:43.487+0800: [Full GC 3961343K->3959868K(3961344K), 29.8959290 secs]
> 2015-01-27T07:45:13.460+0800: [Full GC 3961343K->3959992K(3961344K), 27.9218150 secs]
> 2015-01-27T07:45:41.407+0800: [GC 3960347K(3961344K), 3.0457450 secs]
> 2015-01-27T07:45:52.950+0800: [Full GC 3961343K->3960113K(3961344K), 29.3894670 secs]
> 2015-01-27T07:46:22.393+0800: [Full GC 3961118K->3960240K(3961344K), 28.9879600 secs]
> 2015-01-27T07:46:51.393+0800: [Full GC 3960240K->3960213K(3961344K), 34.1530900 secs]
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill %p"
> #   Executing /bin/sh -c "kill 9050"...
> 2015-01-27T07:47:25.921+0800: [GC 3960214K(3961344K), 3.3959300 secs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org