You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Kevin Jung <it...@samsung.com> on 2014/12/30 04:08:43 UTC

Shuffle write increases in spark 1.2

Hi all,
The size of shuffle write showing in spark web UI is mush different when I
execute same spark job on same input data(100GB) in both spark 1.1 and spark
1.2.
At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1
but 91.0GB in spark 1.2.
I set spark.shuffle.manager option to hash because it's default value is
changed but spark 1.2 writes larger file than spark 1.1.
Can anyone tell me why this happened?

Thanks
Kevin



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by Josh Rosen <ro...@gmail.com>.

If you have a small reproduction for this issue, can you open a ticket at https://issues.apache.org/jira/browse/SPARK ?



On December 29, 2014 at 7:10:02 PM, Kevin Jung (itsjb.jung@samsung.com) wrote:

Hi all,  
The size of shuffle write showing in spark web UI is mush different when I  
execute same spark job on same input data(100GB) in both spark 1.1 and spark  
1.2.  
At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1  
but 91.0GB in spark 1.2.  
I set spark.shuffle.manager option to hash because it's default value is  
changed but spark 1.2 writes larger file than spark 1.1.  
Can anyone tell me why this happened?  

Thanks  
Kevin  



--  
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894.html  
Sent from the Apache Spark User List mailing list archive at Nabble.com.  

---------------------------------------------------------------------  
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org  
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by Xuefeng Wu <be...@gmail.com>.

It looks because different snappy version, if you disable compress or switch to lz4, the size is no different. 

Yours, Xuefeng Wu 吴雪峰 敬上

> On 2015年2月10日, at 下午6:13, chris <ch...@performance-media.de> wrote:
> 
> Hello,
> 
> as the original message from Kevin Jung never got accepted to the
> mailinglist, I quote it here completely:
> 
> 
> Kevin Jung wrote
>> Hi all,
>> The size of shuffle write showing in spark web UI is much different when I
>> execute same spark job on same input data(100GB) in both spark 1.1 and
>> spark 1.2.
>> At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1
>> but 91.0GB in spark 1.2.
>> I set spark.shuffle.manager option to hash because it's default value is
>> changed but spark 1.2 writes larger file than spark 1.1.
>> Can anyone tell me why this happens?
>> 
>> Thanks
>> Kevin
> 
> I'm experiencing the same thing with my job and that's what I tested:
> 
> * Spark 1.2.0 with Sort-based Shuffle
> * Spark 1.2.0 with Hash-based Shuffle
> * Spark 1.2.1 with Sort-based Shuffle
> 
> All three combinations show the same behaviour, which contrasts from Spark
> 1.1.0.
> 
> In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for
> almost four hours. Configuration is identical otherwise - I only added
> org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for
> Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102.
> 
> 
> As a consequence (I think, but causality might be different) I see lots and
> lots of disk spills.
> 
> I cannot provide a small test case, but maybe the log entries for a single
> worker thread can help someone investigate on this. (See below.)
> 
> 
> I also opened an issue on this, see
> https://issues.apache.org/jira/browse/SPARK-5715
> 
> Any help will be greatly appreciated, because otherwise I'm stuck with Spark
> 1.1.0, as quadrupling runtime is not an option.
> 
> Sincerely,
> 
> Chris
> 
> 
> 
> 2015-02-09T14:06:06.328+01:00    INFO    org.apache.spark.executor.Executor
> Running task 9.0 in stage 18.0 (TID 300)    Executor task launch worker-18
> 2015-02-09T14:06:06.351+01:00    INFO    org.apache.spark.CacheManager    Partition
> rdd_35_9 not found, computing it    Executor task launch worker-18
> 2015-02-09T14:06:06.351+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 10 non-empty
> blocks out of 10 blocks    Executor task launch worker-18
> 2015-02-09T14:06:06.351+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 0 ms    Executor task launch worker-18
> 2015-02-09T14:06:07.396+01:00    INFO    org.apache.spark.storage.MemoryStore
> ensureFreeSpace(2582904) called with curMem=300174944, maxMe...    Executor
> task launch worker-18
> 2015-02-09T14:06:07.397+01:00    INFO    org.apache.spark.storage.MemoryStore
> Block rdd_35_9 stored as bytes in memory (estimated size 2.5...    Executor
> task launch worker-18
> 2015-02-09T14:06:07.398+01:00    INFO
> org.apache.spark.storage.BlockManagerMaster    Updated info of block rdd_35_9
> Executor task launch worker-18
> 2015-02-09T14:06:07.399+01:00    INFO    org.apache.spark.CacheManager    Partition
> rdd_38_9 not found, computing it    Executor task launch worker-18
> 2015-02-09T14:06:07.399+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 10 non-empty
> blocks out of 10 blocks    Executor task launch worker-18
> 2015-02-09T14:06:07.400+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 0 ms    Executor task launch worker-18
> 2015-02-09T14:06:07.567+01:00    INFO    org.apache.spark.storage.MemoryStore
> ensureFreeSpace(944848) called with curMem=302757848, maxMem...    Executor
> task launch worker-18
> 2015-02-09T14:06:07.568+01:00    INFO    org.apache.spark.storage.MemoryStore
> Block rdd_38_9 stored as values in memory (estimated size 92...    Executor
> task launch worker-18
> 2015-02-09T14:06:07.569+01:00    INFO
> org.apache.spark.storage.BlockManagerMaster    Updated info of block rdd_38_9
> Executor task launch worker-18
> 2015-02-09T14:06:07.573+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 34 non-empty
> blocks out of 50 blocks    Executor task launch worker-18
> 2015-02-09T14:06:07.573+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 1 ms    Executor task launch worker-18
> 2015-02-09T14:06:38.931+01:00    INFO    org.apache.spark.CacheManager    Partition
> rdd_41_9 not found, computing it    Executor task launch worker-18
> 2015-02-09T14:06:38.931+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 3 non-empty
> blocks out of 10 blocks    Executor task launch worker-18
> 2015-02-09T14:06:38.931+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 0 ms    Executor task launch worker-18
> 2015-02-09T14:06:38.945+01:00    INFO    org.apache.spark.storage.MemoryStore
> ensureFreeSpace(0) called with curMem=307529127, maxMem=9261...    Executor
> task launch worker-18
> 2015-02-09T14:06:38.945+01:00    INFO    org.apache.spark.storage.MemoryStore
> Block rdd_41_9 stored as bytes in memory (estimated size 0.0...    Executor
> task launch worker-18
> 2015-02-09T14:06:38.946+01:00    INFO
> org.apache.spark.storage.BlockManagerMaster    Updated info of block rdd_41_9
> Executor task launch worker-18
> 2015-02-09T14:06:38.946+01:00    WARN    org.apache.spark.storage.BlockManager
> Block rdd_41_9 replicated to only 0 peer(s) instead of 1 pee...    Executor
> task launch worker-18
> 2015-02-09T14:06:39.088+01:00    INFO    org.apache.spark.storage.BlockManager
> Found block rdd_3_9 locally    Executor task launch worker-18
> 2015-02-09T14:06:41.389+01:00    INFO    org.apache.spark.CacheManager    Partition
> rdd_7_9 not found, computing it    Executor task launch worker-18
> 2015-02-09T14:06:41.389+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 1 non-empty
> blocks out of 1 blocks    Executor task launch worker-18
> 2015-02-09T14:06:41.389+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 0 ms    Executor task launch worker-18
> 2015-02-09T14:06:41.402+01:00    INFO    org.apache.spark.storage.MemoryStore
> ensureFreeSpace(38144) called with curMem=307529151, maxMem=...    Executor
> task launch worker-18
> 2015-02-09T14:06:41.402+01:00    INFO    org.apache.spark.storage.MemoryStore
> Block rdd_7_9 stored as values in memory (estimated size 37....    Executor
> task launch worker-18
> 2015-02-09T14:06:41.404+01:00    INFO
> org.apache.spark.storage.BlockManagerMaster    Updated info of block rdd_7_9
> Executor task launch worker-18
> 2015-02-09T14:07:00.019+01:00    INFO    org.apache.spark.CacheManager    Partition
> rdd_73_9 not found, computing it    Executor task launch worker-18
> 2015-02-09T14:07:00.019+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 1 non-empty
> blocks out of 1 blocks    Executor task launch worker-18
> 2015-02-09T14:07:00.019+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 0 ms    Executor task launch worker-18
> 2015-02-09T14:07:00.185+01:00    INFO    org.apache.spark.storage.MemoryStore
> ensureFreeSpace(826250) called with curMem=307567295, maxMem...    Executor
> task launch worker-18
> 2015-02-09T14:07:00.185+01:00    INFO    org.apache.spark.storage.MemoryStore
> Block rdd_73_9 stored as values in memory (estimated size 80...    Executor
> task launch worker-18
> 2015-02-09T14:07:00.186+01:00    INFO
> org.apache.spark.storage.BlockManagerMaster    Updated info of block rdd_73_9
> Executor task launch worker-18
> 2015-02-09T14:07:00.190+01:00    INFO    org.apache.spark.storage.BlockManager
> Found block rdd_35_9 locally    Executor task launch worker-18
> 2015-02-09T14:07:00.190+01:00    INFO    org.apache.spark.storage.BlockManager
> Found block rdd_38_9 locally    Executor task launch worker-18
> 2015-02-09T14:07:00.194+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Getting 34 non-empty
> blocks out of 50 blocks    Executor task launch worker-18
> 2015-02-09T14:07:00.194+01:00    INFO
> org.apache.spark.storage.ShuffleBlockFetcherIterator    Started 0 remote
> fetches in 1 ms    Executor task launch worker-18
> 2015-02-09T14:07:17.967+01:00    INFO
> org.apache.spark.util.collection.ExternalAppendOnlyMap    Thread 117 spilling
> in-memory map of 670.2 MB to disk (1 tim...    Executor task launch worker-18
> 2015-02-09T14:07:46.716+01:00    INFO    org.apache.spark.storage.BlockManager
> Found block rdd_41_9 locally    Executor task launch worker-18
> 2015-02-09T14:07:47.603+01:00    INFO    org.apache.spark.storage.BlockManager
> Found block rdd_3_9 locally    Executor task launch worker-18
> 2015-02-09T14:07:47.897+01:00    INFO
> org.apache.spark.util.collection.ExternalAppendOnlyMap    Thread 117 spilling
> in-memory map of 5.0 MB to disk (1 time ...    Executor task launch worker-18
> 2015-02-09T14:07:48.270+01:00    INFO
> org.apache.spark.util.collection.ExternalAppendOnlyMap    Thread 117 spilling
> in-memory map of 5.0 MB to disk (2 times...    Executor task launch worker-18
> 2015-02-09T14:07:48.727+01:00    INFO
> org.apache.spark.util.collection.ExternalAppendOnlyMap    Thread 117 spilling
> in-memory map of 5.0 MB to disk (3 times...    Executor task launch worker-18
> 2015-02-09T14:07:49.021+01:00    INFO
> org.apache.spark.util.collection.ExternalAppendOnlyMap    Thread 117 spilling
> in-memory map of 5.0 MB to disk (4 times...    Executor task launch worker-18
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21572.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by Aaron Davidson <il...@gmail.com>.

I think Xuefeng Wu's suggestion is likely correct. This different is more
likely explained by the compression library changing versions than sort vs
hash shuffle (which should not affect output size significantly). Others
have reported that switching to lz4 fixed their issue.

We should document this if this is the case. I wonder if we're asking
Snappy to be super-low-overhead and as a result the new version does a
better job of it (less overhead, less compression).

On Sat, Feb 14, 2015 at 9:32 AM, Peng Cheng <pc...@uow.edu.au> wrote:

> I double check the 1.2 feature list and found out that the new sort-based
> shuffle manager has nothing to do with HashPartitioner :-< Sorry for the
> misinformation.
>
> In another hand. This may explain increase in shuffle spill as a side
> effect
> of the new shuffle manager, let me revert spark.shuffle.manager to hash and
> see if it make things better (or worse, as the benchmark in
> https://issues.apache.org/jira/browse/SPARK-3280 indicates)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21657.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Shuffle write increases in spark 1.2

Posted by Peng Cheng <pc...@uow.edu.au>.

I double check the 1.2 feature list and found out that the new sort-based
shuffle manager has nothing to do with HashPartitioner :-< Sorry for the
misinformation.

In another hand. This may explain increase in shuffle spill as a side effect
of the new shuffle manager, let me revert spark.shuffle.manager to hash and
see if it make things better (or worse, as the benchmark in
https://issues.apache.org/jira/browse/SPARK-3280 indicates)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21657.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by Peng Cheng <pc...@uow.edu.au>.

Same problem here, shuffle write increased from 10G to over 64G, since I'm
running on amazon EC2 this always cause temporary folder to consume all the
disk space. Still looking for a solution.

BTW, the 64G shuffle write is encountered on shuffling a pairRDD with
HashPartitioner, so its not related to Spark 1.2.0's new features

Yours Peng



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21656.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by chris <ch...@performance-media.de>.

Hello,

as the original message from Kevin Jung never got accepted to the
mailinglist, I quote it here completely:


Kevin Jung wrote
> Hi all,
> The size of shuffle write showing in spark web UI is much different when I
> execute same spark job on same input data(100GB) in both spark 1.1 and
> spark 1.2.
> At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1
> but 91.0GB in spark 1.2.
> I set spark.shuffle.manager option to hash because it's default value is
> changed but spark 1.2 writes larger file than spark 1.1.
> Can anyone tell me why this happens?
> 
> Thanks
> Kevin

I'm experiencing the same thing with my job and that's what I tested:

* Spark 1.2.0 with Sort-based Shuffle
* Spark 1.2.0 with Hash-based Shuffle
* Spark 1.2.1 with Sort-based Shuffle

All three combinations show the same behaviour, which contrasts from Spark
1.1.0.

In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for
almost four hours. Configuration is identical otherwise - I only added
org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for
Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102.


As a consequence (I think, but causality might be different) I see lots and
lots of disk spills.

I cannot provide a small test case, but maybe the log entries for a single
worker thread can help someone investigate on this. (See below.)


I also opened an issue on this, see
https://issues.apache.org/jira/browse/SPARK-5715

Any help will be greatly appreciated, because otherwise I'm stuck with Spark
1.1.0, as quadrupling runtime is not an option.

Sincerely,

Chris



2015-02-09T14:06:06.328+01:00	INFO	org.apache.spark.executor.Executor
Running task 9.0 in stage 18.0 (TID 300)	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_35_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 10 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:07.396+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(2582904) called with curMem=300174944, maxMe...	Executor
task launch worker-18
2015-02-09T14:06:07.397+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_35_9 stored as bytes in memory (estimated size 2.5...	Executor
task launch worker-18
2015-02-09T14:06:07.398+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_35_9
Executor task launch worker-18
2015-02-09T14:06:07.399+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_38_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:07.399+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 10 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:07.400+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:07.567+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(944848) called with curMem=302757848, maxMem...	Executor
task launch worker-18
2015-02-09T14:06:07.568+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_38_9 stored as values in memory (estimated size 92...	Executor
task launch worker-18
2015-02-09T14:06:07.569+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_38_9
Executor task launch worker-18
2015-02-09T14:06:07.573+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 34 non-empty
blocks out of 50 blocks	Executor task launch worker-18
2015-02-09T14:06:07.573+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 1 ms	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_41_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 3 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:38.945+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(0) called with curMem=307529127, maxMem=9261...	Executor
task launch worker-18
2015-02-09T14:06:38.945+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_41_9 stored as bytes in memory (estimated size 0.0...	Executor
task launch worker-18
2015-02-09T14:06:38.946+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_41_9
Executor task launch worker-18
2015-02-09T14:06:38.946+01:00	WARN	org.apache.spark.storage.BlockManager
Block rdd_41_9 replicated to only 0 peer(s) instead of 1 pee...	Executor
task launch worker-18
2015-02-09T14:06:39.088+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_3_9 locally	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_7_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 1 non-empty
blocks out of 1 blocks	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:41.402+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(38144) called with curMem=307529151, maxMem=...	Executor
task launch worker-18
2015-02-09T14:06:41.402+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_7_9 stored as values in memory (estimated size 37....	Executor
task launch worker-18
2015-02-09T14:06:41.404+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_7_9
Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_73_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 1 non-empty
blocks out of 1 blocks	Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:07:00.185+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(826250) called with curMem=307567295, maxMem...	Executor
task launch worker-18
2015-02-09T14:07:00.185+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_73_9 stored as values in memory (estimated size 80...	Executor
task launch worker-18
2015-02-09T14:07:00.186+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_73_9
Executor task launch worker-18
2015-02-09T14:07:00.190+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_35_9 locally	Executor task launch worker-18
2015-02-09T14:07:00.190+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_38_9 locally	Executor task launch worker-18
2015-02-09T14:07:00.194+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 34 non-empty
blocks out of 50 blocks	Executor task launch worker-18
2015-02-09T14:07:00.194+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 1 ms	Executor task launch worker-18
2015-02-09T14:07:17.967+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 670.2 MB to disk (1 tim...	Executor task launch worker-18
2015-02-09T14:07:46.716+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_41_9 locally	Executor task launch worker-18
2015-02-09T14:07:47.603+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_3_9 locally	Executor task launch worker-18
2015-02-09T14:07:47.897+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (1 time ...	Executor task launch worker-18
2015-02-09T14:07:48.270+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (2 times...	Executor task launch worker-18
2015-02-09T14:07:48.727+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (3 times...	Executor task launch worker-18
2015-02-09T14:07:49.021+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (4 times...	Executor task launch worker-18



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21572.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Shuffle write increases in spark 1.2

Posted by chris <ch...@performance-media.de>.

Hello,

as the original message never got accepted to the mailinglist, I quote it
here completely:


Kevin Jung wrote
> Hi all,
> The size of shuffle write showing in spark web UI is much different when I
> execute same spark job on same input data(100GB) in both spark 1.1 and
> spark 1.2.
> At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1
> but 91.0GB in spark 1.2.
> I set spark.shuffle.manager option to hash because it's default value is
> changed but spark 1.2 writes larger file than spark 1.1.
> Can anyone tell me why this happens?
> 
> Thanks
> Kevin

I'm experiencing the same thing with my job and that's what I tested:

* Spark 1.2.0 with Sort-based Shuffle
* Spark 1.2.0 with Hash-based Shuffle
* Spark 1.2.1 with Sort-based Shuffle

All three combinations show the same behaviour, which contrasts from Spark
1.1.0.

In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for
almost four hours. Configuration is identical otherwise - I only added
org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for
Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102.


As a consequence (I think, but causality might be different) I see lots and
lots of disk spills.

I cannot provide a small test case, but maybe the log entries for a single
worker thread can help someone investigate on this. (See below.)


I will also open up an issue, if nobody stops me by providing an answer ;)

Any help will be greatly appreciated, because otherwise I'm stuck with Spark
1.1.0, as quadrupling runtime is not an option.

Sincerely,

Chris



2015-02-09T14:06:06.328+01:00	INFO	org.apache.spark.executor.Executor
Running task 9.0 in stage 18.0 (TID 300)	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_35_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 10 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:06.351+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:07.396+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(2582904) called with curMem=300174944, maxMe...	Executor
task launch worker-18
2015-02-09T14:06:07.397+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_35_9 stored as bytes in memory (estimated size 2.5...	Executor
task launch worker-18
2015-02-09T14:06:07.398+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_35_9
Executor task launch worker-18
2015-02-09T14:06:07.399+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_38_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:07.399+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 10 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:07.400+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:07.567+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(944848) called with curMem=302757848, maxMem...	Executor
task launch worker-18
2015-02-09T14:06:07.568+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_38_9 stored as values in memory (estimated size 92...	Executor
task launch worker-18
2015-02-09T14:06:07.569+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_38_9
Executor task launch worker-18
2015-02-09T14:06:07.573+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 34 non-empty
blocks out of 50 blocks	Executor task launch worker-18
2015-02-09T14:06:07.573+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 1 ms	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_41_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 3 non-empty
blocks out of 10 blocks	Executor task launch worker-18
2015-02-09T14:06:38.931+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:38.945+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(0) called with curMem=307529127, maxMem=9261...	Executor
task launch worker-18
2015-02-09T14:06:38.945+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_41_9 stored as bytes in memory (estimated size 0.0...	Executor
task launch worker-18
2015-02-09T14:06:38.946+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_41_9
Executor task launch worker-18
2015-02-09T14:06:38.946+01:00	WARN	org.apache.spark.storage.BlockManager
Block rdd_41_9 replicated to only 0 peer(s) instead of 1 pee...	Executor
task launch worker-18
2015-02-09T14:06:39.088+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_3_9 locally	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_7_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 1 non-empty
blocks out of 1 blocks	Executor task launch worker-18
2015-02-09T14:06:41.389+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:06:41.402+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(38144) called with curMem=307529151, maxMem=...	Executor
task launch worker-18
2015-02-09T14:06:41.402+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_7_9 stored as values in memory (estimated size 37....	Executor
task launch worker-18
2015-02-09T14:06:41.404+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_7_9
Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO	org.apache.spark.CacheManager	Partition
rdd_73_9 not found, computing it	Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 1 non-empty
blocks out of 1 blocks	Executor task launch worker-18
2015-02-09T14:07:00.019+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 0 ms	Executor task launch worker-18
2015-02-09T14:07:00.185+01:00	INFO	org.apache.spark.storage.MemoryStore
ensureFreeSpace(826250) called with curMem=307567295, maxMem...	Executor
task launch worker-18
2015-02-09T14:07:00.185+01:00	INFO	org.apache.spark.storage.MemoryStore
Block rdd_73_9 stored as values in memory (estimated size 80...	Executor
task launch worker-18
2015-02-09T14:07:00.186+01:00	INFO
org.apache.spark.storage.BlockManagerMaster	Updated info of block rdd_73_9
Executor task launch worker-18
2015-02-09T14:07:00.190+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_35_9 locally	Executor task launch worker-18
2015-02-09T14:07:00.190+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_38_9 locally	Executor task launch worker-18
2015-02-09T14:07:00.194+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Getting 34 non-empty
blocks out of 50 blocks	Executor task launch worker-18
2015-02-09T14:07:00.194+01:00	INFO
org.apache.spark.storage.ShuffleBlockFetcherIterator	Started 0 remote
fetches in 1 ms	Executor task launch worker-18
2015-02-09T14:07:17.967+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 670.2 MB to disk (1 tim...	Executor task launch worker-18
2015-02-09T14:07:46.716+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_41_9 locally	Executor task launch worker-18
2015-02-09T14:07:47.603+01:00	INFO	org.apache.spark.storage.BlockManager
Found block rdd_3_9 locally	Executor task launch worker-18
2015-02-09T14:07:47.897+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (1 time ...	Executor task launch worker-18
2015-02-09T14:07:48.270+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (2 times...	Executor task launch worker-18
2015-02-09T14:07:48.727+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (3 times...	Executor task launch worker-18
2015-02-09T14:07:49.021+01:00	INFO
org.apache.spark.util.collection.ExternalAppendOnlyMap	Thread 117 spilling
in-memory map of 5.0 MB to disk (4 times...	Executor task launch worker-18









--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tp20894p21570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org