You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dave Jaffe <dj...@vmware.com> on 2016/11/07 22:07:25 UTC

Anomalous Spark RDD persistence behavior

I’ve been studying Spark RDD persistence with spark-perf (https://github.com/databricks/spark-perf), especially when the dataset size starts to exceed available memory.

I’m running Spark 1.6.0 on YARN with CDH 5.7. I have 10 NodeManager nodes, each with 16 vcores and 32 GB of container memory. So I’m running 39 executors with 4 cores and 8 GB each (6 GB spark.executor.memory and 2 GB spark.yarn.executor.memoryOverhead). I am using the default values for spark.memory.fraction and spark.memory.storageFraction so I end up with 3.1 GB available for caching RDDs, for a total of about 121 GB.

I’m running a single Random Forest test, with 500 features and up to 40 million examples, with 1 partition per core or 156 total partitions. The code (at line https://github.com/databricks/spark-perf/blob/master/mllib-tests/v1p5/src/main/scala/mllib/perf/MLAlgorithmTests.scala#L653) caches the input RDD immediately after creation. At 30M examples this fits into memory with all 156 partitions cached, with a total 113.4 GB in memory, or 4 blocks of about 745 MB each per executor. So far so good.

At 40M examples, I expected about 3 partitions to fit in memory per executor, or 75% to be cached. However, I found only 3 partitions across the cluster were cached, or 2%, for a total size in memory of 2.9GB. Three of the executors had one block of 992 MB cached, with 2.1 GB free (enough for 2 more blocks). The other 36 held no blocks, with 3.1 GB free (enough for 3 blocks). Why this dramatic falloff?

Thinking this may improve if I changed the persistence to MEMORY_AND_DISK. Unfortunately now the executor memory was exceeded (“Container killed by YARN for exceeding memory limits. 8.9 GB of 8 GB physical memory used”) and the run ground to a halt. Why does persisting to disk take more memory than caching to memory?

Is this behavior expected as dataset size exceeds available memory?

Thanks in advance,

Dave Jaffe
Big Data Performance
VMware
djaffe@vmware.com

Re: Anomalous Spark RDD persistence behavior

Posted by Dave Jaffe <dj...@vmware.com>.

No, I am not using serializing either with memory or disk.

Dave Jaffe
VMware
djaffe@vmware.com

From: Shreya Agarwal <sh...@microsoft.com>
Date: Monday, November 7, 2016 at 3:29 PM
To: Dave Jaffe <dj...@vmware.com>, "user@spark.apache.org" <us...@spark.apache.org>
Subject: RE: Anomalous Spark RDD persistence behavior

I don’t think this is correct. Unless you are serializing when caching to memory but not serializing when persisting to disk. Can you check?

Also, I have seen the behavior where if I have 100 GB in-memory cache and I use 60 GB to persist something (MEMORY_AND_DISK). Then try to persist another RDD with MEMORY_AND_DISK option which is much greater than the remaining 40 GB (lets say 1 TB), my executors start getting killed at one point. During this period, the memory usage goes above 100GB and after some extra usage it fails. It seems like Spark is trying to cache this new RDD to memory and move the old one out to disk. But it is not able to move the old one out fast enough and crashes with OOM. Anyone seeing that?

From: Dave Jaffe [mailto:djaffe@vmware.com]
Sent: Monday, November 7, 2016 2:07 PM
To: user@spark.apache.org
Subject: Anomalous Spark RDD persistence behavior

I’ve been studying Spark RDD persistence with spark-perf (https://github.com/databricks/spark-perf)<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dperf-29&d=CwMGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=ZVa_NfRWb4LTiT6_IVstUCci54W90AgDk7po0Fiao_o&m=iiGqgoQYFE1OVp2j1UhDscHx7Z43giXIqVGZT3tIh-c&s=4Tc6SS14pBg3pu4jq344GWsDzkqfY7WYaMsp9KXGNEg&e=>, especially when the dataset size starts to exceed available memory.

I’m running Spark 1.6.0 on YARN with CDH 5.7. I have 10 NodeManager nodes, each with 16 vcores and 32 GB of container memory. So I’m running 39 executors with 4 cores and 8 GB each (6 GB spark.executor.memory and 2 GB spark.yarn.executor.memoryOverhead). I am using the default values for spark.memory.fraction and spark.memory.storageFraction so I end up with 3.1 GB available for caching RDDs, for a total of about 121 GB.

I’m running a single Random Forest test, with 500 features and up to 40 million examples, with 1 partition per core or 156 total partitions. The code (at line https://github.com/databricks/spark-perf/blob/master/mllib-tests/v1p5/src/main/scala/mllib/perf/MLAlgorithmTests.scala#L653)<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Dperf_blob_master_mllib-2Dtests_v1p5_src_main_scala_mllib_perf_MLAlgorithmTests.scala-23L653-29&d=CwMGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=ZVa_NfRWb4LTiT6_IVstUCci54W90AgDk7po0Fiao_o&m=iiGqgoQYFE1OVp2j1UhDscHx7Z43giXIqVGZT3tIh-c&s=rxTL0ohQ2q5aJ03gaOxADdEyUOOX5xUf7pmmEsaQ7oE&e=> caches the input RDD immediately after creation. At 30M examples this fits into memory with all 156 partitions cached, with a total 113.4 GB in memory, or 4 blocks of about 745 MB each per executor. So far so good.

At 40M examples, I expected about 3 partitions to fit in memory per executor, or 75% to be cached. However, I found only 3 partitions across the cluster were cached, or 2%, for a total size in memory of 2.9GB. Three of the executors had one block of 992 MB cached, with 2.1 GB free (enough for 2 more blocks). The other 36 held no blocks, with 3.1 GB free (enough for 3 blocks). Why this dramatic falloff?

Thinking this may improve if I changed the persistence to MEMORY_AND_DISK. Unfortunately now the executor memory was exceeded (“Container killed by YARN for exceeding memory limits. 8.9 GB of 8 GB physical memory used”) and the run ground to a halt. Why does persisting to disk take more memory than caching to memory?

Is this behavior expected as dataset size exceeds available memory?

Thanks in advance,

Dave Jaffe
Big Data Performance
VMware
djaffe@vmware.com<ma...@vmware.com>

RE: Anomalous Spark RDD persistence behavior

Posted by Shreya Agarwal <sh...@microsoft.com>.

I don’t think this is correct. Unless you are serializing when caching to memory but not serializing when persisting to disk. Can you check?

Also, I have seen the behavior where if I have 100 GB in-memory cache and I use 60 GB to persist something (MEMORY_AND_DISK). Then try to persist another RDD with MEMORY_AND_DISK option which is much greater than the remaining 40 GB (lets say 1 TB), my executors start getting killed at one point. During this period, the memory usage goes above 100GB and after some extra usage it fails. It seems like Spark is trying to cache this new RDD to memory and move the old one out to disk. But it is not able to move the old one out fast enough and crashes with OOM. Anyone seeing that?

From: Dave Jaffe [mailto:djaffe@vmware.com]
Sent: Monday, November 7, 2016 2:07 PM
To: user@spark.apache.org
Subject: Anomalous Spark RDD persistence behavior

I’ve been studying Spark RDD persistence with spark-perf (https://github.com/databricks/spark-perf), especially when the dataset size starts to exceed available memory.

I’m running Spark 1.6.0 on YARN with CDH 5.7. I have 10 NodeManager nodes, each with 16 vcores and 32 GB of container memory. So I’m running 39 executors with 4 cores and 8 GB each (6 GB spark.executor.memory and 2 GB spark.yarn.executor.memoryOverhead). I am using the default values for spark.memory.fraction and spark.memory.storageFraction so I end up with 3.1 GB available for caching RDDs, for a total of about 121 GB.

I’m running a single Random Forest test, with 500 features and up to 40 million examples, with 1 partition per core or 156 total partitions. The code (at line https://github.com/databricks/spark-perf/blob/master/mllib-tests/v1p5/src/main/scala/mllib/perf/MLAlgorithmTests.scala#L653) caches the input RDD immediately after creation. At 30M examples this fits into memory with all 156 partitions cached, with a total 113.4 GB in memory, or 4 blocks of about 745 MB each per executor. So far so good.

At 40M examples, I expected about 3 partitions to fit in memory per executor, or 75% to be cached. However, I found only 3 partitions across the cluster were cached, or 2%, for a total size in memory of 2.9GB. Three of the executors had one block of 992 MB cached, with 2.1 GB free (enough for 2 more blocks). The other 36 held no blocks, with 3.1 GB free (enough for 3 blocks). Why this dramatic falloff?

Thinking this may improve if I changed the persistence to MEMORY_AND_DISK. Unfortunately now the executor memory was exceeded (“Container killed by YARN for exceeding memory limits. 8.9 GB of 8 GB physical memory used”) and the run ground to a halt. Why does persisting to disk take more memory than caching to memory?

Is this behavior expected as dataset size exceeds available memory?

Thanks in advance,

Dave Jaffe
Big Data Performance
VMware
djaffe@vmware.com<ma...@vmware.com>