You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ascot Moss <as...@gmail.com> on 2016/07/22 22:52:44 UTC

ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Hi

Please help!

 When running random forest training phase in cluster mode, I got GC
overhead limit exceeded.

I have used two parameters when submitting the job to cluster

--driver-memory 64g \

--executor-memory 8g \

My Current settings:

(spark-defaults.conf)

spark.executor.memory           8g

(spark-env.sh)

export SPARK_WORKER_MEMORY=8g

export HADOOP_HEAPSIZE=8000


Any idea how to resolve it?

Regards






###  (the erro log) ###

16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID 30,
n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded

        at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)

        at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)

        at
org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)

        at
org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)

        at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)

        at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)

        at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)

        at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)

        at
org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)

        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

        at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

        at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)

        at scala.collection.AbstractIterator.to(Iterator.scala:1157)

        at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

        at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

        at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

        at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

        at org.apache.spark.scheduler.Task.run(Task.scala:89)

        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by Ted Yu <yu...@gmail.com>.
Have you seen the following ?
http://stackoverflow.com/questions/27553547/xloggc-not-creating-log-file-if-path-doesnt-exist-for-the-first-time

On Sat, Jul 23, 2016 at 5:18 PM, Ascot Moss <as...@gmail.com> wrote:

> I tried to add -Xloggc:./jvm_gc.log
>
> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -Xloggc:./jvm_gc.log -XX:+PrintGCDateStamps"
>
> however, I could not find ./jvm_gc.log
>
> How to resolve the OOM and gc log issue?
>
> Regards
>
> On Sun, Jul 24, 2016 at 6:37 AM, Ascot Moss <as...@gmail.com> wrote:
>
>> My JDK is Java 1.8 u40
>>
>> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Since you specified +PrintGCDetails, you should be able to get some
>>> more detail from the GC log.
>>>
>>> Also, which JDK version are you using ?
>>>
>>> Please use Java 8 where G1GC is more reliable.
>>>
>>> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss <as...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I added the following parameter:
>>>>
>>>> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>>> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
>>>> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
>>>> -XX:+PrintGCTimeStamps"
>>>>
>>>> Still got Java heap space error.
>>>>
>>>> Any idea to resolve?  (my spark is 1.6.1)
>>>>
>>>>
>>>> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
>>>> 22, n1791): java.lang.OutOfMemoryError: Java heap space           at
>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>>
>>>>         at
>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>>
>>>>         at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>>>>
>>>>         at
>>>> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>>>>         at
>>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>>
>>>>         at
>>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>>
>>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>>         at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>>
>>>>         at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>
>>>>         at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>
>>>>         at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>
>>>>         at
>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>
>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>
>>>>         at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Regards
>>>>
>>>>
>>>>
>>>> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss <as...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks. Trying with extra conf now.
>>>>>
>>>>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri <rk...@collectivei.com>
>>>>> wrote:
>>>>>
>>>>>> I can see large number of collections happening on driver and
>>>>>> eventually, driver is running out of memory. ( am not sure whether you have
>>>>>> persisted any rdd or data frame). May be you would want to avoid doing so
>>>>>> many collections or persist unwanted data in memory.
>>>>>>
>>>>>> To begin with, you may want to re-run the job with this following
>>>>>> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
>>>>>> idea of how you are getting OOM.
>>>>>>
>>>>>>
>>>>>> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Please help!
>>>>>>
>>>>>>  When running random forest training phase in cluster mode, I got GC
>>>>>> overhead limit exceeded.
>>>>>>
>>>>>> I have used two parameters when submitting the job to cluster
>>>>>>
>>>>>> --driver-memory 64g \
>>>>>>
>>>>>> --executor-memory 8g \
>>>>>>
>>>>>> My Current settings:
>>>>>>
>>>>>> (spark-defaults.conf)
>>>>>>
>>>>>> spark.executor.memory           8g
>>>>>>
>>>>>> (spark-env.sh)
>>>>>>
>>>>>> export SPARK_WORKER_MEMORY=8g
>>>>>>
>>>>>> export HADOOP_HEAPSIZE=8000
>>>>>>
>>>>>>
>>>>>> Any idea how to resolve it?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ###  (the erro log) ###
>>>>>>
>>>>>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1
>>>>>> (TID 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>
>>>>>>         at
>>>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>>>>
>>>>>>         at
>>>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
>>>>>>
>>>>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>
>>>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>>
>>>>>>         at scala.collection.TraversableOnce$class.to
>>>>>> (TraversableOnce.scala:273)
>>>>>>
>>>>>>         at scala.collection.AbstractIterator.to
>>>>>> <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>>
>>>>>>         at
>>>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>>>
>>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>>>
>>>>>>         at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>>
>>>>>>         at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>>
>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Collective[i] dramatically improves sales and marketing performance
>>>>>> using technology, applications and a revolutionary network designed to
>>>>>> provide next generation analytics and decision-support directly to business
>>>>>> users. Our goal is to maximize human potential and minimize mistakes. In
>>>>>> most cases, the results are astounding. We cannot, however, stop emails
>>>>>> from sometimes being sent to the wrong person. If you are not the intended
>>>>>> recipient, please notify us by replying to this email's sender and deleting
>>>>>> it (and any attachments) permanently from your system. If you are, please
>>>>>> respect the confidentiality of this communication's contents.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by Ascot Moss <as...@gmail.com>.
I tried to add -Xloggc:./jvm_gc.log

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:./jvm_gc.log -XX:+PrintGCDateStamps"

however, I could not find ./jvm_gc.log

How to resolve the OOM and gc log issue?

Regards

On Sun, Jul 24, 2016 at 6:37 AM, Ascot Moss <as...@gmail.com> wrote:

> My JDK is Java 1.8 u40
>
> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Since you specified +PrintGCDetails, you should be able to get some more
>> detail from the GC log.
>>
>> Also, which JDK version are you using ?
>>
>> Please use Java 8 where G1GC is more reliable.
>>
>> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss <as...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I added the following parameter:
>>>
>>> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
>>> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps"
>>>
>>> Still got Java heap space error.
>>>
>>> Any idea to resolve?  (my spark is 1.6.1)
>>>
>>>
>>> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
>>> 22, n1791): java.lang.OutOfMemoryError: Java heap space           at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>
>>>         at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>
>>>         at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>>>
>>>         at
>>> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>>>
>>>         at
>>> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>>>         at
>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>
>>>         at
>>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>>
>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>
>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>
>>>         at
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>
>>>         at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>
>>>         at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>
>>>         at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>>
>>>         at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>
>>>         at
>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>
>>>         at
>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>
>>>         at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>>         at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>>         at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>
>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>
>>>         at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Regards
>>>
>>>
>>>
>>> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss <as...@gmail.com>
>>> wrote:
>>>
>>>> Thanks. Trying with extra conf now.
>>>>
>>>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri <rk...@collectivei.com>
>>>> wrote:
>>>>
>>>>> I can see large number of collections happening on driver and
>>>>> eventually, driver is running out of memory. ( am not sure whether you have
>>>>> persisted any rdd or data frame). May be you would want to avoid doing so
>>>>> many collections or persist unwanted data in memory.
>>>>>
>>>>> To begin with, you may want to re-run the job with this following
>>>>> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
>>>>> idea of how you are getting OOM.
>>>>>
>>>>>
>>>>> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> Please help!
>>>>>
>>>>>  When running random forest training phase in cluster mode, I got GC
>>>>> overhead limit exceeded.
>>>>>
>>>>> I have used two parameters when submitting the job to cluster
>>>>>
>>>>> --driver-memory 64g \
>>>>>
>>>>> --executor-memory 8g \
>>>>>
>>>>> My Current settings:
>>>>>
>>>>> (spark-defaults.conf)
>>>>>
>>>>> spark.executor.memory           8g
>>>>>
>>>>> (spark-env.sh)
>>>>>
>>>>> export SPARK_WORKER_MEMORY=8g
>>>>>
>>>>> export HADOOP_HEAPSIZE=8000
>>>>>
>>>>>
>>>>> Any idea how to resolve it?
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ###  (the erro log) ###
>>>>>
>>>>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>>>>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>
>>>>>         at
>>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>>>
>>>>>         at
>>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>>>>
>>>>>         at
>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>>
>>>>>         at
>>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
>>>>>
>>>>>         at
>>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
>>>>>
>>>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>
>>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>
>>>>>         at
>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>
>>>>>         at
>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>
>>>>>         at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>
>>>>>         at
>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>
>>>>>         at scala.collection.TraversableOnce$class.to
>>>>> (TraversableOnce.scala:273)
>>>>>
>>>>>         at scala.collection.AbstractIterator.to
>>>>> <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
>>>>>
>>>>>         at
>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>
>>>>>         at
>>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>
>>>>>         at
>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>
>>>>>         at
>>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>
>>>>>         at
>>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>>
>>>>>         at
>>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>>
>>>>>         at
>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>
>>>>>         at
>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>
>>>>>         at
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>>
>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>>
>>>>>         at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>>
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>
>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>>
>>>>>
>>>>> Collective[i] dramatically improves sales and marketing performance
>>>>> using technology, applications and a revolutionary network designed to
>>>>> provide next generation analytics and decision-support directly to business
>>>>> users. Our goal is to maximize human potential and minimize mistakes. In
>>>>> most cases, the results are astounding. We cannot, however, stop emails
>>>>> from sometimes being sent to the wrong person. If you are not the intended
>>>>> recipient, please notify us by replying to this email's sender and deleting
>>>>> it (and any attachments) permanently from your system. If you are, please
>>>>> respect the confidentiality of this communication's contents.
>>>>
>>>>
>>>>
>>>
>>
>

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by Ascot Moss <as...@gmail.com>.
My JDK is Java 1.8 u40

On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu <yu...@gmail.com> wrote:

> Since you specified +PrintGCDetails, you should be able to get some more
> detail from the GC log.
>
> Also, which JDK version are you using ?
>
> Please use Java 8 where G1GC is more reliable.
>
> On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss <as...@gmail.com> wrote:
>
>> Hi,
>>
>> I added the following parameter:
>>
>> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
>> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps"
>>
>> Still got Java heap space error.
>>
>> Any idea to resolve?  (my spark is 1.6.1)
>>
>>
>> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID
>> 22, n1791): java.lang.OutOfMemoryError: Java heap space           at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>
>>         at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>
>>         at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>>
>>         at
>> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>>
>>         at
>> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>>         at
>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>
>>         at
>> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>>
>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>>         at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>>         at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>>         at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>>         at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>>
>>         at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>>
>>         at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>>         at
>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>>         at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>>         at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>>         at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>>         at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>>         at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>
>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Regards
>>
>>
>>
>> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss <as...@gmail.com> wrote:
>>
>>> Thanks. Trying with extra conf now.
>>>
>>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri <rk...@collectivei.com>
>>> wrote:
>>>
>>>> I can see large number of collections happening on driver and
>>>> eventually, driver is running out of memory. ( am not sure whether you have
>>>> persisted any rdd or data frame). May be you would want to avoid doing so
>>>> many collections or persist unwanted data in memory.
>>>>
>>>> To begin with, you may want to re-run the job with this following
>>>> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
>>>> idea of how you are getting OOM.
>>>>
>>>>
>>>> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> Please help!
>>>>
>>>>  When running random forest training phase in cluster mode, I got GC
>>>> overhead limit exceeded.
>>>>
>>>> I have used two parameters when submitting the job to cluster
>>>>
>>>> --driver-memory 64g \
>>>>
>>>> --executor-memory 8g \
>>>>
>>>> My Current settings:
>>>>
>>>> (spark-defaults.conf)
>>>>
>>>> spark.executor.memory           8g
>>>>
>>>> (spark-env.sh)
>>>>
>>>> export SPARK_WORKER_MEMORY=8g
>>>>
>>>> export HADOOP_HEAPSIZE=8000
>>>>
>>>>
>>>> Any idea how to resolve it?
>>>>
>>>> Regards
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ###  (the erro log) ###
>>>>
>>>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>>>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>
>>>>         at
>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>>
>>>>         at
>>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
>>>>
>>>>         at
>>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
>>>>
>>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>
>>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>
>>>>         at
>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>
>>>>         at scala.collection.TraversableOnce$class.to
>>>> (TraversableOnce.scala:273)
>>>>
>>>>         at scala.collection.AbstractIterator.to
>>>> <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>
>>>>         at
>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>
>>>>         at
>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>
>>>>         at
>>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>>
>>>>         at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>
>>>>         at
>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>
>>>>         at
>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>
>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>
>>>>         at
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>>
>>>> Collective[i] dramatically improves sales and marketing performance
>>>> using technology, applications and a revolutionary network designed to
>>>> provide next generation analytics and decision-support directly to business
>>>> users. Our goal is to maximize human potential and minimize mistakes. In
>>>> most cases, the results are astounding. We cannot, however, stop emails
>>>> from sometimes being sent to the wrong person. If you are not the intended
>>>> recipient, please notify us by replying to this email's sender and deleting
>>>> it (and any attachments) permanently from your system. If you are, please
>>>> respect the confidentiality of this communication's contents.
>>>
>>>
>>>
>>
>

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by Ted Yu <yu...@gmail.com>.
Since you specified +PrintGCDetails, you should be able to get some more
detail from the GC log.

Also, which JDK version are you using ?

Please use Java 8 where G1GC is more reliable.

On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss <as...@gmail.com> wrote:

> Hi,
>
> I added the following parameter:
>
> --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
> -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
> -XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps"
>
> Still got Java heap space error.
>
> Any idea to resolve?  (my spark is 1.6.1)
>
>
> 16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID 22,
> n1791): java.lang.OutOfMemoryError: Java heap space           at
> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>
>         at
> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>
>         at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)
>
>         at
> org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)
>
>         at
> org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
>         at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>
>         at
> org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)
>
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
>         at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>
>         at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>
>         at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>
>         at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
>
>         at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>
>         at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>
>         at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>
>         at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>
>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>
>         at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>
>         at
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Regards
>
>
>
> On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss <as...@gmail.com> wrote:
>
>> Thanks. Trying with extra conf now.
>>
>> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri <rk...@collectivei.com>
>> wrote:
>>
>>> I can see large number of collections happening on driver and
>>> eventually, driver is running out of memory. ( am not sure whether you have
>>> persisted any rdd or data frame). May be you would want to avoid doing so
>>> many collections or persist unwanted data in memory.
>>>
>>> To begin with, you may want to re-run the job with this following
>>> config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an
>>> idea of how you are getting OOM.
>>>
>>>
>>> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
>>>
>>> Hi
>>>
>>> Please help!
>>>
>>>  When running random forest training phase in cluster mode, I got GC
>>> overhead limit exceeded.
>>>
>>> I have used two parameters when submitting the job to cluster
>>>
>>> --driver-memory 64g \
>>>
>>> --executor-memory 8g \
>>>
>>> My Current settings:
>>>
>>> (spark-defaults.conf)
>>>
>>> spark.executor.memory           8g
>>>
>>> (spark-env.sh)
>>>
>>> export SPARK_WORKER_MEMORY=8g
>>>
>>> export HADOOP_HEAPSIZE=8000
>>>
>>>
>>> Any idea how to resolve it?
>>>
>>> Regards
>>>
>>>
>>>
>>>
>>>
>>>
>>> ###  (the erro log) ###
>>>
>>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>
>>>         at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>>
>>>         at
>>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>>
>>>         at
>>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>>
>>>         at
>>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>>
>>>         at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>
>>>         at
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>>
>>>         at
>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
>>>
>>>         at
>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
>>>
>>>         at
>>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
>>>
>>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>
>>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>
>>>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>
>>>         at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>
>>>         at
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>
>>>         at scala.collection.TraversableOnce$class.to
>>> (TraversableOnce.scala:273)
>>>
>>>         at scala.collection.AbstractIterator.to
>>> <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>
>>>         at
>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>
>>>         at
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>
>>>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>
>>>         at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>>         at
>>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>>
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>>         at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>
>>>         at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>
>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>
>>>         at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> Collective[i] dramatically improves sales and marketing performance
>>> using technology, applications and a revolutionary network designed to
>>> provide next generation analytics and decision-support directly to business
>>> users. Our goal is to maximize human potential and minimize mistakes. In
>>> most cases, the results are astounding. We cannot, however, stop emails
>>> from sometimes being sent to the wrong person. If you are not the intended
>>> recipient, please notify us by replying to this email's sender and deleting
>>> it (and any attachments) permanently from your system. If you are, please
>>> respect the confidentiality of this communication's contents.
>>
>>
>>
>

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by Ascot Moss <as...@gmail.com>.
Hi,

I added the following parameter:

--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC
-XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5
-XX:InitiatingHeapOccupancyPercent=70 -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps"

Still got Java heap space error.

Any idea to resolve?  (my spark is 1.6.1)


16/07/23 23:31:50 WARN TaskSetManager: Lost task 1.0 in stage 6.0 (TID 22,
n1791): java.lang.OutOfMemoryError: Java heap space           at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)

        at
scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)

        at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:248)

        at
org.apache.spark.util.collection.CompactBuffer.toArray(CompactBuffer.scala:30)

        at
org.apache.spark.mllib.tree.DecisionTree$.org$apache$spark$mllib$tree$DecisionTree$$findSplits$1(DecisionTree.scala:1009)
        at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

        at
org.apache.spark.mllib.tree.DecisionTree$$anonfun$29.apply(DecisionTree.scala:1042)

        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

        at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

        at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

        at scala.collection.AbstractIterator.to(Iterator.scala:1157)

        at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

        at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

        at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

        at
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

        at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

        at org.apache.spark.scheduler.Task.run(Task.scala:89)

        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Regards



On Sat, Jul 23, 2016 at 9:49 AM, Ascot Moss <as...@gmail.com> wrote:

> Thanks. Trying with extra conf now.
>
> On Sat, Jul 23, 2016 at 6:59 AM, RK Aduri <rk...@collectivei.com> wrote:
>
>> I can see large number of collections happening on driver and eventually,
>> driver is running out of memory. ( am not sure whether you have persisted
>> any rdd or data frame). May be you would want to avoid doing so many
>> collections or persist unwanted data in memory.
>>
>> To begin with, you may want to re-run the job with this following config: --conf
>> "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps” —> and this will give you an idea of how you are
>> getting OOM.
>>
>>
>> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
>>
>> Hi
>>
>> Please help!
>>
>>  When running random forest training phase in cluster mode, I got GC
>> overhead limit exceeded.
>>
>> I have used two parameters when submitting the job to cluster
>>
>> --driver-memory 64g \
>>
>> --executor-memory 8g \
>>
>> My Current settings:
>>
>> (spark-defaults.conf)
>>
>> spark.executor.memory           8g
>>
>> (spark-env.sh)
>>
>> export SPARK_WORKER_MEMORY=8g
>>
>> export HADOOP_HEAPSIZE=8000
>>
>>
>> Any idea how to resolve it?
>>
>> Regards
>>
>>
>>
>>
>>
>>
>> ###  (the erro log) ###
>>
>> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID
>> 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
>>
>>         at
>> scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
>>
>>         at
>> org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
>>
>>         at
>> org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
>>
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
>>
>>         at
>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
>>
>>         at
>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
>>
>>         at
>> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
>>
>>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>
>>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>
>>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>
>>         at
>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>
>>         at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>
>>         at
>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>
>>         at scala.collection.TraversableOnce$class.to
>> (TraversableOnce.scala:273)
>>
>>         at scala.collection.AbstractIterator.to
>> <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
>>
>>         at
>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>
>>         at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>
>>         at
>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>
>>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>
>>         at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>>         at
>> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
>>
>>         at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>>         at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>
>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Collective[i] dramatically improves sales and marketing performance using
>> technology, applications and a revolutionary network designed to provide
>> next generation analytics and decision-support directly to business users.
>> Our goal is to maximize human potential and minimize mistakes. In most
>> cases, the results are astounding. We cannot, however, stop emails from
>> sometimes being sent to the wrong person. If you are not the intended
>> recipient, please notify us by replying to this email's sender and deleting
>> it (and any attachments) permanently from your system. If you are, please
>> respect the confidentiality of this communication's contents.
>
>
>

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Posted by RK Aduri <rk...@collectivei.com>.
I can see large number of collections happening on driver and eventually, driver is running out of memory. ( am not sure whether you have persisted any rdd or data frame). May be you would want to avoid doing so many collections or persist unwanted data in memory.

To begin with, you may want to re-run the job with this following config: --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps” —> and this will give you an idea of how you are getting OOM.


> On Jul 22, 2016, at 3:52 PM, Ascot Moss <as...@gmail.com> wrote:
> 
> Hi
> 
> Please help!
> 
>  When running random forest training phase in cluster mode, I got GC overhead limit exceeded.
> 
> I have used two parameters when submitting the job to cluster
> --driver-memory 64g \
> 
> --executor-memory 8g \
> 
> 
> My Current settings:
> (spark-defaults.conf)
> 
> spark.executor.memory           8g
> 
> 
> (spark-env.sh)
> export SPARK_WORKER_MEMORY=8g
> 
> export HADOOP_HEAPSIZE=8000
> 
> 
> 
> Any idea how to resolve it?
> 
> Regards
> 
> 
> 
> 
> 
> 
> ###  (the erro log) ###
> 16/07/23 04:34:04 WARN TaskSetManager: Lost task 2.0 in stage 6.1 (TID 30, n1794): java.lang.OutOfMemoryError: GC overhead limit exceeded
> 
>         at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:138)
> 
>         at scala.reflect.ManifestFactory$$anon$12.newArray(Manifest.scala:136)
> 
>         at org.apache.spark.util.collection.CompactBuffer.growToSize(CompactBuffer.scala:144)
> 
>         at org.apache.spark.util.collection.CompactBuffer.$plus$plus$eq(CompactBuffer.scala:90)
> 
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
> 
>         at org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$1$$anonfun$10.apply(PairRDDFunctions.scala:505)
> 
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.mergeIfKeyExists(ExternalAppendOnlyMap.scala:318)
> 
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:365)
> 
>         at org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:265)
> 
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> 
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> 
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> 
>         at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> 
>         at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> 
>         at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> 
>         at scala.collection.TraversableOnce$class.to <http://class.to/>(TraversableOnce.scala:273)
> 
>         at scala.collection.AbstractIterator.to <http://scala.collection.abstractiterator.to/>(Iterator.scala:1157)
> 
>         at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> 
>         at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> 
>         at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> 
>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> 
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
> 
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)
> 
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
> 
>         at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
> 
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> 
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 
>         at java.lang.Thread.run(Thread.java:745)
> 


-- 
Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.