You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/04/19 11:13:00 UTC

[jira] [Commented] (SPARK-27511) Spark Streaming Driver Memory

    [ https://issues.apache.org/jira/browse/SPARK-27511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821847#comment-16821847 ] 

Hyukjin Kwon commented on SPARK-27511:
--------------------------------------

Let's ask questions into mailing lists rather then filing an issue here. You could have a better answer than this.

> Spark Streaming Driver Memory
> -----------------------------
>
>                 Key: SPARK-27511
>                 URL: https://issues.apache.org/jira/browse/SPARK-27511
>             Project: Spark
>          Issue Type: Question
>          Components: DStreams
>    Affects Versions: 2.4.0
>            Reporter: Badri Krishnan
>            Priority: Major
>
> Hello Apache Spark Community.
> We are currently facing an issue with one of our Spark Streaming jobs which consumes data from a IBM MQ, this is run on a AWS EMR cluster using DStreams and Checkpointing.
> Our Spark streaming job failed with several containers exiting with error code: 143. I checked your container logs. For example, one of the killed container's stdout logs [1] show the below error: (Exit code from container container_1553356041292_0001_15_000004 is : 143)
> 2019-03-28 19:32:26,569 ERROR [dispatcher-event-loop-3] org.apache.spark.streaming.receiver.ReceiverSupervisorImpl:Error stopping receiver 2 org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
> ....
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Failed to connect to ip-**-***-*.***.***.com/**.**.***.**:*****
> at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
> at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more
> These containers exited with code 143 because it was not able to reach the application master(Driver Process).
> Amazon mentioned that the Application Master is consuming more memory and hence recommended us to double it. As AM runs on driver, we were asked to increase spark.driver.memory from 1.4G to 3G. But the question that was unanswered was whether increasing the memory would solve the problem or delay the failure. As this is an ever running streaming application, do we need to consider something to understand whether the memory usage builds up over a period of time or are there any properties that needs to be set specific to how AM(application Master) works for streaming application. Any inputs on how to track the AM memory usage? Any insights will be helpful.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org