You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/03/06 02:37:29 UTC
[Spark Streamiing] Streaming job failing consistently after 1h
Hello all,
I have a strange behaviour I can't understand. I have a streaming job using a custom java receiver that pull data from a jms queue that I process and then write to HDFS as parquet and avro files. For some reason my job keeps failing after 1hr and 30 minutes. When It fails I get an error saying the "container is running beyond physical memory limits. Current Usage 4.5GB of 4.5GB physical memory used. 6.4GB of 9.4GB virtual memory used. ". to be honest I don;t understand the error, What are the memory limits shown in the error referring to? I allocated 10 executors with 6 cores each and 4G of executor and driver memory. I set the overhead memory to 2.8G, so the values don't add up.
Anyone have any idea what the error is referring? I have increased the memory and i didn't help, it appears it just bought me more time.
Thanks.
Re: [Spark Streamiing] Streaming job failing consistently after 1h
Posted by Manish Malhotra <ma...@gmail.com>.
Im also facing same problem.
I have implemented Java based custom receiver, which consumes from
messaging system say JMS.
once received message, I call store(object) ... Im storing spark Row object.
it run for around 8 hrs, and then goes OOM, and OOM is happening in
receiver nodes.
I also tried to run multiple receivers, to distribute the load but faces
the same issue.
something fundamentally we are doing wrong, which tells custom
receiver/spark to release the memory.
but Im not able to crack that, atleast till now.
any help is appreciated spark group !!
Regards,
Manish
On Sun, Mar 5, 2017 at 6:37 PM, Charles O. Bajomo <
charles.bajomo@pretechconsulting.co.uk> wrote:
> Hello all,
>
> I have a strange behaviour I can't understand. I have a streaming job
> using a custom java receiver that pull data from a jms queue that I process
> and then write to HDFS as parquet and avro files. For some reason my job
> keeps failing after 1hr and 30 minutes. When It fails I get an error saying
> the "container is running beyond physical memory limits. Current Usage
> 4.5GB of 4.5GB physical memory used. 6.4GB of 9.4GB virtual memory used. ".
> to be honest I don;t understand the error, What are the memory limits
> shown in the error referring to? I allocated 10 executors with 6 cores each
> and 4G of executor and driver memory. I set the overhead memory to 2.8G, so
> the values don't add up.
>
> Anyone have any idea what the error is referring? I have increased the
> memory and i didn't help, it appears it just bought me more time.
>
> Thanks.
>