You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Charles O. Bajomo" <ch...@pretechconsulting.co.uk> on 2017/03/06 02:37:29 UTC

[Spark Streamiing] Streaming job failing consistently after 1h

Hello all, 

I have a strange behaviour I can't understand. I have a streaming job using a custom java receiver that pull data from a jms queue that I process and then write to HDFS as parquet and avro files. For some reason my job keeps failing after 1hr and 30 minutes. When It fails I get an error saying the "container is running beyond physical memory limits. Current Usage 4.5GB of 4.5GB physical memory used. 6.4GB of 9.4GB virtual memory used. ". to be honest I don;t understand the error, What are the memory limits shown in the error referring to? I allocated 10 executors with 6 cores each and 4G of executor and driver memory. I set the overhead memory to 2.8G, so the values don't add up. 

Anyone have any idea what the error is referring? I have increased the memory and i didn't help, it appears it just bought me more time. 

Thanks.

Re: [Spark Streamiing] Streaming job failing consistently after 1h

Posted by Manish Malhotra <ma...@gmail.com>.

Im also facing same problem.

I have implemented Java based custom receiver, which consumes from
messaging system say JMS.
once received message, I call store(object) ... Im storing spark Row object.

it run for around 8 hrs, and then goes OOM, and OOM is happening in
receiver nodes.
I also tried to run multiple receivers, to distribute the load but faces
the same issue.

something fundamentally we are doing wrong, which tells custom
receiver/spark to release the memory.
but Im not able to crack that, atleast till now.

any help is appreciated spark group !!

Regards,
Manish

On Sun, Mar 5, 2017 at 6:37 PM, Charles O. Bajomo <
charles.bajomo@pretechconsulting.co.uk> wrote:

> Hello all,
>
> I have a strange behaviour I can't understand. I have a streaming job
> using a custom java receiver that pull data from a jms queue that I process
> and then write to HDFS as parquet and avro files. For some reason my job
> keeps failing after 1hr and 30 minutes. When It fails I get an error saying
> the "container is running beyond physical memory limits. Current Usage
> 4.5GB of 4.5GB physical memory used. 6.4GB of 9.4GB virtual memory used. ".
> to be honest I don;t understand the error,  What are the memory limits
> shown in the error referring to? I allocated 10 executors with 6 cores each
> and 4G of executor and driver memory. I set the overhead memory to 2.8G, so
> the values don't add up.
>
> Anyone have any idea what the error is referring? I have increased the
> memory and i didn't help, it appears it just bought me more time.
>
> Thanks.
>