You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by map reduced <k3...@gmail.com> on 2016/10/24 19:58:53 UTC

Spark Streaming Kafka job stuck in 'processing' stage

Hi,

I have a streaming job that reads from Kafka (@1min batch) and after some
processing POSTs it to a HTTP endpoint. Every few hours it's getting stuck
in 'processing' stage and starts queueing jobs thereafter:

[image: Inline image 1]

After examining the running 'Executors' (in app-UI page) I found that only
1 out of 6 executors was showing 2 'Active Tasks'.
[image: Inline image 2]

Upon doing thread dump for that, it showed 2 threads for "Executor task
launch worker" threadpool (source
<https://github.com/ueshin/apache-spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L85>).
These threads were all stuck at the same error:

[image: Inline image 3]

This seems to be a JDK bug
<http://bugs.java.com/view_bug.do?bug_id=7012768> that
must have been fixed in JDK 7 - I made sure that I am using '1.8.0_101
(Oracle Corporation)'. I tried adding following on the command line (as
suggested here <https://wiki.zimbra.com/wiki/Configuring_for_IPv4>), but it
didn't fix the issue:

-Djava.net.preferIPv4Stack=true -Dnetworkaddress.cache.ttl=60

Does anyone have any ideas on an approach to debug/fix this?
Thanks,
KP

Re: Spark Streaming Kafka job stuck in 'processing' stage

Posted by map reduced <k3...@gmail.com>.
Found the reason (hopefully), answered on my SO question
https://stackoverflow.com/questions/40225135/spark-streaming-kafka-job-stuck-in-processing-stage
.

It turned out to be a kernel-level bug
https://bugzilla.redhat.com/show_bug.cgi?id=1209433 which is resolved in
linux kernel version 4.0.6 and the hosts where my workers are running they
have RHEL with kernel version 3.5.6. Hopefully after deploying on newer
CentOS machines with kernel version 4.5 it won't be an issue.

How I figured it out is every time it gets stuck at 'checkLookupTable' or
'lookupAllHostAddr', both are native (JNI) calls to underlying OS.

On Mon, Oct 24, 2016 at 12:58 PM, map reduced <k3...@gmail.com> wrote:

> Hi,
>
> I have a streaming job that reads from Kafka (@1min batch) and after some
> processing POSTs it to a HTTP endpoint. Every few hours it's getting stuck
> in 'processing' stage and starts queueing jobs thereafter:
>
> [image: Inline image 1]
>
> After examining the running 'Executors' (in app-UI page) I found that only
> 1 out of 6 executors was showing 2 'Active Tasks'.
> [image: Inline image 2]
>
> Upon doing thread dump for that, it showed 2 threads for "Executor task
> launch worker" threadpool (source
> <https://github.com/ueshin/apache-spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L85>).
> These threads were all stuck at the same error:
>
> [image: Inline image 3]
>
> This seems to be a JDK bug
> <http://bugs.java.com/view_bug.do?bug_id=7012768> that must have been
> fixed in JDK 7 - I made sure that I am using '1.8.0_101 (Oracle
> Corporation)'. I tried adding following on the command line (as suggested
> here <https://wiki.zimbra.com/wiki/Configuring_for_IPv4>), but it didn't
> fix the issue:
>
> -Djava.net.preferIPv4Stack=true -Dnetworkaddress.cache.ttl=60
>
> Does anyone have any ideas on an approach to debug/fix this?
> Thanks,
> KP
>