You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alchemist <al...@gmail.com> on 2020/04/18 19:22:11 UTC

Spark stuck at removing broadcast variable

I am running a simple Spark structured streaming application that is pulling data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed that Spark is trying to pull data from all 1024 Kafka partition and after running successful for few iteration it is stuck with following exception:
20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 10120/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 6620/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 7720/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 78
20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on  in memory (size: 4.5 KB, free: 2.7 GB)20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7 GB)20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7 GB)Then Sparks show RUNNING but it is NOT Processing any data.

Re: Spark stuck at removing broadcast variable

Posted by Waleed Fateem <wa...@gmail.com>.
This might be obvious but just checking anyways, did you confirm whether or
not all of the messages have already been consumed by Spark? If that's the
case then I wouldn't expect much to happen unless new data comes into your
Kafka topic.

If you're a hundred percent sure that there's still plenty more data to be
consumed by Spark and that didn't happen, then I would suggest generating
Java thread dumps (use Java's jstack command) from your driver's process.

On Sat, Apr 18, 2020 at 2:43 PM Sean Owen <sr...@gmail.com> wrote:

> I don't think that means it's stuck on removing something; it was
> removed. Not sure what it is waiting on - more data perhaps?
>
> On Sat, Apr 18, 2020 at 2:22 PM Alchemist <al...@gmail.com>
> wrote:
> >
> > I am running a simple Spark structured streaming application that is
> pulling data from a Kafka Topic. I have a Kafka Topic with nearly 1000
> partitions. I am running this app on 6 node EMR cluster with 4 cores and
> 16GB RAM. I observed that Spark is trying to pull data from all 1024 Kafka
> partition and after running successful for few iteration it is stuck with
> following exception:
> >
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 101
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 66
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 77
> > 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 78
> >
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> in memory (size: 4.5 KB, free: 2.7 GB)
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> ip- in memory (size: 4.5 KB, free: 2.7 GB)
> > 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
> ip- in memory (size: 4.5 KB, free: 2.7 GB)
> > Then Sparks show RUNNING but it is NOT Processing any data.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark stuck at removing broadcast variable

Posted by Sean Owen <sr...@gmail.com>.
I don't think that means it's stuck on removing something; it was
removed. Not sure what it is waiting on - more data perhaps?

On Sat, Apr 18, 2020 at 2:22 PM Alchemist <al...@gmail.com> wrote:
>
> I am running a simple Spark structured streaming application that is pulling data from a Kafka Topic. I have a Kafka Topic with nearly 1000 partitions. I am running this app on 6 node EMR cluster with 4 cores and 16GB RAM. I observed that Spark is trying to pull data from all 1024 Kafka partition and after running successful for few iteration it is stuck with following exception:
>
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 101
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 66
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 77
> 20/04/18 00:51:41 INFO ContextCleaner: Cleaned accumulator 78
>
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on  in memory (size: 4.5 KB, free: 2.7 GB)
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7 GB)
> 20/04/18 00:51:41 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip- in memory (size: 4.5 KB, free: 2.7 GB)
> Then Sparks show RUNNING but it is NOT Processing any data.
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org