You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bharath Mukkati <sp...@gmail.com> on 2015/10/09 03:11:34 UTC

[Spark 1.5] Kinesis receivers not starting

Hi Spark Users,

I am testing my application on Spark 1.5 and kinesis-asl-1.5. The streaming
application starts but I see a ton of stages scheduled for
ReceiverTracker (submitJob
at ReceiverTracker.scala:557 <http://xxx>).

In the driver logs I see this sequence repeat:
15/10/09 00:10:54 INFO INFO ReceiverTracker: Starting 100 receivers
15/10/09 00:10:54 INFO ReceiverTracker: ReceiverTracker started

15/10/09 00:10:54 INFO ReceiverTracker: Receiver 0 started
15/10/09 00:10:54 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
(org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
+++
15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared fields: 3
15/10/09 00:10:54 DEBUG ClosureCleaner:      public static final long
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serialVersionUID
15/10/09 00:10:54 DEBUG ClosureCleaner:      private final scala.Option
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.checkpointDirOption$1
15/10/09 00:10:54 DEBUG ClosureCleaner:      private final
org.apache.spark.util.SerializableConfiguration
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serializableHadoopConf$1
15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared methods: 2
15/10/09 00:10:54 DEBUG ClosureCleaner:      public final java.lang.Object
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(java.lang.Object)
15/10/09 00:10:54 DEBUG ClosureCleaner:      public final void
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(scala.collection.Iterator)
15/10/09 00:10:54 DEBUG ClosureCleaner:  + inner classes: 0
15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer classes: 0
15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer objects: 0
15/10/09 00:10:54 DEBUG ClosureCleaner:  + populating accessed fields
because this is the starting closure
15/10/09 00:10:54 DEBUG ClosureCleaner:  + fields accessed by starting
closure: 0
15/10/09 00:10:54 DEBUG ClosureCleaner:  + there are no enclosing objects!
15/10/09 00:10:54 DEBUG ClosureCleaner:  +++ closure <function1>
(org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
is now cleaned +++

...
(and so on for 100 receivers)

And then I start seeing ...
15/10/09 00:11:02 INFO ReceiverTracker: Restarting Receiver 36
.. and so on for the other receivers

After which the I see Receiver started logs
15/10/09 00:11:02 INFO ReceiverTracker: Receiver 20 started
..
Again the Restarting Receiver logs appear

After a while the driver hangs, no new logs appear although the app seems
to be running. The streaming console shows scheduled stages and jobs.

There are no ERROR logs in the driver. However I see the following
Exceptions (DEBUG logs)

akka.remote.ShutDownAssociation: Shut down address:
akka.tcp://driverPropsFetcher@ip-<xxx>:57886
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The
remote system terminated the association because it is shutting down.
] from Actor[akka://sparkDriver/deadLetters]
15/10/09 00:10:37 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
message AssociationError [akka.tcp://sparkDriver@<xxx>:39053] <-
[akka.tcp://driverPropsFetcher@<xxx>:57886]: Error [Shut down address:
akka.tcp://driverPropsFetcher@<xxx>:57886] [
akka.remote.ShutDownAssociation: Shut down address:
akka.tcp://driverPropsFetcher@<xxx>:57886
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The
remote system terminated the association because it is shutting down.
] from Actor[akka://sparkDriver/deadLetters]

In one of the executor logs I see the following Exceptions:

application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 WARN receiver.ReceiverSupervisorImpl: Skip stopping receiver
because it has not yet stared
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.BlockGenerator: Stopping BlockGenerator
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.BlockGenerator: Waiting for block pushing thread to
terminate
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.BlockGenerator: Pushing out the last 0 blocks
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.BlockGenerator: Stopped block pushing thread
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.BlockGenerator: Stopped BlockGenerator
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver to be
stopped
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:37 INFO receiver.ReceiverSupervisorImpl: Stopped receiver without
error
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:38 INFO receiver.BlockGenerator: Started BlockGenerator
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:38 INFO receiver.BlockGenerator: Started block pushing thread
application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
00:45:38 INFO receiver.ReceiverSupervisorImpl: Stopping receiver with
message: Registered unsuccessfully because Driver refused to start receiver
46:


There is no data in the kinesis stream from where the app is reading. The
number of shards is 100. And the app starts 100 receivers.

Has anyone else seen this behavior? Any ideas on how I can debug the
problem and find out the root cause and fix would be very helpful.

Thanks,

Bharath

Re: [Spark 1.5] Kinesis receivers not starting

Posted by Tathagata Das <td...@databricks.com>.
How many executors and cores do you acquire?

td

On Thu, Oct 8, 2015 at 6:11 PM, Bharath Mukkati <sp...@gmail.com>
wrote:

> Hi Spark Users,
>
> I am testing my application on Spark 1.5 and kinesis-asl-1.5. The
> streaming application starts but I see a ton of stages scheduled for
> ReceiverTracker (submitJob at ReceiverTracker.scala:557 <http://xxx>).
>
> In the driver logs I see this sequence repeat:
> 15/10/09 00:10:54 INFO INFO ReceiverTracker: Starting 100 receivers
> 15/10/09 00:10:54 INFO ReceiverTracker: ReceiverTracker started
>
> 15/10/09 00:10:54 INFO ReceiverTracker: Receiver 0 started
> 15/10/09 00:10:54 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
> (org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
> +++
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared fields: 3
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public static final long
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serialVersionUID
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      private final scala.Option
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.checkpointDirOption$1
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      private final
> org.apache.spark.util.SerializableConfiguration
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.serializableHadoopConf$1
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + declared methods: 2
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public final java.lang.Object
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(java.lang.Object)
> 15/10/09 00:10:54 DEBUG ClosureCleaner:      public final void
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(scala.collection.Iterator)
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + inner classes: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer classes: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + outer objects: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + populating accessed fields
> because this is the starting closure
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + fields accessed by starting
> closure: 0
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 15/10/09 00:10:54 DEBUG ClosureCleaner:  +++ closure <function1>
> (org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9)
> is now cleaned +++
>
> ...
> (and so on for 100 receivers)
>
> And then I start seeing ...
> 15/10/09 00:11:02 INFO ReceiverTracker: Restarting Receiver 36
> .. and so on for the other receivers
>
> After which the I see Receiver started logs
> 15/10/09 00:11:02 INFO ReceiverTracker: Receiver 20 started
> ..
> Again the Restarting Receiver logs appear
>
> After a while the driver hangs, no new logs appear although the app seems
> to be running. The streaming console shows scheduled stages and jobs.
>
> There are no ERROR logs in the driver. However I see the following
> Exceptions (DEBUG logs)
>
> akka.remote.ShutDownAssociation: Shut down address:
> akka.tcp://driverPropsFetcher@ip-<xxx>:57886
> Caused by: akka.remote.transport.Transport$InvalidAssociationException:
> The remote system terminated the association because it is shutting down.
> ] from Actor[akka://sparkDriver/deadLetters]
> 15/10/09 00:10:37 DEBUG
> AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1: [actor] received
> message AssociationError [akka.tcp://sparkDriver@<xxx>:39053] <-
> [akka.tcp://driverPropsFetcher@<xxx>:57886]: Error [Shut down address:
> akka.tcp://driverPropsFetcher@<xxx>:57886] [
> akka.remote.ShutDownAssociation: Shut down address:
> akka.tcp://driverPropsFetcher@<xxx>:57886
> Caused by: akka.remote.transport.Transport$InvalidAssociationException:
> The remote system terminated the association because it is shutting down.
> ] from Actor[akka://sparkDriver/deadLetters]
>
> In one of the executor logs I see the following Exceptions:
>
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 WARN receiver.ReceiverSupervisorImpl: Skip stopping receiver
> because it has not yet stared
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopping BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Waiting for block pushing thread to
> terminate
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Pushing out the last 0 blocks
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopped block pushing thread
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.BlockGenerator: Stopped BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.ReceiverSupervisorImpl: Waiting for receiver to be
> stopped
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:37 INFO receiver.ReceiverSupervisorImpl: Stopped receiver without
> error
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.BlockGenerator: Started BlockGenerator
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.BlockGenerator: Started block pushing thread
> application_1444344955519_0001/container_1444344955519_0001_01_000005/stderr:15/10/09
> 00:45:38 INFO receiver.ReceiverSupervisorImpl: Stopping receiver with
> message: Registered unsuccessfully because Driver refused to start receiver
> 46:
>
>
> There is no data in the kinesis stream from where the app is reading. The
> number of shards is 100. And the app starts 100 receivers.
>
> Has anyone else seen this behavior? Any ideas on how I can debug the
> problem and find out the root cause and fix would be very helpful.
>
> Thanks,
>
> Bharath
>
>
>