You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bharath Ravi Kumar <re...@gmail.com> on 2014/05/15 13:38:47 UTC

Standalone client failing with docker deployed cluster

Hi,

I'm running the spark server with a single worker on a laptop using the
docker images. The spark shell examples run fine with this setup. However,
a standalone java client that tries to run wordcount on a local files (1 MB
in size), the execution fails with the following error on the stdout of the
worker:

14/05/15 10:31:21 INFO Slf4jLogger: Slf4jLogger started
14/05/15 10:31:21 INFO Remoting: Starting remoting
14/05/15 10:31:22 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@worker1:55924]
14/05/15 10:31:22 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkExecutor@worker1:55924]
14/05/15 10:31:22 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://spark@R9FX97h.local:56720/user/CoarseGrainedScheduler
14/05/15 10:31:22 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@worker1:50040/user/Worker
14/05/15 10:31:22 WARN Remoting: Tried to associate with unreachable remote
address [akka.tcp://spark@R9FX97h.local:56720]. Address is now gated for
60000 ms, all messages to this address will be delivered to dead letters.
14/05/15 10:31:22 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
[akka.tcp://sparkExecutor@worker1:55924] ->
[akka.tcp://spark@R9FX97h.local:56720]
disassociated! Shutting down.

I noticed the following messages on the worker console when I attached
through docker:

14/05/15 11:24:33 INFO Worker: Asked to launch executor
app-20140515112408-0005/7 for billingLogProcessor
14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:42437]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:42437]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:42437]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:42437
]
14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:42437]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:42437]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:42437]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:42437
]
14/05/15 11:24:33 INFO ExecutorRunner: Launch command:
"/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
"-Xms512M" "-Xmx512M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://spark@R9FX97h.local:46986/user/CoarseGrainedScheduler" "7"
"worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
"app-20140515112408-0005"
14/05/15 11:24:35 INFO Worker: Executor app-20140515112408-0005/7 finished
with state FAILED message Command exited with code 1 exitStatus 1
14/05/15 11:24:35 INFO LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40172.17.0.4%3A33648-135#310170905]
was not delivered. [34] dead letters encountered. This logging can be
turned off or adjusted with configuration settings 'akka.log-dead-letters'
and 'akka.log-dead-letters-during-shutdown'.
14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:56594]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:56594]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:56594]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:56594
]
14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
[akka.tcp://sparkWorker@worker1:50040] ->
[akka.tcp://sparkExecutor@worker1:56594]:
Error [Association failed with [akka.tcp://sparkExecutor@worker1:56594]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkExecutor@worker1:56594]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: worker1/172.17.0.4:56594
]

The significant code snippets from the standalone java client are as
follows:

JavaSparkContext ctx = new JavaSparkContext(masterAddr, "log_processor",
sparkHome, jarFileLoc);
JavaRDD<String> rawLog = ctx.textFile("/tmp/some.log");
List<Tuple2<String, Long>> topRecords =
rawLog.map(fieldSplitter).map(fieldExtractor).top(5, tupleComparator);


However, running the sample code provided on github (amplab docker page)
over the spark shell went through fine with the following stdout message:

14/05/15 10:39:41 INFO Slf4jLogger: Slf4jLogger started
14/05/15 10:39:42 INFO Remoting: Starting remoting
14/05/15 10:39:42 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkExecutor@worker1:33203]
14/05/15 10:39:42 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkExecutor@worker1:33203]
14/05/15 10:39:42 INFO CoarseGrainedExecutorBackend: Connecting to driver:
akka.tcp://spark@shell18046:45505/user/CoarseGrainedScheduler
14/05/15 10:39:42 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@worker1:50040/user/Worker
14/05/15 10:39:42 INFO WorkerWatcher: Successfully connected to
akka.tcp://sparkWorker@worker1:50040/user/Worker
14/05/15 10:39:42 INFO CoarseGrainedExecutorBackend: Successfully
registered with driver
...

The correspond output seen on the on the worker was:

14/05/15 11:31:31 INFO Worker: Asked to launch executor
app-20140515113131-0006/0 for Spark shell
14/05/15 11:31:31 INFO ExecutorRunner: Launch command:
"/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
"-Xms800M" "-Xmx800M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://spark@shell16722:52142/user/CoarseGrainedScheduler" "0"
"worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
"app-20140515113131-0006"

Any pointer towards what might be wrong with the standalone client?
Apologies for the lengthy log messages. Thanks in advance.

-Bharath

Re: Standalone client failing with docker deployed cluster

Posted by Bharath Ravi Kumar <re...@gmail.com>.
(Trying to bubble up the issue again...)

Any insights (based on the enclosed logs) into why standalone client
invocation might fail while issuing jobs through the spark console
succeeded?

Thanks,
Bharath


On Thu, May 15, 2014 at 5:08 PM, Bharath Ravi Kumar <re...@gmail.com>wrote:

> Hi,
>
> I'm running the spark server with a single worker on a laptop using the
> docker images. The spark shell examples run fine with this setup. However,
> a standalone java client that tries to run wordcount on a local files (1 MB
> in size), the execution fails with the following error on the stdout of the
> worker:
>
> 14/05/15 10:31:21 INFO Slf4jLogger: Slf4jLogger started
> 14/05/15 10:31:21 INFO Remoting: Starting remoting
> 14/05/15 10:31:22 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkExecutor@worker1:55924]
> 14/05/15 10:31:22 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sparkExecutor@worker1:55924]
> 14/05/15 10:31:22 INFO CoarseGrainedExecutorBackend: Connecting to driver:
> akka.tcp://spark@R9FX97h.local:56720/user/CoarseGrainedScheduler
> 14/05/15 10:31:22 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@worker1:50040/user/Worker
> 14/05/15 10:31:22 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://spark@R9FX97h.local:56720]. Address is now
> gated for 60000 ms, all messages to this address will be delivered to dead
> letters.
> 14/05/15 10:31:22 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
> [akka.tcp://sparkExecutor@worker1:55924] ->
> [akka.tcp://spark@R9FX97h.local:56720] disassociated! Shutting down.
>
> I noticed the following messages on the worker console when I attached
> through docker:
>
> 14/05/15 11:24:33 INFO Worker: Asked to launch executor
> app-20140515112408-0005/7 for billingLogProcessor
> 14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:42437]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:42437
> ]
> 14/05/15 11:24:33 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:42437]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:42437]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:42437
> ]
> 14/05/15 11:24:33 INFO ExecutorRunner: Launch command:
> "/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
> ":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
> "-Xms512M" "-Xmx512M"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend"
> "akka.tcp://spark@R9FX97h.local:46986/user/CoarseGrainedScheduler" "7"
> "worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
> "app-20140515112408-0005"
> 14/05/15 11:24:35 INFO Worker: Executor app-20140515112408-0005/7 finished
> with state FAILED message Command exited with code 1 exitStatus 1
> 14/05/15 11:24:35 INFO LocalActorRef: Message
> [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
> Actor[akka://sparkWorker/deadLetters] to
> Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40172.17.0.4%3A33648-135#310170905]
> was not delivered. [34] dead letters encountered. This logging can be
> turned off or adjusted with configuration settings 'akka.log-dead-letters'
> and 'akka.log-dead-letters-during-shutdown'.
> 14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:56594]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:56594
> ]
> 14/05/15 11:24:35 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@worker1:50040] ->
> [akka.tcp://sparkExecutor@worker1:56594]: Error [Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkExecutor@worker1:56594]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
> Connection refused: worker1/172.17.0.4:56594
> ]
>
> The significant code snippets from the standalone java client are as
> follows:
>
> JavaSparkContext ctx = new JavaSparkContext(masterAddr, "log_processor",
> sparkHome, jarFileLoc);
> JavaRDD<String> rawLog = ctx.textFile("/tmp/some.log");
> List<Tuple2<String, Long>> topRecords =
> rawLog.map(fieldSplitter).map(fieldExtractor).top(5, tupleComparator);
>
>
> However, running the sample code provided on github (amplab docker page)
> over the spark shell went through fine with the following stdout message:
>
> 14/05/15 10:39:41 INFO Slf4jLogger: Slf4jLogger started
> 14/05/15 10:39:42 INFO Remoting: Starting remoting
> 14/05/15 10:39:42 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkExecutor@worker1:33203]
> 14/05/15 10:39:42 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://sparkExecutor@worker1:33203]
> 14/05/15 10:39:42 INFO CoarseGrainedExecutorBackend: Connecting to driver:
> akka.tcp://spark@shell18046:45505/user/CoarseGrainedScheduler
> 14/05/15 10:39:42 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@worker1:50040/user/Worker
> 14/05/15 10:39:42 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkWorker@worker1:50040/user/Worker
> 14/05/15 10:39:42 INFO CoarseGrainedExecutorBackend: Successfully
> registered with driver
> ...
>
> The correspond output seen on the on the worker was:
>
> 14/05/15 11:31:31 INFO Worker: Asked to launch executor
> app-20140515113131-0006/0 for Spark shell
> 14/05/15 11:31:31 INFO ExecutorRunner: Launch command:
> "/usr/lib/jvm/java-7-openjdk-amd64/bin/java" "-cp"
> ":/opt/spark-0.9.0/conf:/opt/spark-0.9.0/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop1.0.4.jar"
> "-Xms800M" "-Xmx800M"
> "org.apache.spark.executor.CoarseGrainedExecutorBackend"
> "akka.tcp://spark@shell16722:52142/user/CoarseGrainedScheduler" "0"
> "worker1" "1" "akka.tcp://sparkWorker@worker1:50040/user/Worker"
> "app-20140515113131-0006"
>
> Any pointer towards what might be wrong with the standalone client?
> Apologies for the lengthy log messages. Thanks in advance.
>
> -Bharath
>