You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nan Zhu <zh...@gmail.com> on 2014/01/15 02:53:20 UTC

master attempted to re-register the worker and then took all workers as unregistered

Hi, all  

I’m trying to deploy spark in standalone mode, everything goes as usual,  

the webUI is accessible, the master node wrote some logs saying all workers are registered

14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started  
14/01/15 01:37:31 INFO ActorSystemImpl: RemoteServerStarted@akka://sparkMaster@172.31.36.93:7077
14/01/15 01:37:31 INFO Master: Starting Spark master at spark://172.31.36.93:7077
14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at http://ip-172-31-36-93.us-west-2.compute.internal:8080
14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO ActorSystemImpl: RemoteClientStarted@akka://spark@ip-172-31-37-160.us-west-2.compute.internal:43086




However, when I launched an application, the master firstly “attempted to re-register the worker” and then said that all heartbeats are from “unregistered” workers. Can anyone told me what happened here?

14/01/15 01:38:44 INFO Master: Registering app ALS  
14/01/15 01:38:44 INFO Master: Registered app ALS with ID app-20140115013844-0000
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/0 on worker worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/1 on worker worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/2 on worker worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/3 on worker worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/4 on worker worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal:43055
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal:37914
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal:47709
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal:55355
14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal:36257
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-34-61.us-west-2.compute.internal-37914
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-40-28.us-west-2.compute.internal-43055
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-43-78.us-west-2.compute.internal-36257
14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-41-251.us-west-2.compute.internal-47709
14/01/15 01:38:50 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355




Thank you very much!

Best,

--  
Nan Zhu


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Siyuan he <hs...@gmail.com>.
Hi Cheney
Which mode you are running? YARN or standalone?
I got the same exception when I ran spark on YARN.


On Tue, May 6, 2014 at 10:06 PM, Cheney Sun <su...@gmail.com> wrote:

> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program
> "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."): error=2,
> No such file or directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Yes, 0.9.1.


On Tue, Jul 8, 2014 at 10:26 PM, Nan Zhu <zh...@gmail.com> wrote:

>  Hi, Cheney,
>
> Thanks for the information
>
> which version are you using, 0.9.1?
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote:
>
> Hi Nan,
>
> The problem is still there, just as I described before. It's said that the
> issue had already been addressed in some JIRA and resolved in newer
> version, but I haven't get chance to try it.  If you have any finding,
> please let me know.
>
> Thanks,
> Cheney
>
>
> On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zh...@gmail.com> wrote:
>
>  Hey, Cheney,
>
> The problem is still existing?
>
> Sorry for the delay, I’m starting to look at this issue,
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
>
> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/
> compute-classpath.sh" (in directory "."): error=2, No such file or
> directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>
>
>
>

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
Hi, Cheney,  

Thanks for the information  

which version are you using, 0.9.1?  

Best,  

--  
Nan Zhu


On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote:

> Hi Nan,  
>  
> The problem is still there, just as I described before. It's said that the issue had already been addressed in some JIRA and resolved in newer version, but I haven't get chance to try it.  If you have any finding, please let me know.   
>  
> Thanks,
> Cheney
>  
>  
>  
> On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > Hey, Cheney,  
> >  
> > The problem is still existing?
> >  
> > Sorry for the delay, I’m starting to look at this issue,   
> >  
> > Best,  
> >  
> > --  
> > Nan Zhu
> >  
> >  
> > On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
> >  
> >  
> > > Hi Nan,
> > >  
> > > In worker's log, I see the following exception thrown when try to launch on executor. (The SPARK_HOME is wrongly specified on purpose, so there is no such file "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)").  
> > > After the exception was thrown several times, the worker was requested to kill the executor. Following the killing, the worker try to register again with master, but master reject the registration with WARN message" Got heartbeat from unregistered worker worker-20140504140005-host-spark-online001"
> > >  
> > > Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request addressing this issue? Thanks.
> > >  
> > > java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory  
> > >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> > >         at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
> > >         at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
> > >         at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> > > Caused by: java.io.IOException: error=2, No such file or directory
> > >         at java.lang.UNIXProcess.forkAndExec(Native Method)
> > >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
> > >         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
> > >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
> > >         ... 6 more
> > > ......
> > > 14/05/04 21:35:45 INFO Worker: Asked to kill executor app-20140504213545-0034/18
> > > 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished with state FAILED message class java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory
> > > 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found: app-20140504213545-0034/18
> > > java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
> > >         at scala.collection.MapLike$class.default(MapLike.scala:228)
> > >         at scala.collection.AbstractMap.default(Map.scala:58)
> > >         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> > >         at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
> > >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > >         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > >         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > >         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > >         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > >         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > >         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > >         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > > 14/05/04 21:35:45 INFO Worker: Starting Spark worker host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> > > 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> > > 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at http://host-spark-online001:8081
> > > 14/05/04 21:35:45 INFO Worker: Connecting to master spark://host-spark-online001:7077...
> > > 14/05/04 21:35:45 INFO Worker: Successfully registered with master spark://host-spark-online001:7077
> > >  
> > >  
> >  
>  


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Hi Nan,

The problem is still there, just as I described before. It's said that the
issue had already been addressed in some JIRA and resolved in newer
version, but I haven't get chance to try it.  If you have any finding,
please let me know.

Thanks,
Cheney


On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zh...@gmail.com> wrote:

>  Hey, Cheney,
>
> The problem is still existing?
>
> Sorry for the delay, I’m starting to look at this issue,
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
>
> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/
> compute-classpath.sh" (in directory "."): error=2, No such file or
> directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>
>

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
Hey, Cheney,  

The problem is still existing?

Sorry for the delay, I’m starting to look at this issue,  

Best,  

--  
Nan Zhu


On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:

> Hi Nan,
>  
> In worker's log, I see the following exception thrown when try to launch on executor. (The SPARK_HOME is wrongly specified on purpose, so there is no such file "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)").  
> After the exception was thrown several times, the worker was requested to kill the executor. Following the killing, the worker try to register again with master, but master reject the registration with WARN message" Got heartbeat from unregistered worker worker-20140504140005-host-spark-online001"
>  
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request addressing this issue? Thanks.
>  
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory  
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished with state FAILED message class java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found: app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master spark://host-spark-online001:7077
>  
>  


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
This is a bit different from what I met before,   

I’m suspecting that this is a new bug, I will look at this further  

--  
Nan Zhu


On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:

> Hi Nan,
>  
> In worker's log, I see the following exception thrown when try to launch on executor. (The SPARK_HOME is wrongly specified on purpose, so there is no such file "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)").  
> After the exception was thrown several times, the worker was requested to kill the executor. Following the killing, the worker try to register again with master, but master reject the registration with WARN message" Got heartbeat from unregistered worker worker-20140504140005-host-spark-online001"
>  
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request addressing this issue? Thanks.
>  
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory  
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished with state FAILED message class java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found: app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master spark://host-spark-online001:7077
>  
>  


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Hi Nan,

In worker's log, I see the following exception thrown when try to launch on
executor. (The SPARK_HOME is wrongly specified on purpose, so there is no
such file "/usr/local/spark1/bin/compute-classpath.sh").
After the exception was thrown several times, the worker was requested to
kill the executor. Following the killing, the worker try to register again
with master, but master reject the registration with WARN message" Got
heartbeat from unregistered worker
worker-20140504140005-host-spark-online001"

Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
addressing this issue? Thanks.

java.io.IOException: Cannot run program
"/usr/local/spark1/bin/compute-classpath.sh" (in directory "."): error=2,
No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
        at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
        at
org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
        at
org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
        at
org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
        at
org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
        at
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
        at java.lang.ProcessImpl.start(ProcessImpl.java:130)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
        ... 6 more
......
14/05/04 21:35:45 INFO Worker: Asked to kill executor
app-20140504213545-0034/18
14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished
with state FAILED message class java.io.IOException: Cannot run program
"/usr/local/spark1/bin/compute-classpath.sh" (in directory "."): error=2,
No such file or directory
14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
app-20140504213545-0034/18
java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
        at scala.collection.MapLike$class.default(MapLike.scala:228)
        at scala.collection.AbstractMap.default(Map.scala:58)
        at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
        at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/05/04 21:35:45 INFO Worker: Starting Spark worker
host-spark-online001:7078 with 10 cores, 28.0 GB RAM
14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
http://host-spark-online001:8081
14/05/04 21:35:45 INFO Worker: Connecting to master
spark://host-spark-online001:7077...
14/05/04 21:35:45 INFO Worker: Successfully registered with master
spark://host-spark-online001:7077

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
Ah, I think this should be fixed in 0.9.1?  

Did you see the exception is thrown in the worker side?

Best, 

-- 
Nan Zhu


On Sunday, May 4, 2014 at 10:15 PM, Cheney Sun wrote:

> Hi Nan, 
> 
> Have you found a way to fix the issue? Now I run into the same problem with
> version 0.9.1.
> 
> Thanks,
> Cheney
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5341.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com (http://Nabble.com).
> 
> 



Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Hi Nan, 

Have you found a way to fix the issue? Now I run into the same problem with
version 0.9.1.

Thanks,
Cheney



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5341.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
I got the reason for the weird behaviour  

the executor throws an exception due to the bug in application code (I forgot to set an env variable used in the application code in every machine) when starting  

then the master seems to remove the worker from the list (?) but the worker keeps sending the heartbeat but gets no reply, finally all workers are dead…

but obviously it should not work in this way, the problematic application code should not make all workers dead

I’m checking the source code to find the reason

Best,

--  
Nan Zhu


On Tuesday, January 14, 2014 at 8:53 PM, Nan Zhu wrote:

> Hi, all  
>  
> I’m trying to deploy spark in standalone mode, everything goes as usual,  
>  
> the webUI is accessible, the master node wrote some logs saying all workers are registered
>  
> 14/01/15 01:37:30 INFO Slf4jEventHandler: Slf4jEventHandler started  
> 14/01/15 01:37:31 INFO ActorSystemImpl: RemoteServerStarted@akka://sparkMaster@172.31.36.93 (mailto:sparkMaster@172.31.36.93):7077
> 14/01/15 01:37:31 INFO Master: Starting Spark master at spark://172.31.36.93:7077
> 14/01/15 01:37:31 INFO MasterWebUI: Started Master web UI at http://ip-172-31-36-93.us-west-2.compute.internal:8080
> 14/01/15 01:37:31 INFO Master: I have been elected leader! New state: ALIVE
> 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-34-61.us-west-2.compute.internal):37914
> 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-40-28.us-west-2.compute.internal):43055
> 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
> 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-45-211.us-west-2.compute.internal):55355
> 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
> 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
> 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-41-251.us-west-2.compute.internal):47709
> 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
> 14/01/15 01:37:34 INFO ActorSystemImpl: RemoteClientStarted@akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-43-78.us-west-2.compute.internal):36257
> 14/01/15 01:37:34 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO ActorSystemImpl: RemoteClientStarted@akka://spark@ip-172-31-37-160.us-west-2.compute.internal (mailto:spark@ip-172-31-37-160.us-west-2.compute.internal):43086
>  
>  
>  
>  
> However, when I launched an application, the master firstly “attempted to re-register the worker” and then said that all heartbeats are from “unregistered” workers. Can anyone told me what happened here?
>  
> 14/01/15 01:38:44 INFO Master: Registering app ALS  
> 14/01/15 01:38:44 INFO Master: Registered app ALS with ID app-20140115013844-0000
> 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/0 on worker worker-20140115013734-ip-172-31-43-78.us-west-2.compute.internal-36257
> 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/1 on worker worker-20140115013734-ip-172-31-40-28.us-west-2.compute.internal-43055
> 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/2 on worker worker-20140115013734-ip-172-31-34-61.us-west-2.compute.internal-37914
> 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/3 on worker worker-20140115013734-ip-172-31-45-211.us-west-2.compute.internal-55355
> 14/01/15 01:38:44 INFO Master: Launching executor app-20140115013844-0000/4 on worker worker-20140115013734-ip-172-31-41-251.us-west-2.compute.internal-47709
> 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-40-28.us-west-2.compute.internal:43055 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-40-28.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-40-28.us-west-2.compute.internal):43055
> 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-34-61.us-west-2.compute.internal:37914 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-34-61.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-34-61.us-west-2.compute.internal):37914
> 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-41-251.us-west-2.compute.internal:47709 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-41-251.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-41-251.us-west-2.compute.internal):47709
> 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-45-211.us-west-2.compute.internal:55355 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-45-211.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-45-211.us-west-2.compute.internal):55355
> 14/01/15 01:38:44 INFO Master: Registering worker ip-172-31-43-78.us-west-2.compute.internal:36257 with 2 cores, 6.3 GB RAM
> 14/01/15 01:38:44 INFO Master: Attempted to re-register worker at same address: akka://sparkWorker@ip-172-31-43-78.us-west-2.compute.internal (mailto:sparkWorker@ip-172-31-43-78.us-west-2.compute.internal):36257
> 14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-34-61.us-west-2.compute.internal-37914
> 14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355
> 14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-40-28.us-west-2.compute.internal-43055
> 14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-43-78.us-west-2.compute.internal-36257
> 14/01/15 01:38:44 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-41-251.us-west-2.compute.internal-47709
> 14/01/15 01:38:50 WARN Master: Got heartbeat from unregistered worker worker-20140115013844-ip-172-31-45-211.us-west-2.compute.internal-55355
>  
>  
>  
>  
> Thank you very much!
>  
> Best,
>  
> --  
> Nan Zhu
>