You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nan Zhu <zh...@gmail.com> on 2014/07/08 01:16:30 UTC

Re: master attempted to re-register the worker and then took all workers as unregistered

Hey, Cheney,  

The problem is still existing?

Sorry for the delay, I’m starting to look at this issue,  

Best,  

--  
Nan Zhu


On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:

> Hi Nan,
>  
> In worker's log, I see the following exception thrown when try to launch on executor. (The SPARK_HOME is wrongly specified on purpose, so there is no such file "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)").  
> After the exception was thrown several times, the worker was requested to kill the executor. Following the killing, the worker try to register again with master, but master reject the registration with WARN message" Got heartbeat from unregistered worker worker-20140504140005-host-spark-online001"
>  
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request addressing this issue? Thanks.
>  
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory  
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished with state FAILED message class java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found: app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master spark://host-spark-online001:7077
>  
>  


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Yes, 0.9.1.


On Tue, Jul 8, 2014 at 10:26 PM, Nan Zhu <zh...@gmail.com> wrote:

>  Hi, Cheney,
>
> Thanks for the information
>
> which version are you using, 0.9.1?
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote:
>
> Hi Nan,
>
> The problem is still there, just as I described before. It's said that the
> issue had already been addressed in some JIRA and resolved in newer
> version, but I haven't get chance to try it.  If you have any finding,
> please let me know.
>
> Thanks,
> Cheney
>
>
> On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zh...@gmail.com> wrote:
>
>  Hey, Cheney,
>
> The problem is still existing?
>
> Sorry for the delay, I’m starting to look at this issue,
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
>
> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/
> compute-classpath.sh" (in directory "."): error=2, No such file or
> directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>
>
>
>

Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Nan Zhu <zh...@gmail.com>.
Hi, Cheney,  

Thanks for the information  

which version are you using, 0.9.1?  

Best,  

--  
Nan Zhu


On Tuesday, July 8, 2014 at 10:09 AM, Cheney Sun wrote:

> Hi Nan,  
>  
> The problem is still there, just as I described before. It's said that the issue had already been addressed in some JIRA and resolved in newer version, but I haven't get chance to try it.  If you have any finding, please let me know.   
>  
> Thanks,
> Cheney
>  
>  
>  
> On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > Hey, Cheney,  
> >  
> > The problem is still existing?
> >  
> > Sorry for the delay, I’m starting to look at this issue,   
> >  
> > Best,  
> >  
> > --  
> > Nan Zhu
> >  
> >  
> > On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
> >  
> >  
> > > Hi Nan,
> > >  
> > > In worker's log, I see the following exception thrown when try to launch on executor. (The SPARK_HOME is wrongly specified on purpose, so there is no such file "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)").  
> > > After the exception was thrown several times, the worker was requested to kill the executor. Following the killing, the worker try to register again with master, but master reject the registration with WARN message" Got heartbeat from unregistered worker worker-20140504140005-host-spark-online001"
> > >  
> > > Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request addressing this issue? Thanks.
> > >  
> > > java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory  
> > >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> > >         at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
> > >         at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
> > >         at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
> > >         at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> > > Caused by: java.io.IOException: error=2, No such file or directory
> > >         at java.lang.UNIXProcess.forkAndExec(Native Method)
> > >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
> > >         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
> > >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
> > >         ... 6 more
> > > ......
> > > 14/05/04 21:35:45 INFO Worker: Asked to kill executor app-20140504213545-0034/18
> > > 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18 finished with state FAILED message class java.io.IOException: Cannot run program "/usr/local/spark1/bin/compute-classpath.sh (http://compute-classpath.sh)" (in directory "."): error=2, No such file or directory
> > > 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found: app-20140504213545-0034/18
> > > java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
> > >         at scala.collection.MapLike$class.default(MapLike.scala:228)
> > >         at scala.collection.AbstractMap.default(Map.scala:58)
> > >         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
> > >         at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
> > >         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> > >         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> > >         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> > >         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> > >         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> > >         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > >         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > >         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > >         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > > 14/05/04 21:35:45 INFO Worker: Starting Spark worker host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> > > 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> > > 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at http://host-spark-online001:8081
> > > 14/05/04 21:35:45 INFO Worker: Connecting to master spark://host-spark-online001:7077...
> > > 14/05/04 21:35:45 INFO Worker: Successfully registered with master spark://host-spark-online001:7077
> > >  
> > >  
> >  
>  


Re: master attempted to re-register the worker and then took all workers as unregistered

Posted by Cheney Sun <su...@gmail.com>.
Hi Nan,

The problem is still there, just as I described before. It's said that the
issue had already been addressed in some JIRA and resolved in newer
version, but I haven't get chance to try it.  If you have any finding,
please let me know.

Thanks,
Cheney


On Tue, Jul 8, 2014 at 7:16 AM, Nan Zhu <zh...@gmail.com> wrote:

>  Hey, Cheney,
>
> The problem is still existing?
>
> Sorry for the delay, I’m starting to look at this issue,
>
> Best,
>
> --
> Nan Zhu
>
> On Tuesday, May 6, 2014 at 10:06 PM, Cheney Sun wrote:
>
> Hi Nan,
>
> In worker's log, I see the following exception thrown when try to launch
> on executor. (The SPARK_HOME is wrongly specified on purpose, so there is
> no such file "/usr/local/spark1/bin/compute-classpath.sh").
> After the exception was thrown several times, the worker was requested to
> kill the executor. Following the killing, the worker try to register again
> with master, but master reject the registration with WARN message" Got
> heartbeat from unregistered worker
> worker-20140504140005-host-spark-online001"
>
> Looks like the issue wasn't fixed in 0.9.1. Do you know any pull request
> addressing this issue? Thanks.
>
> java.io.IOException: Cannot run program "/usr/local/spark1/bin/
> compute-classpath.sh" (in directory "."): error=2, No such file or
> directory
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at
> org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:600)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:58)
>         at
> org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:104)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:119)
>         at
> org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:59)
> Caused by: java.io.IOException: error=2, No such file or directory
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1021)
>         ... 6 more
> ......
> 14/05/04 21:35:45 INFO Worker: Asked to kill executor
> app-20140504213545-0034/18
> 14/05/04 21:35:45 INFO Worker: Executor app-20140504213545-0034/18
> finished with state FAILED message class java.io.IOException: Cannot run
> program "/usr/local/spark1/bin/compute-classpath.sh" (in directory "."):
> error=2, No such file or directory
> 14/05/04 21:35:45 ERROR OneForOneStrategy: key not found:
> app-20140504213545-0034/18
> java.util.NoSuchElementException: key not found: app-20140504213545-0034/18
>         at scala.collection.MapLike$class.default(MapLike.scala:228)
>         at scala.collection.AbstractMap.default(Map.scala:58)
>         at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
>         at
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/04 21:35:45 INFO Worker: Starting Spark worker
> host-spark-online001:7078 with 10 cores, 28.0 GB RAM
> 14/05/04 21:35:45 INFO Worker: Spark home: /usr/local/spark-0.9.1-cdh4.2.0
> 14/05/04 21:35:45 INFO WorkerWebUI: Started Worker web UI at
> http://host-spark-online001:8081
> 14/05/04 21:35:45 INFO Worker: Connecting to master
> spark://host-spark-online001:7077...
> 14/05/04 21:35:45 INFO Worker: Successfully registered with master
> spark://host-spark-online001:7077
>
>
>
>