You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Brenden Matthews <br...@airbedandbreakfast.com> on 2013/05/07 20:50:12 UTC

Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

You may want to try Airbnb's dist of Mesos:

https://github.com/airbnb/mesos/tree/testing

A good number of these Mesos bugs have been fixed but aren't yet merged
into upstream.


On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> The log on each slave of the lost task is : No executor found with ID:
> executor_Task_Tracker_XXX.
>
>
>
>
> Wang Yu
>
> 发件人: 王瑜
> 发送时间: 2013-05-07 11:13
> 收件人: mesos-dev
> 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> Hi all,
>
> I have tried adding file extension when upload executor as well as the
> conf file, but it still can not work.
>
> And I have seen
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> but it is a null directory.
>
> Is there any other logs I can read to know why the TASK_LOST happened? I
> really need your help, thanks very much!
>
>
>
>
> Wang Yu
>
> 发件人: Vinod Kone
> 发送时间: 2013-04-26 01:31
> 收件人: mesos-dev@incubator.apache.org
> 抄送: wangyu
> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> Also, you could look at the executor logs (default:
> /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
>  TASK_LOST happened.
>
>
>
> On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> benjamin.mahler@gmail.com> wrote:
>
> Can you maintain the file extension? That is how mesos knows to extract it:
> hadoop fs -copyFromLocal
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> /user/mesos/mesos-executor.tar.gz
>
> Also make sure your mapred-site.xml has the extension as well.
>
>
>
> On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > Hi, Ben,
> >
> > I have tried as you said, but It still can not work.
> > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > /user/mesos/mesos-executor
> > Did I do the right thing? Thanks very much!
> >
> > The log in jobtracker is:
> > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > Task_Tracker_82 on http://slave1:31000
> > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > slots needed.
> > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> >       Pending Map Tasks: 2
> >    Pending Reduce Tasks: 1
> >          Idle Map Slots: 0
> >       Idle Reduce Slots: 0
> >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> >        Needed Map Slots: 2
> >     Needed Reduce Slots: 1
> > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > Task_Tracker_83 on http://slave1:31000
> > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > slots needed.
> > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> >       Pending Map Tasks: 2
> >    Pending Reduce Tasks: 1
> >          Idle Map Slots: 0
> >       Idle Reduce Slots: 0
> >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> >        Needed Map Slots: 2
> >     Needed Reduce Slots: 1
> >
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Benjamin Mahler
> > 发送时间: 2013-04-24 07:49
> > 收件人: mesos-dev@incubator.apache.org; wangyu
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > Then point the conf file to the hdfs directory (you had the right idea,
> > just uploaded the wrong file). :)
> >
> > Can you try that and report back?
> >
> >
> > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > Guodong,
> > >
> > > There still are problems with me, I think there are some problem with
> my
> > > executor setting.
> > >
> > > In mapred-site.xml, I set:("master" is the hostname of
> > > mesos-master-hostname)
> > >   <property>
> > >     <name>mapred.mesos.executor</name>
> > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > >   </property>
> > >
> > > And I upload mesos-executor in /user/mesos/mesos-executor
> > >
> > > The head content is as follows:
> > >
> > > #! /bin/sh
> > >
> > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > #
> > > # The mesos-executor program cannot be directly executed until all the
> > > libtool
> > > # libraries that it depends on are installed.
> > > #
> > > # This wrapper script should never be moved out of the build directory.
> > > # If it is, it will not operate correctly.
> > >
> > > # Sed substitution that helps us do robust quoting.  It backslashifies
> > > # metacharacters that are still active within double-quoted strings.
> > > Xsed='/bin/sed -e 1s/^X//'
> > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > >
> > > # Be Bourne compatible
> > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > >   emulate sh
> > >   NULLCMD=:
> > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > >   # is contrary to our usage.  Disable this feature.
> > >   alias -g '${1+"$@"}'='"$@"'
> > >   setopt NO_GLOB_SUBST
> > > else
> > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > fi
> > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > DUALCASE=1; export DUALCASE # for MKS sh
> > >
> > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > # if CDPATH is set.
> > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > >
> > > relink_command="(cd /home/mesos/build/src; { test -z
> > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> export
> > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> -z
> > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > GCC_EXEC_PREFIX=;
> > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> unset
> > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > >
> >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > export LD_LIBRARY_PATH;
> > >
> >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > ...
> > >
> > >
> > > Did I upload the right file? and set up it in conf file correct? Thanks
> > > very much!
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > From: 王国栋
> > > Date: 2013-04-23 13:32
> > > To: wangyu
> > > CC: mesos-dev
> > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > >
> > > if you run hadoop in local mode, use the following setting is ok
> > >   <property>
> > >     <name>mapred.mesos.master</name>
> > >     <value>local</value>
> > >   </property>
> > >
> > > if you want to start the cluster. set mapred.mesos.master as the
> > > mesos-master-hostname:mesos-master-port.
> > >
> > > Make sure the dns parser result for mesos-master-hostname is the right
> > ip.
> > >
> > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > check
> > > if there is hadoop framework registered.
> > >
> > > Thanks.
> > >
> > > Guodong
> > >
> > >
> > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > **
> > > > Hi, Guodong,
> > > >
> > > > I start hadoop as you said, then I saw this error:
> > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > driver: Cannot parse
> > > > '@0.0.0.0:0'
> > > >
> > > > What's this mean? where should I change MesosScheduler code to fix
> > this?
> > > > Thanks very much! I am so sorry for interrupt you once again...
> > > >
> > > > The whole log is as follows:
> > > >
> > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > /************************************************************
> > > > STARTUP_MSG: Starting JobTracker
> > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > STARTUP_MSG:   args = []
> > > > STARTUP_MSG:   version = 0.20.205.0
> > > >
> > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> 11:19:33
> > > CST 2013
> > > > ************************************************************/
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > hadoop-metrics2.properties
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > MetricsSystem,sub=Stats registered.
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > period
> > > at 10 second(s).
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > system
> > > started
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > QueueMetrics,q=default registered.
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> ugi
> > > registered.
> > > >
> > > > 13/04/23 13:21:04 INFO
> delegation.AbstractDelegationTokenSecretManager:
> > > Updating the current master key for generating delegation tokens
> > > >
> > > > 13/04/23 13:21:04 INFO
> delegation.AbstractDelegationTokenSecretManager:
> > > Starting expired delegation token remover thread,
> > > tokenRemoverScanInterval=60 min(s)
> > > >
> > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks,
> > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > >
> > > > 13/04/23 13:21:04 INFO
> delegation.AbstractDelegationTokenSecretManager:
> > > Updating the current master key for generating delegation tokens
> > > >
> > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > (include/exclude) list
> > > >
> > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > owner
> > > as root
> > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > RpcDetailedActivityForPort9001 registered.
> > > >
> > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > RpcActivityForPort9001 registered.
> > > >
> > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > org.mortbay.log.Slf4jLog
> > > >
> > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > >
> > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> Opening
> > > the listener on 50030
> > > >
> > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > returned
> > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > SelectChannelConnector@0.0.0.0:50030
> > > >
> > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> jvm
> > > registered.
> > > >
> > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > JobTrackerMetrics registered.
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver: 50030
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > directory
> > > >
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > initialized in embedded mode
> > > >
> > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > server at: localhost:50030
> > > >
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > address: localhost:50030
> > > >
> > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed job
> > > store is inactive
> > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting MesosScheduler
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> information
> > > >
> > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > driver: Cannot parse '@
> > > > 0.0.0.0:0'
> > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> file
> > to
> > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> file
> > to
> > > >
> > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > (include/exclude) list
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > starting
> > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > starting
> > > >
> > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > > native-hadoop library for your platform... using builtin-java classes
> > where
> > > applicable
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > nMaps=0 nReduces=0 max=-1
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > job_201304231321_0001
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > added successfully for user 'root' to queue 'default'
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
>  IP=192.168.0.2
> > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001    RESULT=SUCCESS
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > job_201304231321_0001
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > job_201304231321_0001
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > stored with users keys in
> > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > job_201304231321_0001 = 0. Number of splits = 0
> > > >
> > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> job_201304231321_0001
> > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > >
> > > > ------------------------------
> > > > Wang Yu
> > > >
> > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > *Date:* 2013-04-23 11:34
> > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > wangyu@nfs.iscas.ac.cn>
> > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited TaskTracker: http://slave5:50060
> > > >  Hi Yu,
> > > >
> > > > Mesos will just launch tasktracker on each slave node as long as the
> > > > required resource is enough for the tasktracker. So you have to run
> > > > NameNode, Jobtracker and DataNode by your own.
> > > >
> > > > Basicly, starting the hadoop on mesos is like this.
> > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should configure
> > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> normal
> > > one.
> > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > configure mapred-site.xml, this jobtracker should contains the patch
> > for
> > > > mesos)
> > > >
> > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > status
> > > > of Jobtracker.
> > > >
> > > >  Guodong
> > > >
> > > >
> > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > >> problem. Thanks very much!
> > > >>
> > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > >> DataNode) I should stop it first?
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Wang Yu
> > > >>
> > > >> 发件人: Benjamin Mahler
> > > >> 发送时间: 2013-04-23 10:56
> > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > >> TaskTracker: http://slave5:50060
> > > >>  The scheduler we wrote for Hadoop will start its own TaskTrackers,
> > > >> meaning
> > > >> you do not have to start any TaskTrackers yourself
> > > >>
> > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > running
> > > >> in your cluster?
> > > >>
> > > >> Looking at your jps output, is there already a TaskTracker running?
> > > >> [root@master logs]# jps
> > > >> 13896 RunJar
> > > >> 14123 Jps
> > > >> 12718 NameNode
> > > >> 12900 DataNode
> > > >> 13374 TaskTracker  <--- How was this started?
> > > >> 13218 JobTracker
> > > >>
> > > >>
> > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >>
> > > >> > Hi, Ben and Guodong,
> > > >> >
> > > >> > What do you mean "managing your own TaskTrackers"? How should I
> know
> > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> familiar
> > > >> with
> > > >> > mesos very much.
> > > >> > Dies it mean I do not need configure hdfs-site.xml and
> core-site.xml
> > > in
> > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> set
> > up
> > > >> > hadoop on mesos, and run my MR tasks.
> > > >> >
> > > >> > Thanks very much for your patient reply...Maybe I have a long way
> to
> > > >> go...
> > > >> >
> > > >> >
> > > >> >
> > > >> > The log messages you see:
> > > >> > 2013-04-18 16:47:19,645 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > >> >
> > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > currently
> > > >> > don't support running your own TaskTrackers, as the MesosScheduler
> > > will
> > > >> > launch them on your behalf when needed.
> > > >> >
> > > >> > Are you managing your own TaskTrackers? The purpose of using
> Hadoop
> > > with
> > > >> > mesos is that you no longer have to do that. We will detect that
> > jobs
> > > >> have
> > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > >> >
> > > >> > Guodong may be able to help further getting set up!
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Wang Yu
> > > >> >
> > > >> > From: 王国栋
> > > >> > Date: 2013-04-18 17:10
> > > >> > To: mesos-dev; wangyu
> > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > >> > TaskTracker: http://slave5:50060
> > > >> > You can check the slave log and the mesos-executor log, which is
> > > >> normally
> > > >> > located in the dir like
> > > >> >
> > > >> >
> > > >>
> > >
> >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > >> > The log is tasktracker log.
> > > >> >
> > > >> > I hope it will help.
> > > >> >
> > > >> > Guodong
> > > >> >
> > > >> >
> > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > >> >
> > > >> > > **
> > > >> > > Hi All,
> > > >> > >
> > > >> > > I have deployed mesos on three node: master, slave1, slave5. and
> > it
> > > >> works
> > > >> > > well.
> > > >> > >  Then I set hadoop over it, using master as namenode, and
> master,
> > > >> slave1,
> > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > >> > >  [root@master logs]# jps
> > > >> > > 13896 RunJar
> > > >> > > 14123 Jps
> > > >> > > 12718 NameNode
> > > >> > > 12900 DataNode
> > > >> > > 13374 TaskTracker
> > > >> > > 13218 JobTracker
> > > >> > >
> > > >> > > Then I run test benchmark, it can not go on working...
> > > >> > >  [root@master
> > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> hadoop-examples-0.20.205.0.jar
> > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > >> > > Running 30 maps.
> > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > >> > job_201304181646_0001
> > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > >> > > It stopped here.
> > > >> > >
> > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log, it
> > > shows:
> > > >> > >  2013-04-18 16
> > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > RUNNING
> > > >> > > 2013-04-18 16
> > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5
> > > on
> > > >> > 9001: starting
> > > >> > > 2013-04-18 16
> > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6
> > > on
> > > >> > 9001: starting
> > > >> > > 2013-04-18 16
> > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9
> > > on
> > > >> > 9001: starting
> > > >> > > 2013-04-18 16
> > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7
> > > on
> > > >> > 9001: starting
> > > >> > > 2013-04-18 16
> > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8
> > > on
> > > >> > 9001: starting
> > > >> > > 2013-04-18 16
> > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding a
> > new
> > > >> > node: /default-rack/master
> > > >> > > 2013-04-18 16
> > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > tracker
> > > >> > tracker_master:localhost/
> > > >> > > 127.0.0.1:44997 to host master
> > > >> > > 2013-04-18 16
> > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > >> Unknown/exited
> > > >> > TaskTracker:
> > > >> > > http://master:50060.
> > > >> > > 2013-04-18 16
> > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > >> Unknown/exited
> > > >> > TaskTracker:
> > > >> > > http://master:50060.
> > > >> > > 2013-04-18 16
> > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > >> Unknown/exited
> > > >> > TaskTracker:
> > > >> > > http://master:50060.
> > > >> > > 2013-04-18 16
> > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > >> Unknown/exited
> > > >> > TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:04,609 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:07,618 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:10,625 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:13,632 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:13,686 INFO
> > org.apache.hadoop.net.NetworkTopology:
> > > >> > Adding a new node: /default-rack/slave5
> > > >> > >
> > > >> > > 2013-04-18 16:47:13,686 INFO
> org.apache.hadoop.mapred.JobTracker:
> > > >> Adding
> > > >> > tracker tracker_slave5:
> > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > >> > >
> > > >> > > 2013-04-18 16:47:13,687 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://slave5:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:16,638 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:16,697 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://slave5:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:19,645 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:19,707 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://slave5:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:22,651 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:22,715 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://slave5:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:25,658 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:25,725 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://slave5:50060.
> > > >> > >
> > > >> > > 2013-04-18 16:47:28,665 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > >> > Unknown/exited TaskTracker:
> > > >> > > http://master:50060.
> > > >> > >
> > > >> > > Does anybody can help me? Thanks very much!
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Vinod Kone <vi...@gmail.com>.
On Sun, May 12, 2013 at 7:42 PM, Vinod Kone <vi...@twitter.com> wrote:

>   <property>
>>     <name>mapred.mesos.executor</name>
>> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>>     <value>hdfs://master/user/mesos/mesos-executor</value>
>>   </property>
>>
>
> the mapred.mesos.executor property looks incorrect. the value should be
> where you have uploaded the "hadoop.tar.gz" bundle generated by the
> (TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
> bundle in the hadoop build directory. upload the bundle to a hdfs location
> and set the above property to that location.
>
> vinod
>
>
>
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
>> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
>> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
>> > 总用量 0
>> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
>> > .  ..
>> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
>> > 2. I added "--isolation=cgroups" for slaves, but it still not work.
>> Tasks
>> > are always lost. But there is no error any more, I still do not know
>> what
>> > happened to the executor...Logs on one slave is as follows. Please help
>> me,
>> > thanks very much!
>> >
>> > mesos-slave.INFO
>> > Log file created at: 2013/05/13 09:12:54
>> > Running on machine: slave1
>> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
>> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
>> > root
>> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
>> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
>> > 192.168.0.3:36668
>> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
>> > mem=63356; ports=[31000-32000]; disk=29143
>> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
>> > cgroups hierarchy root
>> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
>> > master@192.168.0.2:5050
>> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
>> > '/home/mesos/build/logs/mesos-slave.INFO'
>> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
>> > detected at master@192.168.0.2:5050
>> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering
>> isolator
>> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
>> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
>> > slave ID 201305130913-33597632-5050-3893-0
>> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
>> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
>> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
>> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
>> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
>> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
>> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
>> > '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
>> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>> > allowed age: 5.11days
>> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>> > allowed age: 5.11days
>> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>> > allowed age: 5.11days
>> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
>> > Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
>> > executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
>> > with resources cpus=1; mem=1280 for framework
>> > 201305130913-33597632-5050-3893-0000 in cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
>> > controls for executor executor_Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>> > I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>> > to 1024 for executor executor_Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
>> > for OOM events for executor executor_Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor
>> at =
>> > 24552
>> > I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
>> > Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>> > I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>> > I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
>> > executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > with resources cpus=1; mem=1280 for framework
>> > 201305130913-33597632-5050-3893-0000 in cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
>> > controls for executor executor_Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>> > I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>> > to 1024 for executor executor_Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_1
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
>> > for OOM events for executor executor_Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor
>> at =
>> > 24628
>> > I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
>> > executor_Task_Tracker_0 of framework
>> 201305130913-33597632-5050-3893-0000
>> > terminated with status 256
>> > I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
>> > executor_Task_Tracker_0 of framework
>> 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
>> > triggered for executor executor_Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
>> > notifier for executor executor_Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > after 1 attempts
>> > I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
>> > destroyed cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>> > I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
>> > 'executor_Task_Tracker_0' of framework
>> 201305130913-33597632-5050-3893-0000
>> > has exited with status 1
>> > I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
>> > TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
>> > TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000 to the status update manager
>> > I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
>> > resources for an unknown/killed executor
>> > I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
>> status
>> > update TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
>> > StatusUpdate stream for task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
>> UPDATE
>> > for status update TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding
>> > status update TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000 to the master at
>> > master@192.168.0.2:5050
>> > I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status
>> > update for task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received
>> status
>> > update acknowledgement for task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK
>> > for status update TASK_LOST from task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up
>> > status update stream for task Task_Tracker_0 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager
>> > successfully handled status update acknowledgement for task
>> Task_Tracker_0
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>> > for removal
>> > I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task
>> > Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
>> > I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
>> > I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching
>> > executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > with resources cpus=1; mem=1280 for framework
>> > 201305130913-33597632-5050-3893-0000 in cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup
>> > controls for executor executor_Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>> > I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>> > to 1024 for executor executor_Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_2
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening
>> > for OOM events for executor executor_Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor
>> at =
>> > 24704
>> > I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor
>> > executor_Task_Tracker_1 of framework
>> 201305130913-33597632-5050-3893-0000
>> > terminated with status 256
>> > I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor
>> > executor_Task_Tracker_1 of framework
>> 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is
>> > triggered for executor executor_Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM
>> > notifier for executor executor_Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > after 1 attempts
>> > I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully
>> > destroyed cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>> > I0513 09:16:40.481230 24196 slave.cpp:1479] Executor
>> > 'executor_Task_Tracker_1' of framework
>> 201305130913-33597632-5050-3893-0000
>> > has exited with status 1
>> > I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update
>> > TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update
>> > TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000 to the status update manager
>> > I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update
>> > resources for an unknown/killed executor
>> > I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received
>> status
>> > update TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating
>> > StatusUpdate stream for task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling
>> UPDATE
>> > for status update TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding
>> > status update TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000 to the master at
>> > master@192.168.0.2:5050
>> > I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status
>> > update for task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received
>> status
>> > update acknowledgement for task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK
>> > for status update TASK_LOST from task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up
>> > status update stream for task Task_Tracker_1 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager
>> > successfully handled status update acknowledgement for task
>> Task_Tracker_1
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>> > for removal
>> > I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task
>> > Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
>> > I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
>> > I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching
>> > executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642
>> > with resources cpus=1; mem=1280 for framework
>> > 201305130913-33597632-5050-3893-0000 in cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup
>> > controls for executor executor_Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>> > I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>> > to 1024 for executor executor_Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_3
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening
>> > for OOM events for executor executor_Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor
>> at =
>> > 24780
>> > I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for
>> > unacknowledged status updates
>> > I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor
>> > executor_Task_Tracker_2 of framework
>> 201305130913-33597632-5050-3893-0000
>> > terminated with status 256
>> > I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor
>> > executor_Task_Tracker_2 of framework
>> 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is
>> > triggered for executor executor_Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM
>> > notifier for executor executor_Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > after 1 attempts
>> > I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully
>> > destroyed cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
>> > I0513 09:16:46.493391 24184 slave.cpp:1479] Executor
>> > 'executor_Task_Tracker_2' of framework
>> 201305130913-33597632-5050-3893-0000
>> > has exited with status 1
>> > I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update
>> > TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update
>> > TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000 to the status update manager
>> > I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update
>> > resources for an unknown/killed executor
>> > I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received
>> status
>> > update TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating
>> > StatusUpdate stream for task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling
>> UPDATE
>> > for status update TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding
>> > status update TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000 to the master at
>> > master@192.168.0.2:5050
>> > I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status
>> > update for task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received
>> status
>> > update acknowledgement for task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK
>> > for status update TASK_LOST from task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up
>> > status update stream for task Task_Tracker_2 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager
>> > successfully handled status update acknowledgement for task
>> Task_Tracker_2
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
>> > for removal
>> > I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task
>> > Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
>> > I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file
>> >
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
>> > I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching
>> > executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in
>> >
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
>> > with resources cpus=1; mem=1280 for framework
>> > 201305130913-33597632-5050-3893-0000 in cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
>> > I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup
>> > controls for executor executor_Task_Tracker_4 of framework
>> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>> > I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>> > to 1024 for executor executor_Task_Tracker_4 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated
>> > 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_4
>> > of framework 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening
>> > for OOM events for executor executor_Task_Tracker_4 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor
>> at =
>> > 24856
>> > I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for
>> > unacknowledged status updates
>> > I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor
>> > executor_Task_Tracker_3 of framework
>> 201305130913-33597632-5050-3893-0000
>> > terminated with status 256
>> > I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor
>> > executor_Task_Tracker_3 of framework
>> 201305130913-33597632-5050-3893-0000
>> > I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is
>> > triggered for executor executor_Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM
>> > notifier for executor executor_Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000 with uuid
>> > 22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > after 1 attempts
>> > I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed
>> >
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully
>> > destroyed cgroup
>> >
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
>> > I0513 09:16:52.496247 24203 slave.cpp:1479] Executor
>> > 'executor_Task_Tracker_3' of framework
>> 201305130913-33597632-5050-3893-0000
>> > has exited with status 1
>> > I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update
>> > TASK_LOST from task Task_Tracker_3 of framework
>> > 201305130913-33597632-5050-3893-0000
>> > ......
>> >
>> >
>> >
>> >
>> > Wang Yu
>> >
>> > 发件人: Benjamin Mahler
>> > 发送时间: 2013-05-11 02:32
>> > 收件人: wangyu
>> > 抄送: Benjamin Mahler; mesos-dev
>> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
>> > TaskTracker: http://slave5:50060
>> > 1. If you look at a slave log, you can see that the process isolator
>> > launched the task and then notified the slave that it was lost. Can you
>> > look inside one of the executor directories, there should be an stderr
>> file
>> > there. E.g.:
>> >
>> > I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
>> >
>> >
>> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'
>> >
>> > Look for these in the logs and read the stderr present inside. Can you
>> > report back with the contents?
>> >
>> > 2. Are you running on Linux? You may want to consider using
>> > --isolation=cgroups when starting your slaves. This uses linux control
>> > groups to do process / cpu / memory isolation between executors running
>> on
>> > the slave.
>> >
>> > Thanks!
>> >
>> >
>> > On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>> >
>> > > **
>> > > Hi Ben,
>> > >
>> > > Logs for mesos master and slaves are attached, thanks for helping me
>> with
>> > > this problem. I am very appreciate for your patient reply.
>> > >
>> > > Three servers: "master", "slave1", "slave5"
>> > > Mesos master: "master"
>> > > Mesos slaves: "master", "slave1", "slave5"
>> > >
>> > > ------------------------------
>> > > Wang Yu
>> > >
>> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
>> > > *发送时间:* 2013-05-10 07:22
>> > > *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
>> > > *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<
>> > benjamin.mahler@gmail.com>
>> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
>> > > TaskTracker: http://slave5:50060
>> > >  Ah I see them now, looks like you uploaded the NameNode logs? Can you
>> > > upload the mesos-master and mesos-slave logs instead? What will be
>> > > interesting here is what happened on the slave that is trying to run
>> the
>> > > TaskTracker.
>> > >
>> > >
>> > > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>> > >
>> > > > **
>> > >
>> > > > I have uploaded them in the former email, I will send them again.
>> PS:
>> > Will
>> > > > the email list reject the attachements?
>> > > >
>> > > > Can you see them?
>> > > >
>> > > > ------------------------------
>> > > > Wang Yu
>> > > >
>> > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
>> > > > *发送时间:* 2013-05-09 10:00
>> > > > *收件人:* mesos-dev@incubator.apache.org; wangyu <
>> wangyu@nfs.iscas.ac.cn>
>> > > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
>> Unknown/exited
>> > > > TaskTracker: http://slave5:50060
>> > > >  Did you forget to attach them?
>> > > >
>> > > >
>> > > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>> > > >
>> > > > > **
>> > > > > OK.
>> > > > > Logs are attached. I use Ctrl+C to stop jobtracker when the
>> task_lost
>> > > > > happened.
>> > > > >
>> > > > > Thanks very much for your help!
>> > > > >
>> > > > > ------------------------------
>> > > > > Wang Yu
>> > > > >
>> > > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
>> > > > > *发送时间:* 2013-05-09 01:23
>> > > > > *收件人:* mesos-dev@incubator.apache.org
>> > > > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
>> > > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler:
>> Unknown/exited
>> > > > > TaskTracker: http://slave5:50060
>> > > > >
>> > > >
>> > >
>> > > > > Hey Brenden, are there any bugs in particular here that you're
>> > referring to?
>> > > > >
>> > > > > Wang, can you provide the logs for the JobTracker, the slave, and
>> the
>> > > > > master?
>> > > > >
>> > > > >
>> > > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
>> > > > > brenden.matthews@airbedandbreakfast.com> wrote:
>> > > > >
>> > > > > > You may want to try Airbnb's dist of Mesos:
>> > > > > >
>> > > > > > https://github.com/airbnb/mesos/tree/testing
>> > > > > >
>> > >
>> > > > > > A good number of these Mesos bugs have been fixed but aren't yet
>> > merged
>> > > > > > into upstream.
>> > > > > >
>> > > > > >
>> > > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
>> wrote:
>> > > > > >
>> > > >
>> > >
>> > > > > > > The log on each slave of the lost task is : No executor found
>> > with ID:
>> > > > > > > executor_Task_Tracker_XXX.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Wang Yu
>> > > > > > >
>> > > > > > > 发件人: 王瑜
>> > > > > > > 发送时间: 2013-05-07 11:13
>> > > > > > > 收件人: mesos-dev
>> > > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > Unknown/exited
>> > > > > > > TaskTracker: http://slave5:50060
>> > > > > > > Hi all,
>> > > > > > >
>> > > >
>> > >
>> > > > > > > I have tried adding file extension when upload executor as
>> well
>> > as the
>> > > > > > > conf file, but it still can not work.
>> > > > > > >
>> > > > > > > And I have seen
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > >
>> >
>> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
>> > > > > > > but it is a null directory.
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > Is there any other logs I can read to know why the TASK_LOST
>> > happened? I
>> > > > > > > really need your help, thanks very much!
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Wang Yu
>> > > > > > >
>> > > > > > > 发件人: Vinod Kone
>> > > > > > > 发送时间: 2013-04-26 01:31
>> > > > > > > 收件人: mesos-dev@incubator.apache.org
>> > > > > > > 抄送: wangyu
>> > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > Unknown/exited
>> > > > > > > TaskTracker: http://slave5:50060
>> > > > > > > Also, you could look at the executor logs (default:
>> > > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why
>> > the
>> > > > > > >  TASK_LOST happened.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
>> > > > > > > benjamin.mahler@gmail.com> wrote:
>> > > > > > >
>> > > >
>> > >
>> > > > > > > Can you maintain the file extension? That is how mesos knows
>> to
>> > extract
>> > > > > > it:
>> > > > > > > hadoop fs -copyFromLocal
>> > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
>> > > > > > > /user/mesos/mesos-executor.tar.gz
>> > > > > > >
>> > > > > > > Also make sure your mapred-site.xml has the extension as well.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi, Ben,
>> > > > > > > >
>> > > > > > > > I have tried as you said, but It still can not work.
>> > > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
>> > > > > > > >
>> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
>> > > > > > > > /user/mesos/mesos-executor
>> > > > > > > > Did I do the right thing? Thanks very much!
>> > > > > > > >
>> > > > > > > > The log in jobtracker is:
>> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
>> > > > > > > > Task_Tracker_82 on http://slave1:31000
>> > > > >
>> > > >
>> > >
>> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map
>> > and reduce
>> > > > > > > > slots needed.
>> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update
>> of
>> > > > > > > > Task_Tracker_82 to TASK_LOST with message Executor
>> terminated
>> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker
>> Status
>> > > > > > > >       Pending Map Tasks: 2
>> > > > > > > >    Pending Reduce Tasks: 1
>> > > > > > > >          Idle Map Slots: 0
>> > > > > > > >       Idle Reduce Slots: 0
>> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
>> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
>> > > > > > > >        Needed Map Slots: 2
>> > > > > > > >     Needed Reduce Slots: 1
>> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
>> > > > > > > > Task_Tracker_83 on http://slave1:31000
>> > > > >
>> > > >
>> > >
>> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map
>> > and reduce
>> > > > > > > > slots needed.
>> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update
>> of
>> > > > > > > > Task_Tracker_83 to TASK_LOST with message Executor
>> terminated
>> > > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker
>> Status
>> > > > > > > >       Pending Map Tasks: 2
>> > > > > > > >    Pending Reduce Tasks: 1
>> > > > > > > >          Idle Map Slots: 0
>> > > > > > > >       Idle Reduce Slots: 0
>> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
>> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
>> > > > > > > >        Needed Map Slots: 2
>> > > > > > > >     Needed Reduce Slots: 1
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Wang Yu
>> > > > > > > >
>> > > > > > > > 发件人: Benjamin Mahler
>> > > > > > > > 发送时间: 2013-04-24 07:49
>> > > > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
>> > >
>> > > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > Unknown/exited
>> > > > > > > > TaskTracker: http://slave5:50060
>> > > > >
>> > > >
>> > >
>> > > > > > > > You need to instead upload the hadoop.tar.gz generated by
>> the
>> > tutorial.
>> > > > >
>> > > >
>> > >
>> > > > > > > > Then point the conf file to the hdfs directory (you had the
>> > right idea,
>> > > > > > > > just uploaded the wrong file). :)
>> > > > > > > >
>> > > > > > > > Can you try that and report back?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <
>> wangyu@nfs.iscas.ac.cn
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Guodong,
>> > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > There still are problems with me, I think there are some
>> > problem with
>> > > > > > > my
>> > > > > > > > > executor setting.
>> > > > > > > > >
>> > > > > > > > > In mapred-site.xml, I set:("master" is the hostname of
>> > > > > > > > > mesos-master-hostname)
>> > > > > > > > >   <property>
>> > > > > > > > >     <name>mapred.mesos.executor</name>
>> > > > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>> > > > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
>> > > > > > > > >   </property>
>> > > > > > > > >
>> > > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
>> > > > > > > > >
>> > > > > > > > > The head content is as follows:
>> > > > > > > > >
>> > > > > > > > > #! /bin/sh
>> > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > # mesos-executor - temporary wrapper script for
>> > .libs/mesos-executor
>> > > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
>> > > > > > > > > #
>> > > >
>> > >
>> > > > > > > > > # The mesos-executor program cannot be directly executed
>> > until all
>> > > > > > the
>> > > > > > > > > libtool
>> > > > > > > > > # libraries that it depends on are installed.
>> > > > > > > > > #
>> > > > > > > > > # This wrapper script should never be moved out of the
>> build
>> > > > > > directory.
>> > > > > > > > > # If it is, it will not operate correctly.
>> > > > > > > > >
>> > > > > > > > > # Sed substitution that helps us do robust quoting.  It
>> > > > > > backslashifies
>> > > > >
>> > > >
>> > >
>> > > > > > > > > # metacharacters that are still active within
>> double-quoted
>> > strings.
>> > > > > > > > > Xsed='/bin/sed -e 1s/^X//'
>> > > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
>> > > > > > > > >
>> > > > > > > > > # Be Bourne compatible
>> > > > >
>> > > >
>> > >
>> > > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null
>> > 2>&1; then
>> > > > > > > > >   emulate sh
>> > > > > > > > >   NULLCMD=:
>> > > > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"},
>> > which
>> > > > > > > > >   # is contrary to our usage.  Disable this feature.
>> > > > > > > > >   alias -g '${1+"$@"}'='"$@"'
>> > > > > > > > >   setopt NO_GLOB_SUBST
>> > > > > > > > > else
>> > > > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;;
>> esac
>> > > > > > > > > fi
>> > > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
>> > > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
>> > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > # The HP-UX ksh and POSIX shell print the target directory
>> > to stdout
>> > > > > > > > > # if CDPATH is set.
>> > > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
>> > > > > > > > >
>> > > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
>> > > >
>> > >
>> > > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || {
>> > LIBRARY_PATH=;
>> > > > > > > export
>> > >
>> > > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" ||
>> > unset
>> > > > >
>> > > >
>> > >
>> > > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH;
>> };
>> > }; { test
>> > > > > > > -z
>> > > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
>> > > > > > > > GCC_EXEC_PREFIX=;
>> > > >
>> > >
>> > > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z
>> > \"\${LD_RUN_PATH+set}\" ||
>> > > > > > > unset
>> > > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > >
>> >
>> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
>> > > > > > > > > export LD_LIBRARY_PATH;
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > >
>> >
>> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
>> > > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
>> > > > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
>> > > > >
>> > > >
>> > >
>> > > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread
>> > -lcurl -lssl
>> > > > >
>> > > >
>> > >
>> > > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath
>> > -Wl,/home/mesos/build/src/.libs
>> > > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
>> > > > > > > > > ...
>> > > > > > > > >
>> > > > > > > > >
>> > >
>> > > > > > > > > Did I upload the right file? and set up it in conf file
>> > correct?
>> > > > > > Thanks
>> > > > > > > > > very much!
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Wang Yu
>> > > > > > > > >
>> > > > > > > > > From: 王国栋
>> > > > > > > > > Date: 2013-04-23 13:32
>> > > > > > > > > To: wangyu
>> > > > > > > > > CC: mesos-dev
>> > > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > Unknown/exited
>> > > > > > > > > TaskTracker: http://slave5:50060
>> > > > > > > > > Hmm. it seems that the mapred.mesos.master is set
>> correctly.
>> > > > > > > > >
>> > >
>> > > > > > > > > if you run hadoop in local mode, use the following setting
>> > is ok
>> > > > > > > > >   <property>
>> > > > > > > > >     <name>mapred.mesos.master</name>
>> > > > > > > > >     <value>local</value>
>> > > > > > > > >   </property>
>> > > > > > > > >
>> > >
>> > > > > > > > > if you want to start the cluster. set mapred.mesos.master
>> as
>> > the
>> > > > > > > > > mesos-master-hostname:mesos-master-port.
>> > > > > > > > >
>> > >
>> > > > > > > > > Make sure the dns parser result for mesos-master-hostname
>> is
>> > the
>> > > > > > right
>> > > > > > > > ip.
>> > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > BTW: when you starting the jobtracker, you can check mesos
>> > webUI and
>> > > > > > > > check
>> > > > > > > > > if there is hadoop framework registered.
>> > > > > > > > >
>> > > > > > > > > Thanks.
>> > > > > > > > >
>> > > > > > > > > Guodong
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <
>> wangyu@nfs.iscas.ac.cn
>> > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > **
>> > > > > > > > > > Hi, Guodong,
>> > > > > > > > > >
>> > > > > > > > > > I start hadoop as you said, then I saw this error:
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error
>> from
>> > scheduler
>> > > > > > > > > driver: Cannot parse
>> > > > > > > > > > '@0.0.0.0:0'
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > What's this mean? where should I change MesosScheduler
>> > code to fix
>> > > > > > > > this?
>> > >
>> > > > > > > > > > Thanks very much! I am so sorry for interrupt you once
>> > again...
>> > > > > > > > > >
>> > > > > > > > > > The whole log is as follows:
>> > > > > > > > > >
>> > > > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
>> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
>> > > > > > > > > >
>> > /************************************************************
>> > > > > > > > > > STARTUP_MSG: Starting JobTracker
>> > > > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
>> > > > > > > > > > STARTUP_MSG:   args = []
>> > > > > > > > > > STARTUP_MSG:   version = 0.20.205.0
>> > > > > > > > > >
>> > > > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat
>> Apr
>> > 13
>> > > > > > > 11:19:33
>> > > > > > > > > CST 2013
>> > > > > > > > > >
>> > ************************************************************/
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded
>> > properties from
>> > > > > > > > > hadoop-metrics2.properties
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > > > MetricsSystem,sub=Stats registered.
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled
>> > snapshot
>> > > > > > > > period
>> > > > > > > > > at 10 second(s).
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl:
>> JobTracker
>> > metrics
>> > > > > > > > system
>> > > > > > > > > started
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > > > QueueMetrics,q=default registered.
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > ugi
>> > > > > > > > > registered.
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO
>> > > > > > > delegation.AbstractDelegationTokenSecretManager:
>> > >
>> > > > > > > > > Updating the current master key for generating delegation
>> > tokens
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO
>> > > > > > > delegation.AbstractDelegationTokenSecretManager:
>> > > > > > > > > Starting expired delegation token remover thread,
>> > > > > > > > > tokenRemoverScanInterval=60 min(s)
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler
>> > configured with
>> > > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
>> > > > > > > limitMaxMemForMapTasks,
>> > > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO
>> > > > > > > delegation.AbstractDelegationTokenSecretManager:
>> > >
>> > > > > > > > > Updating the current master key for generating delegation
>> > tokens
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing
>> > hosts
>> > > > > > > > > (include/exclude) list
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting
>> > jobtracker with
>> > > > > > > > owner
>> > > > > > > > > as root
>> > > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > > > RpcDetailedActivityForPort9001 registered.
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > > > RpcActivityForPort9001 registered.
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
>> > > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> > > > > > > > > org.mortbay.log.Slf4jLog
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global
>> > filtersafety
>> > > > > > > > >
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
>> > >
>> > > > > > > > > webServer.getConnectors()[0].getLocalPort() before open()
>> is
>> > -1.
>> > > > > > > Opening
>> > > > > > > > > the listener on 50030
>> > > > > > > > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer:
>> > listener.getLocalPort()
>> > > > > > > > returned
>> > >
>> > > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned
>> > 50030
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to
>> > port 50030
>> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
>> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
>> > > > > > > > > > SelectChannelConnector@0.0.0.0:50030
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > jvm
>> > > > > > > > > registered.
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
>> > for source
>> > > > > > > > > JobTrackerMetrics registered.
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up
>> > at: 9001
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker
>> > webserver:
>> > > > > > 50030
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up
>> the
>> > system
>> > > > > > > > > directory
>> > > > > > > > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server
>> > being
>> > > > > > > > > initialized in embedded mode
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started
>> > job history
>> > > > > > > > > server at: localhost:50030
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History
>> > Server web
>> > > > > > > > > address: localhost:50030
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore:
>> > Completed
>> > > > > > job
>> > > > > > > > > store is inactive
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
>> > > > > > MesosScheduler
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing
>> hosts
>> > > > > > > information
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error
>> from
>> > scheduler
>> > > > > > > > > driver: Cannot parse '@
>> > > > > > > > > > 0.0.0.0:0'
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
>> > includes
>> > > > > > > file
>> > > > > > > > to
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
>> > excludes
>> > > > > > > file
>> > > > > > > > to
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing
>> > hosts
>> > > > > > > > > (include/exclude) list
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker:
>> Decommissioning
>> > 0 nodes
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder:
>> > starting
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7
>> on
>> > 9001:
>> > > > > > > > starting
>> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting
>> RUNNING
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8
>> on
>> > 9001:
>> > > > > > > > starting
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9
>> on
>> > 9001:
>> > > > > > > > starting
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to
>> > load
>> > > > >
>> > > >
>> > >
>> > > > > > > > > native-hadoop library for your platform... using
>> > builtin-java classes
>> > > > > > > > where
>> > > > > > > > > applicable
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
>> > job_201304231321_0001:
>> > > > > > > > > nMaps=0 nReduces=0 max=-1
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
>> > > > > > > > > job_201304231321_0001
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job
>> > job_201304231321_0001
>> > > > > > > > > added successfully for user 'root' to queue 'default'
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
>> > > > > > >  IP=192.168.0.2
>> > > > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
>> > > > > >  RESULT=SUCCESS
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
>> > > > > > > > > job_201304231321_0001
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
>> Initializing
>> > > > > > > > > job_201304231321_0001
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken
>> > generated and
>> > > > > > > > > stored with users keys in
>> > >
>> > > > > > > > >
>> > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
>> > > > > > > > > >
>> > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size
>> > for job
>> > > > > > > > > job_201304231321_0001 = 0. Number of splits = 0
>> > > > > > > > > >
>> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
>> > > > > > > job_201304231321_0001
>> > > > > > > > > initialized successfully with 0 map tasks and 0 reduce
>> tasks.
>> > > > > > > > > >
>> > > > > > > > > > ------------------------------
>> > > > > > > > > > Wang Yu
>> > > > > > > > > >
>> > > > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
>> > > > > > > > > > *Date:* 2013-04-23 11:34
>> > > > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>;
>> wangyu<
>> > > > > > > > > wangyu@nfs.iscas.ac.cn>
>> > > > > > > > > > *Subject:* Re: Re:
>> org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
>> > > > > > > > > >  Hi Yu,
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > Mesos will just launch tasktracker on each slave node as
>> > long as
>> > > > > > the
>> > > > >
>> > > >
>> > >
>> > > > > > > > > > required resource is enough for the tasktracker. So you
>> > have to run
>> > > > > > > > > > NameNode, Jobtracker and DataNode by your own.
>> > > > > > > > > >
>> > > > > > > > > > Basicly, starting the hadoop on mesos is like this.
>> > > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you
>> should
>> > > > > > configure
>> > >
>> > > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different
>> > from the
>> > > > > > > normal
>> > > > > > > > > one.
>> > > >
>> > >
>> > > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker
>> (you
>> > should
>> > >
>> > > > > > > > > > configure mapred-site.xml, this jobtracker should
>> contains
>> > the
>> > > > > > patch
>> > > > > > > > for
>> > > > > > > > > > mesos)
>> > > > > > > > > >
>> > > >
>> > >
>> > > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to
>> > check the
>> > > > > > > > status
>> > > > > > > > > > of Jobtracker.
>> > > > > > > > > >
>> > > > > > > > > >  Guodong
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <
>> > wangyu@nfs.iscas.ac.cn
>> > > >
>> > > > > > wrote:
>> > > > > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know
>> > what's my
>> > > > > > > > > >> problem. Thanks very much!
>> > > > > > > > > >>
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like
>> > JobTracker,
>> > > > > > > > > >> DataNode) I should stop it first?
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >> Wang Yu
>> > > > > > > > > >>
>> > > > > > > > > >> 发件人: Benjamin Mahler
>> > > > > > > > > >> 发送时间: 2013-04-23 10:56
>> > > > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
>> > > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > Unknown/exited
>> > > > > > > > > >> TaskTracker: http://slave5:50060
>> > > > > > > > > >>  The scheduler we wrote for Hadoop will start its own
>> > > > > > TaskTrackers,
>> > > > > > > > > >> meaning
>> > > > > > > > > >> you do not have to start any TaskTrackers yourself
>> > > > > > > > > >>
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> Are you starting your own TaskTrackers? Are there any
>> > TaskTrackers
>> > > > > > > > > running
>> > > > > > > > > >> in your cluster?
>> > > > > > > > > >>
>> > > > > > > > > >> Looking at your jps output, is there already a
>> TaskTracker
>> > > > > > running?
>> > > > > > > > > >> [root@master logs]# jps
>> > > > > > > > > >> 13896 RunJar
>> > > > > > > > > >> 14123 Jps
>> > > > > > > > > >> 12718 NameNode
>> > > > > > > > > >> 12900 DataNode
>> > > > > > > > > >> 13374 TaskTracker  <--- How was this started?
>> > > > > > > > > >> 13218 JobTracker
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <
>> > wangyu@nfs.iscas.ac.cn
>> > > >
>> > > > > > wrote:
>> > > > > > > > > >>
>> > > > > > > > > >> > Hi, Ben and Guodong,
>> > > > > > > > > >> >
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > What do you mean "managing your own TaskTrackers"?
>> How
>> > should I
>> > > > > > > know
>> > >
>> > > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I
>> do
>> > not
>> > > > > > > familiar
>> > > > > > > > > >> with
>> > > > > > > > > >> > mesos very much.
>> > > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml
>> and
>> > > > > > > core-site.xml
>> > > > > > > > > in
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I
>> just
>> > want to
>> > > > > > > set
>> > > > > > > > up
>> > > > > > > > > >> > hadoop on mesos, and run my MR tasks.
>> > > > > > > > > >> >
>> > > >
>> > >
>> > > > > > > > > >> > Thanks very much for your patient reply...Maybe I
>> have
>> > a long
>> > > > > > way
>> > > > > > > to
>> > > > > > > > > >> go...
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> > The log messages you see:
>> > > > > > > > > >> > 2013-04-18 16:47:19,645 INFO
>> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
>> > > > > > > > > >> >
>> > > >
>> > >
>> > > > > > > > > >> > Are printed when mesos does not know about the
>> > TaskTracker. We
>> > > > > > > > > currently
>> > > > > > > > > >> > don't support running your own TaskTrackers, as the
>> > > > > > MesosScheduler
>> > > > > > > > > will
>> > > > > > > > > >> > launch them on your behalf when needed.
>> > > > > > > > > >> >
>> > >
>> > > > > > > > > >> > Are you managing your own TaskTrackers? The purpose
>> of
>> > using
>> > > > > > > Hadoop
>> > > > > > > > > with
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > mesos is that you no longer have to do that. We will
>> > detect that
>> > > > > > > > jobs
>> > > > > > > > > >> have
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers
>> > accordingly.
>> > > > > > > > > >> >
>> > > > > > > > > >> > Guodong may be able to help further getting set up!
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> > Wang Yu
>> > > > > > > > > >> >
>> > > > > > > > > >> > From: 王国栋
>> > > > > > > > > >> > Date: 2013-04-18 17:10
>> > > > > > > > > >> > To: mesos-dev; wangyu
>> > > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > Unknown/exited
>> > > > > > > > > >> > TaskTracker: http://slave5:50060
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > You can check the slave log and the mesos-executor
>> log,
>> > which is
>> > > > > > > > > >> normally
>> > > > > > > > > >> > located in the dir like
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > >
>> > > >
>> > >
>> > > > > >
>> >
>> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
>> > > > > > > > > >> > The log is tasktracker log.
>> > > > > > > > > >> >
>> > > > > > > > > >> > I hope it will help.
>> > > > > > > > > >> >
>> > > > > > > > > >> > Guodong
>> > > > > > > > > >> >
>> > > > > > > > > >> >
>> > > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
>> > > wangyu@nfs.iscas.ac.cn
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > > >> >
>> > > > > > > > > >> > > **
>> > > > > > > > > >> > > Hi All,
>> > > > > > > > > >> > >
>> > > >
>> > >
>> > > > > > > > > >> > > I have deployed mesos on three node: master,
>> slave1,
>> > slave5.
>> > > > > > and
>> > > > > > > > it
>> > > > > > > > > >> works
>> > > > > > > > > >> > > well.
>> > >
>> > > > > > > > > >> > >  Then I set hadoop over it, using master as
>> namenode,
>> > and
>> > > > > > > master,
>> > > > > > > > > >> slave1,
>> > > >
>> > >
>> > > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks
>> > works well.
>> > > > > > > > > >> > >  [root@master logs]# jps
>> > > > > > > > > >> > > 13896 RunJar
>> > > > > > > > > >> > > 14123 Jps
>> > > > > > > > > >> > > 12718 NameNode
>> > > > > > > > > >> > > 12900 DataNode
>> > > > > > > > > >> > > 13374 TaskTracker
>> > > > > > > > > >> > > 13218 JobTracker
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Then I run test benchmark, it can not go on
>> working...
>> > > > > > > > > >> > >  [root@master
>> > > > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
>> > > > > > > hadoop-examples-0.20.205.0.jar
>> > > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
>> > > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
>> > > > > > > > > >> > > Running 30 maps.
>> > > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
>> > > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running
>> job:
>> > > > > > > > > >> > job_201304181646_0001
>> > >
>> > > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0%
>> > reduce 0%
>> > > > > > > > > >> > > It stopped here.
>> > > > > > > > > >> > >
>> > > >
>> > >
>> > > > > > > > > >> > > Then I read the log file:
>> > hadoop-root-jobtracker-master.log,
>> > > > > > it
>> > > > > > > > > shows:
>> > > > > > > > > >> > >  2013-04-18 16
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > > :46:51,724 INFO
>> org.apache.hadoop.mapred.JobTracker:
>> > Starting
>> > > > > > > > > RUNNING
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC
>> > Server
>> > > > > > > handler 5
>> > > > > > > > > on
>> > > > > > > > > >> > 9001: starting
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
>> > Server
>> > > > > > > handler 6
>> > > > > > > > > on
>> > > > > > > > > >> > 9001: starting
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
>> > Server
>> > > > > > > handler 9
>> > > > > > > > > on
>> > > > > > > > > >> > 9001: starting
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
>> > Server
>> > > > > > > handler 7
>> > > > > > > > > on
>> > > > > > > > > >> > 9001: starting
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
>> > Server
>> > > > > > > handler 8
>> > > > > > > > > on
>> > > > > > > > > >> > 9001: starting
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > >
>> > > >
>> > >
>> > > > > > > > > >> > > :46:52,557 INFO
>> > org.apache.hadoop.net.NetworkTopology: Adding
>> > > > > > a
>> > > > > > > > new
>> > > > > > > > > >> > node: /default-rack/master
>> > > > > > > > > >> > > 2013-04-18 16
>> > > >
>> > >
>> > > > > > > > > >> > > :46:52,560 INFO
>> org.apache.hadoop.mapred.JobTracker:
>> > Adding
>> > > > > > > > tracker
>> > > > > > > > > >> > tracker_master:localhost/
>> > > > > > > > > >> > > 127.0.0.1:44997 to host master
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:52,568 INFO
>> > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> Unknown/exited
>> > > > > > > > > >> > TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:55,581 INFO
>> > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> Unknown/exited
>> > > > > > > > > >> > TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :46:58,590 INFO
>> > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> Unknown/exited
>> > > > > > > > > >> > TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > > 2013-04-18 16
>> > > > > > > > > >> > > :47:01,600 INFO
>> > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> Unknown/exited
>> > > > > > > > > >> > TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
>> > > > > > > > org.apache.hadoop.net.NetworkTopology:
>> > > > > > > > > >> > Adding a new node: /default-rack/slave5
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
>> > > > > > > org.apache.hadoop.mapred.JobTracker:
>> > > > > > > > > >> Adding
>> > > > > > > > > >> > tracker tracker_slave5:
>> > > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://slave5:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://slave5:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://slave5:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://slave5:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://slave5:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
>> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
>> > > > > > > > > >> > Unknown/exited TaskTracker:
>> > > > > > > > > >> > > http://master:50060.
>> > > > > > > > > >> > >
>> > > > > > > > > >> > > Does anybody can help me? Thanks very much!
>> > > > > > > > > >> > >
>> > > > > > > > > >> >
>> > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > >
>> > >
>> >
>>
>
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Benjamin Mahler <be...@gmail.com>.
Now that you've uploaded the executor, can you send us the master / slave
logs? When looking at a slave, can you look at an executor run directory to
see what's in stderr?

For example, in the slave you'll see a log line like the following:

I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525]
Launching executor_Task_Tracker_4 (cd hadoop && ./bin/
mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-
5050-3893-0/frameworks/201305130913-33597632-5050-
3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-
1ec0-4946-a1bc-0644a7238e3c with resources cpus=1; mem=1280 for framework
201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-
33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-
4946-a1bc-0644a7238e3c

Based on the above, you'll want to check out what's inside the directory:
$ cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/
201305130913-33597632-5050-3893-0000/executors/executor_
Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
$ ls
$ cat stderr

Thanks!


On Sun, May 12, 2013 at 8:45 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> Yes, I also updated mapred-site.xml. But it still can not work.
>
> I am using git version, just download it using git clone git://
> git.apache.org/mesos.git
>
> $ cd mesos
> $ ./bootstrap
> $ ./configure
> $ make
> $ cd hadoop
> $ make hadoop-0.20.205.0
>
> Then deploy it on the real cluster.
>
> I really do not know where is the problem, please help me with it.
>
>
>
>
> Wang Yu
>
> 发件人: Vinod Kone
> 发送时间: 2013-05-13 11:30
> 收件人: mesos-dev@incubator.apache.org
> 抄送: mesos-dev
> 主题: Re: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> Hmm. You definitely need the right extension but not the "hadoop" name. In
> assuming you also updated the file name in mapred-site.xml?
>
> Also I'm surprised that the slave logs donot show info about downloading
> the executor. What version of mesos are you running?
>
> @vinodkone
> Sent from my mobile
>
> On May 12, 2013, at 7:59 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > I have uploaded the right file using:
> > [root@master hadoop-0.20.205.0]# hadoop fs -mkdir mesos
> > [root@master hadoop-0.20.205.0]# hadoop fs -copyFromLocal
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> /user/mesos/mesos-executor
> >
> > I have tried add file extension--" /user/mesos/mesos-executor"->"
> /user/mesos/mesos-executor.tar.gz", but it still can not work. Does it must
> using hadoop.tar.gz as the file name?
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Vinod Kone
> > 发送时间: 2013-05-13 10:42
> > 收件人: mesos-dev@incubator.apache.org; wangyu
> > 抄送: Benjamin Mahler
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> >>
> >>  <property>
> >>    <name>mapred.mesos.executor</name>
> >> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> >>    <value>hdfs://master/user/mesos/mesos-executor</value>
> >>  </property>
> >
> > the mapred.mesos.executor property looks incorrect. the value should be
> > where you have uploaded the "hadoop.tar.gz" bundle generated by the
> > (TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
> > bundle in the hadoop build directory. upload the bundle to a hdfs
> location
> > and set the above property to that location.
> >
> > vinod
> >
> >
> >
> >>
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> >>> 总用量 0
> >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> >>> .  ..
> >>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> >>> 2. I added "--isolation=cgroups" for slaves, but it still not work.
> Tasks
> >>> are always lost. But there is no error any more, I still do not know
> what
> >>> happened to the executor...Logs on one slave is as follows. Please help
> >> me,
> >>> thanks very much!
> >>>
> >>> mesos-slave.INFO
> >>> Log file created at: 2013/05/13 09:12:54
> >>> Running on machine: slave1
> >>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> >>> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> >>> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
> >>> root
> >>> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> >>> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
> >>> 192.168.0.3:36668
> >>> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
> >>> mem=63356; ports=[31000-32000]; disk=29143
> >>> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
> >>> cgroups hierarchy root
> >>> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
> >>> master@192.168.0.2:5050
> >>> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
> >>> '/home/mesos/build/logs/mesos-slave.INFO'
> >>> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
> >>> detected at master@192.168.0.2:5050
> >>> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering
> isolator
> >>> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> >>> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master;
> given
> >>> slave ID 201305130913-33597632-5050-3893-0
> >>> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> >>> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> >>> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> >>> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> >>> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> >>> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> >>> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
> >>> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> >>> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
> >> Max
> >>> allowed age: 5.11days
> >>> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
> >> Max
> >>> allowed age: 5.11days
> >>> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
> >> Max
> >>> allowed age: 5.11days
> >>> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
> >>> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
> >>
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> >>> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
> >>
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> >>> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
> >>> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
> >>
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
> >>> with resources cpus=1; mem=1280 for framework
> >>> 201305130913-33597632-5050-3893-0000 in cgroup
> >>
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
> >>> controls for executor executor_Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> >>> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
> >> 'cpu.shares'
> >>> to 1024 for executor executor_Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> >>> 'memory.limit_in_bytes' to 1342177280 for executor
> >> executor_Task_Tracker_0
> >>> of framework 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> >>> 'memory.limit_in_bytes' to 1342177280 for executor
> >> executor_Task_Tracker_0
> >>> of framework 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started
> listening
> >>> for OOM events for executor executor_Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor
> at
> >> =
> >>> 24552
> >>> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
> >>> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
> >>
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> >>> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
> >>
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> >>> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
> >>> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
> >>
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> >>> with resources cpus=1; mem=1280 for framework
> >>> 201305130913-33597632-5050-3893-0000 in cgroup
> >>
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> >>> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
> >>> controls for executor executor_Task_Tracker_1 of framework
> >>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> >>> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
> >> 'cpu.shares'
> >>> to 1024 for executor executor_Task_Tracker_1 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
> >>> 'memory.limit_in_bytes' to 1342177280 for executor
> >> executor_Task_Tracker_1
> >>> of framework 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started
> listening
> >>> for OOM events for executor executor_Task_Tracker_1 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor
> at
> >> =
> >>> 24628
> >>> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
> >>> executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> >>> terminated with status 256
> >>> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
> >>> executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
> >>> triggered for executor executor_Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000 with uuid
> >>> 6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
> >>> notifier for executor executor_Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000 with uuid
> >>> 6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
> >>
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
> >>
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> after 1 attempts
> >>> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
> >>
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
> >>
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
> >>> destroyed cgroup
> >>
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> >>> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
> >>> 'executor_Task_Tracker_0' of framework
> >> 201305130913-33597632-5050-3893-0000
> >>> has exited with status 1
> >>> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
> >>> TASK_LOST from task Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
> >>> TASK_LOST from task Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000 to the status update manager
> >>> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
> >>> resources for an unknown/killed executor
> >>> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
> >> status
> >>> update TASK_LOST from task Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
> >>> StatusUpdate stream for task Task_Tracker_0 of framework
> >>> 201305130913-33597632-5050-3893-0000
> >>> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
> >> UPDATE
> >>> for status update TASK_
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Yes, I also updated mapred-site.xml. But it still can not work.

I am using git version, just download it using git clone git://git.apache.org/mesos.git

$ cd mesos
$ ./bootstrap
$ ./configure
$ make
$ cd hadoop
$ make hadoop-0.20.205.0

Then deploy it on the real cluster.

I really do not know where is the problem, please help me with it.




Wang Yu

发件人: Vinod Kone
发送时间: 2013-05-13 11:30
收件人: mesos-dev@incubator.apache.org
抄送: mesos-dev
主题: Re: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
Hmm. You definitely need the right extension but not the "hadoop" name. In assuming you also updated the file name in mapred-site.xml?

Also I'm surprised that the slave logs donot show info about downloading the executor. What version of mesos are you running?

@vinodkone
Sent from my mobile 

On May 12, 2013, at 7:59 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> I have uploaded the right file using:
> [root@master hadoop-0.20.205.0]# hadoop fs -mkdir mesos
> [root@master hadoop-0.20.205.0]# hadoop fs -copyFromLocal /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz /user/mesos/mesos-executor 
> 
> I have tried add file extension--" /user/mesos/mesos-executor"->" /user/mesos/mesos-executor.tar.gz", but it still can not work. Does it must using hadoop.tar.gz as the file name?
> 
> 
> 
> 
> Wang Yu
> 
> 发件人: Vinod Kone
> 发送时间: 2013-05-13 10:42
> 收件人: mesos-dev@incubator.apache.org; wangyu
> 抄送: Benjamin Mahler
> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
>> 
>>  <property>
>>    <name>mapred.mesos.executor</name>
>> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>>    <value>hdfs://master/user/mesos/mesos-executor</value>
>>  </property>
> 
> the mapred.mesos.executor property looks incorrect. the value should be
> where you have uploaded the "hadoop.tar.gz" bundle generated by the
> (TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
> bundle in the hadoop build directory. upload the bundle to a hdfs location
> and set the above property to that location.
> 
> vinod
> 
> 
> 
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
>>> 总用量 0
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
>>> .  ..
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
>>> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
>>> are always lost. But there is no error any more, I still do not know what
>>> happened to the executor...Logs on one slave is as follows. Please help
>> me,
>>> thanks very much!
>>> 
>>> mesos-slave.INFO
>>> Log file created at: 2013/05/13 09:12:54
>>> Running on machine: slave1
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
>>> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
>>> root
>>> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
>>> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
>>> 192.168.0.3:36668
>>> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
>>> mem=63356; ports=[31000-32000]; disk=29143
>>> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
>>> cgroups hierarchy root
>>> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
>>> master@192.168.0.2:5050
>>> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
>>> '/home/mesos/build/logs/mesos-slave.INFO'
>>> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
>>> detected at master@192.168.0.2:5050
>>> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
>>> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
>>> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
>>> slave ID 201305130913-33597632-5050-3893-0
>>> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
>>> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
>>> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
>>> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
>>> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
>>> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
>>> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
>>> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
>>> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>>> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>>> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
>>> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
>>> with resources cpus=1; mem=1280 for framework
>>> 201305130913-33597632-5050-3893-0000 in cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
>>> controls for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>>> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>>> to 1024 for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
>>> for OOM events for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at
>> =
>>> 24552
>>> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
>>> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>>> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>>> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
>>> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>>> with resources cpus=1; mem=1280 for framework
>>> 201305130913-33597632-5050-3893-0000 in cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>>> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
>>> controls for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>>> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>>> to 1024 for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_1
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
>>> for OOM events for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at
>> =
>>> 24628
>>> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
>>> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
>>> terminated with status 256
>>> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
>>> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
>>> triggered for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with uuid
>>> 6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
>>> notifier for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with uuid
>>> 6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> after 1 attempts
>>> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
>>> destroyed cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
>>> 'executor_Task_Tracker_0' of framework
>> 201305130913-33597632-5050-3893-0000
>>> has exited with status 1
>>> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
>>> TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
>>> TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 to the status update manager
>>> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
>>> resources for an unknown/killed executor
>>> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
>> status
>>> update TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
>>> StatusUpdate stream for task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
>> UPDATE
>>> for status update TASK_

Re: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Vinod Kone <vi...@twitter.com>.
Hmm. You definitely need the right extension but not the "hadoop" name. In assuming you also updated the file name in mapred-site.xml?

Also I'm surprised that the slave logs donot show info about downloading the executor. What version of mesos are you running?

@vinodkone
Sent from my mobile 

On May 12, 2013, at 7:59 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> I have uploaded the right file using:
> [root@master hadoop-0.20.205.0]# hadoop fs -mkdir mesos
> [root@master hadoop-0.20.205.0]# hadoop fs -copyFromLocal /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz /user/mesos/mesos-executor 
> 
> I have tried add file extension--" /user/mesos/mesos-executor"->" /user/mesos/mesos-executor.tar.gz", but it still can not work. Does it must using hadoop.tar.gz as the file name?
> 
> 
> 
> 
> Wang Yu
> 
> 发件人: Vinod Kone
> 发送时间: 2013-05-13 10:42
> 收件人: mesos-dev@incubator.apache.org; wangyu
> 抄送: Benjamin Mahler
> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
>> 
>>  <property>
>>    <name>mapred.mesos.executor</name>
>> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>>    <value>hdfs://master/user/mesos/mesos-executor</value>
>>  </property>
> 
> the mapred.mesos.executor property looks incorrect. the value should be
> where you have uploaded the "hadoop.tar.gz" bundle generated by the
> (TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
> bundle in the hadoop build directory. upload the bundle to a hdfs location
> and set the above property to that location.
> 
> vinod
> 
> 
> 
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
>>> 总用量 0
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
>>> .  ..
>>> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
>>> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
>>> are always lost. But there is no error any more, I still do not know what
>>> happened to the executor...Logs on one slave is as follows. Please help
>> me,
>>> thanks very much!
>>> 
>>> mesos-slave.INFO
>>> Log file created at: 2013/05/13 09:12:54
>>> Running on machine: slave1
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
>>> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
>>> root
>>> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
>>> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
>>> 192.168.0.3:36668
>>> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
>>> mem=63356; ports=[31000-32000]; disk=29143
>>> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
>>> cgroups hierarchy root
>>> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
>>> master@192.168.0.2:5050
>>> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
>>> '/home/mesos/build/logs/mesos-slave.INFO'
>>> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
>>> detected at master@192.168.0.2:5050
>>> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
>>> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
>>> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
>>> slave ID 201305130913-33597632-5050-3893-0
>>> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
>>> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
>>> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
>>> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
>>> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
>>> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
>>> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
>>> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
>>> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
>> Max
>>> allowed age: 5.11days
>>> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
>>> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>>> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
>>> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
>>> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
>>> with resources cpus=1; mem=1280 for framework
>>> 201305130913-33597632-5050-3893-0000 in cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
>>> controls for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>>> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>>> to 1024 for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_0
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
>>> for OOM events for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at
>> =
>>> 24552
>>> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
>>> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>>> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
>> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
>>> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
>>> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
>> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>>> with resources cpus=1; mem=1280 for framework
>>> 201305130913-33597632-5050-3893-0000 in cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
>>> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
>>> controls for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
>>> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
>> 'cpu.shares'
>>> to 1024 for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
>>> 'memory.limit_in_bytes' to 1342177280 for executor
>> executor_Task_Tracker_1
>>> of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
>>> for OOM events for executor executor_Task_Tracker_1 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at
>> =
>>> 24628
>>> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
>>> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
>>> terminated with status 256
>>> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
>>> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
>>> triggered for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with uuid
>>> 6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
>>> notifier for executor executor_Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 with uuid
>>> 6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> after 1 attempts
>>> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
>> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
>>> destroyed cgroup
>> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
>>> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
>>> 'executor_Task_Tracker_0' of framework
>> 201305130913-33597632-5050-3893-0000
>>> has exited with status 1
>>> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
>>> TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
>>> TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000 to the status update manager
>>> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
>>> resources for an unknown/killed executor
>>> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
>> status
>>> update TASK_LOST from task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
>>> StatusUpdate stream for task Task_Tracker_0 of framework
>>> 201305130913-33597632-5050-3893-0000
>>> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
>> UPDATE
>>> for status update TASK_

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
I have uploaded the right file using:
[root@master hadoop-0.20.205.0]# hadoop fs -mkdir mesos
[root@master hadoop-0.20.205.0]# hadoop fs -copyFromLocal /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz /user/mesos/mesos-executor 

I have tried add file extension--" /user/mesos/mesos-executor"->" /user/mesos/mesos-executor.tar.gz", but it still can not work. Does it must using hadoop.tar.gz as the file name?




Wang Yu

发件人: Vinod Kone
发送时间: 2013-05-13 10:42
收件人: mesos-dev@incubator.apache.org; wangyu
抄送: Benjamin Mahler
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
>
>   <property>
>     <name>mapred.mesos.executor</name>
> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>     <value>hdfs://master/user/mesos/mesos-executor</value>
>   </property>
>

the mapred.mesos.executor property looks incorrect. the value should be
where you have uploaded the "hadoop.tar.gz" bundle generated by the
(TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
bundle in the hadoop build directory. upload the bundle to a hdfs location
and set the above property to that location.

vinod



> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> > 总用量 0
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> > .  ..
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
> > are always lost. But there is no error any more, I still do not know what
> > happened to the executor...Logs on one slave is as follows. Please help
> me,
> > thanks very much!
> >
> > mesos-slave.INFO
> > Log file created at: 2013/05/13 09:12:54
> > Running on machine: slave1
> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
> > root
> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
> > 192.168.0.3:36668
> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
> > mem=63356; ports=[31000-32000]; disk=29143
> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
> > cgroups hierarchy root
> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
> > master@192.168.0.2:5050
> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
> > '/home/mesos/build/logs/mesos-slave.INFO'
> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
> > detected at master@192.168.0.2:5050
> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
> > slave ID 201305130913-33597632-5050-3893-0
> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
> > Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24552
> > I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
> > Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_1
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24628
> > I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > after 1 attempts
> > I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_0' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > for removal
> > I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task
> > Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_2
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24704
> > I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > after 1 attempts
> > I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.481230 24196 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_1' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_1
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > for removal
> > I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task
> > Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> > I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> > I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_3
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24780
> > I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for
> > unacknowledged status updates
> > I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > after 1 attempts
> > I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.493391 24184 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_2' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_2
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > for removal
> > I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task
> > Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> > I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> > I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_4
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24856
> > I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for
> > unacknowledged status updates
> > I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > after 1 attempts
> > I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.496247 24203 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_3' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > ......
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Benjamin Mahler
> > 发送时间: 2013-05-11 02:32
> > 收件人: wangyu
> > 抄送: Benjamin Mahler; mesos-dev
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > 1. If you look at a slave log, you can see that the process isolator
> > launched the task and then notified the slave that it was lost. Can you
> > look inside one of the executor directories, there should be an stderr
> file
> > there. E.g.:
> >
> > I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
> >
> >
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'
> >
> > Look for these in the logs and read the stderr present inside. Can you
> > report back with the contents?
> >
> > 2. Are you running on Linux? You may want to consider using
> > --isolation=cgroups when starting your slaves. This uses linux control
> > groups to do process / cpu / memory isolation between executors running
> on
> > the slave.
> >
> > Thanks!
> >
> >
> > On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> > > Hi Ben,
> > >
> > > Logs for mesos master and slaves are attached, thanks for helping me
> with
> > > this problem. I am very appreciate for your patient reply.
> > >
> > > Three servers: "master", "slave1", "slave5"
> > > Mesos master: "master"
> > > Mesos slaves: "master", "slave1", "slave5"
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-10 07:22
> > > *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> > > *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<
> > benjamin.mahler@gmail.com>
> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> > > upload the mesos-master and mesos-slave logs instead? What will be
> > > interesting here is what happened on the slave that is trying to run
> the
> > > TaskTracker.
> > >
> > >
> > > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > **
> > >
> > > > I have uploaded them in the former email, I will send them again. PS:
> > Will
> > > > the email list reject the attachements?
> > > >
> > > > Can you see them?
> > > >
> > > > ------------------------------
> > > > Wang Yu
> > > >
> > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > *发送时间:* 2013-05-09 10:00
> > > > *收件人:* mesos-dev@incubator.apache.org; wangyu <
> wangyu@nfs.iscas.ac.cn>
> > > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > >  Did you forget to attach them?
> > > >
> > > >
> > > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > **
> > > > > OK.
> > > > > Logs are attached. I use Ctrl+C to stop jobtracker when the
> task_lost
> > > > > happened.
> > > > >
> > > > > Thanks very much for your help!
> > > > >
> > > > > ------------------------------
> > > > > Wang Yu
> > > > >
> > > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > > *发送时间:* 2013-05-09 01:23
> > > > > *收件人:* mesos-dev@incubator.apache.org
> > > > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > Hey Brenden, are there any bugs in particular here that you're
> > referring to?
> > > > >
> > > > > Wang, can you provide the logs for the JobTracker, the slave, and
> the
> > > > > master?
> > > > >
> > > > >
> > > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > > > brenden.matthews@airbedandbreakfast.com> wrote:
> > > > >
> > > > > > You may want to try Airbnb's dist of Mesos:
> > > > > >
> > > > > > https://github.com/airbnb/mesos/tree/testing
> > > > > >
> > >
> > > > > > A good number of these Mesos bugs have been fixed but aren't yet
> > merged
> > > > > > into upstream.
> > > > > >
> > > > > >
> > > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > > >
> > > >
> > >
> > > > > > > The log on each slave of the lost task is : No executor found
> > with ID:
> > > > > > > executor_Task_Tracker_XXX.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: 王瑜
> > > > > > > 发送时间: 2013-05-07 11:13
> > > > > > > 收件人: mesos-dev
> > > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Hi all,
> > > > > > >
> > > >
> > >
> > > > > > > I have tried adding file extension when upload executor as well
> > as the
> > > > > > > conf file, but it still can not work.
> > > > > > >
> > > > > > > And I have seen
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > > > but it is a null directory.
> > > > > > >
> > > > >
> > > >
> > >
> > > > > > > Is there any other logs I can read to know why the TASK_LOST
> > happened? I
> > > > > > > really need your help, thanks very much!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: Vinod Kone
> > > > > > > 发送时间: 2013-04-26 01:31
> > > > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > > > 抄送: wangyu
> > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Also, you could look at the executor logs (default:
> > > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why
> > the
> > > > > > >  TASK_LOST happened.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > > > benjamin.mahler@gmail.com> wrote:
> > > > > > >
> > > >
> > >
> > > > > > > Can you maintain the file extension? That is how mesos knows to
> > extract
> > > > > > it:
> > > > > > > hadoop fs -copyFromLocal
> > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > /user/mesos/mesos-executor.tar.gz
> > > > > > >
> > > > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > Hi, Ben,
> > > > > > > >
> > > > > > > > I have tried as you said, but It still can not work.
> > > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > > >
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > > /user/mesos/mesos-executor
> > > > > > > > Did I do the right thing? Thanks very much!
> > > > > > > >
> > > > > > > > The log in jobtracker is:
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > > > Task_Tracker_82 on http://slave1:31000
> > > > >
> > > >
> > >
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map
> > and reduce
> > > > > > > > slots needed.
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update
> of
> > > > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker
> Status
> > > > > > > >       Pending Map Tasks: 2
> > > > > > > >    Pending Reduce Tasks: 1
> > > > > > > >          Idle Map Slots: 0
> > > > > > > >       Idle Reduce Slots: 0
> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > > >        Needed Map Slots: 2
> > > > > > > >     Needed Reduce Slots: 1
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > > > Task_Tracker_83 on http://slave1:31000
> > > > >
> > > >
> > >
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map
> > and reduce
> > > > > > > > slots needed.
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update
> of
> > > > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker
> Status
> > > > > > > >       Pending Map Tasks: 2
> > > > > > > >    Pending Reduce Tasks: 1
> > > > > > > >          Idle Map Slots: 0
> > > > > > > >       Idle Reduce Slots: 0
> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > > >        Needed Map Slots: 2
> > > > > > > >     Needed Reduce Slots: 1
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > > 发件人: Benjamin Mahler
> > > > > > > > 发送时间: 2013-04-24 07:49
> > > > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > >
> > > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > > > > You need to instead upload the hadoop.tar.gz generated by the
> > tutorial.
> > > > >
> > > >
> > >
> > > > > > > > Then point the conf file to the hdfs directory (you had the
> > right idea,
> > > > > > > > just uploaded the wrong file). :)
> > > > > > > >
> > > > > > > > Can you try that and report back?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Guodong,
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > There still are problems with me, I think there are some
> > problem with
> > > > > > > my
> > > > > > > > > executor setting.
> > > > > > > > >
> > > > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > > > mesos-master-hostname)
> > > > > > > > >   <property>
> > > > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > > > >   </property>
> > > > > > > > >
> > > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > > > >
> > > > > > > > > The head content is as follows:
> > > > > > > > >
> > > > > > > > > #! /bin/sh
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > # mesos-executor - temporary wrapper script for
> > .libs/mesos-executor
> > > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > > > #
> > > >
> > >
> > > > > > > > > # The mesos-executor program cannot be directly executed
> > until all
> > > > > > the
> > > > > > > > > libtool
> > > > > > > > > # libraries that it depends on are installed.
> > > > > > > > > #
> > > > > > > > > # This wrapper script should never be moved out of the
> build
> > > > > > directory.
> > > > > > > > > # If it is, it will not operate correctly.
> > > > > > > > >
> > > > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > > > backslashifies
> > > > >
> > > >
> > >
> > > > > > > > > # metacharacters that are still active within double-quoted
> > strings.
> > > > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > > > >
> > > > > > > > > # Be Bourne compatible
> > > > >
> > > >
> > >
> > > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null
> > 2>&1; then
> > > > > > > > >   emulate sh
> > > > > > > > >   NULLCMD=:
> > > > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"},
> > which
> > > > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > > > else
> > > > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;;
> esac
> > > > > > > > > fi
> > > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > # The HP-UX ksh and POSIX shell print the target directory
> > to stdout
> > > > > > > > > # if CDPATH is set.
> > > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > > > >
> > > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > >
> > >
> > > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || {
> > LIBRARY_PATH=;
> > > > > > > export
> > >
> > > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" ||
> > unset
> > > > >
> > > >
> > >
> > > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; };
> > }; { test
> > > > > > > -z
> > > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > > > GCC_EXEC_PREFIX=;
> > > >
> > >
> > > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z
> > \"\${LD_RUN_PATH+set}\" ||
> > > > > > > unset
> > > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > > > export LD_LIBRARY_PATH;
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > > >
> > > >
> > >
> > > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread
> > -lcurl -lssl
> > > > >
> > > >
> > >
> > > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath
> > -Wl,/home/mesos/build/src/.libs
> > > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > >
> > >
> > > > > > > > > Did I upload the right file? and set up it in conf file
> > correct?
> > > > > > Thanks
> > > > > > > > > very much!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Wang Yu
> > > > > > > > >
> > > > > > > > > From: 王国栋
> > > > > > > > > Date: 2013-04-23 13:32
> > > > > > > > > To: wangyu
> > > > > > > > > CC: mesos-dev
> > > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited
> > > > > > > > > TaskTracker: http://slave5:50060
> > > > > > > > > Hmm. it seems that the mapred.mesos.master is set
> correctly.
> > > > > > > > >
> > >
> > > > > > > > > if you run hadoop in local mode, use the following setting
> > is ok
> > > > > > > > >   <property>
> > > > > > > > >     <name>mapred.mesos.master</name>
> > > > > > > > >     <value>local</value>
> > > > > > > > >   </property>
> > > > > > > > >
> > >
> > > > > > > > > if you want to start the cluster. set mapred.mesos.master
> as
> > the
> > > > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > > > >
> > >
> > > > > > > > > Make sure the dns parser result for mesos-master-hostname
> is
> > the
> > > > > > right
> > > > > > > > ip.
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > BTW: when you starting the jobtracker, you can check mesos
> > webUI and
> > > > > > > > check
> > > > > > > > > if there is hadoop framework registered.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > Guodong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > **
> > > > > > > > > > Hi, Guodong,
> > > > > > > > > >
> > > > > > > > > > I start hadoop as you said, then I saw this error:
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from
> > scheduler
> > > > > > > > > driver: Cannot parse
> > > > > > > > > > '@0.0.0.0:0'
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > What's this mean? where should I change MesosScheduler
> > code to fix
> > > > > > > > this?
> > >
> > > > > > > > > > Thanks very much! I am so sorry for interrupt you once
> > again...
> > > > > > > > > >
> > > > > > > > > > The whole log is as follows:
> > > > > > > > > >
> > > > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > > >
> > /************************************************************
> > > > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > > > >
> > > > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat
> Apr
> > 13
> > > > > > > 11:19:33
> > > > > > > > > CST 2013
> > > > > > > > > >
> > ************************************************************/
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded
> > properties from
> > > > > > > > > hadoop-metrics2.properties
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled
> > snapshot
> > > > > > > > period
> > > > > > > > > at 10 second(s).
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker
> > metrics
> > > > > > > > system
> > > > > > > > > started
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > QueueMetrics,q=default registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > ugi
> > > > > > > > > registered.
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > >
> > > > > > > > > Updating the current master key for generating delegation
> > tokens
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > > > Starting expired delegation token remover thread,
> > > > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler
> > configured with
> > > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > > > limitMaxMemForMapTasks,
> > > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > >
> > > > > > > > > Updating the current master key for generating delegation
> > tokens
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing
> > hosts
> > > > > > > > > (include/exclude) list
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting
> > jobtracker with
> > > > > > > > owner
> > > > > > > > > as root
> > > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > RpcActivityForPort9001 registered.
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global
> > filtersafety
> > > > > > > > >
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > >
> > > > > > > > > webServer.getConnectors()[0].getLocalPort() before open()
> is
> > -1.
> > > > > > > Opening
> > > > > > > > > the listener on 50030
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer:
> > listener.getLocalPort()
> > > > > > > > returned
> > >
> > > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned
> > 50030
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to
> > port 50030
> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > jvm
> > > > > > > > > registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > JobTrackerMetrics registered.
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up
> > at: 9001
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker
> > webserver:
> > > > > > 50030
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the
> > system
> > > > > > > > > directory
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server
> > being
> > > > > > > > > initialized in embedded mode
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started
> > job history
> > > > > > > > > server at: localhost:50030
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History
> > Server web
> > > > > > > > > address: localhost:50030
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore:
> > Completed
> > > > > > job
> > > > > > > > > store is inactive
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > > > MesosScheduler
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing
> hosts
> > > > > > > information
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from
> > scheduler
> > > > > > > > > driver: Cannot parse '@
> > > > > > > > > > 0.0.0.0:0'
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> > includes
> > > > > > > file
> > > > > > > > to
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> > excludes
> > > > > > > file
> > > > > > > > to
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing
> > hosts
> > > > > > > > > (include/exclude) list
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning
> > 0 nodes
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder:
> > starting
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7
> on
> > 9001:
> > > > > > > > starting
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting
> RUNNING
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9
> on
> > 9001:
> > > > > > > > starting
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to
> > load
> > > > >
> > > >
> > >
> > > > > > > > > native-hadoop library for your platform... using
> > builtin-java classes
> > > > > > > > where
> > > > > > > > > applicable
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
> > job_201304231321_0001:
> > > > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job
> > job_201304231321_0001
> > > > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > > > >  IP=192.168.0.2
> > > > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > > > >  RESULT=SUCCESS
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken
> > generated and
> > > > > > > > > stored with users keys in
> > >
> > > > > > > > >
> > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size
> > for job
> > > > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > > > job_201304231321_0001
> > > > > > > > > initialized successfully with 0 map tasks and 0 reduce
> tasks.
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > Wang Yu
> > > > > > > > > >
> > > > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>;
> wangyu<
> > > > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > > > *Subject:* Re: Re:
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > > > >  Hi Yu,
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > Mesos will just launch tasktracker on each slave node as
> > long as
> > > > > > the
> > > > >
> > > >
> > >
> > > > > > > > > > required resource is enough for the tasktracker. So you
> > have to run
> > > > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > > > >
> > > > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you
> should
> > > > > > configure
> > >
> > > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different
> > from the
> > > > > > > normal
> > > > > > > > > one.
> > > >
> > >
> > > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker
> (you
> > should
> > >
> > > > > > > > > > configure mapred-site.xml, this jobtracker should
> contains
> > the
> > > > > > patch
> > > > > > > > for
> > > > > > > > > > mesos)
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to
> > check the
> > > > > > > > status
> > > > > > > > > > of Jobtracker.
> > > > > > > > > >
> > > > > > > > > >  Guodong
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know
> > what's my
> > > > > > > > > >> problem. Thanks very much!
> > > > > > > > > >>
> > > > >
> > > >
> > >
> > > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like
> > JobTracker,
> > > > > > > > > >> DataNode) I should stop it first?
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Wang Yu
> > > > > > > > > >>
> > > > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > > > TaskTrackers,
> > > > > > > > > >> meaning
> > > > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > > > >>
> > > > >
> > > >
> > >
> > > > > > > > > >> Are you starting your own TaskTrackers? Are there any
> > TaskTrackers
> > > > > > > > > running
> > > > > > > > > >> in your cluster?
> > > > > > > > > >>
> > > > > > > > > >> Looking at your jps output, is there already a
> TaskTracker
> > > > > > running?
> > > > > > > > > >> [root@master logs]# jps
> > > > > > > > > >> 13896 RunJar
> > > > > > > > > >> 14123 Jps
> > > > > > > > > >> 12718 NameNode
> > > > > > > > > >> 12900 DataNode
> > > > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > > > >> 13218 JobTracker
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > > > >> >
> > > > >
> > > >
> > >
> > > > > > > > > >> > What do you mean "managing your own TaskTrackers"? How
> > should I
> > > > > > > know
> > >
> > > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I
> do
> > not
> > > > > > > familiar
> > > > > > > > > >> with
> > > > > > > > > >> > mesos very much.
> > > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > > > core-site.xml
> > > > > > > > > in
> > > > >
> > > >
> > >
> > > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I
> just
> > want to
> > > > > > > set
> > > > > > > > up
> > > > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > > > >> >
> > > >
> > >
> > > > > > > > > >> > Thanks very much for your patient reply...Maybe I have
> > a long
> > > > > > way
> > > > > > > to
> > > > > > > > > >> go...
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > The log messages you see:
> > > > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > > > >> >
> > > >
> > >
> > > > > > > > > >> > Are printed when mesos does not know about the
> > TaskTracker. We
> > > > > > > > > currently
> > > > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > > > MesosScheduler
> > > > > > > > > will
> > > > > > > > > >> > launch them on your behalf when needed.
> > > > > > > > > >> >
> > >
> > > > > > > > > >> > Are you managing your own TaskTrackers? The purpose of
> > using
> > > > > > > Hadoop
> > > > > > > > > with
> > > > >
> > > >
> > >
> > > > > > > > > >> > mesos is that you no longer have to do that. We will
> > detect that
> > > > > > > > jobs
> > > > > > > > > >> have
> > > > >
> > > >
> > >
> > > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers
> > accordingly.
> > > > > > > > > >> >
> > > > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > Wang Yu
> > > > > > > > > >> >
> > > > > > > > > >> > From: 王国栋
> > > > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited
> > > > > > > > > >> > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > > > > > >> > You can check the slave log and the mesos-executor
> log,
> > which is
> > > > > > > > > >> normally
> > > > > > > > > >> > located in the dir like
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > > > >> > The log is tasktracker log.
> > > > > > > > > >> >
> > > > > > > > > >> > I hope it will help.
> > > > > > > > > >> >
> > > > > > > > > >> > Guodong
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> > > wangyu@nfs.iscas.ac.cn
> > > > >
> > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > **
> > > > > > > > > >> > > Hi All,
> > > > > > > > > >> > >
> > > >
> > >
> > > > > > > > > >> > > I have deployed mesos on three node: master, slave1,
> > slave5.
> > > > > > and
> > > > > > > > it
> > > > > > > > > >> works
> > > > > > > > > >> > > well.
> > >
> > > > > > > > > >> > >  Then I set hadoop over it, using master as
> namenode,
> > and
> > > > > > > master,
> > > > > > > > > >> slave1,
> > > >
> > >
> > > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks
> > works well.
> > > > > > > > > >> > >  [root@master logs]# jps
> > > > > > > > > >> > > 13896 RunJar
> > > > > > > > > >> > > 14123 Jps
> > > > > > > > > >> > > 12718 NameNode
> > > > > > > > > >> > > 12900 DataNode
> > > > > > > > > >> > > 13374 TaskTracker
> > > > > > > > > >> > > 13218 JobTracker
> > > > > > > > > >> > >
> > > > > > > > > >> > > Then I run test benchmark, it can not go on
> working...
> > > > > > > > > >> > >  [root@master
> > > > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > > > >> > > Running 30 maps.
> > > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running
> job:
> > > > > > > > > >> > job_201304181646_0001
> > >
> > > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0%
> > reduce 0%
> > > > > > > > > >> > > It stopped here.
> > > > > > > > > >> > >
> > > >
> > >
> > > > > > > > > >> > > Then I read the log file:
> > hadoop-root-jobtracker-master.log,
> > > > > > it
> > > > > > > > > shows:
> > > > > > > > > >> > >  2013-04-18 16
> > > > >
> > > >
> > >
> > > > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker:
> > Starting
> > > > > > > > > RUNNING
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 5
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 6
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 9
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 7
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 8
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > >
> > > >
> > >
> > > > > > > > > >> > > :46:52,557 INFO
> > org.apache.hadoop.net.NetworkTopology: Adding
> > > > > > a
> > > > > > > > new
> > > > > > > > > >> > node: /default-rack/master
> > > > > > > > > >> > > 2013-04-18 16
> > > >
> > >
> > > > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding
> > > > > > > > tracker
> > > > > > > > > >> > tracker_master:localhost/
> > > > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:52,568 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:55,581 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:58,590 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :47:01,600 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > > > >> Adding
> > > > > > > > > >> > tracker tracker_slave5:
> > > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Vinod Kone <vi...@twitter.com>.
>
>   <property>
>     <name>mapred.mesos.executor</name>
> #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
>     <value>hdfs://master/user/mesos/mesos-executor</value>
>   </property>
>

the mapred.mesos.executor property looks incorrect. the value should be
where you have uploaded the "hadoop.tar.gz" bundle generated by the
(TUTORIAL.sh or make hadoop). you can find the generated "hadoop.tar.gz"
bundle in the hadoop build directory. upload the bundle to a hdfs location
and set the above property to that location.

vinod



> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> > 总用量 0
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> > .  ..
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
> > are always lost. But there is no error any more, I still do not know what
> > happened to the executor...Logs on one slave is as follows. Please help
> me,
> > thanks very much!
> >
> > mesos-slave.INFO
> > Log file created at: 2013/05/13 09:12:54
> > Running on machine: slave1
> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
> > root
> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
> > 192.168.0.3:36668
> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
> > mem=63356; ports=[31000-32000]; disk=29143
> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
> > cgroups hierarchy root
> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
> > master@192.168.0.2:5050
> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
> > '/home/mesos/build/logs/mesos-slave.INFO'
> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
> > detected at master@192.168.0.2:5050
> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
> > slave ID 201305130913-33597632-5050-3893-0
> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
> > '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%.
> Max
> > allowed age: 5.11days
> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
> > Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24552
> > I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
> > Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_1
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24628
> > I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > after 1 attempts
> > I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> > I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_0' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_0 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_0
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> > for removal
> > I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task
> > Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_2
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24704
> > I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > after 1 attempts
> > I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> > I0513 09:16:40.481230 24196 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_1' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_1 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_1
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> > for removal
> > I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task
> > Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> > I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> > I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_3
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24780
> > I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for
> > unacknowledged status updates
> > I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > after 1 attempts
> > I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> > I0513 09:16:46.493391 24184 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_2' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update
> > TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 to the status update manager
> > I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update
> > resources for an unknown/killed executor
> > I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received
> status
> > update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating
> > StatusUpdate stream for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling
> UPDATE
> > for status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding
> > status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000 to the master at
> > master@192.168.0.2:5050
> > I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status
> > update for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received
> status
> > update acknowledgement for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK
> > for status update TASK_LOST from task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up
> > status update stream for task Task_Tracker_2 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager
> > successfully handled status update acknowledgement for task
> Task_Tracker_2
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> > for removal
> > I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task
> > Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> > I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file
> >
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> > I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching
> > executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in
> >
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > with resources cpus=1; mem=1280 for framework
> > 201305130913-33597632-5050-3893-0000 in cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> > I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup
> > controls for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> > I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated
> 'cpu.shares'
> > to 1024 for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated
> > 'memory.limit_in_bytes' to 1342177280 for executor
> executor_Task_Tracker_4
> > of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening
> > for OOM events for executor executor_Task_Tracker_4 of framework
> > 201305130913-33597632-5050-3893-0000
> > I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at
> =
> > 24856
> > I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for
> > unacknowledged status updates
> > I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor
> > executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> > terminated with status 256
> > I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor
> > executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> > I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is
> > triggered for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM
> > notifier for executor executor_Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000 with uuid
> > 22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > after 1 attempts
> > I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed
> >
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully
> > destroyed cgroup
> >
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> > I0513 09:16:52.496247 24203 slave.cpp:1479] Executor
> > 'executor_Task_Tracker_3' of framework
> 201305130913-33597632-5050-3893-0000
> > has exited with status 1
> > I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update
> > TASK_LOST from task Task_Tracker_3 of framework
> > 201305130913-33597632-5050-3893-0000
> > ......
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Benjamin Mahler
> > 发送时间: 2013-05-11 02:32
> > 收件人: wangyu
> > 抄送: Benjamin Mahler; mesos-dev
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > 1. If you look at a slave log, you can see that the process isolator
> > launched the task and then notified the slave that it was lost. Can you
> > look inside one of the executor directories, there should be an stderr
> file
> > there. E.g.:
> >
> > I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
> >
> >
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'
> >
> > Look for these in the logs and read the stderr present inside. Can you
> > report back with the contents?
> >
> > 2. Are you running on Linux? You may want to consider using
> > --isolation=cgroups when starting your slaves. This uses linux control
> > groups to do process / cpu / memory isolation between executors running
> on
> > the slave.
> >
> > Thanks!
> >
> >
> > On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> > > Hi Ben,
> > >
> > > Logs for mesos master and slaves are attached, thanks for helping me
> with
> > > this problem. I am very appreciate for your patient reply.
> > >
> > > Three servers: "master", "slave1", "slave5"
> > > Mesos master: "master"
> > > Mesos slaves: "master", "slave1", "slave5"
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-10 07:22
> > > *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> > > *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<
> > benjamin.mahler@gmail.com>
> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> > > upload the mesos-master and mesos-slave logs instead? What will be
> > > interesting here is what happened on the slave that is trying to run
> the
> > > TaskTracker.
> > >
> > >
> > > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > **
> > >
> > > > I have uploaded them in the former email, I will send them again. PS:
> > Will
> > > > the email list reject the attachements?
> > > >
> > > > Can you see them?
> > > >
> > > > ------------------------------
> > > > Wang Yu
> > > >
> > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > *发送时间:* 2013-05-09 10:00
> > > > *收件人:* mesos-dev@incubator.apache.org; wangyu <
> wangyu@nfs.iscas.ac.cn>
> > > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > >  Did you forget to attach them?
> > > >
> > > >
> > > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > **
> > > > > OK.
> > > > > Logs are attached. I use Ctrl+C to stop jobtracker when the
> task_lost
> > > > > happened.
> > > > >
> > > > > Thanks very much for your help!
> > > > >
> > > > > ------------------------------
> > > > > Wang Yu
> > > > >
> > > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > > *发送时间:* 2013-05-09 01:23
> > > > > *收件人:* mesos-dev@incubator.apache.org
> > > > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > Hey Brenden, are there any bugs in particular here that you're
> > referring to?
> > > > >
> > > > > Wang, can you provide the logs for the JobTracker, the slave, and
> the
> > > > > master?
> > > > >
> > > > >
> > > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > > > brenden.matthews@airbedandbreakfast.com> wrote:
> > > > >
> > > > > > You may want to try Airbnb's dist of Mesos:
> > > > > >
> > > > > > https://github.com/airbnb/mesos/tree/testing
> > > > > >
> > >
> > > > > > A good number of these Mesos bugs have been fixed but aren't yet
> > merged
> > > > > > into upstream.
> > > > > >
> > > > > >
> > > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > > >
> > > >
> > >
> > > > > > > The log on each slave of the lost task is : No executor found
> > with ID:
> > > > > > > executor_Task_Tracker_XXX.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: 王瑜
> > > > > > > 发送时间: 2013-05-07 11:13
> > > > > > > 收件人: mesos-dev
> > > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Hi all,
> > > > > > >
> > > >
> > >
> > > > > > > I have tried adding file extension when upload executor as well
> > as the
> > > > > > > conf file, but it still can not work.
> > > > > > >
> > > > > > > And I have seen
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > > > but it is a null directory.
> > > > > > >
> > > > >
> > > >
> > >
> > > > > > > Is there any other logs I can read to know why the TASK_LOST
> > happened? I
> > > > > > > really need your help, thanks very much!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: Vinod Kone
> > > > > > > 发送时间: 2013-04-26 01:31
> > > > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > > > 抄送: wangyu
> > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Also, you could look at the executor logs (default:
> > > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why
> > the
> > > > > > >  TASK_LOST happened.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > > > benjamin.mahler@gmail.com> wrote:
> > > > > > >
> > > >
> > >
> > > > > > > Can you maintain the file extension? That is how mesos knows to
> > extract
> > > > > > it:
> > > > > > > hadoop fs -copyFromLocal
> > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > /user/mesos/mesos-executor.tar.gz
> > > > > > >
> > > > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > Hi, Ben,
> > > > > > > >
> > > > > > > > I have tried as you said, but It still can not work.
> > > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > > >
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > > /user/mesos/mesos-executor
> > > > > > > > Did I do the right thing? Thanks very much!
> > > > > > > >
> > > > > > > > The log in jobtracker is:
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > > > Task_Tracker_82 on http://slave1:31000
> > > > >
> > > >
> > >
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map
> > and reduce
> > > > > > > > slots needed.
> > > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update
> of
> > > > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker
> Status
> > > > > > > >       Pending Map Tasks: 2
> > > > > > > >    Pending Reduce Tasks: 1
> > > > > > > >          Idle Map Slots: 0
> > > > > > > >       Idle Reduce Slots: 0
> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > > >        Needed Map Slots: 2
> > > > > > > >     Needed Reduce Slots: 1
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > > > Task_Tracker_83 on http://slave1:31000
> > > > >
> > > >
> > >
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map
> > and reduce
> > > > > > > > slots needed.
> > > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update
> of
> > > > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker
> Status
> > > > > > > >       Pending Map Tasks: 2
> > > > > > > >    Pending Reduce Tasks: 1
> > > > > > > >          Idle Map Slots: 0
> > > > > > > >       Idle Reduce Slots: 0
> > > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > > >        Needed Map Slots: 2
> > > > > > > >     Needed Reduce Slots: 1
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > > 发件人: Benjamin Mahler
> > > > > > > > 发送时间: 2013-04-24 07:49
> > > > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > >
> > > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > > > > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > > > > You need to instead upload the hadoop.tar.gz generated by the
> > tutorial.
> > > > >
> > > >
> > >
> > > > > > > > Then point the conf file to the hdfs directory (you had the
> > right idea,
> > > > > > > > just uploaded the wrong file). :)
> > > > > > > >
> > > > > > > > Can you try that and report back?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Guodong,
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > There still are problems with me, I think there are some
> > problem with
> > > > > > > my
> > > > > > > > > executor setting.
> > > > > > > > >
> > > > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > > > mesos-master-hostname)
> > > > > > > > >   <property>
> > > > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > > > >   </property>
> > > > > > > > >
> > > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > > > >
> > > > > > > > > The head content is as follows:
> > > > > > > > >
> > > > > > > > > #! /bin/sh
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > # mesos-executor - temporary wrapper script for
> > .libs/mesos-executor
> > > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > > > #
> > > >
> > >
> > > > > > > > > # The mesos-executor program cannot be directly executed
> > until all
> > > > > > the
> > > > > > > > > libtool
> > > > > > > > > # libraries that it depends on are installed.
> > > > > > > > > #
> > > > > > > > > # This wrapper script should never be moved out of the
> build
> > > > > > directory.
> > > > > > > > > # If it is, it will not operate correctly.
> > > > > > > > >
> > > > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > > > backslashifies
> > > > >
> > > >
> > >
> > > > > > > > > # metacharacters that are still active within double-quoted
> > strings.
> > > > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > > > >
> > > > > > > > > # Be Bourne compatible
> > > > >
> > > >
> > >
> > > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null
> > 2>&1; then
> > > > > > > > >   emulate sh
> > > > > > > > >   NULLCMD=:
> > > > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"},
> > which
> > > > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > > > else
> > > > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;;
> esac
> > > > > > > > > fi
> > > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > # The HP-UX ksh and POSIX shell print the target directory
> > to stdout
> > > > > > > > > # if CDPATH is set.
> > > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > > > >
> > > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > >
> > >
> > > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || {
> > LIBRARY_PATH=;
> > > > > > > export
> > >
> > > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" ||
> > unset
> > > > >
> > > >
> > >
> > > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; };
> > }; { test
> > > > > > > -z
> > > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > > > GCC_EXEC_PREFIX=;
> > > >
> > >
> > > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z
> > \"\${LD_RUN_PATH+set}\" ||
> > > > > > > unset
> > > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > > > export LD_LIBRARY_PATH;
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > > >
> > > >
> > >
> > > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread
> > -lcurl -lssl
> > > > >
> > > >
> > >
> > > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath
> > -Wl,/home/mesos/build/src/.libs
> > > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > >
> > >
> > > > > > > > > Did I upload the right file? and set up it in conf file
> > correct?
> > > > > > Thanks
> > > > > > > > > very much!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Wang Yu
> > > > > > > > >
> > > > > > > > > From: 王国栋
> > > > > > > > > Date: 2013-04-23 13:32
> > > > > > > > > To: wangyu
> > > > > > > > > CC: mesos-dev
> > > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited
> > > > > > > > > TaskTracker: http://slave5:50060
> > > > > > > > > Hmm. it seems that the mapred.mesos.master is set
> correctly.
> > > > > > > > >
> > >
> > > > > > > > > if you run hadoop in local mode, use the following setting
> > is ok
> > > > > > > > >   <property>
> > > > > > > > >     <name>mapred.mesos.master</name>
> > > > > > > > >     <value>local</value>
> > > > > > > > >   </property>
> > > > > > > > >
> > >
> > > > > > > > > if you want to start the cluster. set mapred.mesos.master
> as
> > the
> > > > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > > > >
> > >
> > > > > > > > > Make sure the dns parser result for mesos-master-hostname
> is
> > the
> > > > > > right
> > > > > > > > ip.
> > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > BTW: when you starting the jobtracker, you can check mesos
> > webUI and
> > > > > > > > check
> > > > > > > > > if there is hadoop framework registered.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > Guodong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > **
> > > > > > > > > > Hi, Guodong,
> > > > > > > > > >
> > > > > > > > > > I start hadoop as you said, then I saw this error:
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from
> > scheduler
> > > > > > > > > driver: Cannot parse
> > > > > > > > > > '@0.0.0.0:0'
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > What's this mean? where should I change MesosScheduler
> > code to fix
> > > > > > > > this?
> > >
> > > > > > > > > > Thanks very much! I am so sorry for interrupt you once
> > again...
> > > > > > > > > >
> > > > > > > > > > The whole log is as follows:
> > > > > > > > > >
> > > > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > > >
> > /************************************************************
> > > > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > > > >
> > > > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat
> Apr
> > 13
> > > > > > > 11:19:33
> > > > > > > > > CST 2013
> > > > > > > > > >
> > ************************************************************/
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded
> > properties from
> > > > > > > > > hadoop-metrics2.properties
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled
> > snapshot
> > > > > > > > period
> > > > > > > > > at 10 second(s).
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker
> > metrics
> > > > > > > > system
> > > > > > > > > started
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > QueueMetrics,q=default registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > ugi
> > > > > > > > > registered.
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > >
> > > > > > > > > Updating the current master key for generating delegation
> > tokens
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > > > Starting expired delegation token remover thread,
> > > > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler
> > configured with
> > > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > > > limitMaxMemForMapTasks,
> > > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > > delegation.AbstractDelegationTokenSecretManager:
> > >
> > > > > > > > > Updating the current master key for generating delegation
> > tokens
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing
> > hosts
> > > > > > > > > (include/exclude) list
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting
> > jobtracker with
> > > > > > > > owner
> > > > > > > > > as root
> > > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > RpcActivityForPort9001 registered.
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global
> > filtersafety
> > > > > > > > >
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > >
> > > > > > > > > webServer.getConnectors()[0].getLocalPort() before open()
> is
> > -1.
> > > > > > > Opening
> > > > > > > > > the listener on 50030
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer:
> > listener.getLocalPort()
> > > > > > > > returned
> > >
> > > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned
> > 50030
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to
> > port 50030
> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > jvm
> > > > > > > > > registered.
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> > for source
> > > > > > > > > JobTrackerMetrics registered.
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up
> > at: 9001
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker
> > webserver:
> > > > > > 50030
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the
> > system
> > > > > > > > > directory
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server
> > being
> > > > > > > > > initialized in embedded mode
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started
> > job history
> > > > > > > > > server at: localhost:50030
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History
> > Server web
> > > > > > > > > address: localhost:50030
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore:
> > Completed
> > > > > > job
> > > > > > > > > store is inactive
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > > > MesosScheduler
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing
> hosts
> > > > > > > information
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from
> > scheduler
> > > > > > > > > driver: Cannot parse '@
> > > > > > > > > > 0.0.0.0:0'
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> > includes
> > > > > > > file
> > > > > > > > to
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> > excludes
> > > > > > > file
> > > > > > > > to
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing
> > hosts
> > > > > > > > > (include/exclude) list
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning
> > 0 nodes
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder:
> > starting
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7
> on
> > 9001:
> > > > > > > > starting
> > > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting
> RUNNING
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8
> on
> > 9001:
> > > > > > > > starting
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9
> on
> > 9001:
> > > > > > > > starting
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to
> > load
> > > > >
> > > >
> > >
> > > > > > > > > native-hadoop library for your platform... using
> > builtin-java classes
> > > > > > > > where
> > > > > > > > > applicable
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
> > job_201304231321_0001:
> > > > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job
> > job_201304231321_0001
> > > > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > > > >  IP=192.168.0.2
> > > > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > > > >  RESULT=SUCCESS
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > > > job_201304231321_0001
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken
> > generated and
> > > > > > > > > stored with users keys in
> > >
> > > > > > > > >
> > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > > > >
> > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size
> > for job
> > > > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > > > >
> > > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > > > job_201304231321_0001
> > > > > > > > > initialized successfully with 0 map tasks and 0 reduce
> tasks.
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > Wang Yu
> > > > > > > > > >
> > > > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>;
> wangyu<
> > > > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > > > *Subject:* Re: Re:
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > > > >  Hi Yu,
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > Mesos will just launch tasktracker on each slave node as
> > long as
> > > > > > the
> > > > >
> > > >
> > >
> > > > > > > > > > required resource is enough for the tasktracker. So you
> > have to run
> > > > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > > > >
> > > > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you
> should
> > > > > > configure
> > >
> > > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different
> > from the
> > > > > > > normal
> > > > > > > > > one.
> > > >
> > >
> > > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker
> (you
> > should
> > >
> > > > > > > > > > configure mapred-site.xml, this jobtracker should
> contains
> > the
> > > > > > patch
> > > > > > > > for
> > > > > > > > > > mesos)
> > > > > > > > > >
> > > >
> > >
> > > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to
> > check the
> > > > > > > > status
> > > > > > > > > > of Jobtracker.
> > > > > > > > > >
> > > > > > > > > >  Guodong
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > >
> > > >
> > >
> > > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know
> > what's my
> > > > > > > > > >> problem. Thanks very much!
> > > > > > > > > >>
> > > > >
> > > >
> > >
> > > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like
> > JobTracker,
> > > > > > > > > >> DataNode) I should stop it first?
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Wang Yu
> > > > > > > > > >>
> > > > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > > > TaskTrackers,
> > > > > > > > > >> meaning
> > > > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > > > >>
> > > > >
> > > >
> > >
> > > > > > > > > >> Are you starting your own TaskTrackers? Are there any
> > TaskTrackers
> > > > > > > > > running
> > > > > > > > > >> in your cluster?
> > > > > > > > > >>
> > > > > > > > > >> Looking at your jps output, is there already a
> TaskTracker
> > > > > > running?
> > > > > > > > > >> [root@master logs]# jps
> > > > > > > > > >> 13896 RunJar
> > > > > > > > > >> 14123 Jps
> > > > > > > > > >> 12718 NameNode
> > > > > > > > > >> 12900 DataNode
> > > > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > > > >> 13218 JobTracker
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > > > >> >
> > > > >
> > > >
> > >
> > > > > > > > > >> > What do you mean "managing your own TaskTrackers"? How
> > should I
> > > > > > > know
> > >
> > > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I
> do
> > not
> > > > > > > familiar
> > > > > > > > > >> with
> > > > > > > > > >> > mesos very much.
> > > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > > > core-site.xml
> > > > > > > > > in
> > > > >
> > > >
> > >
> > > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I
> just
> > want to
> > > > > > > set
> > > > > > > > up
> > > > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > > > >> >
> > > >
> > >
> > > > > > > > > >> > Thanks very much for your patient reply...Maybe I have
> > a long
> > > > > > way
> > > > > > > to
> > > > > > > > > >> go...
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > The log messages you see:
> > > > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > > > >> >
> > > >
> > >
> > > > > > > > > >> > Are printed when mesos does not know about the
> > TaskTracker. We
> > > > > > > > > currently
> > > > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > > > MesosScheduler
> > > > > > > > > will
> > > > > > > > > >> > launch them on your behalf when needed.
> > > > > > > > > >> >
> > >
> > > > > > > > > >> > Are you managing your own TaskTrackers? The purpose of
> > using
> > > > > > > Hadoop
> > > > > > > > > with
> > > > >
> > > >
> > >
> > > > > > > > > >> > mesos is that you no longer have to do that. We will
> > detect that
> > > > > > > > jobs
> > > > > > > > > >> have
> > > > >
> > > >
> > >
> > > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers
> > accordingly.
> > > > > > > > > >> >
> > > > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > Wang Yu
> > > > > > > > > >> >
> > > > > > > > > >> > From: 王国栋
> > > > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited
> > > > > > > > > >> > TaskTracker: http://slave5:50060
> > > > >
> > > >
> > >
> > > > > > > > > >> > You can check the slave log and the mesos-executor
> log,
> > which is
> > > > > > > > > >> normally
> > > > > > > > > >> > located in the dir like
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> > > > > >
> >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > > > >> > The log is tasktracker log.
> > > > > > > > > >> >
> > > > > > > > > >> > I hope it will help.
> > > > > > > > > >> >
> > > > > > > > > >> > Guodong
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> > > wangyu@nfs.iscas.ac.cn
> > > > >
> > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > **
> > > > > > > > > >> > > Hi All,
> > > > > > > > > >> > >
> > > >
> > >
> > > > > > > > > >> > > I have deployed mesos on three node: master, slave1,
> > slave5.
> > > > > > and
> > > > > > > > it
> > > > > > > > > >> works
> > > > > > > > > >> > > well.
> > >
> > > > > > > > > >> > >  Then I set hadoop over it, using master as
> namenode,
> > and
> > > > > > > master,
> > > > > > > > > >> slave1,
> > > >
> > >
> > > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks
> > works well.
> > > > > > > > > >> > >  [root@master logs]# jps
> > > > > > > > > >> > > 13896 RunJar
> > > > > > > > > >> > > 14123 Jps
> > > > > > > > > >> > > 12718 NameNode
> > > > > > > > > >> > > 12900 DataNode
> > > > > > > > > >> > > 13374 TaskTracker
> > > > > > > > > >> > > 13218 JobTracker
> > > > > > > > > >> > >
> > > > > > > > > >> > > Then I run test benchmark, it can not go on
> working...
> > > > > > > > > >> > >  [root@master
> > > > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > > > >> > > Running 30 maps.
> > > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running
> job:
> > > > > > > > > >> > job_201304181646_0001
> > >
> > > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0%
> > reduce 0%
> > > > > > > > > >> > > It stopped here.
> > > > > > > > > >> > >
> > > >
> > >
> > > > > > > > > >> > > Then I read the log file:
> > hadoop-root-jobtracker-master.log,
> > > > > > it
> > > > > > > > > shows:
> > > > > > > > > >> > >  2013-04-18 16
> > > > >
> > > >
> > >
> > > > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker:
> > Starting
> > > > > > > > > RUNNING
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 5
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 6
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 9
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 7
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> > Server
> > > > > > > handler 8
> > > > > > > > > on
> > > > > > > > > >> > 9001: starting
> > > > > > > > > >> > > 2013-04-18 16
> > > > >
> > > >
> > >
> > > > > > > > > >> > > :46:52,557 INFO
> > org.apache.hadoop.net.NetworkTopology: Adding
> > > > > > a
> > > > > > > > new
> > > > > > > > > >> > node: /default-rack/master
> > > > > > > > > >> > > 2013-04-18 16
> > > >
> > >
> > > > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker:
> > Adding
> > > > > > > > tracker
> > > > > > > > > >> > tracker_master:localhost/
> > > > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:52,568 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:55,581 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :46:58,590 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > > 2013-04-18 16
> > > > > > > > > >> > > :47:01,600 INFO
> > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> Unknown/exited
> > > > > > > > > >> > TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > > > >> Adding
> > > > > > > > > >> > tracker tracker_slave5:
> > > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://slave5:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > > >> > > http://master:50060.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
>

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
The hdfs setup is the same as former hadoop. Namenode: master; Datanode: slave1, slave5

hdfs-site.xml is as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.name.dir</name>
                <value>/home/HadoopRun/name1,/home/HadoopRun/name2</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/home/HadoopRun/data1,/home/HadoopRun/data2</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.web.ugi</name>
                <value>root,supergroup</value>
        </property>
</configuration>

Is there any other mesos/hdfs setup information I need to provide?

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>master:9001</value>
  </property>
  <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.MesosScheduler</value>
  </property>
  <property>
    <name>mapred.mesos.taskScheduler</name>
    <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>
  </property>
  <property>
    <name>mapred.mesos.master</name>
    <value>master:5050</value>
  </property>
#
# Make sure to uncomment the 'mapred.mesos.executor' property,
# when running the Hadoop JobTracker on a real Mesos cluster.
# NOTE: You need to MANUALLY upload the Mesos executor bundle
# to the location that is set as the value of this property.
  <property>
    <name>mapred.mesos.executor</name>
#    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
    <value>hdfs://master/user/mesos/mesos-executor</value>
  </property>
#
# The properties below indicate the amount of resources
# that are allocated to a Hadoop slot (i.e., map/reduce task) by Mesos.
  <property>
    <name>mapred.mesos.slot.cpus</name>
    <value>0.2</value>
  </property>
  <property>
    <name>mapred.mesos.slot.disk</name>
    <!-- The value is in MB. -->
    <value>1024</value>
  </property>
  <property>
    <name>mapred.mesos.slot.mem</name>
    <!-- Note that this is the total memory required for
         JVM overhead (256 MB) and the heap (-Xmx) of the task.
         The value is in MB. -->
    <value>512</value>
  </property>
</configuration>

Thanks very much!




Wang Yu

发件人: Vinod Kone
发送时间: 2013-05-13 10:15
收件人: mesos-dev@incubator.apache.org; wangyu
抄送: Benjamin Mahler
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
What is your mesos/hdfs cluster setup? Also, can you paste your
mapred-site.xml?


On Sun, May 12, 2013 at 6:40 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> 1. I opened all the executor directory, but all of them are null. I do not
> know what happened to them...
> [root@slave1 logs]# cd
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> 总用量 0
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> .  ..
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
> are always lost. But there is no error any more, I still do not know what
> happened to the executor...Logs on one slave is as follows. Please help me,
> thanks very much!
>
> mesos-slave.INFO
> Log file created at: 2013/05/13 09:12:54
> Running on machine: slave1
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
> root
> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
> 192.168.0.3:36668
> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
> mem=63356; ports=[31000-32000]; disk=29143
> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
> cgroups hierarchy root
> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
> master@192.168.0.2:5050
> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
> '/home/mesos/build/logs/mesos-slave.INFO'
> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
> detected at master@192.168.0.2:5050
> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
> slave ID 201305130913-33597632-5050-3893-0
> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at =
> 24552
> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at =
> 24628
> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> after 1 attempts
> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
> 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> for removal
> I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task
> Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_2
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at =
> 24704
> I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> after 1 attempts
> I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.481230 24196 slave.cpp:1479] Executor
> 'executor_Task_Tracker_1' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_1
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> for removal
> I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task
> Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_3
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at =
> 24780
> I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for
> unacknowledged status updates
> I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> after 1 attempts
> I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.493391 24184 slave.cpp:1479] Executor
> 'executor_Task_Tracker_2' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_2
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> for removal
> I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task
> Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_4
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at =
> 24856
> I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for
> unacknowledged status updates
> I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> after 1 attempts
> I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.496247 24203 slave.cpp:1479] Executor
> 'executor_Task_Tracker_3' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> ......
>
>
>
>
> Wang Yu
>
> 发件人: Benjamin Mahler
> 发送时间: 2013-05-11 02:32
> 收件人: wangyu
> 抄送: Benjamin Mahler; mesos-dev
> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> 1. If you look at a slave log, you can see that the process isolator
> launched the task and then notified the slave that it was lost. Can you
> look inside one of the executor directories, there should be an stderr file
> there. E.g.:
>
> I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
>
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'
>
> Look for these in the logs and read the stderr present inside. Can you
> report back with the contents?
>
> 2. Are you running on Linux? You may want to consider using
> --isolation=cgroups when starting your slaves. This uses linux control
> groups to do process / cpu / memory isolation between executors running on
> the slave.
>
> Thanks!
>
>
> On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
> > Hi Ben,
> >
> > Logs for mesos master and slaves are attached, thanks for helping me with
> > this problem. I am very appreciate for your patient reply.
> >
> > Three servers: "master", "slave1", "slave5"
> > Mesos master: "master"
> > Mesos slaves: "master", "slave1", "slave5"
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-10 07:22
> > *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> > *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<
> benjamin.mahler@gmail.com>
> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> > upload the mesos-master and mesos-slave logs instead? What will be
> > interesting here is what happened on the slave that is trying to run the
> > TaskTracker.
> >
> >
> > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> >
> > > I have uploaded them in the former email, I will send them again. PS:
> Will
> > > the email list reject the attachements?
> > >
> > > Can you see them?
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-09 10:00
> > > *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >  Did you forget to attach them?
> > >
> > >
> > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > **
> > > > OK.
> > > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > > > happened.
> > > >
> > > > Thanks very much for your help!
> > > >
> > > > ------------------------------
> > > > Wang Yu
> > > >
> > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > *发送时间:* 2013-05-09 01:23
> > > > *收件人:* mesos-dev@incubator.apache.org
> > > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > Hey Brenden, are there any bugs in particular here that you're
> referring to?
> > > >
> > > > Wang, can you provide the logs for the JobTracker, the slave, and the
> > > > master?
> > > >
> > > >
> > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > > brenden.matthews@airbedandbreakfast.com> wrote:
> > > >
> > > > > You may want to try Airbnb's dist of Mesos:
> > > > >
> > > > > https://github.com/airbnb/mesos/tree/testing
> > > > >
> >
> > > > > A good number of these Mesos bugs have been fixed but aren't yet
> merged
> > > > > into upstream.
> > > > >
> > > > >
> > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > > >
> > >
> >
> > > > > > The log on each slave of the lost task is : No executor found
> with ID:
> > > > > > executor_Task_Tracker_XXX.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: 王瑜
> > > > > > 发送时间: 2013-05-07 11:13
> > > > > > 收件人: mesos-dev
> > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Hi all,
> > > > > >
> > >
> >
> > > > > > I have tried adding file extension when upload executor as well
> as the
> > > > > > conf file, but it still can not work.
> > > > > >
> > > > > > And I have seen
> > > > > >
> > > >
> > >
> >
> > > > >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > > but it is a null directory.
> > > > > >
> > > >
> > >
> >
> > > > > > Is there any other logs I can read to know why the TASK_LOST
> happened? I
> > > > > > really need your help, thanks very much!
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: Vinod Kone
> > > > > > 发送时间: 2013-04-26 01:31
> > > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > > 抄送: wangyu
> > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Also, you could look at the executor logs (default:
> > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why
> the
> > > > > >  TASK_LOST happened.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > > benjamin.mahler@gmail.com> wrote:
> > > > > >
> > >
> >
> > > > > > Can you maintain the file extension? That is how mesos knows to
> extract
> > > > > it:
> > > > > > hadoop fs -copyFromLocal
> > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > /user/mesos/mesos-executor.tar.gz
> > > > > >
> > > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > Hi, Ben,
> > > > > > >
> > > > > > > I have tried as you said, but It still can not work.
> > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > /user/mesos/mesos-executor
> > > > > > > Did I do the right thing? Thanks very much!
> > > > > > >
> > > > > > > The log in jobtracker is:
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > > Task_Tracker_82 on http://slave1:31000
> > > >
> > >
> >
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map
> and reduce
> > > > > > > slots needed.
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > > > >       Pending Map Tasks: 2
> > > > > > >    Pending Reduce Tasks: 1
> > > > > > >          Idle Map Slots: 0
> > > > > > >       Idle Reduce Slots: 0
> > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > >        Needed Map Slots: 2
> > > > > > >     Needed Reduce Slots: 1
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > > Task_Tracker_83 on http://slave1:31000
> > > >
> > >
> >
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map
> and reduce
> > > > > > > slots needed.
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > > > >       Pending Map Tasks: 2
> > > > > > >    Pending Reduce Tasks: 1
> > > > > > >          Idle Map Slots: 0
> > > > > > >       Idle Reduce Slots: 0
> > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > >        Needed Map Slots: 2
> > > > > > >     Needed Reduce Slots: 1
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: Benjamin Mahler
> > > > > > > 发送时间: 2013-04-24 07:49
> > > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> >
> > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > > > > You need to instead upload the hadoop.tar.gz generated by the
> tutorial.
> > > >
> > >
> >
> > > > > > > Then point the conf file to the hdfs directory (you had the
> right idea,
> > > > > > > just uploaded the wrong file). :)
> > > > > > >
> > > > > > > Can you try that and report back?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > Guodong,
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > There still are problems with me, I think there are some
> problem with
> > > > > > my
> > > > > > > > executor setting.
> > > > > > > >
> > > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > > mesos-master-hostname)
> > > > > > > >   <property>
> > > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > > >   </property>
> > > > > > > >
> > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > > >
> > > > > > > > The head content is as follows:
> > > > > > > >
> > > > > > > > #! /bin/sh
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > # mesos-executor - temporary wrapper script for
> .libs/mesos-executor
> > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > > #
> > >
> >
> > > > > > > > # The mesos-executor program cannot be directly executed
> until all
> > > > > the
> > > > > > > > libtool
> > > > > > > > # libraries that it depends on are installed.
> > > > > > > > #
> > > > > > > > # This wrapper script should never be moved out of the build
> > > > > directory.
> > > > > > > > # If it is, it will not operate correctly.
> > > > > > > >
> > > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > > backslashifies
> > > >
> > >
> >
> > > > > > > > # metacharacters that are still active within double-quoted
> strings.
> > > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > > >
> > > > > > > > # Be Bourne compatible
> > > >
> > >
> >
> > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null
> 2>&1; then
> > > > > > > >   emulate sh
> > > > > > > >   NULLCMD=:
> > > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"},
> which
> > > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > > else
> > > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > > > fi
> > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > # The HP-UX ksh and POSIX shell print the target directory
> to stdout
> > > > > > > > # if CDPATH is set.
> > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > > >
> > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > >
> >
> > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || {
> LIBRARY_PATH=;
> > > > > > export
> >
> > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" ||
> unset
> > > >
> > >
> >
> > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; };
> }; { test
> > > > > > -z
> > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > > GCC_EXEC_PREFIX=;
> > >
> >
> > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z
> \"\${LD_RUN_PATH+set}\" ||
> > > > > > unset
> > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > > export LD_LIBRARY_PATH;
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > >
> > >
> >
> > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread
> -lcurl -lssl
> > > >
> > >
> >
> > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath
> -Wl,/home/mesos/build/src/.libs
> > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > > ...
> > > > > > > >
> > > > > > > >
> >
> > > > > > > > Did I upload the right file? and set up it in conf file
> correct?
> > > > > Thanks
> > > > > > > > very much!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > > From: 王国栋
> > > > > > > > Date: 2013-04-23 13:32
> > > > > > > > To: wangyu
> > > > > > > > CC: mesos-dev
> > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > TaskTracker: http://slave5:50060
> > > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > > > >
> >
> > > > > > > > if you run hadoop in local mode, use the following setting
> is ok
> > > > > > > >   <property>
> > > > > > > >     <name>mapred.mesos.master</name>
> > > > > > > >     <value>local</value>
> > > > > > > >   </property>
> > > > > > > >
> >
> > > > > > > > if you want to start the cluster. set mapred.mesos.master as
> the
> > > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > > >
> >
> > > > > > > > Make sure the dns parser result for mesos-master-hostname is
> the
> > > > > right
> > > > > > > ip.
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > BTW: when you starting the jobtracker, you can check mesos
> webUI and
> > > > > > > check
> > > > > > > > if there is hadoop framework registered.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > Guodong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > > wrote:
> > > > > > > >
> > > > > > > > > **
> > > > > > > > > Hi, Guodong,
> > > > > > > > >
> > > > > > > > > I start hadoop as you said, then I saw this error:
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from
> scheduler
> > > > > > > > driver: Cannot parse
> > > > > > > > > '@0.0.0.0:0'
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > What's this mean? where should I change MesosScheduler
> code to fix
> > > > > > > this?
> >
> > > > > > > > > Thanks very much! I am so sorry for interrupt you once
> again...
> > > > > > > > >
> > > > > > > > > The whole log is as follows:
> > > > > > > > >
> > > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > >
> /************************************************************
> > > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > > >
> > > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr
> 13
> > > > > > 11:19:33
> > > > > > > > CST 2013
> > > > > > > > >
> ************************************************************/
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded
> properties from
> > > > > > > > hadoop-metrics2.properties
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled
> snapshot
> > > > > > > period
> > > > > > > > at 10 second(s).
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker
> metrics
> > > > > > > system
> > > > > > > > started
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > QueueMetrics,q=default registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > ugi
> > > > > > > > registered.
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> >
> > > > > > > > Updating the current master key for generating delegation
> tokens
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > > Starting expired delegation token remover thread,
> > > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler
> configured with
> > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > > limitMaxMemForMapTasks,
> > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> >
> > > > > > > > Updating the current master key for generating delegation
> tokens
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing
> hosts
> > > > > > > > (include/exclude) list
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting
> jobtracker with
> > > > > > > owner
> > > > > > > > as root
> > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > RpcActivityForPort9001 registered.
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global
> filtersafety
> > > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> >
> > > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> > > > > > Opening
> > > > > > > > the listener on 50030
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer:
> listener.getLocalPort()
> > > > > > > returned
> >
> > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned
> 50030
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to
> port 50030
> > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > jvm
> > > > > > > > registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > JobTrackerMetrics registered.
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up
> at: 9001
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker
> webserver:
> > > > > 50030
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the
> system
> > > > > > > > directory
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server
> being
> > > > > > > > initialized in embedded mode
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started
> job history
> > > > > > > > server at: localhost:50030
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History
> Server web
> > > > > > > > address: localhost:50030
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore:
> Completed
> > > > > job
> > > > > > > > store is inactive
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > > MesosScheduler
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > > > information
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from
> scheduler
> > > > > > > > driver: Cannot parse '@
> > > > > > > > > 0.0.0.0:0'
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> includes
> > > > > > file
> > > > > > > to
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> excludes
> > > > > > file
> > > > > > > to
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing
> hosts
> > > > > > > > (include/exclude) list
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning
> 0 nodes
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder:
> starting
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on
> 9001:
> > > > > > > starting
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on
> 9001:
> > > > > > > starting
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to
> load
> > > >
> > >
> >
> > > > > > > > native-hadoop library for your platform... using
> builtin-java classes
> > > > > > > where
> > > > > > > > applicable
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
> job_201304231321_0001:
> > > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job
> job_201304231321_0001
> > > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > > >  IP=192.168.0.2
> > > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > > >  RESULT=SUCCESS
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken
> generated and
> > > > > > > > stored with users keys in
> >
> > > > > > > >
> /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size
> for job
> > > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > > job_201304231321_0001
> > > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > Wang Yu
> > > > > > > > >
> > > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > > >  Hi Yu,
> > > > > > > > >
> > >
> >
> > > > > > > > > Mesos will just launch tasktracker on each slave node as
> long as
> > > > > the
> > > >
> > >
> >
> > > > > > > > > required resource is enough for the tasktracker. So you
> have to run
> > > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > > >
> > > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > > > configure
> >
> > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different
> from the
> > > > > > normal
> > > > > > > > one.
> > >
> >
> > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you
> should
> >
> > > > > > > > > configure mapred-site.xml, this jobtracker should contains
> the
> > > > > patch
> > > > > > > for
> > > > > > > > > mesos)
> > > > > > > > >
> > >
> >
> > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to
> check the
> > > > > > > status
> > > > > > > > > of Jobtracker.
> > > > > > > > >
> > > > > > > > >  Guodong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know
> what's my
> > > > > > > > >> problem. Thanks very much!
> > > > > > > > >>
> > > >
> > >
> >
> > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like
> JobTracker,
> > > > > > > > >> DataNode) I should stop it first?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Wang Yu
> > > > > > > > >>
> > > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > > TaskTrackers,
> > > > > > > > >> meaning
> > > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > > >>
> > > >
> > >
> >
> > > > > > > > >> Are you starting your own TaskTrackers? Are there any
> TaskTrackers
> > > > > > > > running
> > > > > > > > >> in your cluster?
> > > > > > > > >>
> > > > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > > > running?
> > > > > > > > >> [root@master logs]# jps
> > > > > > > > >> 13896 RunJar
> > > > > > > > >> 14123 Jps
> > > > > > > > >> 12718 NameNode
> > > > > > > > >> 12900 DataNode
> > > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > > >> 13218 JobTracker
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > > >> >
> > > >
> > >
> >
> > > > > > > > >> > What do you mean "managing your own TaskTrackers"? How
> should I
> > > > > > know
> >
> > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do
> not
> > > > > > familiar
> > > > > > > > >> with
> > > > > > > > >> > mesos very much.
> > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > > core-site.xml
> > > > > > > > in
> > > >
> > >
> >
> > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just
> want to
> > > > > > set
> > > > > > > up
> > > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > > >> >
> > >
> >
> > > > > > > > >> > Thanks very much for your patient reply...Maybe I have
> a long
> > > > > way
> > > > > > to
> > > > > > > > >> go...
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > The log messages you see:
> > > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > > >> >
> > >
> >
> > > > > > > > >> > Are printed when mesos does not know about the
> TaskTracker. We
> > > > > > > > currently
> > > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > > MesosScheduler
> > > > > > > > will
> > > > > > > > >> > launch them on your behalf when needed.
> > > > > > > > >> >
> >
> > > > > > > > >> > Are you managing your own TaskTrackers? The purpose of
> using
> > > > > > Hadoop
> > > > > > > > with
> > > >
> > >
> >
> > > > > > > > >> > mesos is that you no longer have to do that. We will
> detect that
> > > > > > > jobs
> > > > > > > > >> have
> > > >
> > >
> >
> > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers
> accordingly.
> > > > > > > > >> >
> > > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > Wang Yu
> > > > > > > > >> >
> > > > > > > > >> > From: 王国栋
> > > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > >> > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > > > > > >> > You can check the slave log and the mesos-executor log,
> which is
> > > > > > > > >> normally
> > > > > > > > >> > located in the dir like
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > > >> > The log is tasktracker log.
> > > > > > > > >> >
> > > > > > > > >> > I hope it will help.
> > > > > > > > >> >
> > > > > > > > >> > Guodong
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > **
> > > > > > > > >> > > Hi All,
> > > > > > > > >> > >
> > >
> >
> > > > > > > > >> > > I have deployed mesos on three node: master, slave1,
> slave5.
> > > > > and
> > > > > > > it
> > > > > > > > >> works
> > > > > > > > >> > > well.
> >
> > > > > > > > >> > >  Then I set hadoop over it, using master as namenode,
> and
> > > > > > master,
> > > > > > > > >> slave1,
> > >
> >
> > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks
> works well.
> > > > > > > > >> > >  [root@master logs]# jps
> > > > > > > > >> > > 13896 RunJar
> > > > > > > > >> > > 14123 Jps
> > > > > > > > >> > > 12718 NameNode
> > > > > > > > >> > > 12900 DataNode
> > > > > > > > >> > > 13374 TaskTracker
> > > > > > > > >> > > 13218 JobTracker
> > > > > > > > >> > >
> > > > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > > > >> > >  [root@master
> > > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > > >> > > Running 30 maps.
> > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > > > >> > job_201304181646_0001
> >
> > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0%
> reduce 0%
> > > > > > > > >> > > It stopped here.
> > > > > > > > >> > >
> > >
> >
> > > > > > > > >> > > Then I read the log file:
> hadoop-root-jobtracker-master.log,
> > > > > it
> > > > > > > > shows:
> > > > > > > > >> > >  2013-04-18 16
> > > >
> > >
> >
> > > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker:
> Starting
> > > > > > > > RUNNING
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 5
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 6
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 9
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 7
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 8
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > >
> > >
> >
> > > > > > > > >> > > :46:52,557 INFO
> org.apache.hadoop.net.NetworkTopology: Adding
> > > > > a
> > > > > > > new
> > > > > > > > >> > node: /default-rack/master
> > > > > > > > >> > > 2013-04-18 16
> > >
> >
> > > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker:
> Adding
> > > > > > > tracker
> > > > > > > > >> > tracker_master:localhost/
> > > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:52,568 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:55,581 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:58,590 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :47:01,600 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > > >> Adding
> > > > > > > > >> > tracker tracker_slave5:
> > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Vinod Kone <vi...@gmail.com>.
What is your mesos/hdfs cluster setup? Also, can you paste your
mapred-site.xml?


On Sun, May 12, 2013 at 6:40 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> 1. I opened all the executor directory, but all of them are null. I do not
> know what happened to them...
> [root@slave1 logs]# cd
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> 总用量 0
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> .  ..
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks
> are always lost. But there is no error any more, I still do not know what
> happened to the executor...Logs on one slave is as follows. Please help me,
> thanks very much!
>
> mesos-slave.INFO
> Log file created at: 2013/05/13 09:12:54
> Running on machine: slave1
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by
> root
> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@
> 192.168.0.3:36668
> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24;
> mem=63356; ports=[31000-32000]; disk=29143
> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as
> cgroups hierarchy root
> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at
> master@192.168.0.2:5050
> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file
> '/home/mesos/build/logs/mesos-slave.INFO'
> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master
> detected at master@192.168.0.2:5050
> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given
> slave ID 201305130913-33597632-5050-3893-0
> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max
> allowed age: 5.11days
> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task
> Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at =
> 24552
> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task
> Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at =
> 24628
> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> after 1 attempts
> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor
> 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_0 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_0
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> for removal
> I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task
> Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_2
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at =
> 24704
> I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> after 1 attempts
> I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:40.481230 24196 slave.cpp:1479] Executor
> 'executor_Task_Tracker_1' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_1 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_1
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> for removal
> I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task
> Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
> I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_3
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at =
> 24780
> I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for
> unacknowledged status updates
> I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> after 1 attempts
> I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
> I0513 09:16:46.493391 24184 slave.cpp:1479] Executor
> 'executor_Task_Tracker_2' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update
> TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update
> resources for an unknown/killed executor
> I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received status
> update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating
> StatusUpdate stream for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling UPDATE
> for status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding
> status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000 to the master at
> master@192.168.0.2:5050
> I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status
> update for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received status
> update acknowledgement for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK
> for status update TASK_LOST from task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up
> status update stream for task Task_Tracker_2 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager
> successfully handled status update acknowledgement for task Task_Tracker_2
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
> for removal
> I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task
> Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file
> '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
> I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching
> executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in
> /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> with resources cpus=1; mem=1280 for framework
> 201305130913-33597632-5050-3893-0000 in cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup
> controls for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares'
> to 1024 for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated
> 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_4
> of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening
> for OOM events for executor executor_Task_Tracker_4 of framework
> 201305130913-33597632-5050-3893-0000
> I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at =
> 24856
> I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for
> unacknowledged status updates
> I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor
> executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> terminated with status 256
> I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor
> executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is
> triggered for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM
> notifier for executor executor_Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000 with uuid
> 22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> after 1 attempts
> I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed
> /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully
> destroyed cgroup
> mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
> I0513 09:16:52.496247 24203 slave.cpp:1479] Executor
> 'executor_Task_Tracker_3' of framework 201305130913-33597632-5050-3893-0000
> has exited with status 1
> I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update
> TASK_LOST from task Task_Tracker_3 of framework
> 201305130913-33597632-5050-3893-0000
> ......
>
>
>
>
> Wang Yu
>
> 发件人: Benjamin Mahler
> 发送时间: 2013-05-11 02:32
> 收件人: wangyu
> 抄送: Benjamin Mahler; mesos-dev
> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
> 1. If you look at a slave log, you can see that the process isolator
> launched the task and then notified the slave that it was lost. Can you
> look inside one of the executor directories, there should be an stderr file
> there. E.g.:
>
> I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
>
> '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'
>
> Look for these in the logs and read the stderr present inside. Can you
> report back with the contents?
>
> 2. Are you running on Linux? You may want to consider using
> --isolation=cgroups when starting your slaves. This uses linux control
> groups to do process / cpu / memory isolation between executors running on
> the slave.
>
> Thanks!
>
>
> On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
> > Hi Ben,
> >
> > Logs for mesos master and slaves are attached, thanks for helping me with
> > this problem. I am very appreciate for your patient reply.
> >
> > Three servers: "master", "slave1", "slave5"
> > Mesos master: "master"
> > Mesos slaves: "master", "slave1", "slave5"
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-10 07:22
> > *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> > *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<
> benjamin.mahler@gmail.com>
> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> > upload the mesos-master and mesos-slave logs instead? What will be
> > interesting here is what happened on the slave that is trying to run the
> > TaskTracker.
> >
> >
> > On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> >
> > > I have uploaded them in the former email, I will send them again. PS:
> Will
> > > the email list reject the attachements?
> > >
> > > Can you see them?
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-09 10:00
> > > *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> > > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >  Did you forget to attach them?
> > >
> > >
> > > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > **
> > > > OK.
> > > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > > > happened.
> > > >
> > > > Thanks very much for your help!
> > > >
> > > > ------------------------------
> > > > Wang Yu
> > > >
> > > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > > *发送时间:* 2013-05-09 01:23
> > > > *收件人:* mesos-dev@incubator.apache.org
> > > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > Hey Brenden, are there any bugs in particular here that you're
> referring to?
> > > >
> > > > Wang, can you provide the logs for the JobTracker, the slave, and the
> > > > master?
> > > >
> > > >
> > > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > > brenden.matthews@airbedandbreakfast.com> wrote:
> > > >
> > > > > You may want to try Airbnb's dist of Mesos:
> > > > >
> > > > > https://github.com/airbnb/mesos/tree/testing
> > > > >
> >
> > > > > A good number of these Mesos bugs have been fixed but aren't yet
> merged
> > > > > into upstream.
> > > > >
> > > > >
> > > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > > >
> > >
> >
> > > > > > The log on each slave of the lost task is : No executor found
> with ID:
> > > > > > executor_Task_Tracker_XXX.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: 王瑜
> > > > > > 发送时间: 2013-05-07 11:13
> > > > > > 收件人: mesos-dev
> > > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Hi all,
> > > > > >
> > >
> >
> > > > > > I have tried adding file extension when upload executor as well
> as the
> > > > > > conf file, but it still can not work.
> > > > > >
> > > > > > And I have seen
> > > > > >
> > > >
> > >
> >
> > > > >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > > but it is a null directory.
> > > > > >
> > > >
> > >
> >
> > > > > > Is there any other logs I can read to know why the TASK_LOST
> happened? I
> > > > > > really need your help, thanks very much!
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: Vinod Kone
> > > > > > 发送时间: 2013-04-26 01:31
> > > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > > 抄送: wangyu
> > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Also, you could look at the executor logs (default:
> > > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why
> the
> > > > > >  TASK_LOST happened.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > > benjamin.mahler@gmail.com> wrote:
> > > > > >
> > >
> >
> > > > > > Can you maintain the file extension? That is how mesos knows to
> extract
> > > > > it:
> > > > > > hadoop fs -copyFromLocal
> > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > /user/mesos/mesos-executor.tar.gz
> > > > > >
> > > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > Hi, Ben,
> > > > > > >
> > > > > > > I have tried as you said, but It still can not work.
> > > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > > /user/mesos/mesos-executor
> > > > > > > Did I do the right thing? Thanks very much!
> > > > > > >
> > > > > > > The log in jobtracker is:
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > > Task_Tracker_82 on http://slave1:31000
> > > >
> > >
> >
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map
> and reduce
> > > > > > > slots needed.
> > > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > > > >       Pending Map Tasks: 2
> > > > > > >    Pending Reduce Tasks: 1
> > > > > > >          Idle Map Slots: 0
> > > > > > >       Idle Reduce Slots: 0
> > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > >        Needed Map Slots: 2
> > > > > > >     Needed Reduce Slots: 1
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > > Task_Tracker_83 on http://slave1:31000
> > > >
> > >
> >
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map
> and reduce
> > > > > > > slots needed.
> > > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > > > >       Pending Map Tasks: 2
> > > > > > >    Pending Reduce Tasks: 1
> > > > > > >          Idle Map Slots: 0
> > > > > > >       Idle Reduce Slots: 0
> > > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > > >        Needed Map Slots: 2
> > > > > > >     Needed Reduce Slots: 1
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > 发件人: Benjamin Mahler
> > > > > > > 发送时间: 2013-04-24 07:49
> > > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> >
> > > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > > > > You need to instead upload the hadoop.tar.gz generated by the
> tutorial.
> > > >
> > >
> >
> > > > > > > Then point the conf file to the hdfs directory (you had the
> right idea,
> > > > > > > just uploaded the wrong file). :)
> > > > > > >
> > > > > > > Can you try that and report back?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > Guodong,
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > There still are problems with me, I think there are some
> problem with
> > > > > > my
> > > > > > > > executor setting.
> > > > > > > >
> > > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > > mesos-master-hostname)
> > > > > > > >   <property>
> > > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > > >   </property>
> > > > > > > >
> > > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > > >
> > > > > > > > The head content is as follows:
> > > > > > > >
> > > > > > > > #! /bin/sh
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > # mesos-executor - temporary wrapper script for
> .libs/mesos-executor
> > > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > > #
> > >
> >
> > > > > > > > # The mesos-executor program cannot be directly executed
> until all
> > > > > the
> > > > > > > > libtool
> > > > > > > > # libraries that it depends on are installed.
> > > > > > > > #
> > > > > > > > # This wrapper script should never be moved out of the build
> > > > > directory.
> > > > > > > > # If it is, it will not operate correctly.
> > > > > > > >
> > > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > > backslashifies
> > > >
> > >
> >
> > > > > > > > # metacharacters that are still active within double-quoted
> strings.
> > > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > > >
> > > > > > > > # Be Bourne compatible
> > > >
> > >
> >
> > > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null
> 2>&1; then
> > > > > > > >   emulate sh
> > > > > > > >   NULLCMD=:
> > > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"},
> which
> > > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > > else
> > > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > > > fi
> > > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > # The HP-UX ksh and POSIX shell print the target directory
> to stdout
> > > > > > > > # if CDPATH is set.
> > > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > > >
> > > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > >
> >
> > > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || {
> LIBRARY_PATH=;
> > > > > > export
> >
> > > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" ||
> unset
> > > >
> > >
> >
> > > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; };
> }; { test
> > > > > > -z
> > > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > > GCC_EXEC_PREFIX=;
> > >
> >
> > > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z
> \"\${LD_RUN_PATH+set}\" ||
> > > > > > unset
> > > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > > export LD_LIBRARY_PATH;
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > >
> > >
> >
> > > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread
> -lcurl -lssl
> > > >
> > >
> >
> > > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath
> -Wl,/home/mesos/build/src/.libs
> > > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > > ...
> > > > > > > >
> > > > > > > >
> >
> > > > > > > > Did I upload the right file? and set up it in conf file
> correct?
> > > > > Thanks
> > > > > > > > very much!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > > From: 王国栋
> > > > > > > > Date: 2013-04-23 13:32
> > > > > > > > To: wangyu
> > > > > > > > CC: mesos-dev
> > > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > TaskTracker: http://slave5:50060
> > > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > > > >
> >
> > > > > > > > if you run hadoop in local mode, use the following setting
> is ok
> > > > > > > >   <property>
> > > > > > > >     <name>mapred.mesos.master</name>
> > > > > > > >     <value>local</value>
> > > > > > > >   </property>
> > > > > > > >
> >
> > > > > > > > if you want to start the cluster. set mapred.mesos.master as
> the
> > > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > > >
> >
> > > > > > > > Make sure the dns parser result for mesos-master-hostname is
> the
> > > > > right
> > > > > > > ip.
> > > > > > > >
> > > >
> > >
> >
> > > > > > > > BTW: when you starting the jobtracker, you can check mesos
> webUI and
> > > > > > > check
> > > > > > > > if there is hadoop framework registered.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > Guodong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > > wrote:
> > > > > > > >
> > > > > > > > > **
> > > > > > > > > Hi, Guodong,
> > > > > > > > >
> > > > > > > > > I start hadoop as you said, then I saw this error:
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from
> scheduler
> > > > > > > > driver: Cannot parse
> > > > > > > > > '@0.0.0.0:0'
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > What's this mean? where should I change MesosScheduler
> code to fix
> > > > > > > this?
> >
> > > > > > > > > Thanks very much! I am so sorry for interrupt you once
> again...
> > > > > > > > >
> > > > > > > > > The whole log is as follows:
> > > > > > > > >
> > > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > >
> /************************************************************
> > > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > > >
> > > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr
> 13
> > > > > > 11:19:33
> > > > > > > > CST 2013
> > > > > > > > >
> ************************************************************/
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded
> properties from
> > > > > > > > hadoop-metrics2.properties
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled
> snapshot
> > > > > > > period
> > > > > > > > at 10 second(s).
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker
> metrics
> > > > > > > system
> > > > > > > > started
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > QueueMetrics,q=default registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > ugi
> > > > > > > > registered.
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> >
> > > > > > > > Updating the current master key for generating delegation
> tokens
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > > Starting expired delegation token remover thread,
> > > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler
> configured with
> > > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > > limitMaxMemForMapTasks,
> > > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO
> > > > > > delegation.AbstractDelegationTokenSecretManager:
> >
> > > > > > > > Updating the current master key for generating delegation
> tokens
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing
> hosts
> > > > > > > > (include/exclude) list
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting
> jobtracker with
> > > > > > > owner
> > > > > > > > as root
> > > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > RpcActivityForPort9001 registered.
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global
> filtersafety
> > > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> >
> > > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> > > > > > Opening
> > > > > > > > the listener on 50030
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer:
> listener.getLocalPort()
> > > > > > > returned
> >
> > > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned
> 50030
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to
> port 50030
> > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > jvm
> > > > > > > > registered.
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean
> for source
> > > > > > > > JobTrackerMetrics registered.
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up
> at: 9001
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker
> webserver:
> > > > > 50030
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the
> system
> > > > > > > > directory
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server
> being
> > > > > > > > initialized in embedded mode
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started
> job history
> > > > > > > > server at: localhost:50030
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History
> Server web
> > > > > > > > address: localhost:50030
> > > > > > > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore:
> Completed
> > > > > job
> > > > > > > > store is inactive
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > > MesosScheduler
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > > > information
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from
> scheduler
> > > > > > > > driver: Cannot parse '@
> > > > > > > > > 0.0.0.0:0'
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> includes
> > > > > > file
> > > > > > > to
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the
> excludes
> > > > > > file
> > > > > > > to
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing
> hosts
> > > > > > > > (include/exclude) list
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning
> 0 nodes
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder:
> starting
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on
> 9001:
> > > > > > > starting
> > > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on
> 9001:
> > > > > > > starting
> > >
> >
> > > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on
> 9001:
> > > > > > > starting
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to
> load
> > > >
> > >
> >
> > > > > > > > native-hadoop library for your platform... using
> builtin-java classes
> > > > > > > where
> > > > > > > > applicable
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress:
> job_201304231321_0001:
> > > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job
> job_201304231321_0001
> > > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > > >  IP=192.168.0.2
> > > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > > >  RESULT=SUCCESS
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > > job_201304231321_0001
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken
> generated and
> > > > > > > > stored with users keys in
> >
> > > > > > > >
> /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > > >
> >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size
> for job
> > > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > > >
> > > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > > job_201304231321_0001
> > > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > Wang Yu
> > > > > > > > >
> > > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > > >  Hi Yu,
> > > > > > > > >
> > >
> >
> > > > > > > > > Mesos will just launch tasktracker on each slave node as
> long as
> > > > > the
> > > >
> > >
> >
> > > > > > > > > required resource is enough for the tasktracker. So you
> have to run
> > > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > > >
> > > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > > > configure
> >
> > > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different
> from the
> > > > > > normal
> > > > > > > > one.
> > >
> >
> > > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you
> should
> >
> > > > > > > > > configure mapred-site.xml, this jobtracker should contains
> the
> > > > > patch
> > > > > > > for
> > > > > > > > > mesos)
> > > > > > > > >
> > >
> >
> > > > > > > > > Then, you can use mesos web UI and jobtracker web UI to
> check the
> > > > > > > status
> > > > > > > > > of Jobtracker.
> > > > > > > > >
> > > > > > > > >  Guodong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > > >
> > > >
> > >
> >
> > > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know
> what's my
> > > > > > > > >> problem. Thanks very much!
> > > > > > > > >>
> > > >
> > >
> >
> > > > > > > > >> ps: Besides TaskTracker, is there any other roles(like
> JobTracker,
> > > > > > > > >> DataNode) I should stop it first?
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Wang Yu
> > > > > > > > >>
> > > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > > TaskTrackers,
> > > > > > > > >> meaning
> > > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > > >>
> > > >
> > >
> >
> > > > > > > > >> Are you starting your own TaskTrackers? Are there any
> TaskTrackers
> > > > > > > > running
> > > > > > > > >> in your cluster?
> > > > > > > > >>
> > > > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > > > running?
> > > > > > > > >> [root@master logs]# jps
> > > > > > > > >> 13896 RunJar
> > > > > > > > >> 14123 Jps
> > > > > > > > >> 12718 NameNode
> > > > > > > > >> 12900 DataNode
> > > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > > >> 13218 JobTracker
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > > >> >
> > > >
> > >
> >
> > > > > > > > >> > What do you mean "managing your own TaskTrackers"? How
> should I
> > > > > > know
> >
> > > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do
> not
> > > > > > familiar
> > > > > > > > >> with
> > > > > > > > >> > mesos very much.
> > > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > > core-site.xml
> > > > > > > > in
> > > >
> > >
> >
> > > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just
> want to
> > > > > > set
> > > > > > > up
> > > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > > >> >
> > >
> >
> > > > > > > > >> > Thanks very much for your patient reply...Maybe I have
> a long
> > > > > way
> > > > > > to
> > > > > > > > >> go...
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > The log messages you see:
> > > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > > >> >
> > >
> >
> > > > > > > > >> > Are printed when mesos does not know about the
> TaskTracker. We
> > > > > > > > currently
> > > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > > MesosScheduler
> > > > > > > > will
> > > > > > > > >> > launch them on your behalf when needed.
> > > > > > > > >> >
> >
> > > > > > > > >> > Are you managing your own TaskTrackers? The purpose of
> using
> > > > > > Hadoop
> > > > > > > > with
> > > >
> > >
> >
> > > > > > > > >> > mesos is that you no longer have to do that. We will
> detect that
> > > > > > > jobs
> > > > > > > > >> have
> > > >
> > >
> >
> > > > > > > > >> > pending map / reduce tasks and launch TaskTrackers
> accordingly.
> > > > > > > > >> >
> > > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > Wang Yu
> > > > > > > > >> >
> > > > > > > > >> > From: 王国栋
> > > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited
> > > > > > > > >> > TaskTracker: http://slave5:50060
> > > >
> > >
> >
> > > > > > > > >> > You can check the slave log and the mesos-executor log,
> which is
> > > > > > > > >> normally
> > > > > > > > >> > located in the dir like
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> > > > >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > > >> > The log is tasktracker log.
> > > > > > > > >> >
> > > > > > > > >> > I hope it will help.
> > > > > > > > >> >
> > > > > > > > >> > Guodong
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> > wangyu@nfs.iscas.ac.cn
> > > >
> > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > **
> > > > > > > > >> > > Hi All,
> > > > > > > > >> > >
> > >
> >
> > > > > > > > >> > > I have deployed mesos on three node: master, slave1,
> slave5.
> > > > > and
> > > > > > > it
> > > > > > > > >> works
> > > > > > > > >> > > well.
> >
> > > > > > > > >> > >  Then I set hadoop over it, using master as namenode,
> and
> > > > > > master,
> > > > > > > > >> slave1,
> > >
> >
> > > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks
> works well.
> > > > > > > > >> > >  [root@master logs]# jps
> > > > > > > > >> > > 13896 RunJar
> > > > > > > > >> > > 14123 Jps
> > > > > > > > >> > > 12718 NameNode
> > > > > > > > >> > > 12900 DataNode
> > > > > > > > >> > > 13374 TaskTracker
> > > > > > > > >> > > 13218 JobTracker
> > > > > > > > >> > >
> > > > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > > > >> > >  [root@master
> > > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > > >> > > Running 30 maps.
> > > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > > > >> > job_201304181646_0001
> >
> > > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0%
> reduce 0%
> > > > > > > > >> > > It stopped here.
> > > > > > > > >> > >
> > >
> >
> > > > > > > > >> > > Then I read the log file:
> hadoop-root-jobtracker-master.log,
> > > > > it
> > > > > > > > shows:
> > > > > > > > >> > >  2013-04-18 16
> > > >
> > >
> >
> > > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker:
> Starting
> > > > > > > > RUNNING
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 5
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 6
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 9
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 7
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > > > handler 8
> > > > > > > > on
> > > > > > > > >> > 9001: starting
> > > > > > > > >> > > 2013-04-18 16
> > > >
> > >
> >
> > > > > > > > >> > > :46:52,557 INFO
> org.apache.hadoop.net.NetworkTopology: Adding
> > > > > a
> > > > > > > new
> > > > > > > > >> > node: /default-rack/master
> > > > > > > > >> > > 2013-04-18 16
> > >
> >
> > > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker:
> Adding
> > > > > > > tracker
> > > > > > > > >> > tracker_master:localhost/
> > > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:52,568 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:55,581 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :46:58,590 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > > 2013-04-18 16
> > > > > > > > >> > > :47:01,600 INFO
> org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> Unknown/exited
> > > > > > > > >> > TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > > >> Adding
> > > > > > > > >> > tracker tracker_slave5:
> > > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://slave5:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > > >> > > http://master:50060.
> > > > > > > > >> > >
> > > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>

Re: Re: Tasks always lost when running hadoop test!

Posted by Wang Yu <wa...@nfs.iscas.ac.cn>.
1. There is no log in directories like "/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c"
2. I use the git version, just download it using "git clone git://git.apache.org/mesos.git". This is what you told me before...

2013-05-15



Wang Yu



发件人:Vinod Kone <vi...@twitter.com>
发送时间:2013-05-15 23:14
主题:Re: Tasks always lost when running hadoop test!
收件人:"mesos-dev@incubator.apache.org"<me...@incubator.apache.org>
抄送:"mesos-dev"<me...@incubator.apache.org>,"Benjamin Mahler"<be...@gmail.com>

logs? Also what version of mesos? 

@vinodkone 
Sent from my mobile  

On May 15, 2013, at 12:00 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote: 

> Hi Ben, 
>  
> I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it. 
> Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing!  
>  
> But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help! 
>  
>  
>  
>  
> Wang Yu 
>  
> 发件人: 王瑜 
> 发送时间: 2013-05-15 11:04 
> 收件人: mesos-dev 
> 抄送: Benjamin Mahler 
> 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060 
> Hi, Ben, 
>  
> I have reworked the test, and checked log directory again, it is still null. The same as following. 
> I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following... 
> " Asked to update resources for an unknown/killed executor" why it always kill the executor? 
>  
> 1. I opened all the executor directory, but all of them are null. I do not know what happened to them... 
> [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c 
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls 
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l 
> 总用量 0 
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a 
> .  .. 
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# 
> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much! 
>  
> mesos-slave.INFO 
> Log file created at: 2013/05/13 09:12:54 
> Running on machine: slave1 
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 
> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator 
> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root 
> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave 
> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668 
> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143 
> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root 
> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050 
> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO' 
> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050 
> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator 
> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery 
> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0 
> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal 
> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal 
> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal 
> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal 
> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal 
> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal 
> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal 
> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 
> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552 
> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' 
> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' 
> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b 
> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280 
> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628 
> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256 
> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 after 1 attempts 
> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1 
> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager 
> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor 
> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050 
> I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 
> I0513 09:16:34.487547 24185 status_upda 

Re: /home/mesos/build/hadoop/hadoop-0.20.205.0/build.xml:666: The following error occurred while executing this line

Posted by Benjamin Mahler <be...@gmail.com>.
Hi Wang,

We will be releasing 0.12.0 shortly, which contains a completely new Hadoop
framework that we wrote awhile back. However, you'll want to keep in mind
the bundled Hadoop framework is not production vetted in 0.12.0.

Brenden Matthews has been been running it extensively since then and has
been a great help with fixing bugs and improving the framework since then.

Keep an eye out for the VOTE and release of 0.12.0!

Ben


On Sun, Jun 9, 2013 at 1:25 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> Hi all,
>
> When I compile the new stable version of mesos and deploy hadoop on it, it
> can not compile hadoop.tar.gz file for task excutor, the log is as follows,
> thanks very for helping me.
> It seems there are some problem with javac can not find symbol "."
>
> compile:
>      [echo] contrib: mesos
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build-contrib.xml:185:
> warning: 'includeantruntime' was not set, defaulting to
> build.sysclasspath=last; set to false for repeatable builds
>     [javac] Compiling 5 source files to
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build/contrib/mesos/classes
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:61:
> 错误: 找不到符号
>     [javac]       Class<?>[] instClasses =
> TaskTracker.getInstrumentationClasses(conf);
>     [javac]                                           ^
>     [javac]   符号:   方法 getInstrumentationClasses(JobConf)
>     [javac]   位置: 类 TaskTracker
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:136:
> 错误: 找不到符号
>     [javac]     if (task.extraData.equals("")) {
>     [javac]             ^
>     [javac]   符号:   变量 extraData
>     [javac]   位置: 类型为Task的变量 task
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:143:
> 错误: 找不到符号
>     [javac]       .setValue(task.extraData)
>     [javac]                     ^
>     [javac]   符号:   变量 extraData
>     [javac]   位置: 类型为Task的变量 task
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:176:
> 错误: 找不到符号
>     [javac]
> .setTaskId(TaskID.newBuilder().setValue(task.extraData).build())
>     [javac]                                                     ^
>     [javac]   符号:   变量 extraData
>     [javac]   位置: 类型为Task的变量 task
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:143:
> 错误: jobTracker可以在MesosScheduler中访问private
>     [javac]     this.jobTracker = mesosSched.jobTracker;
>     [javac]                                 ^
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:557:
> 错误: 找不到符号
>     [javac]                 task.extraData = "" + nt.mesosId.getValue();
>     [javac]                     ^
>     [javac]   符号:   变量 extraData
>     [javac]   位置: 类型为Task的变量 task
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:572:
> 错误: 找不到符号
>     [javac]                 task.extraData = "" + nt.mesosId.getValue();
>     [javac]                     ^
>     [javac]   符号:   变量 extraData
>     [javac]   位置: 类型为Task的变量 task
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:725:
> 错误: 找不到符号
>     [javac]       int maxLevel = job.getMaxCacheLevel();
>     [javac]                         ^
>     [javac]   符号:   方法 getMaxCacheLevel()
>     [javac]   位置: 类型为JobInProgress的变量 job
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java:545:
> 错误: 找不到符号
>     [javac]                     .setName("Hadoop TaskTracker")
>     [javac]                     ^
>     [javac]   符号:   方法 setName(String)
>     [javac]   位置: 类 Builder
>     [javac]
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/MesosTaskTrackerInstrumentation.java:24:
> 错误: 方法不会覆盖或实现超类型的方法
>     [javac]   @Override
>     [javac]   ^
>     [javac] 注:
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java使用或覆盖了已过时的
> API。
>     [javac] 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
>     [javac] 10 个错误
>
> BUILD FAILED
> /home/mesos/build/hadoop/hadoop-0.20.205.0/build.xml:666: The following
> error occurred while executing this line:
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build.xml:30: The
> following error occurred while executing this line:
> /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build-contrib.xml:185:
> Compile failed; see the compiler error output for details.
>
> Total time: 24 seconds
>
> Oh no! We failed to run 'ant -Dversion=0.20.205.0 compile bin-package'. If
> you need help try emailing:
>
>   mesos-dev@incubator.apache.org
>
> (Remember to include as much debug information as possible.)
>
>
>
>
> Wang Yu

Re: Re: Tasks always lost when running hadoop test!

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi all,

Here more strange thing happened, when I run spark on the mesos, the task lost just like hadoop.  What's wrong with my mesos, do you have any suggestions?

[root@master spark]# ./run spark.examples.SparkPi master:5050
13/05/16 10:43:17 INFO spark.BoundedMemoryCache: BoundedMemoryCache.maxBytes = 6791445872
13/05/16 10:43:17 INFO spark.CacheTrackerActor: Registered actor on port 7077
13/05/16 10:43:17 INFO spark.CacheTrackerActor: Started slave cache (size 6.3GB) on master
13/05/16 10:43:17 INFO spark.MapOutputTrackerActor: Registered actor on port 7077
13/05/16 10:43:17 INFO spark.ShuffleManager: Shuffle dir: /tmp/spark-local-79237de3-efea-48d6-bb13-c32d98c1d7ec/shuffle
13/05/16 10:43:17 INFO server.Server: jetty-7.5.3.v20111011
13/05/16 10:43:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:44208 STARTING
13/05/16 10:43:17 INFO spark.ShuffleManager: Local URI: http://192.168.0.2:44208
13/05/16 10:43:17 INFO server.Server: jetty-7.5.3.v20111011
13/05/16 10:43:17 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:51063 STARTING
13/05/16 10:43:17 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.0.2:51063
13/05/16 10:43:17 INFO spark.MesosScheduler: Registered as framework ID 201305151124-33597632-5050-20218-0001
13/05/16 10:43:17 INFO spark.SparkContext: Starting job...
13/05/16 10:43:17 INFO spark.CacheTracker: Registering RDD ID 1 with cache
13/05/16 10:43:17 INFO spark.CacheTrackerActor: Registering RDD 1 with 2 partitions
13/05/16 10:43:17 INFO spark.CacheTracker: Registering RDD ID 0 with cache
13/05/16 10:43:17 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
13/05/16 10:43:17 INFO spark.CacheTrackerActor: Asked for current cache locations
13/05/16 10:43:17 INFO spark.MesosScheduler: Final stage: Stage 0
13/05/16 10:43:17 INFO spark.MesosScheduler: Parents of final stage: List()
13/05/16 10:43:17 INFO spark.MesosScheduler: Missing parents: List()
13/05/16 10:43:17 INFO spark.MesosScheduler: Submitting Stage 0, which has no missing parents
13/05/16 10:43:17 INFO spark.MesosScheduler: Got a job with 2 tasks
13/05/16 10:43:17 INFO spark.MesosScheduler: Adding job with ID 0
13/05/16 10:43:17 INFO spark.SimpleJob: Starting task 0:0 as TID 0 on slave 201305151124-33597632-5050-20218-1: slave5 (preferred)
13/05/16 10:43:17 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 72 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:17 INFO spark.SimpleJob: Starting task 0:1 as TID 1 on slave 201305151124-33597632-5050-20218-0: slave1 (preferred)
13/05/16 10:43:17 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:18 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
13/05/16 10:43:18 INFO spark.SimpleJob: Starting task 0:1 as TID 2 on slave 201305151124-33597632-5050-20218-0: slave1 (preferred)
13/05/16 10:43:18 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:18 INFO spark.SimpleJob: Lost TID 0 (task 0:0)
13/05/16 10:43:19 INFO spark.SimpleJob: Lost TID 2 (task 0:1)
13/05/16 10:43:19 INFO spark.SimpleJob: Starting task 0:1 as TID 3 on slave 201305151124-33597632-5050-20218-1: slave5 (preferred)
13/05/16 10:43:19 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:19 INFO spark.SimpleJob: Starting task 0:0 as TID 4 on slave 201305151124-33597632-5050-20218-0: slave1 (preferred)
13/05/16 10:43:19 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 2 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:19 INFO spark.SimpleJob: Lost TID 3 (task 0:1)
13/05/16 10:43:20 INFO spark.SimpleJob: Lost TID 4 (task 0:0)
13/05/16 10:43:20 INFO spark.SimpleJob: Starting task 0:0 as TID 5 on slave 201305151124-33597632-5050-20218-1: slave5 (preferred)
13/05/16 10:43:20 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 2 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:20 INFO spark.SimpleJob: Starting task 0:1 as TID 6 on slave 201305151124-33597632-5050-20218-0: slave1 (preferred)
13/05/16 10:43:20 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:20 INFO spark.SimpleJob: Lost TID 5 (task 0:0)
13/05/16 10:43:21 INFO spark.SimpleJob: Lost TID 6 (task 0:1)
13/05/16 10:43:21 INFO spark.SimpleJob: Starting task 0:1 as TID 7 on slave 201305151124-33597632-5050-20218-1: slave5 (preferred)
13/05/16 10:43:21 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:21 INFO spark.SimpleJob: Starting task 0:0 as TID 8 on slave 201305151124-33597632-5050-20218-0: slave1 (preferred)
13/05/16 10:43:21 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 2 ms to serialize by spark.JavaSerializerInstance
13/05/16 10:43:21 INFO spark.SimpleJob: Lost TID 7 (task 0:1)
13/05/16 10:43:21 ERROR spark.SimpleJob: Task 0:1 failed more than 4 times; aborting job
13/05/16 10:43:22 INFO spark.MesosScheduler: Ignoring update from TID 8 because its job is gone




Wang Yu

From: Vinod Kone
Date: 2013-05-15 23:45
To: Wang Yu
CC: mesos-dev@incubator.apache.org; Benjamin Mahler
Subject: Re: Tasks always lost when running hadoop test!
What is the git sha of your HEAD?

Also can you post the scheduler/master/slave logs?

@vinodkone
Sent from my mobile 

On May 15, 2013, at 8:22 AM, "Wang Yu" <wa...@nfs.iscas.ac.cn> wrote:

> 1. There is no log in directories like "/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c"
> 2. I use the git version, just download it using "git clone git://git.apache.org/mesos.git". This is what you told me before...<1.gif>
>  
> 2013-05-15
> Wang Yu
> 发件人:Vinod Kone <vi...@twitter.com>
> 发送时间:2013-05-15 23:14
> 主题:Re: Tasks always lost when running hadoop test!
> 收件人:"mesos-dev@incubator.apache.org"<me...@incubator.apache.org>
> 抄送:"mesos-dev"<me...@incubator.apache.org>,"Benjamin Mahler"<be...@gmail.com>
>  
> logs? Also what version of mesos? 
>  
> @vinodkone 
> Sent from my mobile  
>  
> On May 15, 2013, at 12:00 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote: 
>  
> > Hi Ben, 
> >  
> > I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it. 
> > Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing!  
> >  
> > But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help! 
> >  
> >  
> >  
> >  
> > Wang Yu 
> >  
> > 发件人: 王瑜 
> > 发送时间: 2013-05-15 11:04 
> > 收件人: mesos-dev 
> > 抄送: Benjamin Mahler 
> > 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060 
> > Hi, Ben, 
> >  
> > I have reworked the test, and checked log directory again, it is still null. The same as following. 
> > I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following... 
> > " Asked to update resources for an unknown/killed executor" why it always kill the executor? 
> >  
> > 1. I opened all the executor directory, but all of them are null. I do not know what happened to them... 
> > [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l 
> > 总用量 0 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a 
> > .  .. 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# 
> > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much! 
> >  
> > mesos-slave.INFO 
> > Log file created at: 2013/05/13 09:12:54 
> > Running on machine: slave1 
> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 
> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator 
> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root 
> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave 
> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668 
> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143 
> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root 
> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO' 
> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator 
> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery 
> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0 
> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal 
> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal 
> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal 
> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal 
> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal 
> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal 
> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal 
> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000 
> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> > I0513&nbs

/home/mesos/build/hadoop/hadoop-0.20.205.0/build.xml:666: The following error occurred while executing this line

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi all,

When I compile the new stable version of mesos and deploy hadoop on it, it can not compile hadoop.tar.gz file for task excutor, the log is as follows, thanks very for helping me.
It seems there are some problem with javac can not find symbol "."

compile:
     [echo] contrib: mesos
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build-contrib.xml:185: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
    [javac] Compiling 5 source files to /home/mesos/build/hadoop/hadoop-0.20.205.0/build/contrib/mesos/classes
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:61: 错误: 找不到符号
    [javac]       Class<?>[] instClasses = TaskTracker.getInstrumentationClasses(conf);
    [javac]                                           ^
    [javac]   符号:   方法 getInstrumentationClasses(JobConf)
    [javac]   位置: 类 TaskTracker
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:136: 错误: 找不到符号
    [javac]     if (task.extraData.equals("")) {
    [javac]             ^
    [javac]   符号:   变量 extraData
    [javac]   位置: 类型为Task的变量 task
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:143: 错误: 找不到符号
    [javac]       .setValue(task.extraData)
    [javac]                     ^
    [javac]   符号:   变量 extraData
    [javac]   位置: 类型为Task的变量 task
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkExecutor.java:176: 错误: 找不到符号
    [javac]         .setTaskId(TaskID.newBuilder().setValue(task.extraData).build())
    [javac]                                                     ^
    [javac]   符号:   变量 extraData
    [javac]   位置: 类型为Task的变量 task
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:143: 错误: jobTracker可以在MesosScheduler中访问private
    [javac]     this.jobTracker = mesosSched.jobTracker;
    [javac]                                 ^
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:557: 错误: 找不到符号
    [javac]                 task.extraData = "" + nt.mesosId.getValue();
    [javac]                     ^
    [javac]   符号:   变量 extraData
    [javac]   位置: 类型为Task的变量 task
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:572: 错误: 找不到符号
    [javac]                 task.extraData = "" + nt.mesosId.getValue();
    [javac]                     ^
    [javac]   符号:   变量 extraData
    [javac]   位置: 类型为Task的变量 task
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java:725: 错误: 找不到符号
    [javac]       int maxLevel = job.getMaxCacheLevel();
    [javac]                         ^
    [javac]   符号:   方法 getMaxCacheLevel()
    [javac]   位置: 类型为JobInProgress的变量 job
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/MesosScheduler.java:545: 错误: 找不到符号
    [javac]                     .setName("Hadoop TaskTracker")
    [javac]                     ^
    [javac]   符号:   方法 setName(String)
    [javac]   位置: 类 Builder
    [javac] /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/MesosTaskTrackerInstrumentation.java:24: 错误: 方法不会覆盖或实现超类型的方法
    [javac]   @Override
    [javac]   ^
    [javac] 注: /home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/mesos/src/java/org/apache/hadoop/mapred/FrameworkScheduler.java使用或覆盖了已过时的 API。
    [javac] 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
    [javac] 10 个错误

BUILD FAILED
/home/mesos/build/hadoop/hadoop-0.20.205.0/build.xml:666: The following error occurred while executing this line:
/home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build.xml:30: The following error occurred while executing this line:
/home/mesos/build/hadoop/hadoop-0.20.205.0/src/contrib/build-contrib.xml:185: Compile failed; see the compiler error output for details.

Total time: 24 seconds

Oh no! We failed to run 'ant -Dversion=0.20.205.0 compile bin-package'. If you need help try emailing:

  mesos-dev@incubator.apache.org

(Remember to include as much debug information as possible.)




Wang Yu

Re: Re: Tasks always lost when running hadoop test!

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi Vinod,
The mesos version is 0.13.0. And the logs for master and slave is attached, can you get them?
Where should I get scheduler logs? 

Thanks very much for your help!





Wang Yu

From: Vinod Kone
Date: 2013-05-15 23:45
To: Wang Yu
CC: mesos-dev@incubator.apache.org; Benjamin Mahler
Subject: Re: Tasks always lost when running hadoop test!
What is the git sha of your HEAD?

Also can you post the scheduler/master/slave logs?

@vinodkone
Sent from my mobile 

On May 15, 2013, at 8:22 AM, "Wang Yu" <wa...@nfs.iscas.ac.cn> wrote:

> 1. There is no log in directories like "/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c"
> 2. I use the git version, just download it using "git clone git://git.apache.org/mesos.git". This is what you told me before...<1.gif>
>  
> 2013-05-15
> Wang Yu
> 发件人:Vinod Kone <vi...@twitter.com>
> 发送时间:2013-05-15 23:14
> 主题:Re: Tasks always lost when running hadoop test!
> 收件人:"mesos-dev@incubator.apache.org"<me...@incubator.apache.org>
> 抄送:"mesos-dev"<me...@incubator.apache.org>,"Benjamin Mahler"<be...@gmail.com>
>  
> logs? Also what version of mesos? 
>  
> @vinodkone 
> Sent from my mobile  
>  
> On May 15, 2013, at 12:00 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote: 
>  
> > Hi Ben, 
> >  
> > I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it. 
> > Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing!  
> >  
> > But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help! 
> >  
> >  
> >  
> >  
> > Wang Yu 
> >  
> > 发件人: 王瑜 
> > 发送时间: 2013-05-15 11:04 
> > 收件人: mesos-dev 
> > 抄送: Benjamin Mahler 
> > 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060 
> > Hi, Ben, 
> >  
> > I have reworked the test, and checked log directory again, it is still null. The same as following. 
> > I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following... 
> > " Asked to update resources for an unknown/killed executor" why it always kill the executor? 
> >  
> > 1. I opened all the executor directory, but all of them are null. I do not know what happened to them... 
> > [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l 
> > 总用量 0 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a 
> > .  .. 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# 
> > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much! 
> >  
> > mesos-slave.INFO 
> > Log file created at: 2013/05/13 09:12:54 
> > Running on machine: slave1 
> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 
> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator 
> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root 
> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave 
> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668 
> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143 
> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root 
> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO' 
> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator 
> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery 
> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0 
> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal 
> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal 
> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal 
> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal 
> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal 
> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal 
> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal 
> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000 
> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> > I0513&nbs

Re: Tasks always lost when running hadoop test!

Posted by Vinod Kone <vi...@gmail.com>.
What is the git sha of your HEAD?

Also can you post the scheduler/master/slave logs?

@vinodkone
Sent from my mobile 

On May 15, 2013, at 8:22 AM, "Wang Yu" <wa...@nfs.iscas.ac.cn> wrote:

> 1. There is no log in directories like "/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c"
> 2. I use the git version, just download it using "git clone git://git.apache.org/mesos.git". This is what you told me before...<1.gif>
>  
> 2013-05-15
> Wang Yu
> 发件人:Vinod Kone <vi...@twitter.com>
> 发送时间:2013-05-15 23:14
> 主题:Re: Tasks always lost when running hadoop test!
> 收件人:"mesos-dev@incubator.apache.org"<me...@incubator.apache.org>
> 抄送:"mesos-dev"<me...@incubator.apache.org>,"Benjamin Mahler"<be...@gmail.com>
>  
> logs? Also what version of mesos? 
>  
> @vinodkone 
> Sent from my mobile  
>  
> On May 15, 2013, at 12:00 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote: 
>  
> > Hi Ben, 
> >  
> > I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it. 
> > Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing!  
> >  
> > But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help! 
> >  
> >  
> >  
> >  
> > Wang Yu 
> >  
> > 发件人: 王瑜 
> > 发送时间: 2013-05-15 11:04 
> > 收件人: mesos-dev 
> > 抄送: Benjamin Mahler 
> > 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060 
> > Hi, Ben, 
> >  
> > I have reworked the test, and checked log directory again, it is still null. The same as following. 
> > I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following... 
> > " Asked to update resources for an unknown/killed executor" why it always kill the executor? 
> >  
> > 1. I opened all the executor directory, but all of them are null. I do not know what happened to them... 
> > [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l 
> > 总用量 0 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a 
> > .  .. 
> > [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# 
> > 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much! 
> >  
> > mesos-slave.INFO 
> > Log file created at: 2013/05/13 09:12:54 
> > Running on machine: slave1 
> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 
> > I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator 
> > I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root 
> > I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave 
> > I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668 
> > I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143 
> > I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root 
> > I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO' 
> > I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050 
> > I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator 
> > I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery 
> > I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0 
> > I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal 
> > I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal 
> > I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal 
> > I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal 
> > I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal 
> > I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal 
> > I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal 
> > I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days 
> > I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000 
> > I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' 
> > I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 
> > I0513&nbs

Re: Tasks always lost when running hadoop test!

Posted by Vinod Kone <vi...@twitter.com>.
logs? Also what version of mesos?

@vinodkone
Sent from my mobile 

On May 15, 2013, at 12:00 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> Hi Ben,
> 
> I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it.
> Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing! 
> 
> But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help!
> 
> 
> 
> 
> Wang Yu
> 
> 发件人: 王瑜
> 发送时间: 2013-05-15 11:04
> 收件人: mesos-dev
> 抄送: Benjamin Mahler
> 主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
> Hi, Ben,
> 
> I have reworked the test, and checked log directory again, it is still null. The same as following.
> I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following...
> " Asked to update resources for an unknown/killed executor" why it always kill the executor?
> 
> 1. I opened all the executor directory, but all of them are null. I do not know what happened to them...
> [root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
> 总用量 0
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
> .  ..
> [root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
> 2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much!
> 
> mesos-slave.INFO
> Log file created at: 2013/05/13 09:12:54
> Running on machine: slave1
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
> I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root
> I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
> I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668
> I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143
> I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root
> I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050
> I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO'
> I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050
> I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
> I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
> I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0
> I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
> I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
> I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
> I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
> I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
> I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
> I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
> I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
> I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
> I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
> I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
> I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552
> I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
> I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
> I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
> I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628
> I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
> I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 after 1 attempts
> I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
> I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
> I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
> I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
> I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
> I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
> I0513 09:16:34.487547 24185 status_upda

Tasks always lost when running hadoop test!

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi Ben,

I think the problem is mesos have found the executor on hdfs://master/user/mesos/hadoop.tar.gz, but it did not download it, so did not use it.
Mesos found the executor, so it did not output error, just update the task status as lost; but mesos did not use the executor, so the executor directory contains nothing! 

But I am not very familiar with source code, so I do not know why mesos can not use the executor. And I also do not know whether my analysis is right. Thanks very much for your help!




Wang Yu

发件人: 王瑜
发送时间: 2013-05-15 11:04
收件人: mesos-dev
抄送: Benjamin Mahler
主题: 回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
Hi, Ben,

I have reworked the test, and checked log directory again, it is still null. The same as following.
I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following...
" Asked to update resources for an unknown/killed executor" why it always kill the executor?

1. I opened all the executor directory, but all of them are null. I do not know what happened to them...
[root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
总用量 0
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
.  ..
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much!

mesos-slave.INFO
Log file created at: 2013/05/13 09:12:54
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root
I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668
I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143
I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root
I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050
I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO'
I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050
I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0
I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552
I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628
I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 after 1 attempts
I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' for removal
I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at = 24704
I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b after 1 attempts
I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.481230 24196 slave.cpp:1479] Executor 'executor_Task_Tracker_1' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' for removal
I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at = 24780
I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27 after 1 attempts
I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493391 24184 slave.cpp:1479] Executor 'executor_Task_Tracker_2' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27' for removal
I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at = 24856
I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642 after 1 attempts
I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496247 24203 slave.cpp:1479] Executor 'executor_Task_Tracker_3' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
......




Wang Yu

回复: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi, Ben,

I have reworked the test, and checked log directory again, it is still null. The same as following.
I think there is the problem with my executor, but I do not know how to let the executor works. Logs is as following...
" Asked to update resources for an unknown/killed executor" why it always kill the executor?

1. I opened all the executor directory, but all of them are null. I do not know what happened to them...
[root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
总用量 0
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
.  ..
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much!

mesos-slave.INFO
Log file created at: 2013/05/13 09:12:54
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root
I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668
I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143
I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root
I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050
I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO'
I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050
I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0
I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552
I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628
I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 after 1 attempts
I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' for removal
I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at = 24704
I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b after 1 attempts
I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.481230 24196 slave.cpp:1479] Executor 'executor_Task_Tracker_1' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' for removal
I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at = 24780
I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27 after 1 attempts
I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493391 24184 slave.cpp:1479] Executor 'executor_Task_Tracker_2' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27' for removal
I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at = 24856
I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642 after 1 attempts
I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496247 24203 slave.cpp:1479] Executor 'executor_Task_Tracker_3' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
......




Wang Yu

发件人: Benjamin Mahler
发送时间: 2013-05-11 02:32
收件人: wangyu
抄送: Benjamin Mahler; mesos-dev
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
1. If you look at a slave log, you can see that the process isolator
launched the task and then notified the slave that it was lost. Can you
look inside one of the executor directories, there should be an stderr file
there. E.g.:

I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
'/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'

Look for these in the logs and read the stderr present inside. Can you
report back with the contents?

2. Are you running on Linux? You may want to consider using
--isolation=cgroups when starting your slaves. This uses linux control
groups to do process / cpu / memory isolation between executors running on
the slave.

Thanks!


On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> Hi Ben,
>
> Logs for mesos master and slaves are attached, thanks for helping me with
> this problem. I am very appreciate for your patient reply.
>
> Three servers: "master", "slave1", "slave5"
> Mesos master: "master"
> Mesos slaves: "master", "slave1", "slave5"
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-10 07:22
> *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<be...@gmail.com>
> *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> upload the mesos-master and mesos-slave logs instead? What will be
> interesting here is what happened on the slave that is trying to run the
> TaskTracker.
>
>
> On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
>
> > I have uploaded them in the former email, I will send them again. PS: Will
> > the email list reject the attachements?
> >
> > Can you see them?
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-09 10:00
> > *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >  Did you forget to attach them?
> >
> >
> > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> > > OK.
> > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > > happened.
> > >
> > > Thanks very much for your help!
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-09 01:23
> > > *收件人:* mesos-dev@incubator.apache.org
> > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > Hey Brenden, are there any bugs in particular here that you're referring to?
> > >
> > > Wang, can you provide the logs for the JobTracker, the slave, and the
> > > master?
> > >
> > >
> > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > brenden.matthews@airbedandbreakfast.com> wrote:
> > >
> > > > You may want to try Airbnb's dist of Mesos:
> > > >
> > > > https://github.com/airbnb/mesos/tree/testing
> > > >
>
> > > > A good number of these Mesos bugs have been fixed but aren't yet merged
> > > > into upstream.
> > > >
> > > >
> > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> >
>
> > > > > The log on each slave of the lost task is : No executor found with ID:
> > > > > executor_Task_Tracker_XXX.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: 王瑜
> > > > > 发送时间: 2013-05-07 11:13
> > > > > 收件人: mesos-dev
> > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Hi all,
> > > > >
> >
>
> > > > > I have tried adding file extension when upload executor as well as the
> > > > > conf file, but it still can not work.
> > > > >
> > > > > And I have seen
> > > > >
> > >
> >
>
> > > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > but it is a null directory.
> > > > >
> > >
> >
>
> > > > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > > > really need your help, thanks very much!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: Vinod Kone
> > > > > 发送时间: 2013-04-26 01:31
> > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > 抄送: wangyu
> > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Also, you could look at the executor logs (default:
> > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > > > >  TASK_LOST happened.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > benjamin.mahler@gmail.com> wrote:
> > > > >
> >
>
> > > > > Can you maintain the file extension? That is how mesos knows to extract
> > > > it:
> > > > > hadoop fs -copyFromLocal
> > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > /user/mesos/mesos-executor.tar.gz
> > > > >
> > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > Hi, Ben,
> > > > > >
> > > > > > I have tried as you said, but It still can not work.
> > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > /user/mesos/mesos-executor
> > > > > > Did I do the right thing? Thanks very much!
> > > > > >
> > > > > > The log in jobtracker is:
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_82 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_83 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: Benjamin Mahler
> > > > > > 发送时间: 2013-04-24 07:49
> > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
>
> > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > >
> >
>
> > > > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > > > just uploaded the wrong file). :)
> > > > > >
> > > > > > Can you try that and report back?
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > Guodong,
> > > > > > >
> > >
> >
>
> > > > > > > There still are problems with me, I think there are some problem with
> > > > > my
> > > > > > > executor setting.
> > > > > > >
> > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > mesos-master-hostname)
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > >   </property>
> > > > > > >
> > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > >
> > > > > > > The head content is as follows:
> > > > > > >
> > > > > > > #! /bin/sh
> > > > > > >
> > >
> >
>
> > > > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > #
> >
>
> > > > > > > # The mesos-executor program cannot be directly executed until all
> > > > the
> > > > > > > libtool
> > > > > > > # libraries that it depends on are installed.
> > > > > > > #
> > > > > > > # This wrapper script should never be moved out of the build
> > > > directory.
> > > > > > > # If it is, it will not operate correctly.
> > > > > > >
> > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > backslashifies
> > >
> >
>
> > > > > > > # metacharacters that are still active within double-quoted strings.
> > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > >
> > > > > > > # Be Bourne compatible
> > >
> >
>
> > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > > > >   emulate sh
> > > > > > >   NULLCMD=:
> > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > else
> > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > > fi
> > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > >
> > >
> >
>
> > > > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > > > # if CDPATH is set.
> > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > >
> > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> >
>
> > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > > > export
>
> > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > >
> >
>
> > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > > > -z
> > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > GCC_EXEC_PREFIX=;
> >
>
> > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > > > unset
> > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > export LD_LIBRARY_PATH;
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > >
> >
>
> > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > >
> >
>
> > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > ...
> > > > > > >
> > > > > > >
>
> > > > > > > Did I upload the right file? and set up it in conf file correct?
> > > > Thanks
> > > > > > > very much!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > From: 王国栋
> > > > > > > Date: 2013-04-23 13:32
> > > > > > > To: wangyu
> > > > > > > CC: mesos-dev
> > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > > >
>
> > > > > > > if you run hadoop in local mode, use the following setting is ok
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.master</name>
> > > > > > >     <value>local</value>
> > > > > > >   </property>
> > > > > > >
>
> > > > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > >
>
> > > > > > > Make sure the dns parser result for mesos-master-hostname is the
> > > > right
> > > > > > ip.
> > > > > > >
> > >
> >
>
> > > > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > > > check
> > > > > > > if there is hadoop framework registered.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > **
> > > > > > > > Hi, Guodong,
> > > > > > > >
> > > > > > > > I start hadoop as you said, then I saw this error:
> > >
> >
>
> > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse
> > > > > > > > '@0.0.0.0:0'
> > > > > > > >
> > >
> >
>
> > > > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > > > this?
>
> > > > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > > > >
> > > > > > > > The whole log is as follows:
> > > > > > > >
> > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > /************************************************************
> > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > >
> > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > > > 11:19:33
> > > > > > > CST 2013
> > > > > > > > ************************************************************/
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > > > hadoop-metrics2.properties
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > > > period
> > > > > > > at 10 second(s).
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > > > system
> > > > > > > started
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > QueueMetrics,q=default registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > ugi
> > > > > > > registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > Starting expired delegation token remover thread,
> > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > limitMaxMemForMapTasks,
> > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > > > owner
> > > > > > > as root
> > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcActivityForPort9001 registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
>
> > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > > > Opening
> > > > > > > the listener on 50030
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > > > returned
>
> > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > jvm
> > > > > > > registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > JobTrackerMetrics registered.
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > > > 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > > > directory
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > > > initialized in embedded mode
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > > > server at: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > > > address: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > > > job
> > > > > > > store is inactive
> > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > MesosScheduler
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > > information
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse '@
> > > > > > > > 0.0.0.0:0'
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > > > file
> > > > > > to
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > > > file
> > > > > > to
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > > > starting
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > > > starting
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > >
> >
>
> > > > > > > native-hadoop library for your platform... using builtin-java classes
> > > > > > where
> > > > > > > applicable
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > >  IP=192.168.0.2
> > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > >  RESULT=SUCCESS
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > > > stored with users keys in
>
> > > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > job_201304231321_0001
> > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > >  Hi Yu,
> > > > > > > >
> >
>
> > > > > > > > Mesos will just launch tasktracker on each slave node as long as
> > > > the
> > >
> >
>
> > > > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > >
> > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > > configure
>
> > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > > > normal
> > > > > > > one.
> >
>
> > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
>
> > > > > > > > configure mapred-site.xml, this jobtracker should contains the
> > > > patch
> > > > > > for
> > > > > > > > mesos)
> > > > > > > >
> >
>
> > > > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > > > status
> > > > > > > > of Jobtracker.
> > > > > > > >
> > > > > > > >  Guodong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >
> > >
> >
>
> > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > > > >> problem. Thanks very much!
> > > > > > > >>
> > >
> >
>
> > > > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > > > >> DataNode) I should stop it first?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Wang Yu
> > > > > > > >>
> > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > TaskTrackers,
> > > > > > > >> meaning
> > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > >>
> > >
> >
>
> > > > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > > > running
> > > > > > > >> in your cluster?
> > > > > > > >>
> > > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > > running?
> > > > > > > >> [root@master logs]# jps
> > > > > > > >> 13896 RunJar
> > > > > > > >> 14123 Jps
> > > > > > > >> 12718 NameNode
> > > > > > > >> 12900 DataNode
> > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > >> 13218 JobTracker
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > >> >
> > >
> >
>
> > > > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > > > know
>
> > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > > > familiar
> > > > > > > >> with
> > > > > > > >> > mesos very much.
> > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > core-site.xml
> > > > > > > in
> > >
> >
>
> > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > > > set
> > > > > > up
> > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > >> >
> >
>
> > > > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > > > way
> > > > > to
> > > > > > > >> go...
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > The log messages you see:
> > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > >> >
> >
>
> > > > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > > > currently
> > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > MesosScheduler
> > > > > > > will
> > > > > > > >> > launch them on your behalf when needed.
> > > > > > > >> >
>
> > > > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > > > Hadoop
> > > > > > > with
> > >
> >
>
> > > > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > > > jobs
> > > > > > > >> have
> > >
> >
>
> > > > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > > > >> >
> > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Wang Yu
> > > > > > > >> >
> > > > > > > >> > From: 王国栋
> > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > >> > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > > > >> normally
> > > > > > > >> > located in the dir like
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > >> > The log is tasktracker log.
> > > > > > > >> >
> > > > > > > >> > I hope it will help.
> > > > > > > >> >
> > > > > > > >> > Guodong
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > **
> > > > > > > >> > > Hi All,
> > > > > > > >> > >
> >
>
> > > > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > > > and
> > > > > > it
> > > > > > > >> works
> > > > > > > >> > > well.
>
> > > > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > > > master,
> > > > > > > >> slave1,
> >
>
> > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > > > >> > >  [root@master logs]# jps
> > > > > > > >> > > 13896 RunJar
> > > > > > > >> > > 14123 Jps
> > > > > > > >> > > 12718 NameNode
> > > > > > > >> > > 12900 DataNode
> > > > > > > >> > > 13374 TaskTracker
> > > > > > > >> > > 13218 JobTracker
> > > > > > > >> > >
> > > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > > >> > >  [root@master
> > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > >> > > Running 30 maps.
> > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > > >> > job_201304181646_0001
>
> > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > > > >> > > It stopped here.
> > > > > > > >> > >
> >
>
> > > > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > > > it
> > > > > > > shows:
> > > > > > > >> > >  2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > > > RUNNING
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 5
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 6
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 9
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 7
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 8
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > > > a
> > > > > > new
> > > > > > > >> > node: /default-rack/master
> > > > > > > >> > > 2013-04-18 16
> >
>
> > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > > > tracker
> > > > > > > >> > tracker_master:localhost/
> > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > >> Adding
> > > > > > > >> > tracker tracker_slave5:
> > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
1. I opened all the executor directory, but all of them are null. I do not know what happened to them...
[root@slave1 logs]# cd /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -l
总用量 0
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]# ls -a
.  ..
[root@slave1 8a4dd631-1ec0-4946-a1bc-0644a7238e3c]#
2. I added "--isolation=cgroups" for slaves, but it still not work. Tasks are always lost. But there is no error any more, I still do not know what happened to the executor...Logs on one slave is as follows. Please help me, thanks very much!

mesos-slave.INFO
Log file created at: 2013/05/13 09:12:54
Running on machine: slave1
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0513 09:12:54.170383 24183 main.cpp:124] Creating "cgroups" isolator
I0513 09:12:54.171617 24183 main.cpp:132] Build: 2013-04-10 16:07:43 by root
I0513 09:12:54.171656 24183 main.cpp:133] Starting Mesos slave
I0513 09:12:54.173495 24197 slave.cpp:203] Slave started on 1)@192.168.0.3:36668
I0513 09:12:54.173578 24197 slave.cpp:204] Slave resources: cpus=24; mem=63356; ports=[31000-32000]; disk=29143
I0513 09:12:54.174486 24192 cgroups_isolator.cpp:242] Using /cgroup as cgroups hierarchy root
I0513 09:12:54.179914 24197 slave.cpp:453] New master detected at master@192.168.0.2:5050
I0513 09:12:54.180809 24197 slave.cpp:436] Successfully attached file '/home/mesos/build/logs/mesos-slave.INFO'
I0513 09:12:54.180817 24207 status_update_manager.cpp:132] New master detected at master@192.168.0.2:5050
I0513 09:12:54.194345 24192 cgroups_isolator.cpp:730] Recovering isolator
I0513 09:12:54.195453 24189 slave.cpp:377] Finished recovery
I0513 09:12:54.197798 24206 slave.cpp:487] Registered with master; given slave ID 201305130913-33597632-5050-3893-0
I0513 09:12:54.198086 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081719-33597632-5050-4050-1' for removal
I0513 09:12:54.198329 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305100938-33597632-5050-19520-1' for removal
I0513 09:12:54.198490 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081625-33597632-5050-2991-1' for removal
I0513 09:12:54.198593 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081746-33597632-5050-12378-1' for removal
I0513 09:12:54.198874 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305090914-33597632-5050-5072-1' for removal
I0513 09:12:54.199028 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305081730-33597632-5050-8558-1' for removal
I0513 09:12:54.199149 24201 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201304131144-33597632-5050-4949-2' for removal
I0513 09:13:54.176460 24204 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:14:54.178444 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:15:54.180680 24203 slave.cpp:1811] Current disk usage 26.93%. Max allowed age: 5.11days
I0513 09:16:23.051203 24200 slave.cpp:587] Got assigned task Task_Tracker_0 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.054324 24200 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.055605 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495'
I0513 09:16:23.056043 24190 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_0 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:23.059368 24190 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:23.060478 24190 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061101 24190 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.061807 24190 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:23.063297 24190 cgroups_isolator.cpp:555] Forked executor at = 24552
I0513 09:16:29.055598 24190 slave.cpp:587] Got assigned task Task_Tracker_1 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.058297 24190 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059012 24203 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b'
I0513 09:16:29.059865 24200 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_1 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:29.061282 24200 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:29.062208 24200 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.062940 24200 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.063705 24200 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:29.065239 24200 cgroups_isolator.cpp:555] Forked executor at = 24628
I0513 09:16:34.457746 24188 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:34.457909 24188 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.459873 24188 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.460028 24188 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 with uuid 6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461314 24190 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.461675 24190 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495 after 1 attempts
I0513 09:16:34.464400 24197 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.464659 24197 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477118 24199 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_0_tag_6522748a-9d43-41b7-8f88-cd537a502495
I0513 09:16:34.477439 24190 slave.cpp:1479] Executor 'executor_Task_Tracker_0' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:34.479852 24190 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480123 24190 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:34.480136 24199 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:34.480480 24185 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480716 24185 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.480927 24185 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.481107 24185 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:34.487007 24194 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487257 24185 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487412 24185 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487547 24185 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.487788 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_0 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:34.488142 24202 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_0/runs/6522748a-9d43-41b7-8f88-cd537a502495' for removal
I0513 09:16:35.063462 24199 slave.cpp:587] Got assigned task Task_Tracker_2 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.066090 24199 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066673 24188 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27'
I0513 09:16:35.066985 24205 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_2 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:35.068594 24205 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:35.069341 24205 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070061 24205 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.070828 24205 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:35.071966 24205 cgroups_isolator.cpp:555] Forked executor at = 24704
I0513 09:16:40.464987 24197 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:40.465175 24197 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.467118 24197 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.467269 24197 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 with uuid 38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468596 24198 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.468945 24198 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b after 1 attempts
I0513 09:16:40.471577 24200 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.471850 24200 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.480960 24185 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_1_tag_38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b
I0513 09:16:40.481230 24196 slave.cpp:1479] Executor 'executor_Task_Tracker_1' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:40.483572 24196 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.483801 24196 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:40.483846 24193 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:40.484094 24205 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484267 24205 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484412 24205 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.484558 24205 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:40.487229 24202 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487457 24196 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487607 24196 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487741 24196 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.487949 24207 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_1 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:40.488278 24193 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_1/runs/38d83d3a-ef9d-4118-b28c-c6c3cfba6c4b' for removal
I0513 09:16:41.072098 24194 slave.cpp:587] Got assigned task Task_Tracker_3 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.074632 24194 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.075546 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642'
I0513 09:16:41.076081 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_3 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_3/runs/22f6e84b-d07f-430a-a322-6f804b3cd642 with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:41.077606 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:41.078402 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.079186 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.080008 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:41.081447 24194 cgroups_isolator.cpp:555] Forked executor at = 24780
I0513 09:16:44.482589 24200 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:46.473145 24199 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:46.473307 24199 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.475491 24199 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.475649 24199 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 with uuid f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.476820 24192 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.477181 24192 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27 after 1 attempts
I0513 09:16:46.479907 24201 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.480229 24201 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493069 24200 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_2_tag_f4729d73-5000-4c40-9c0e-1e77ad414f27
I0513 09:16:46.493391 24184 slave.cpp:1479] Executor 'executor_Task_Tracker_2' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:46.495689 24184 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.495933 24184 slave.cpp:1280] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the status update manager
I0513 09:16:46.495980 24189 cgroups_isolator.cpp:666] Asked to update resources for an unknown/killed executor
I0513 09:16:46.496305 24193 status_update_manager.cpp:254] Received status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496553 24193 status_update_manager.cpp:403] Creating StatusUpdate stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496707 24193 status_update_manager.hpp:314] Handling UPDATE for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.496868 24193 status_update_manager.cpp:289] Forwarding status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000 to the master at master@192.168.0.2:5050
I0513 09:16:46.499631 24201 slave.cpp:979] Got acknowledgement of status update for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.499961 24193 status_update_manager.cpp:314] Received status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500128 24193 status_update_manager.hpp:314] Handling ACK for status update TASK_LOST from task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500257 24193 status_update_manager.cpp:434] Cleaning up status update stream for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500452 24192 slave.cpp:1016] Status update manager successfully handled status update acknowledgement for task Task_Tracker_2 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:46.500743 24204 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_2/runs/f4729d73-5000-4c40-9c0e-1e77ad414f27' for removal
I0513 09:16:47.079013 24193 slave.cpp:587] Got assigned task Task_Tracker_4 for framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.081650 24193 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082447 24198 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c'
I0513 09:16:47.082861 24194 cgroups_isolator.cpp:525] Launching executor_Task_Tracker_4 (cd hadoop && ./bin/mesos-executor) in /tmp/mesos/slaves/201305130913-33597632-5050-3893-0/frameworks/201305130913-33597632-5050-3893-0000/executors/executor_Task_Tracker_4/runs/8a4dd631-1ec0-4946-a1bc-0644a7238e3c with resources cpus=1; mem=1280 for framework 201305130913-33597632-5050-3893-0000 in cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_4_tag_8a4dd631-1ec0-4946-a1bc-0644a7238e3c
I0513 09:16:47.084478 24194 cgroups_isolator.cpp:670] Changing cgroup controls for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000 with resources cpus=1; mem=1280
I0513 09:16:47.085273 24194 cgroups_isolator.cpp:841] Updated 'cpu.shares' to 1024 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086045 24194 cgroups_isolator.cpp:979] Updated 'memory.limit_in_bytes' to 1342177280 for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.086853 24194 cgroups_isolator.cpp:1005] Started listening for OOM events for executor executor_Task_Tracker_4 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:47.088227 24194 cgroups_isolator.cpp:555] Forked executor at = 24856
I0513 09:16:50.485791 24194 status_update_manager.cpp:379] Checking for unacknowledged status updates
I0513 09:16:52.480471 24185 cgroups_isolator.cpp:806] Executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 terminated with status 256
I0513 09:16:52.480622 24185 cgroups_isolator.cpp:635] Killing executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
I0513 09:16:52.482652 24185 cgroups_isolator.cpp:1025] OOM notifier is triggered for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.482805 24185 cgroups_isolator.cpp:1030] Discarded OOM notifier for executor executor_Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000 with uuid 22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484110 24195 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.484447 24195 cgroups.cpp:1214] Successfully froze cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642 after 1 attempts
I0513 09:16:52.487893 24184 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.488129 24184 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496047 24207 cgroups_isolator.cpp:1144] Successfully destroyed cgroup mesos/framework_201305130913-33597632-5050-3893-0000_executor_executor_Task_Tracker_3_tag_22f6e84b-d07f-430a-a322-6f804b3cd642
I0513 09:16:52.496247 24203 slave.cpp:1479] Executor 'executor_Task_Tracker_3' of framework 201305130913-33597632-5050-3893-0000 has exited with status 1
I0513 09:16:52.498538 24203 slave.cpp:1232] Handling status update TASK_LOST from task Task_Tracker_3 of framework 201305130913-33597632-5050-3893-0000
......




Wang Yu

发件人: Benjamin Mahler
发送时间: 2013-05-11 02:32
收件人: wangyu
抄送: Benjamin Mahler; mesos-dev
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
1. If you look at a slave log, you can see that the process isolator
launched the task and then notified the slave that it was lost. Can you
look inside one of the executor directories, there should be an stderr file
there. E.g.:

I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
'/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'

Look for these in the logs and read the stderr present inside. Can you
report back with the contents?

2. Are you running on Linux? You may want to consider using
--isolation=cgroups when starting your slaves. This uses linux control
groups to do process / cpu / memory isolation between executors running on
the slave.

Thanks!


On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> Hi Ben,
>
> Logs for mesos master and slaves are attached, thanks for helping me with
> this problem. I am very appreciate for your patient reply.
>
> Three servers: "master", "slave1", "slave5"
> Mesos master: "master"
> Mesos slaves: "master", "slave1", "slave5"
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-10 07:22
> *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<be...@gmail.com>
> *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> upload the mesos-master and mesos-slave logs instead? What will be
> interesting here is what happened on the slave that is trying to run the
> TaskTracker.
>
>
> On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
>
> > I have uploaded them in the former email, I will send them again. PS: Will
> > the email list reject the attachements?
> >
> > Can you see them?
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-09 10:00
> > *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >  Did you forget to attach them?
> >
> >
> > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> > > OK.
> > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > > happened.
> > >
> > > Thanks very much for your help!
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-09 01:23
> > > *收件人:* mesos-dev@incubator.apache.org
> > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > Hey Brenden, are there any bugs in particular here that you're referring to?
> > >
> > > Wang, can you provide the logs for the JobTracker, the slave, and the
> > > master?
> > >
> > >
> > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > brenden.matthews@airbedandbreakfast.com> wrote:
> > >
> > > > You may want to try Airbnb's dist of Mesos:
> > > >
> > > > https://github.com/airbnb/mesos/tree/testing
> > > >
>
> > > > A good number of these Mesos bugs have been fixed but aren't yet merged
> > > > into upstream.
> > > >
> > > >
> > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> >
>
> > > > > The log on each slave of the lost task is : No executor found with ID:
> > > > > executor_Task_Tracker_XXX.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: 王瑜
> > > > > 发送时间: 2013-05-07 11:13
> > > > > 收件人: mesos-dev
> > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Hi all,
> > > > >
> >
>
> > > > > I have tried adding file extension when upload executor as well as the
> > > > > conf file, but it still can not work.
> > > > >
> > > > > And I have seen
> > > > >
> > >
> >
>
> > > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > but it is a null directory.
> > > > >
> > >
> >
>
> > > > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > > > really need your help, thanks very much!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: Vinod Kone
> > > > > 发送时间: 2013-04-26 01:31
> > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > 抄送: wangyu
> > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Also, you could look at the executor logs (default:
> > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > > > >  TASK_LOST happened.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > benjamin.mahler@gmail.com> wrote:
> > > > >
> >
>
> > > > > Can you maintain the file extension? That is how mesos knows to extract
> > > > it:
> > > > > hadoop fs -copyFromLocal
> > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > /user/mesos/mesos-executor.tar.gz
> > > > >
> > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > Hi, Ben,
> > > > > >
> > > > > > I have tried as you said, but It still can not work.
> > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > /user/mesos/mesos-executor
> > > > > > Did I do the right thing? Thanks very much!
> > > > > >
> > > > > > The log in jobtracker is:
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_82 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_83 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: Benjamin Mahler
> > > > > > 发送时间: 2013-04-24 07:49
> > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
>
> > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > >
> >
>
> > > > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > > > just uploaded the wrong file). :)
> > > > > >
> > > > > > Can you try that and report back?
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > Guodong,
> > > > > > >
> > >
> >
>
> > > > > > > There still are problems with me, I think there are some problem with
> > > > > my
> > > > > > > executor setting.
> > > > > > >
> > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > mesos-master-hostname)
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > >   </property>
> > > > > > >
> > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > >
> > > > > > > The head content is as follows:
> > > > > > >
> > > > > > > #! /bin/sh
> > > > > > >
> > >
> >
>
> > > > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > #
> >
>
> > > > > > > # The mesos-executor program cannot be directly executed until all
> > > > the
> > > > > > > libtool
> > > > > > > # libraries that it depends on are installed.
> > > > > > > #
> > > > > > > # This wrapper script should never be moved out of the build
> > > > directory.
> > > > > > > # If it is, it will not operate correctly.
> > > > > > >
> > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > backslashifies
> > >
> >
>
> > > > > > > # metacharacters that are still active within double-quoted strings.
> > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > >
> > > > > > > # Be Bourne compatible
> > >
> >
>
> > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > > > >   emulate sh
> > > > > > >   NULLCMD=:
> > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > else
> > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > > fi
> > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > >
> > >
> >
>
> > > > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > > > # if CDPATH is set.
> > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > >
> > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> >
>
> > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > > > export
>
> > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > >
> >
>
> > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > > > -z
> > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > GCC_EXEC_PREFIX=;
> >
>
> > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > > > unset
> > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > export LD_LIBRARY_PATH;
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > >
> >
>
> > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > >
> >
>
> > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > ...
> > > > > > >
> > > > > > >
>
> > > > > > > Did I upload the right file? and set up it in conf file correct?
> > > > Thanks
> > > > > > > very much!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > From: 王国栋
> > > > > > > Date: 2013-04-23 13:32
> > > > > > > To: wangyu
> > > > > > > CC: mesos-dev
> > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > > >
>
> > > > > > > if you run hadoop in local mode, use the following setting is ok
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.master</name>
> > > > > > >     <value>local</value>
> > > > > > >   </property>
> > > > > > >
>
> > > > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > >
>
> > > > > > > Make sure the dns parser result for mesos-master-hostname is the
> > > > right
> > > > > > ip.
> > > > > > >
> > >
> >
>
> > > > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > > > check
> > > > > > > if there is hadoop framework registered.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > **
> > > > > > > > Hi, Guodong,
> > > > > > > >
> > > > > > > > I start hadoop as you said, then I saw this error:
> > >
> >
>
> > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse
> > > > > > > > '@0.0.0.0:0'
> > > > > > > >
> > >
> >
>
> > > > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > > > this?
>
> > > > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > > > >
> > > > > > > > The whole log is as follows:
> > > > > > > >
> > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > /************************************************************
> > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > >
> > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > > > 11:19:33
> > > > > > > CST 2013
> > > > > > > > ************************************************************/
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > > > hadoop-metrics2.properties
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > > > period
> > > > > > > at 10 second(s).
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > > > system
> > > > > > > started
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > QueueMetrics,q=default registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > ugi
> > > > > > > registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > Starting expired delegation token remover thread,
> > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > limitMaxMemForMapTasks,
> > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > > > owner
> > > > > > > as root
> > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcActivityForPort9001 registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
>
> > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > > > Opening
> > > > > > > the listener on 50030
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > > > returned
>
> > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > jvm
> > > > > > > registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > JobTrackerMetrics registered.
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > > > 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > > > directory
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > > > initialized in embedded mode
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > > > server at: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > > > address: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > > > job
> > > > > > > store is inactive
> > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > MesosScheduler
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > > information
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse '@
> > > > > > > > 0.0.0.0:0'
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > > > file
> > > > > > to
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > > > file
> > > > > > to
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > > > starting
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > > > starting
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > >
> >
>
> > > > > > > native-hadoop library for your platform... using builtin-java classes
> > > > > > where
> > > > > > > applicable
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > >  IP=192.168.0.2
> > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > >  RESULT=SUCCESS
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > > > stored with users keys in
>
> > > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > job_201304231321_0001
> > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > >  Hi Yu,
> > > > > > > >
> >
>
> > > > > > > > Mesos will just launch tasktracker on each slave node as long as
> > > > the
> > >
> >
>
> > > > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > >
> > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > > configure
>
> > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > > > normal
> > > > > > > one.
> >
>
> > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
>
> > > > > > > > configure mapred-site.xml, this jobtracker should contains the
> > > > patch
> > > > > > for
> > > > > > > > mesos)
> > > > > > > >
> >
>
> > > > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > > > status
> > > > > > > > of Jobtracker.
> > > > > > > >
> > > > > > > >  Guodong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >
> > >
> >
>
> > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > > > >> problem. Thanks very much!
> > > > > > > >>
> > >
> >
>
> > > > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > > > >> DataNode) I should stop it first?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Wang Yu
> > > > > > > >>
> > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > TaskTrackers,
> > > > > > > >> meaning
> > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > >>
> > >
> >
>
> > > > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > > > running
> > > > > > > >> in your cluster?
> > > > > > > >>
> > > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > > running?
> > > > > > > >> [root@master logs]# jps
> > > > > > > >> 13896 RunJar
> > > > > > > >> 14123 Jps
> > > > > > > >> 12718 NameNode
> > > > > > > >> 12900 DataNode
> > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > >> 13218 JobTracker
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > >> >
> > >
> >
>
> > > > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > > > know
>
> > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > > > familiar
> > > > > > > >> with
> > > > > > > >> > mesos very much.
> > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > core-site.xml
> > > > > > > in
> > >
> >
>
> > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > > > set
> > > > > > up
> > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > >> >
> >
>
> > > > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > > > way
> > > > > to
> > > > > > > >> go...
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > The log messages you see:
> > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > >> >
> >
>
> > > > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > > > currently
> > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > MesosScheduler
> > > > > > > will
> > > > > > > >> > launch them on your behalf when needed.
> > > > > > > >> >
>
> > > > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > > > Hadoop
> > > > > > > with
> > >
> >
>
> > > > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > > > jobs
> > > > > > > >> have
> > >
> >
>
> > > > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > > > >> >
> > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Wang Yu
> > > > > > > >> >
> > > > > > > >> > From: 王国栋
> > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > >> > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > > > >> normally
> > > > > > > >> > located in the dir like
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > >> > The log is tasktracker log.
> > > > > > > >> >
> > > > > > > >> > I hope it will help.
> > > > > > > >> >
> > > > > > > >> > Guodong
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > **
> > > > > > > >> > > Hi All,
> > > > > > > >> > >
> >
>
> > > > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > > > and
> > > > > > it
> > > > > > > >> works
> > > > > > > >> > > well.
>
> > > > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > > > master,
> > > > > > > >> slave1,
> >
>
> > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > > > >> > >  [root@master logs]# jps
> > > > > > > >> > > 13896 RunJar
> > > > > > > >> > > 14123 Jps
> > > > > > > >> > > 12718 NameNode
> > > > > > > >> > > 12900 DataNode
> > > > > > > >> > > 13374 TaskTracker
> > > > > > > >> > > 13218 JobTracker
> > > > > > > >> > >
> > > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > > >> > >  [root@master
> > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > >> > > Running 30 maps.
> > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > > >> > job_201304181646_0001
>
> > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > > > >> > > It stopped here.
> > > > > > > >> > >
> >
>
> > > > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > > > it
> > > > > > > shows:
> > > > > > > >> > >  2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > > > RUNNING
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 5
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 6
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 9
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 7
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 8
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > > > a
> > > > > > new
> > > > > > > >> > node: /default-rack/master
> > > > > > > >> > > 2013-04-18 16
> >
>
> > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > > > tracker
> > > > > > > >> > tracker_master:localhost/
> > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > >> Adding
> > > > > > > >> > tracker tracker_slave5:
> > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Benjamin Mahler <be...@gmail.com>.
1. If you look at a slave log, you can see that the process isolator
launched the task and then notified the slave that it was lost. Can you
look inside one of the executor directories, there should be an stderr file
there. E.g.:

I0510 09:44:33.801655  7412 paths.hpp:302] Created executor directory
'/tmp/mesos/slaves/201305100938-33597632-5050-19520-1/frameworks/201305100938-33597632-5050-19520-0000/executors/executor_Task_Tracker_5/runs/2981a5c2-84e5-4868-9507-8aecb32ee163'

Look for these in the logs and read the stderr present inside. Can you
report back with the contents?

2. Are you running on Linux? You may want to consider using
--isolation=cgroups when starting your slaves. This uses linux control
groups to do process / cpu / memory isolation between executors running on
the slave.

Thanks!


On Thu, May 9, 2013 at 7:07 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> Hi Ben,
>
> Logs for mesos master and slaves are attached, thanks for helping me with
> this problem. I am very appreciate for your patient reply.
>
> Three servers: "master", "slave1", "slave5"
> Mesos master: "master"
> Mesos slaves: "master", "slave1", "slave5"
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-10 07:22
> *收件人:* wangyu <wa...@nfs.iscas.ac.cn>
> *抄送:* mesos-dev <me...@incubator.apache.org>; Benjamin Mahler<be...@gmail.com>
> *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>  Ah I see them now, looks like you uploaded the NameNode logs? Can you
> upload the mesos-master and mesos-slave logs instead? What will be
> interesting here is what happened on the slave that is trying to run the
> TaskTracker.
>
>
> On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
>
> > I have uploaded them in the former email, I will send them again. PS: Will
> > the email list reject the attachements?
> >
> > Can you see them?
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-09 10:00
> > *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> > *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >  Did you forget to attach them?
> >
> >
> > On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > **
> > > OK.
> > > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > > happened.
> > >
> > > Thanks very much for your help!
> > >
> > > ------------------------------
> > > Wang Yu
> > >
> > >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > > *发送时间:* 2013-05-09 01:23
> > > *收件人:* mesos-dev@incubator.apache.org
> > > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > Hey Brenden, are there any bugs in particular here that you're referring to?
> > >
> > > Wang, can you provide the logs for the JobTracker, the slave, and the
> > > master?
> > >
> > >
> > > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > > brenden.matthews@airbedandbreakfast.com> wrote:
> > >
> > > > You may want to try Airbnb's dist of Mesos:
> > > >
> > > > https://github.com/airbnb/mesos/tree/testing
> > > >
>
> > > > A good number of these Mesos bugs have been fixed but aren't yet merged
> > > > into upstream.
> > > >
> > > >
> > > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> >
>
> > > > > The log on each slave of the lost task is : No executor found with ID:
> > > > > executor_Task_Tracker_XXX.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: 王瑜
> > > > > 发送时间: 2013-05-07 11:13
> > > > > 收件人: mesos-dev
> > > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Hi all,
> > > > >
> >
>
> > > > > I have tried adding file extension when upload executor as well as the
> > > > > conf file, but it still can not work.
> > > > >
> > > > > And I have seen
> > > > >
> > >
> >
>
> > > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > > but it is a null directory.
> > > > >
> > >
> >
>
> > > > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > > > really need your help, thanks very much!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: Vinod Kone
> > > > > 发送时间: 2013-04-26 01:31
> > > > > 收件人: mesos-dev@incubator.apache.org
> > > > > 抄送: wangyu
> > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Also, you could look at the executor logs (default:
> > > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > > > >  TASK_LOST happened.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > > benjamin.mahler@gmail.com> wrote:
> > > > >
> >
>
> > > > > Can you maintain the file extension? That is how mesos knows to extract
> > > > it:
> > > > > hadoop fs -copyFromLocal
> > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > /user/mesos/mesos-executor.tar.gz
> > > > >
> > > > > Also make sure your mapred-site.xml has the extension as well.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > Hi, Ben,
> > > > > >
> > > > > > I have tried as you said, but It still can not work.
> > > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > > /user/mesos/mesos-executor
> > > > > > Did I do the right thing? Thanks very much!
> > > > > >
> > > > > > The log in jobtracker is:
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_82 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > > Task_Tracker_83 on http://slave1:31000
> > >
> >
>
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > > slots needed.
> > > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > > >       Pending Map Tasks: 2
> > > > > >    Pending Reduce Tasks: 1
> > > > > >          Idle Map Slots: 0
> > > > > >       Idle Reduce Slots: 0
> > > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > > >        Needed Map Slots: 2
> > > > > >     Needed Reduce Slots: 1
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > 发件人: Benjamin Mahler
> > > > > > 发送时间: 2013-04-24 07:49
> > > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
>
> > > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > >
> >
>
> > > > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > > > just uploaded the wrong file). :)
> > > > > >
> > > > > > Can you try that and report back?
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > Guodong,
> > > > > > >
> > >
> >
>
> > > > > > > There still are problems with me, I think there are some problem with
> > > > > my
> > > > > > > executor setting.
> > > > > > >
> > > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > > mesos-master-hostname)
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.executor</name>
> > > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > > >   </property>
> > > > > > >
> > > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > > >
> > > > > > > The head content is as follows:
> > > > > > >
> > > > > > > #! /bin/sh
> > > > > > >
> > >
> >
>
> > > > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > > #
> >
>
> > > > > > > # The mesos-executor program cannot be directly executed until all
> > > > the
> > > > > > > libtool
> > > > > > > # libraries that it depends on are installed.
> > > > > > > #
> > > > > > > # This wrapper script should never be moved out of the build
> > > > directory.
> > > > > > > # If it is, it will not operate correctly.
> > > > > > >
> > > > > > > # Sed substitution that helps us do robust quoting.  It
> > > > backslashifies
> > >
> >
>
> > > > > > > # metacharacters that are still active within double-quoted strings.
> > > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > > >
> > > > > > > # Be Bourne compatible
> > >
> >
>
> > > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > > > >   emulate sh
> > > > > > >   NULLCMD=:
> > > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > > >   setopt NO_GLOB_SUBST
> > > > > > > else
> > > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > > fi
> > > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > > >
> > >
> >
>
> > > > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > > > # if CDPATH is set.
> > > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > > >
> > > > > > > relink_command="(cd /home/mesos/build/src; { test -z
> >
>
> > > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > > > export
>
> > > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > >
> >
>
> > > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > > > -z
> > > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > > GCC_EXEC_PREFIX=;
> >
>
> > > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > > > unset
> > > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > > export LD_LIBRARY_PATH;
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > >
> >
>
> > > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > >
> >
>
> > > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > > ...
> > > > > > >
> > > > > > >
>
> > > > > > > Did I upload the right file? and set up it in conf file correct?
> > > > Thanks
> > > > > > > very much!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Wang Yu
> > > > > > >
> > > > > > > From: 王国栋
> > > > > > > Date: 2013-04-23 13:32
> > > > > > > To: wangyu
> > > > > > > CC: mesos-dev
> > > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > TaskTracker: http://slave5:50060
> > > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > > >
>
> > > > > > > if you run hadoop in local mode, use the following setting is ok
> > > > > > >   <property>
> > > > > > >     <name>mapred.mesos.master</name>
> > > > > > >     <value>local</value>
> > > > > > >   </property>
> > > > > > >
>
> > > > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > > > mesos-master-hostname:mesos-master-port.
> > > > > > >
>
> > > > > > > Make sure the dns parser result for mesos-master-hostname is the
> > > > right
> > > > > > ip.
> > > > > > >
> > >
> >
>
> > > > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > > > check
> > > > > > > if there is hadoop framework registered.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > > wrote:
> > > > > > >
> > > > > > > > **
> > > > > > > > Hi, Guodong,
> > > > > > > >
> > > > > > > > I start hadoop as you said, then I saw this error:
> > >
> >
>
> > > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse
> > > > > > > > '@0.0.0.0:0'
> > > > > > > >
> > >
> >
>
> > > > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > > > this?
>
> > > > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > > > >
> > > > > > > > The whole log is as follows:
> > > > > > > >
> > > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > > /************************************************************
> > > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > > STARTUP_MSG:   args = []
> > > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > > >
> > > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > > > 11:19:33
> > > > > > > CST 2013
> > > > > > > > ************************************************************/
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > > > hadoop-metrics2.properties
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > MetricsSystem,sub=Stats registered.
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > > > period
> > > > > > > at 10 second(s).
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > > > system
> > > > > > > started
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > QueueMetrics,q=default registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > ugi
> > > > > > > registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > > Starting expired delegation token remover thread,
> > > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > > limitMaxMemForMapTasks,
> > > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO
> > > > > delegation.AbstractDelegationTokenSecretManager:
>
> > > > > > > Updating the current master key for generating delegation tokens
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > > > owner
> > > > > > > as root
> > > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > RpcActivityForPort9001 registered.
> > > > > > > >
> > > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > > org.mortbay.log.Slf4jLog
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
>
> > > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > > > Opening
> > > > > > > the listener on 50030
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > > > returned
>
> > > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > jvm
> > > > > > > registered.
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > > JobTrackerMetrics registered.
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > > > 50030
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > > > directory
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > > > initialized in embedded mode
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > > > server at: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > > > address: localhost:50030
> > > > > > > >
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > > > job
> > > > > > > store is inactive
> > > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > > MesosScheduler
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > > information
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > > driver: Cannot parse '@
> > > > > > > > 0.0.0.0:0'
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > > > file
> > > > > > to
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > > > file
> > > > > > to
> > > > > > > >
> > > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > > > (include/exclude) list
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > > > starting
> > > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > > > starting
> >
>
> > > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > > > starting
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > >
> >
>
> > > > > > > native-hadoop library for your platform... using builtin-java classes
> > > > > > where
> > > > > > > applicable
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > > > added successfully for user 'root' to queue 'default'
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > > >  IP=192.168.0.2
> > > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > > >  RESULT=SUCCESS
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > > job_201304231321_0001
> > > > > > > >
> > >
> >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > > > stored with users keys in
>
> > > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > > >
>
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > > >
> > > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > > job_201304231321_0001
> > > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > Wang Yu
> > > > > > > >
> > > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > > >  Hi Yu,
> > > > > > > >
> >
>
> > > > > > > > Mesos will just launch tasktracker on each slave node as long as
> > > > the
> > >
> >
>
> > > > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > > >
> > > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > > configure
>
> > > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > > > normal
> > > > > > > one.
> >
>
> > > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
>
> > > > > > > > configure mapred-site.xml, this jobtracker should contains the
> > > > patch
> > > > > > for
> > > > > > > > mesos)
> > > > > > > >
> >
>
> > > > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > > > status
> > > > > > > > of Jobtracker.
> > > > > > > >
> > > > > > > >  Guodong
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >
> > >
> >
>
> > > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > > > >> problem. Thanks very much!
> > > > > > > >>
> > >
> >
>
> > > > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > > > >> DataNode) I should stop it first?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Wang Yu
> > > > > > > >>
> > > > > > > >> 发件人: Benjamin Mahler
> > > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > > >> TaskTracker: http://slave5:50060
> > > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > > TaskTrackers,
> > > > > > > >> meaning
> > > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > > >>
> > >
> >
>
> > > > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > > > running
> > > > > > > >> in your cluster?
> > > > > > > >>
> > > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > > running?
> > > > > > > >> [root@master logs]# jps
> > > > > > > >> 13896 RunJar
> > > > > > > >> 14123 Jps
> > > > > > > >> 12718 NameNode
> > > > > > > >> 12900 DataNode
> > > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > > >> 13218 JobTracker
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > > >>
> > > > > > > >> > Hi, Ben and Guodong,
> > > > > > > >> >
> > >
> >
>
> > > > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > > > know
>
> > > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > > > familiar
> > > > > > > >> with
> > > > > > > >> > mesos very much.
> > > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > > core-site.xml
> > > > > > > in
> > >
> >
>
> > > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > > > set
> > > > > > up
> > > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > > >> >
> >
>
> > > > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > > > way
> > > > > to
> > > > > > > >> go...
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > The log messages you see:
> > > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > > >> >
> >
>
> > > > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > > > currently
> > > > > > > >> > don't support running your own TaskTrackers, as the
> > > > MesosScheduler
> > > > > > > will
> > > > > > > >> > launch them on your behalf when needed.
> > > > > > > >> >
>
> > > > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > > > Hadoop
> > > > > > > with
> > >
> >
>
> > > > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > > > jobs
> > > > > > > >> have
> > >
> >
>
> > > > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > > > >> >
> > > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Wang Yu
> > > > > > > >> >
> > > > > > > >> > From: 王国栋
> > > > > > > >> > Date: 2013-04-18 17:10
> > > > > > > >> > To: mesos-dev; wangyu
> > > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited
> > > > > > > >> > TaskTracker: http://slave5:50060
> > >
> >
>
> > > > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > > > >> normally
> > > > > > > >> > located in the dir like
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
> > > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > > >> > The log is tasktracker log.
> > > > > > > >> >
> > > > > > > >> > I hope it will help.
> > > > > > > >> >
> > > > > > > >> > Guodong
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <
> wangyu@nfs.iscas.ac.cn
> > >
> > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > **
> > > > > > > >> > > Hi All,
> > > > > > > >> > >
> >
>
> > > > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > > > and
> > > > > > it
> > > > > > > >> works
> > > > > > > >> > > well.
>
> > > > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > > > master,
> > > > > > > >> slave1,
> >
>
> > > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > > > >> > >  [root@master logs]# jps
> > > > > > > >> > > 13896 RunJar
> > > > > > > >> > > 14123 Jps
> > > > > > > >> > > 12718 NameNode
> > > > > > > >> > > 12900 DataNode
> > > > > > > >> > > 13374 TaskTracker
> > > > > > > >> > > 13218 JobTracker
> > > > > > > >> > >
> > > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > > >> > >  [root@master
> > > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > > hadoop-examples-0.20.205.0.jar
> > > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > > >> > > Running 30 maps.
> > > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > > >> > job_201304181646_0001
>
> > > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > > > >> > > It stopped here.
> > > > > > > >> > >
> >
>
> > > > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > > > it
> > > > > > > shows:
> > > > > > > >> > >  2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > > > RUNNING
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 5
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 6
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 9
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 7
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > > handler 8
> > > > > > > on
> > > > > > > >> > 9001: starting
> > > > > > > >> > > 2013-04-18 16
> > >
> >
>
> > > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > > > a
> > > > > > new
> > > > > > > >> > node: /default-rack/master
> > > > > > > >> > > 2013-04-18 16
> >
>
> > > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > > > tracker
> > > > > > > >> > tracker_master:localhost/
> > > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > > 2013-04-18 16
> > > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> Unknown/exited
> > > > > > > >> > TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > > >> Adding
> > > > > > > >> > tracker tracker_slave5:
> > > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://slave5:50060.
> > > > > > > >> > >
> > > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > >> > Unknown/exited TaskTracker:
> > > > > > > >> > > http://master:50060.
> > > > > > > >> > >
> > > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>
>

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
Hi Ben,

Logs for mesos master and slaves are attached, thanks for helping me with this problem. I am very appreciate for your patient reply.

Three servers: "master", "slave1", "slave5"
Mesos master: "master"
Mesos slaves: "master", "slave1", "slave5"




Wang Yu

发件人: Benjamin Mahler
发送时间: 2013-05-10 07:22
收件人: wangyu
抄送: mesos-dev; Benjamin Mahler
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
Ah I see them now, looks like you uploaded the NameNode logs? Can you
upload the mesos-master and mesos-slave logs instead? What will be
interesting here is what happened on the slave that is trying to run the
TaskTracker.


On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> I have uploaded them in the former email, I will send them again. PS: Will
> the email list reject the attachements?
>
> Can you see them?
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-09 10:00
> *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>  Did you forget to attach them?
>
>
> On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
> > OK.
> > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > happened.
> >
> > Thanks very much for your help!
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-09 01:23
> > *收件人:* mesos-dev@incubator.apache.org
> > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >
>
> > Hey Brenden, are there any bugs in particular here that you're referring to?
> >
> > Wang, can you provide the logs for the JobTracker, the slave, and the
> > master?
> >
> >
> > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > brenden.matthews@airbedandbreakfast.com> wrote:
> >
> > > You may want to try Airbnb's dist of Mesos:
> > >
> > > https://github.com/airbnb/mesos/tree/testing
> > >
> > > A good number of these Mesos bugs have been fixed but aren't yet merged
> > > into upstream.
> > >
> > >
> > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
>
> > > > The log on each slave of the lost task is : No executor found with ID:
> > > > executor_Task_Tracker_XXX.
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: 王瑜
> > > > 发送时间: 2013-05-07 11:13
> > > > 收件人: mesos-dev
> > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Hi all,
> > > >
>
> > > > I have tried adding file extension when upload executor as well as the
> > > > conf file, but it still can not work.
> > > >
> > > > And I have seen
> > > >
> >
>
> > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > but it is a null directory.
> > > >
> >
>
> > > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > > really need your help, thanks very much!
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: Vinod Kone
> > > > 发送时间: 2013-04-26 01:31
> > > > 收件人: mesos-dev@incubator.apache.org
> > > > 抄送: wangyu
> > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Also, you could look at the executor logs (default:
> > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > > >  TASK_LOST happened.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > benjamin.mahler@gmail.com> wrote:
> > > >
>
> > > > Can you maintain the file extension? That is how mesos knows to extract
> > > it:
> > > > hadoop fs -copyFromLocal
> > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > /user/mesos/mesos-executor.tar.gz
> > > >
> > > > Also make sure your mapred-site.xml has the extension as well.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > Hi, Ben,
> > > > >
> > > > > I have tried as you said, but It still can not work.
> > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > /user/mesos/mesos-executor
> > > > > Did I do the right thing? Thanks very much!
> > > > >
> > > > > The log in jobtracker is:
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > Task_Tracker_82 on http://slave1:31000
> >
>
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > slots needed.
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > >       Pending Map Tasks: 2
> > > > >    Pending Reduce Tasks: 1
> > > > >          Idle Map Slots: 0
> > > > >       Idle Reduce Slots: 0
> > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > >        Needed Map Slots: 2
> > > > >     Needed Reduce Slots: 1
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > Task_Tracker_83 on http://slave1:31000
> >
>
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > slots needed.
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > >       Pending Map Tasks: 2
> > > > >    Pending Reduce Tasks: 1
> > > > >          Idle Map Slots: 0
> > > > >       Idle Reduce Slots: 0
> > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > >        Needed Map Slots: 2
> > > > >     Needed Reduce Slots: 1
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: Benjamin Mahler
> > > > > 发送时间: 2013-04-24 07:49
> > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> >
>
> > > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> >
>
> > > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > > just uploaded the wrong file). :)
> > > > >
> > > > > Can you try that and report back?
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > Guodong,
> > > > > >
> >
>
> > > > > > There still are problems with me, I think there are some problem with
> > > > my
> > > > > > executor setting.
> > > > > >
> > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > mesos-master-hostname)
> > > > > >   <property>
> > > > > >     <name>mapred.mesos.executor</name>
> > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > >   </property>
> > > > > >
> > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > >
> > > > > > The head content is as follows:
> > > > > >
> > > > > > #! /bin/sh
> > > > > >
> >
>
> > > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > #
>
> > > > > > # The mesos-executor program cannot be directly executed until all
> > > the
> > > > > > libtool
> > > > > > # libraries that it depends on are installed.
> > > > > > #
> > > > > > # This wrapper script should never be moved out of the build
> > > directory.
> > > > > > # If it is, it will not operate correctly.
> > > > > >
> > > > > > # Sed substitution that helps us do robust quoting.  It
> > > backslashifies
> >
>
> > > > > > # metacharacters that are still active within double-quoted strings.
> > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > >
> > > > > > # Be Bourne compatible
> >
>
> > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > > >   emulate sh
> > > > > >   NULLCMD=:
> > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > >   setopt NO_GLOB_SUBST
> > > > > > else
> > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > fi
> > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > >
> >
>
> > > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > > # if CDPATH is set.
> > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > >
> > > > > > relink_command="(cd /home/mesos/build/src; { test -z
>
> > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > > export
> > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> >
>
> > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > > -z
> > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > GCC_EXEC_PREFIX=;
>
> > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > > unset
> > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > >
> > > > >
> > > >
> >
>
> > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > export LD_LIBRARY_PATH;
> > > > > >
> > > > >
> > > >
> >
>
> > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> >
>
> > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> >
>
> > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > ...
> > > > > >
> > > > > >
> > > > > > Did I upload the right file? and set up it in conf file correct?
> > > Thanks
> > > > > > very much!
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > From: 王国栋
> > > > > > Date: 2013-04-23 13:32
> > > > > > To: wangyu
> > > > > > CC: mesos-dev
> > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > >
> > > > > > if you run hadoop in local mode, use the following setting is ok
> > > > > >   <property>
> > > > > >     <name>mapred.mesos.master</name>
> > > > > >     <value>local</value>
> > > > > >   </property>
> > > > > >
> > > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > > mesos-master-hostname:mesos-master-port.
> > > > > >
> > > > > > Make sure the dns parser result for mesos-master-hostname is the
> > > right
> > > > > ip.
> > > > > >
> >
>
> > > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > > check
> > > > > > if there is hadoop framework registered.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > **
> > > > > > > Hi, Guodong,
> > > > > > >
> > > > > > > I start hadoop as you said, then I saw this error:
> >
>
> > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > driver: Cannot parse
> > > > > > > '@0.0.0.0:0'
> > > > > > >
> >
>
> > > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > > this?
> > > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > > >
> > > > > > > The whole log is as follows:
> > > > > > >
> > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > /************************************************************
> > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > STARTUP_MSG:   args = []
> > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > >
> > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > > 11:19:33
> > > > > > CST 2013
> > > > > > > ************************************************************/
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > > hadoop-metrics2.properties
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > MetricsSystem,sub=Stats registered.
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > > period
> > > > > > at 10 second(s).
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > > system
> > > > > > started
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > QueueMetrics,q=default registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > ugi
> > > > > > registered.
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Updating the current master key for generating delegation tokens
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Starting expired delegation token remover thread,
> > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > limitMaxMemForMapTasks,
> > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Updating the current master key for generating delegation tokens
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > > (include/exclude) list
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > > owner
> > > > > > as root
> > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > RpcActivityForPort9001 registered.
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > org.mortbay.log.Slf4jLog
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > > Opening
> > > > > > the listener on 50030
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > > returned
> > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
>
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > jvm
> > > > > > registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > JobTrackerMetrics registered.
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > > 50030
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > > directory
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > > initialized in embedded mode
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > > server at: localhost:50030
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > > address: localhost:50030
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > > job
> > > > > > store is inactive
> > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > MesosScheduler
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > information
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > driver: Cannot parse '@
> > > > > > > 0.0.0.0:0'
>
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > > file
> > > > > to
>
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > > file
> > > > > to
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > > (include/exclude) list
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > > starting
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > > starting
> > > > > > >
> > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> >
>
> > > > > > native-hadoop library for your platform... using builtin-java classes
> > > > > where
> > > > > > applicable
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > job_201304231321_0001
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > > added successfully for user 'root' to queue 'default'
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > >  IP=192.168.0.2
> > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > >  RESULT=SUCCESS
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > job_201304231321_0001
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > job_201304231321_0001
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > > stored with users keys in
> > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > job_201304231321_0001
> > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > Wang Yu
> > > > > > >
> > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > >  Hi Yu,
> > > > > > >
>
> > > > > > > Mesos will just launch tasktracker on each slave node as long as
> > > the
> >
>
> > > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > >
> > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > configure
> > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > > normal
> > > > > > one.
>
> > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > > > configure mapred-site.xml, this jobtracker should contains the
> > > patch
> > > > > for
> > > > > > > mesos)
> > > > > > >
>
> > > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > > status
> > > > > > > of Jobtracker.
> > > > > > >
> > > > > > >  Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > > >
> >
>
> > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > > >> problem. Thanks very much!
> > > > > > >>
> >
>
> > > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > > >> DataNode) I should stop it first?
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Wang Yu
> > > > > > >>
> > > > > > >> 发件人: Benjamin Mahler
> > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > > >> TaskTracker: http://slave5:50060
> > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > TaskTrackers,
> > > > > > >> meaning
> > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > >>
> >
>
> > > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > > running
> > > > > > >> in your cluster?
> > > > > > >>
> > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > running?
> > > > > > >> [root@master logs]# jps
> > > > > > >> 13896 RunJar
> > > > > > >> 14123 Jps
> > > > > > >> 12718 NameNode
> > > > > > >> 12900 DataNode
> > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > >> 13218 JobTracker
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > > >>
> > > > > > >> > Hi, Ben and Guodong,
> > > > > > >> >
> >
>
> > > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > > know
> > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > > familiar
> > > > > > >> with
> > > > > > >> > mesos very much.
> > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > core-site.xml
> > > > > > in
> >
>
> > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > > set
> > > > > up
> > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > >> >
>
> > > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > > way
> > > > to
> > > > > > >> go...
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > The log messages you see:
> > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > >> >
>
> > > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > > currently
> > > > > > >> > don't support running your own TaskTrackers, as the
> > > MesosScheduler
> > > > > > will
> > > > > > >> > launch them on your behalf when needed.
> > > > > > >> >
> > > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > > Hadoop
> > > > > > with
> >
>
> > > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > > jobs
> > > > > > >> have
> >
>
> > > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > > >> >
> > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Wang Yu
> > > > > > >> >
> > > > > > >> > From: 王国栋
> > > > > > >> > Date: 2013-04-18 17:10
> > > > > > >> > To: mesos-dev; wangyu
> > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > >> > TaskTracker: http://slave5:50060
> >
>
> > > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > > >> normally
> > > > > > >> > located in the dir like
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> >
>
> > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > >> > The log is tasktracker log.
> > > > > > >> >
> > > > > > >> > I hope it will help.
> > > > > > >> >
> > > > > > >> > Guodong
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > >> >
> > > > > > >> > > **
> > > > > > >> > > Hi All,
> > > > > > >> > >
>
> > > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > > and
> > > > > it
> > > > > > >> works
> > > > > > >> > > well.
> > > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > > master,
> > > > > > >> slave1,
>
> > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > > >> > >  [root@master logs]# jps
> > > > > > >> > > 13896 RunJar
> > > > > > >> > > 14123 Jps
> > > > > > >> > > 12718 NameNode
> > > > > > >> > > 12900 DataNode
> > > > > > >> > > 13374 TaskTracker
> > > > > > >> > > 13218 JobTracker
> > > > > > >> > >
> > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > >> > >  [root@master
> > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > hadoop-examples-0.20.205.0.jar
> > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > >> > > Running 30 maps.
> > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > >> > job_201304181646_0001
> > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > > >> > > It stopped here.
> > > > > > >> > >
>
> > > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > > it
> > > > > > shows:
> > > > > > >> > >  2013-04-18 16
> >
>
> > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > > RUNNING
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 5
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 6
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 9
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 7
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 8
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> >
>
> > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > > a
> > > > > new
> > > > > > >> > node: /default-rack/master
> > > > > > >> > > 2013-04-18 16
>
> > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > > tracker
> > > > > > >> > tracker_master:localhost/
> > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > >> Adding
> > > > > > >> > tracker tracker_slave5:
> > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Benjamin Mahler <be...@gmail.com>.
Ah I see them now, looks like you uploaded the NameNode logs? Can you
upload the mesos-master and mesos-slave logs instead? What will be
interesting here is what happened on the slave that is trying to run the
TaskTracker.


On Wed, May 8, 2013 at 8:32 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> I have uploaded them in the former email, I will send them again. PS: Will
> the email list reject the attachements?
>
> Can you see them?
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-09 10:00
> *收件人:* mesos-dev@incubator.apache.org; wangyu <wa...@nfs.iscas.ac.cn>
> *主题:* Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>  Did you forget to attach them?
>
>
> On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > **
> > OK.
> > Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> > happened.
> >
> > Thanks very much for your help!
> >
> > ------------------------------
> > Wang Yu
> >
> >  *发件人:* Benjamin Mahler <be...@gmail.com>
> > *发送时间:* 2013-05-09 01:23
> > *收件人:* mesos-dev@incubator.apache.org
> > *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> > *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> >
>
> > Hey Brenden, are there any bugs in particular here that you're referring to?
> >
> > Wang, can you provide the logs for the JobTracker, the slave, and the
> > master?
> >
> >
> > On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> > brenden.matthews@airbedandbreakfast.com> wrote:
> >
> > > You may want to try Airbnb's dist of Mesos:
> > >
> > > https://github.com/airbnb/mesos/tree/testing
> > >
> > > A good number of these Mesos bugs have been fixed but aren't yet merged
> > > into upstream.
> > >
> > >
> > > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
>
> > > > The log on each slave of the lost task is : No executor found with ID:
> > > > executor_Task_Tracker_XXX.
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: 王瑜
> > > > 发送时间: 2013-05-07 11:13
> > > > 收件人: mesos-dev
> > > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Hi all,
> > > >
>
> > > > I have tried adding file extension when upload executor as well as the
> > > > conf file, but it still can not work.
> > > >
> > > > And I have seen
> > > >
> >
>
> > > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > > but it is a null directory.
> > > >
> >
>
> > > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > > really need your help, thanks very much!
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: Vinod Kone
> > > > 发送时间: 2013-04-26 01:31
> > > > 收件人: mesos-dev@incubator.apache.org
> > > > 抄送: wangyu
> > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Also, you could look at the executor logs (default:
> > > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > > >  TASK_LOST happened.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > > benjamin.mahler@gmail.com> wrote:
> > > >
>
> > > > Can you maintain the file extension? That is how mesos knows to extract
> > > it:
> > > > hadoop fs -copyFromLocal
> > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > /user/mesos/mesos-executor.tar.gz
> > > >
> > > > Also make sure your mapred-site.xml has the extension as well.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > Hi, Ben,
> > > > >
> > > > > I have tried as you said, but It still can not work.
> > > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > > /user/mesos/mesos-executor
> > > > > Did I do the right thing? Thanks very much!
> > > > >
> > > > > The log in jobtracker is:
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > > Task_Tracker_82 on http://slave1:31000
> >
>
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > slots needed.
> > > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > > >       Pending Map Tasks: 2
> > > > >    Pending Reduce Tasks: 1
> > > > >          Idle Map Slots: 0
> > > > >       Idle Reduce Slots: 0
> > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > >        Needed Map Slots: 2
> > > > >     Needed Reduce Slots: 1
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > > Task_Tracker_83 on http://slave1:31000
> >
>
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > > slots needed.
> > > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > > >       Pending Map Tasks: 2
> > > > >    Pending Reduce Tasks: 1
> > > > >          Idle Map Slots: 0
> > > > >       Idle Reduce Slots: 0
> > > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > > >        Needed Map Slots: 2
> > > > >     Needed Reduce Slots: 1
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > 发件人: Benjamin Mahler
> > > > > 发送时间: 2013-04-24 07:49
> > > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> >
>
> > > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> >
>
> > > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > > just uploaded the wrong file). :)
> > > > >
> > > > > Can you try that and report back?
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > Guodong,
> > > > > >
> >
>
> > > > > > There still are problems with me, I think there are some problem with
> > > > my
> > > > > > executor setting.
> > > > > >
> > > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > > mesos-master-hostname)
> > > > > >   <property>
> > > > > >     <name>mapred.mesos.executor</name>
> > > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > > >   </property>
> > > > > >
> > > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > > >
> > > > > > The head content is as follows:
> > > > > >
> > > > > > #! /bin/sh
> > > > > >
> >
>
> > > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > > #
>
> > > > > > # The mesos-executor program cannot be directly executed until all
> > > the
> > > > > > libtool
> > > > > > # libraries that it depends on are installed.
> > > > > > #
> > > > > > # This wrapper script should never be moved out of the build
> > > directory.
> > > > > > # If it is, it will not operate correctly.
> > > > > >
> > > > > > # Sed substitution that helps us do robust quoting.  It
> > > backslashifies
> >
>
> > > > > > # metacharacters that are still active within double-quoted strings.
> > > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > > >
> > > > > > # Be Bourne compatible
> >
>
> > > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > > >   emulate sh
> > > > > >   NULLCMD=:
> > > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > > >   # is contrary to our usage.  Disable this feature.
> > > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > > >   setopt NO_GLOB_SUBST
> > > > > > else
> > > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > > fi
> > > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > > >
> >
>
> > > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > > # if CDPATH is set.
> > > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > > >
> > > > > > relink_command="(cd /home/mesos/build/src; { test -z
>
> > > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > > export
> > > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> >
>
> > > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > > -z
> > > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > > GCC_EXEC_PREFIX=;
>
> > > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > > unset
> > > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > > >
> > > > >
> > > >
> >
>
> > > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > > export LD_LIBRARY_PATH;
> > > > > >
> > > > >
> > > >
> >
>
> > > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> >
>
> > > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> >
>
> > > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > > ...
> > > > > >
> > > > > >
> > > > > > Did I upload the right file? and set up it in conf file correct?
> > > Thanks
> > > > > > very much!
> > > > > >
> > > > > >
> > > > > >
> > > > > > Wang Yu
> > > > > >
> > > > > > From: 王国栋
> > > > > > Date: 2013-04-23 13:32
> > > > > > To: wangyu
> > > > > > CC: mesos-dev
> > > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > TaskTracker: http://slave5:50060
> > > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > > >
> > > > > > if you run hadoop in local mode, use the following setting is ok
> > > > > >   <property>
> > > > > >     <name>mapred.mesos.master</name>
> > > > > >     <value>local</value>
> > > > > >   </property>
> > > > > >
> > > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > > mesos-master-hostname:mesos-master-port.
> > > > > >
> > > > > > Make sure the dns parser result for mesos-master-hostname is the
> > > right
> > > > > ip.
> > > > > >
> >
>
> > > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > > check
> > > > > > if there is hadoop framework registered.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > > wrote:
> > > > > >
> > > > > > > **
> > > > > > > Hi, Guodong,
> > > > > > >
> > > > > > > I start hadoop as you said, then I saw this error:
> >
>
> > > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > driver: Cannot parse
> > > > > > > '@0.0.0.0:0'
> > > > > > >
> >
>
> > > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > > this?
> > > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > > >
> > > > > > > The whole log is as follows:
> > > > > > >
> > > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > > /************************************************************
> > > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > > STARTUP_MSG:   args = []
> > > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > > >
> > > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > > 11:19:33
> > > > > > CST 2013
> > > > > > > ************************************************************/
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > > hadoop-metrics2.properties
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > MetricsSystem,sub=Stats registered.
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > > period
> > > > > > at 10 second(s).
> > > > > > >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > > system
> > > > > > started
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > QueueMetrics,q=default registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > ugi
> > > > > > registered.
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Updating the current master key for generating delegation tokens
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Starting expired delegation token remover thread,
> > > > > > tokenRemoverScanInterval=60 min(s)
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > > limitMaxMemForMapTasks,
> > > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO
> > > > delegation.AbstractDelegationTokenSecretManager:
> > > > > > Updating the current master key for generating delegation tokens
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > > (include/exclude) list
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > > owner
> > > > > > as root
> > > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > RpcDetailedActivityForPort9001 registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > RpcActivityForPort9001 registered.
> > > > > > >
> > > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > > org.mortbay.log.Slf4jLog
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > > Opening
> > > > > > the listener on 50030
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > > returned
> > > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
>
> > > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > jvm
> > > > > > registered.
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > > JobTrackerMetrics registered.
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > > 50030
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > > directory
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > > initialized in embedded mode
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > > server at: localhost:50030
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > > address: localhost:50030
> > > > > > >
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > > job
> > > > > > store is inactive
> > > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > > MesosScheduler
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > > information
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > > driver: Cannot parse '@
> > > > > > > 0.0.0.0:0'
>
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > > file
> > > > > to
>
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > > file
> > > > > to
> > > > > > >
> > > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > > (include/exclude) list
>
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > > starting
> > > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > > starting
>
> > > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > > starting
> > > > > > >
> > > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> >
>
> > > > > > native-hadoop library for your platform... using builtin-java classes
> > > > > where
> > > > > > applicable
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > > nMaps=0 nReduces=0 max=-1
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > > job_201304231321_0001
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > > added successfully for user 'root' to queue 'default'
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > > >  IP=192.168.0.2
> > > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> > >  RESULT=SUCCESS
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > > job_201304231321_0001
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > > job_201304231321_0001
> > > > > > >
> >
>
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > > stored with users keys in
> > > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > > >
> > > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > > job_201304231321_0001
> > > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > Wang Yu
> > > > > > >
> > > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > > *Date:* 2013-04-23 11:34
> > > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > > >  Hi Yu,
> > > > > > >
>
> > > > > > > Mesos will just launch tasktracker on each slave node as long as
> > > the
> >
>
> > > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > > >
> > > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > > configure
> > > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > > normal
> > > > > > one.
>
> > > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > > > configure mapred-site.xml, this jobtracker should contains the
> > > patch
> > > > > for
> > > > > > > mesos)
> > > > > > >
>
> > > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > > status
> > > > > > > of Jobtracker.
> > > > > > >
> > > > > > >  Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > > >
> >
>
> > > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > > >> problem. Thanks very much!
> > > > > > >>
> >
>
> > > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > > >> DataNode) I should stop it first?
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Wang Yu
> > > > > > >>
> > > > > > >> 发件人: Benjamin Mahler
> > > > > > >> 发送时间: 2013-04-23 10:56
> > > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > > >> TaskTracker: http://slave5:50060
> > > > > > >>  The scheduler we wrote for Hadoop will start its own
> > > TaskTrackers,
> > > > > > >> meaning
> > > > > > >> you do not have to start any TaskTrackers yourself
> > > > > > >>
> >
>
> > > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > > running
> > > > > > >> in your cluster?
> > > > > > >>
> > > > > > >> Looking at your jps output, is there already a TaskTracker
> > > running?
> > > > > > >> [root@master logs]# jps
> > > > > > >> 13896 RunJar
> > > > > > >> 14123 Jps
> > > > > > >> 12718 NameNode
> > > > > > >> 12900 DataNode
> > > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > > >> 13218 JobTracker
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > > >>
> > > > > > >> > Hi, Ben and Guodong,
> > > > > > >> >
> >
>
> > > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > > know
> > > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > > familiar
> > > > > > >> with
> > > > > > >> > mesos very much.
> > > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > > core-site.xml
> > > > > > in
> >
>
> > > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > > set
> > > > > up
> > > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > > >> >
>
> > > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > > way
> > > > to
> > > > > > >> go...
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > The log messages you see:
> > > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > > >> >
>
> > > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > > currently
> > > > > > >> > don't support running your own TaskTrackers, as the
> > > MesosScheduler
> > > > > > will
> > > > > > >> > launch them on your behalf when needed.
> > > > > > >> >
> > > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > > Hadoop
> > > > > > with
> >
>
> > > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > > jobs
> > > > > > >> have
> >
>
> > > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > > >> >
> > > > > > >> > Guodong may be able to help further getting set up!
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Wang Yu
> > > > > > >> >
> > > > > > >> > From: 王国栋
> > > > > > >> > Date: 2013-04-18 17:10
> > > > > > >> > To: mesos-dev; wangyu
> > > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > Unknown/exited
> > > > > > >> > TaskTracker: http://slave5:50060
> >
>
> > > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > > >> normally
> > > > > > >> > located in the dir like
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> >
>
> > > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > > >> > The log is tasktracker log.
> > > > > > >> >
> > > > > > >> > I hope it will help.
> > > > > > >> >
> > > > > > >> > Guodong
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> >
> > > > wrote:
> > > > > > >> >
> > > > > > >> > > **
> > > > > > >> > > Hi All,
> > > > > > >> > >
>
> > > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > > and
> > > > > it
> > > > > > >> works
> > > > > > >> > > well.
> > > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > > master,
> > > > > > >> slave1,
>
> > > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > > >> > >  [root@master logs]# jps
> > > > > > >> > > 13896 RunJar
> > > > > > >> > > 14123 Jps
> > > > > > >> > > 12718 NameNode
> > > > > > >> > > 12900 DataNode
> > > > > > >> > > 13374 TaskTracker
> > > > > > >> > > 13218 JobTracker
> > > > > > >> > >
> > > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > > >> > >  [root@master
> > > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > > hadoop-examples-0.20.205.0.jar
> > > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > > >> > > Running 30 maps.
> > > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > > >> > job_201304181646_0001
> > > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > > >> > > It stopped here.
> > > > > > >> > >
>
> > > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > > it
> > > > > > shows:
> > > > > > >> > >  2013-04-18 16
> >
>
> > > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > > RUNNING
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 5
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 6
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 9
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 7
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > > handler 8
> > > > > > on
> > > > > > >> > 9001: starting
> > > > > > >> > > 2013-04-18 16
> >
>
> > > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > > a
> > > > > new
> > > > > > >> > node: /default-rack/master
> > > > > > >> > > 2013-04-18 16
>
> > > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > > tracker
> > > > > > >> > tracker_master:localhost/
> > > > > > >> > > 127.0.0.1:44997 to host master
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > > 2013-04-18 16
> > > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> Unknown/exited
> > > > > > >> > TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > > org.apache.hadoop.net.NetworkTopology:
> > > > > > >> > Adding a new node: /default-rack/slave5
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > org.apache.hadoop.mapred.JobTracker:
> > > > > > >> Adding
> > > > > > >> > tracker tracker_slave5:
> > > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://slave5:50060.
> > > > > > >> > >
> > > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > > >> > Unknown/exited TaskTracker:
> > > > > > >> > > http://master:50060.
> > > > > > >> > >
> > > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
I have uploaded them in the former email, I will send them again. PS: Will the email list reject the attachements?

Can you see them?




Wang Yu

发件人: Benjamin Mahler
发送时间: 2013-05-09 10:00
收件人: mesos-dev@incubator.apache.org; wangyu
主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
Did you forget to attach them?


On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> OK.
> Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> happened.
>
> Thanks very much for your help!
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-09 01:23
> *收件人:* mesos-dev@incubator.apache.org
> *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>
> Hey Brenden, are there any bugs in particular here that you're referring to?
>
> Wang, can you provide the logs for the JobTracker, the slave, and the
> master?
>
>
> On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> brenden.matthews@airbedandbreakfast.com> wrote:
>
> > You may want to try Airbnb's dist of Mesos:
> >
> > https://github.com/airbnb/mesos/tree/testing
> >
> > A good number of these Mesos bugs have been fixed but aren't yet merged
> > into upstream.
> >
> >
> > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > The log on each slave of the lost task is : No executor found with ID:
> > > executor_Task_Tracker_XXX.
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: 王瑜
> > > 发送时间: 2013-05-07 11:13
> > > 收件人: mesos-dev
> > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > Hi all,
> > >
> > > I have tried adding file extension when upload executor as well as the
> > > conf file, but it still can not work.
> > >
> > > And I have seen
> > >
>
> > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > but it is a null directory.
> > >
>
> > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > really need your help, thanks very much!
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: Vinod Kone
> > > 发送时间: 2013-04-26 01:31
> > > 收件人: mesos-dev@incubator.apache.org
> > > 抄送: wangyu
> > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > Also, you could look at the executor logs (default:
> > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > >  TASK_LOST happened.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > benjamin.mahler@gmail.com> wrote:
> > >
> > > Can you maintain the file extension? That is how mesos knows to extract
> > it:
> > > hadoop fs -copyFromLocal
> > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > /user/mesos/mesos-executor.tar.gz
> > >
> > > Also make sure your mapred-site.xml has the extension as well.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > Hi, Ben,
> > > >
> > > > I have tried as you said, but It still can not work.
> > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > /user/mesos/mesos-executor
> > > > Did I do the right thing? Thanks very much!
> > > >
> > > > The log in jobtracker is:
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > Task_Tracker_82 on http://slave1:31000
>
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > slots needed.
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > >       Pending Map Tasks: 2
> > > >    Pending Reduce Tasks: 1
> > > >          Idle Map Slots: 0
> > > >       Idle Reduce Slots: 0
> > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > >        Needed Map Slots: 2
> > > >     Needed Reduce Slots: 1
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > Task_Tracker_83 on http://slave1:31000
>
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > slots needed.
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > >       Pending Map Tasks: 2
> > > >    Pending Reduce Tasks: 1
> > > >          Idle Map Slots: 0
> > > >       Idle Reduce Slots: 0
> > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > >        Needed Map Slots: 2
> > > >     Needed Reduce Slots: 1
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: Benjamin Mahler
> > > > 发送时间: 2013-04-24 07:49
> > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
>
> > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
>
> > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > just uploaded the wrong file). :)
> > > >
> > > > Can you try that and report back?
> > > >
> > > >
> > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > Guodong,
> > > > >
>
> > > > > There still are problems with me, I think there are some problem with
> > > my
> > > > > executor setting.
> > > > >
> > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > mesos-master-hostname)
> > > > >   <property>
> > > > >     <name>mapred.mesos.executor</name>
> > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > >   </property>
> > > > >
> > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > >
> > > > > The head content is as follows:
> > > > >
> > > > > #! /bin/sh
> > > > >
>
> > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > #
> > > > > # The mesos-executor program cannot be directly executed until all
> > the
> > > > > libtool
> > > > > # libraries that it depends on are installed.
> > > > > #
> > > > > # This wrapper script should never be moved out of the build
> > directory.
> > > > > # If it is, it will not operate correctly.
> > > > >
> > > > > # Sed substitution that helps us do robust quoting.  It
> > backslashifies
>
> > > > > # metacharacters that are still active within double-quoted strings.
> > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > >
> > > > > # Be Bourne compatible
>
> > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > >   emulate sh
> > > > >   NULLCMD=:
> > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > >   # is contrary to our usage.  Disable this feature.
> > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > >   setopt NO_GLOB_SUBST
> > > > > else
> > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > fi
> > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > >
>
> > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > # if CDPATH is set.
> > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > >
> > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > export
> > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
>
> > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > -z
> > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > GCC_EXEC_PREFIX=;
> > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > unset
> > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > >
> > > >
> > >
>
> > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > export LD_LIBRARY_PATH;
> > > > >
> > > >
> > >
>
> > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
>
> > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
>
> > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > ...
> > > > >
> > > > >
> > > > > Did I upload the right file? and set up it in conf file correct?
> > Thanks
> > > > > very much!
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > From: 王国栋
> > > > > Date: 2013-04-23 13:32
> > > > > To: wangyu
> > > > > CC: mesos-dev
> > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > >
> > > > > if you run hadoop in local mode, use the following setting is ok
> > > > >   <property>
> > > > >     <name>mapred.mesos.master</name>
> > > > >     <value>local</value>
> > > > >   </property>
> > > > >
> > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > mesos-master-hostname:mesos-master-port.
> > > > >
> > > > > Make sure the dns parser result for mesos-master-hostname is the
> > right
> > > > ip.
> > > > >
>
> > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > check
> > > > > if there is hadoop framework registered.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Guodong
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > **
> > > > > > Hi, Guodong,
> > > > > >
> > > > > > I start hadoop as you said, then I saw this error:
>
> > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > driver: Cannot parse
> > > > > > '@0.0.0.0:0'
> > > > > >
>
> > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > this?
> > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > >
> > > > > > The whole log is as follows:
> > > > > >
> > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > /************************************************************
> > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > STARTUP_MSG:   args = []
> > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > >
> > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > 11:19:33
> > > > > CST 2013
> > > > > > ************************************************************/
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > hadoop-metrics2.properties
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > MetricsSystem,sub=Stats registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > period
> > > > > at 10 second(s).
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > system
> > > > > started
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > QueueMetrics,q=default registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > ugi
> > > > > registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Updating the current master key for generating delegation tokens
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Starting expired delegation token remover thread,
> > > > > tokenRemoverScanInterval=60 min(s)
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > limitMaxMemForMapTasks,
> > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Updating the current master key for generating delegation tokens
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > (include/exclude) list
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > owner
> > > > > as root
> > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > RpcDetailedActivityForPort9001 registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > RpcActivityForPort9001 registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > org.mortbay.log.Slf4jLog
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > Opening
> > > > > the listener on 50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > returned
> > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > jvm
> > > > > registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > JobTrackerMetrics registered.
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > 50030
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > directory
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > initialized in embedded mode
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > server at: localhost:50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > address: localhost:50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > job
> > > > > store is inactive
> > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > MesosScheduler
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > information
> > > > > >
>
> > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > driver: Cannot parse '@
> > > > > > 0.0.0.0:0'
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > file
> > > > to
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > file
> > > > to
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > (include/exclude) list
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > starting
> > > > > >
> > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
>
> > > > > native-hadoop library for your platform... using builtin-java classes
> > > > where
> > > > > applicable
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > nMaps=0 nReduces=0 max=-1
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > job_201304231321_0001
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > added successfully for user 'root' to queue 'default'
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > >  IP=192.168.0.2
> > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> >  RESULT=SUCCESS
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > job_201304231321_0001
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > job_201304231321_0001
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > stored with users keys in
> > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > job_201304231321_0001
> > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > >
> > > > > > ------------------------------
> > > > > > Wang Yu
> > > > > >
> > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > *Date:* 2013-04-23 11:34
> > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > >  Hi Yu,
> > > > > >
> > > > > > Mesos will just launch tasktracker on each slave node as long as
> > the
>
> > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > >
> > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > configure
> > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > normal
> > > > > one.
> > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > > configure mapred-site.xml, this jobtracker should contains the
> > patch
> > > > for
> > > > > > mesos)
> > > > > >
> > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > status
> > > > > > of Jobtracker.
> > > > > >
> > > > > >  Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > > >
>
> > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > >> problem. Thanks very much!
> > > > > >>
>
> > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > >> DataNode) I should stop it first?
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Wang Yu
> > > > > >>
> > > > > >> 发件人: Benjamin Mahler
> > > > > >> 发送时间: 2013-04-23 10:56
> > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > >> TaskTracker: http://slave5:50060
> > > > > >>  The scheduler we wrote for Hadoop will start its own
> > TaskTrackers,
> > > > > >> meaning
> > > > > >> you do not have to start any TaskTrackers yourself
> > > > > >>
>
> > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > running
> > > > > >> in your cluster?
> > > > > >>
> > > > > >> Looking at your jps output, is there already a TaskTracker
> > running?
> > > > > >> [root@master logs]# jps
> > > > > >> 13896 RunJar
> > > > > >> 14123 Jps
> > > > > >> 12718 NameNode
> > > > > >> 12900 DataNode
> > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > >> 13218 JobTracker
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > > >>
> > > > > >> > Hi, Ben and Guodong,
> > > > > >> >
>
> > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > know
> > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > familiar
> > > > > >> with
> > > > > >> > mesos very much.
> > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > core-site.xml
> > > > > in
>
> > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > set
> > > > up
> > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > >> >
> > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > way
> > > to
> > > > > >> go...
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > The log messages you see:
> > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > >> >
> > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > currently
> > > > > >> > don't support running your own TaskTrackers, as the
> > MesosScheduler
> > > > > will
> > > > > >> > launch them on your behalf when needed.
> > > > > >> >
> > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > Hadoop
> > > > > with
>
> > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > jobs
> > > > > >> have
>
> > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > >> >
> > > > > >> > Guodong may be able to help further getting set up!
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Wang Yu
> > > > > >> >
> > > > > >> > From: 王国栋
> > > > > >> > Date: 2013-04-18 17:10
> > > > > >> > To: mesos-dev; wangyu
> > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > >> > TaskTracker: http://slave5:50060
>
> > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > >> normally
> > > > > >> > located in the dir like
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
>
> > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > >> > The log is tasktracker log.
> > > > > >> >
> > > > > >> > I hope it will help.
> > > > > >> >
> > > > > >> > Guodong
> > > > > >> >
> > > > > >> >
> > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > >> >
> > > > > >> > > **
> > > > > >> > > Hi All,
> > > > > >> > >
> > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > and
> > > > it
> > > > > >> works
> > > > > >> > > well.
> > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > master,
> > > > > >> slave1,
> > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > >> > >  [root@master logs]# jps
> > > > > >> > > 13896 RunJar
> > > > > >> > > 14123 Jps
> > > > > >> > > 12718 NameNode
> > > > > >> > > 12900 DataNode
> > > > > >> > > 13374 TaskTracker
> > > > > >> > > 13218 JobTracker
> > > > > >> > >
> > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > >> > >  [root@master
> > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > hadoop-examples-0.20.205.0.jar
> > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > >> > > Running 30 maps.
> > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > >> > job_201304181646_0001
> > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > >> > > It stopped here.
> > > > > >> > >
> > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > it
> > > > > shows:
> > > > > >> > >  2013-04-18 16
>
> > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > RUNNING
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 5
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 6
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 9
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 7
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 8
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
>
> > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > a
> > > > new
> > > > > >> > node: /default-rack/master
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > tracker
> > > > > >> > tracker_master:localhost/
> > > > > >> > > 127.0.0.1:44997 to host master
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > org.apache.hadoop.net.NetworkTopology:
> > > > > >> > Adding a new node: /default-rack/slave5
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > org.apache.hadoop.mapred.JobTracker:
> > > > > >> Adding
> > > > > >> > tracker tracker_slave5:
> > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Benjamin Mahler <be...@gmail.com>.
Did you forget to attach them?


On Wed, May 8, 2013 at 6:48 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:

> **
> OK.
> Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost
> happened.
>
> Thanks very much for your help!
>
> ------------------------------
> Wang Yu
>
>  *发件人:* Benjamin Mahler <be...@gmail.com>
> *发送时间:* 2013-05-09 01:23
> *收件人:* mesos-dev@incubator.apache.org
> *抄送:* wangyu <wa...@nfs.iscas.ac.cn>
> *主题:* Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> TaskTracker: http://slave5:50060
>
> Hey Brenden, are there any bugs in particular here that you're referring to?
>
> Wang, can you provide the logs for the JobTracker, the slave, and the
> master?
>
>
> On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
> brenden.matthews@airbedandbreakfast.com> wrote:
>
> > You may want to try Airbnb's dist of Mesos:
> >
> > https://github.com/airbnb/mesos/tree/testing
> >
> > A good number of these Mesos bugs have been fixed but aren't yet merged
> > into upstream.
> >
> >
> > On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > The log on each slave of the lost task is : No executor found with ID:
> > > executor_Task_Tracker_XXX.
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: 王瑜
> > > 发送时间: 2013-05-07 11:13
> > > 收件人: mesos-dev
> > > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > Hi all,
> > >
> > > I have tried adding file extension when upload executor as well as the
> > > conf file, but it still can not work.
> > >
> > > And I have seen
> > >
>
> > /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > > but it is a null directory.
> > >
>
> > > Is there any other logs I can read to know why the TASK_LOST happened? I
> > > really need your help, thanks very much!
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: Vinod Kone
> > > 发送时间: 2013-04-26 01:31
> > > 收件人: mesos-dev@incubator.apache.org
> > > 抄送: wangyu
> > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > Also, you could look at the executor logs (default:
> > > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> > >  TASK_LOST happened.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > > benjamin.mahler@gmail.com> wrote:
> > >
> > > Can you maintain the file extension? That is how mesos knows to extract
> > it:
> > > hadoop fs -copyFromLocal
> > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > /user/mesos/mesos-executor.tar.gz
> > >
> > > Also make sure your mapred-site.xml has the extension as well.
> > >
> > >
> > >
> > > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > Hi, Ben,
> > > >
> > > > I have tried as you said, but It still can not work.
> > > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > > /user/mesos/mesos-executor
> > > > Did I do the right thing? Thanks very much!
> > > >
> > > > The log in jobtracker is:
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > > Task_Tracker_82 on http://slave1:31000
>
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > slots needed.
> > > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > > >       Pending Map Tasks: 2
> > > >    Pending Reduce Tasks: 1
> > > >          Idle Map Slots: 0
> > > >       Idle Reduce Slots: 0
> > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > >        Needed Map Slots: 2
> > > >     Needed Reduce Slots: 1
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > > Task_Tracker_83 on http://slave1:31000
>
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > > slots needed.
> > > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > > >       Pending Map Tasks: 2
> > > >    Pending Reduce Tasks: 1
> > > >          Idle Map Slots: 0
> > > >       Idle Reduce Slots: 0
> > > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > > >        Needed Map Slots: 2
> > > >     Needed Reduce Slots: 1
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > 发件人: Benjamin Mahler
> > > > 发送时间: 2013-04-24 07:49
> > > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > > TaskTracker: http://slave5:50060
>
> > > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
>
> > > > Then point the conf file to the hdfs directory (you had the right idea,
> > > > just uploaded the wrong file). :)
> > > >
> > > > Can you try that and report back?
> > > >
> > > >
> > > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > Guodong,
> > > > >
>
> > > > > There still are problems with me, I think there are some problem with
> > > my
> > > > > executor setting.
> > > > >
> > > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > > mesos-master-hostname)
> > > > >   <property>
> > > > >     <name>mapred.mesos.executor</name>
> > > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > > >   </property>
> > > > >
> > > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > > >
> > > > > The head content is as follows:
> > > > >
> > > > > #! /bin/sh
> > > > >
>
> > > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > > #
> > > > > # The mesos-executor program cannot be directly executed until all
> > the
> > > > > libtool
> > > > > # libraries that it depends on are installed.
> > > > > #
> > > > > # This wrapper script should never be moved out of the build
> > directory.
> > > > > # If it is, it will not operate correctly.
> > > > >
> > > > > # Sed substitution that helps us do robust quoting.  It
> > backslashifies
>
> > > > > # metacharacters that are still active within double-quoted strings.
> > > > > Xsed='/bin/sed -e 1s/^X//'
> > > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > > >
> > > > > # Be Bourne compatible
>
> > > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > > >   emulate sh
> > > > >   NULLCMD=:
> > > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > > >   # is contrary to our usage.  Disable this feature.
> > > > >   alias -g '${1+"$@"}'='"$@"'
> > > > >   setopt NO_GLOB_SUBST
> > > > > else
> > > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > > fi
> > > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > > >
>
> > > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > > # if CDPATH is set.
> > > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > > >
> > > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > > export
> > > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
>
> > > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > > -z
> > > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > > GCC_EXEC_PREFIX=;
> > > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > > unset
> > > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > > >
> > > >
> > >
>
> > LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > > export LD_LIBRARY_PATH;
> > > > >
> > > >
> > >
>
> > PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
>
> > > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
>
> > > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > > ...
> > > > >
> > > > >
> > > > > Did I upload the right file? and set up it in conf file correct?
> > Thanks
> > > > > very much!
> > > > >
> > > > >
> > > > >
> > > > > Wang Yu
> > > > >
> > > > > From: 王国栋
> > > > > Date: 2013-04-23 13:32
> > > > > To: wangyu
> > > > > CC: mesos-dev
> > > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > TaskTracker: http://slave5:50060
> > > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > > >
> > > > > if you run hadoop in local mode, use the following setting is ok
> > > > >   <property>
> > > > >     <name>mapred.mesos.master</name>
> > > > >     <value>local</value>
> > > > >   </property>
> > > > >
> > > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > > mesos-master-hostname:mesos-master-port.
> > > > >
> > > > > Make sure the dns parser result for mesos-master-hostname is the
> > right
> > > > ip.
> > > > >
>
> > > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > > check
> > > > > if there is hadoop framework registered.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Guodong
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wangyu@nfs.iscas.ac.cn
> > wrote:
> > > > >
> > > > > > **
> > > > > > Hi, Guodong,
> > > > > >
> > > > > > I start hadoop as you said, then I saw this error:
>
> > > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > driver: Cannot parse
> > > > > > '@0.0.0.0:0'
> > > > > >
>
> > > > > > What's this mean? where should I change MesosScheduler code to fix
> > > > this?
> > > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > > >
> > > > > > The whole log is as follows:
> > > > > >
> > > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > > /************************************************************
> > > > > > STARTUP_MSG: Starting JobTracker
> > > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > > STARTUP_MSG:   args = []
> > > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > > >
> > > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > > 11:19:33
> > > > > CST 2013
> > > > > > ************************************************************/
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > > hadoop-metrics2.properties
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > MetricsSystem,sub=Stats registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > > period
> > > > > at 10 second(s).
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > > system
> > > > > started
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > QueueMetrics,q=default registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > ugi
> > > > > registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Updating the current master key for generating delegation tokens
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Starting expired delegation token remover thread,
> > > > > tokenRemoverScanInterval=60 min(s)
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > > limitMaxMemForMapTasks,
> > > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO
> > > delegation.AbstractDelegationTokenSecretManager:
> > > > > Updating the current master key for generating delegation tokens
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > > (include/exclude) list
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > > owner
> > > > > as root
> > > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > RpcDetailedActivityForPort9001 registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > RpcActivityForPort9001 registered.
> > > > > >
> > > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > > org.mortbay.log.Slf4jLog
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > > Opening
> > > > > the listener on 50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > > returned
> > > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> > > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > > SelectChannelConnector@0.0.0.0:50030
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > jvm
> > > > > registered.
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > > JobTrackerMetrics registered.
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> > 50030
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > > directory
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > > initialized in embedded mode
> > > > > >
>
> > > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > > server at: localhost:50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > > address: localhost:50030
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> > job
> > > > > store is inactive
> > > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> > MesosScheduler
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > > information
> > > > > >
>
> > > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > > driver: Cannot parse '@
> > > > > > 0.0.0.0:0'
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > > file
> > > > to
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > > file
> > > > to
> > > > > >
> > > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > > (include/exclude) list
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > > starting
> > > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > > starting
> > > > > >
> > > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
>
> > > > > native-hadoop library for your platform... using builtin-java classes
> > > > where
> > > > > applicable
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > > nMaps=0 nReduces=0 max=-1
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > > job_201304231321_0001
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > > added successfully for user 'root' to queue 'default'
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> > >  IP=192.168.0.2
> > > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
> >  RESULT=SUCCESS
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > > job_201304231321_0001
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > > job_201304231321_0001
> > > > > >
>
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > > stored with users keys in
> > > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > > >
> > > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > > job_201304231321_0001
> > > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > > >
> > > > > > ------------------------------
> > > > > > Wang Yu
> > > > > >
> > > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > > *Date:* 2013-04-23 11:34
> > > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > > wangyu@nfs.iscas.ac.cn>
> > > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > > >  Hi Yu,
> > > > > >
> > > > > > Mesos will just launch tasktracker on each slave node as long as
> > the
>
> > > > > > required resource is enough for the tasktracker. So you have to run
> > > > > > NameNode, Jobtracker and DataNode by your own.
> > > > > >
> > > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> > configure
> > > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > > normal
> > > > > one.
> > > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > > configure mapred-site.xml, this jobtracker should contains the
> > patch
> > > > for
> > > > > > mesos)
> > > > > >
> > > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > > status
> > > > > > of Jobtracker.
> > > > > >
> > > > > >  Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > > >
>
> > > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > > >> problem. Thanks very much!
> > > > > >>
>
> > > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > > >> DataNode) I should stop it first?
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Wang Yu
> > > > > >>
> > > > > >> 发件人: Benjamin Mahler
> > > > > >> 发送时间: 2013-04-23 10:56
> > > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > > >> TaskTracker: http://slave5:50060
> > > > > >>  The scheduler we wrote for Hadoop will start its own
> > TaskTrackers,
> > > > > >> meaning
> > > > > >> you do not have to start any TaskTrackers yourself
> > > > > >>
>
> > > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > > running
> > > > > >> in your cluster?
> > > > > >>
> > > > > >> Looking at your jps output, is there already a TaskTracker
> > running?
> > > > > >> [root@master logs]# jps
> > > > > >> 13896 RunJar
> > > > > >> 14123 Jps
> > > > > >> 12718 NameNode
> > > > > >> 12900 DataNode
> > > > > >> 13374 TaskTracker  <--- How was this started?
> > > > > >> 13218 JobTracker
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > > >>
> > > > > >> > Hi, Ben and Guodong,
> > > > > >> >
>
> > > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > > know
> > > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > > familiar
> > > > > >> with
> > > > > >> > mesos very much.
> > > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > > core-site.xml
> > > > > in
>
> > > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > > set
> > > > up
> > > > > >> > hadoop on mesos, and run my MR tasks.
> > > > > >> >
> > > > > >> > Thanks very much for your patient reply...Maybe I have a long
> > way
> > > to
> > > > > >> go...
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > The log messages you see:
> > > > > >> > 2013-04-18 16:47:19,645 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > > >> >
> > > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > > currently
> > > > > >> > don't support running your own TaskTrackers, as the
> > MesosScheduler
> > > > > will
> > > > > >> > launch them on your behalf when needed.
> > > > > >> >
> > > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > > Hadoop
> > > > > with
>
> > > > > >> > mesos is that you no longer have to do that. We will detect that
> > > > jobs
> > > > > >> have
>
> > > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > > >> >
> > > > > >> > Guodong may be able to help further getting set up!
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Wang Yu
> > > > > >> >
> > > > > >> > From: 王国栋
> > > > > >> > Date: 2013-04-18 17:10
> > > > > >> > To: mesos-dev; wangyu
> > > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > Unknown/exited
> > > > > >> > TaskTracker: http://slave5:50060
>
> > > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > > >> normally
> > > > > >> > located in the dir like
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
>
> > "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > > >> > The log is tasktracker log.
> > > > > >> >
> > > > > >> > I hope it will help.
> > > > > >> >
> > > > > >> > Guodong
> > > > > >> >
> > > > > >> >
> > > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > > wrote:
> > > > > >> >
> > > > > >> > > **
> > > > > >> > > Hi All,
> > > > > >> > >
> > > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> > and
> > > > it
> > > > > >> works
> > > > > >> > > well.
> > > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > > master,
> > > > > >> slave1,
> > > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > > >> > >  [root@master logs]# jps
> > > > > >> > > 13896 RunJar
> > > > > >> > > 14123 Jps
> > > > > >> > > 12718 NameNode
> > > > > >> > > 12900 DataNode
> > > > > >> > > 13374 TaskTracker
> > > > > >> > > 13218 JobTracker
> > > > > >> > >
> > > > > >> > > Then I run test benchmark, it can not go on working...
> > > > > >> > >  [root@master
> > > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > > hadoop-examples-0.20.205.0.jar
> > > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > > >> > > Running 30 maps.
> > > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > > >> > job_201304181646_0001
> > > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > > >> > > It stopped here.
> > > > > >> > >
> > > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> > it
> > > > > shows:
> > > > > >> > >  2013-04-18 16
>
> > > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > > RUNNING
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 5
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 6
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 9
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 7
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 8
> > > > > on
> > > > > >> > 9001: starting
> > > > > >> > > 2013-04-18 16
>
> > > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> > a
> > > > new
> > > > > >> > node: /default-rack/master
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > > tracker
> > > > > >> > tracker_master:localhost/
> > > > > >> > > 127.0.0.1:44997 to host master
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > > 2013-04-18 16
> > > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> Unknown/exited
> > > > > >> > TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > > org.apache.hadoop.net.NetworkTopology:
> > > > > >> > Adding a new node: /default-rack/slave5
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > org.apache.hadoop.mapred.JobTracker:
> > > > > >> Adding
> > > > > >> > tracker tracker_slave5:
> > > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://slave5:50060.
> > > > > >> > >
> > > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > > >> > Unknown/exited TaskTracker:
> > > > > >> > > http://master:50060.
> > > > > >> > >
> > > > > >> > > Does anybody can help me? Thanks very much!
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>

回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by 王瑜 <wa...@nfs.iscas.ac.cn>.
OK. 
Logs are attached. I use Ctrl+C to stop jobtracker when the task_lost happened. 

Thanks very much for your help!




Wang Yu

发件人: Benjamin Mahler
发送时间: 2013-05-09 01:23
收件人: mesos-dev@incubator.apache.org
抄送: wangyu
主题: Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060
Hey Brenden, are there any bugs in particular here that you're referring to?

Wang, can you provide the logs for the JobTracker, the slave, and the
master?


On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
brenden.matthews@airbedandbreakfast.com> wrote:

> You may want to try Airbnb's dist of Mesos:
>
> https://github.com/airbnb/mesos/tree/testing
>
> A good number of these Mesos bugs have been fixed but aren't yet merged
> into upstream.
>
>
> On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > The log on each slave of the lost task is : No executor found with ID:
> > executor_Task_Tracker_XXX.
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: 王瑜
> > 发送时间: 2013-05-07 11:13
> > 收件人: mesos-dev
> > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > Hi all,
> >
> > I have tried adding file extension when upload executor as well as the
> > conf file, but it still can not work.
> >
> > And I have seen
> >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > but it is a null directory.
> >
> > Is there any other logs I can read to know why the TASK_LOST happened? I
> > really need your help, thanks very much!
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Vinod Kone
> > 发送时间: 2013-04-26 01:31
> > 收件人: mesos-dev@incubator.apache.org
> > 抄送: wangyu
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > Also, you could look at the executor logs (default:
> > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> >  TASK_LOST happened.
> >
> >
> >
> > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > benjamin.mahler@gmail.com> wrote:
> >
> > Can you maintain the file extension? That is how mesos knows to extract
> it:
> > hadoop fs -copyFromLocal
> > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > /user/mesos/mesos-executor.tar.gz
> >
> > Also make sure your mapred-site.xml has the extension as well.
> >
> >
> >
> > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > Hi, Ben,
> > >
> > > I have tried as you said, but It still can not work.
> > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > /user/mesos/mesos-executor
> > > Did I do the right thing? Thanks very much!
> > >
> > > The log in jobtracker is:
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > Task_Tracker_82 on http://slave1:31000
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > slots needed.
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > >       Pending Map Tasks: 2
> > >    Pending Reduce Tasks: 1
> > >          Idle Map Slots: 0
> > >       Idle Reduce Slots: 0
> > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > >        Needed Map Slots: 2
> > >     Needed Reduce Slots: 1
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > Task_Tracker_83 on http://slave1:31000
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > slots needed.
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > >       Pending Map Tasks: 2
> > >    Pending Reduce Tasks: 1
> > >          Idle Map Slots: 0
> > >       Idle Reduce Slots: 0
> > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > >        Needed Map Slots: 2
> > >     Needed Reduce Slots: 1
> > >
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: Benjamin Mahler
> > > 发送时间: 2013-04-24 07:49
> > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > > Then point the conf file to the hdfs directory (you had the right idea,
> > > just uploaded the wrong file). :)
> > >
> > > Can you try that and report back?
> > >
> > >
> > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > Guodong,
> > > >
> > > > There still are problems with me, I think there are some problem with
> > my
> > > > executor setting.
> > > >
> > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > mesos-master-hostname)
> > > >   <property>
> > > >     <name>mapred.mesos.executor</name>
> > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > >   </property>
> > > >
> > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > >
> > > > The head content is as follows:
> > > >
> > > > #! /bin/sh
> > > >
> > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > #
> > > > # The mesos-executor program cannot be directly executed until all
> the
> > > > libtool
> > > > # libraries that it depends on are installed.
> > > > #
> > > > # This wrapper script should never be moved out of the build
> directory.
> > > > # If it is, it will not operate correctly.
> > > >
> > > > # Sed substitution that helps us do robust quoting.  It
> backslashifies
> > > > # metacharacters that are still active within double-quoted strings.
> > > > Xsed='/bin/sed -e 1s/^X//'
> > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > >
> > > > # Be Bourne compatible
> > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > >   emulate sh
> > > >   NULLCMD=:
> > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > >   # is contrary to our usage.  Disable this feature.
> > > >   alias -g '${1+"$@"}'='"$@"'
> > > >   setopt NO_GLOB_SUBST
> > > > else
> > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > fi
> > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > >
> > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > # if CDPATH is set.
> > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > >
> > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > export
> > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > -z
> > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > GCC_EXEC_PREFIX=;
> > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > unset
> > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > >
> > >
> >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > export LD_LIBRARY_PATH;
> > > >
> > >
> >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > ...
> > > >
> > > >
> > > > Did I upload the right file? and set up it in conf file correct?
> Thanks
> > > > very much!
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > From: 王国栋
> > > > Date: 2013-04-23 13:32
> > > > To: wangyu
> > > > CC: mesos-dev
> > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > >
> > > > if you run hadoop in local mode, use the following setting is ok
> > > >   <property>
> > > >     <name>mapred.mesos.master</name>
> > > >     <value>local</value>
> > > >   </property>
> > > >
> > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > mesos-master-hostname:mesos-master-port.
> > > >
> > > > Make sure the dns parser result for mesos-master-hostname is the
> right
> > > ip.
> > > >
> > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > check
> > > > if there is hadoop framework registered.
> > > >
> > > > Thanks.
> > > >
> > > > Guodong
> > > >
> > > >
> > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > **
> > > > > Hi, Guodong,
> > > > >
> > > > > I start hadoop as you said, then I saw this error:
> > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > driver: Cannot parse
> > > > > '@0.0.0.0:0'
> > > > >
> > > > > What's this mean? where should I change MesosScheduler code to fix
> > > this?
> > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > >
> > > > > The whole log is as follows:
> > > > >
> > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > /************************************************************
> > > > > STARTUP_MSG: Starting JobTracker
> > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > STARTUP_MSG:   args = []
> > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > >
> > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > 11:19:33
> > > > CST 2013
> > > > > ************************************************************/
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > hadoop-metrics2.properties
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > MetricsSystem,sub=Stats registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > period
> > > > at 10 second(s).
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > system
> > > > started
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > QueueMetrics,q=default registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > ugi
> > > > registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Updating the current master key for generating delegation tokens
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Starting expired delegation token remover thread,
> > > > tokenRemoverScanInterval=60 min(s)
> > > > >
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > limitMaxMemForMapTasks,
> > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Updating the current master key for generating delegation tokens
> > > > >
> > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > (include/exclude) list
> > > > >
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > owner
> > > > as root
> > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > RpcDetailedActivityForPort9001 registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > RpcActivityForPort9001 registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > org.mortbay.log.Slf4jLog
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > Opening
> > > > the listener on 50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > returned
> > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > SelectChannelConnector@0.0.0.0:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > jvm
> > > > registered.
> > > > >
> > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > JobTrackerMetrics registered.
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> 50030
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > directory
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > initialized in embedded mode
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > server at: localhost:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > address: localhost:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> job
> > > > store is inactive
> > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> MesosScheduler
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > information
> > > > >
> > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > driver: Cannot parse '@
> > > > > 0.0.0.0:0'
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > file
> > > to
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > file
> > > to
> > > > >
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > (include/exclude) list
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > starting
> > > > >
> > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > > > native-hadoop library for your platform... using builtin-java classes
> > > where
> > > > applicable
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > nMaps=0 nReduces=0 max=-1
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > added successfully for user 'root' to queue 'default'
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> >  IP=192.168.0.2
> > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
>  RESULT=SUCCESS
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > stored with users keys in
> > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > job_201304231321_0001
> > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > >
> > > > > ------------------------------
> > > > > Wang Yu
> > > > >
> > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > *Date:* 2013-04-23 11:34
> > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > wangyu@nfs.iscas.ac.cn>
> > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > >  Hi Yu,
> > > > >
> > > > > Mesos will just launch tasktracker on each slave node as long as
> the
> > > > > required resource is enough for the tasktracker. So you have to run
> > > > > NameNode, Jobtracker and DataNode by your own.
> > > > >
> > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> configure
> > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > normal
> > > > one.
> > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > configure mapred-site.xml, this jobtracker should contains the
> patch
> > > for
> > > > > mesos)
> > > > >
> > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > status
> > > > > of Jobtracker.
> > > > >
> > > > >  Guodong
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > >
> > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > >> problem. Thanks very much!
> > > > >>
> > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > >> DataNode) I should stop it first?
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Wang Yu
> > > > >>
> > > > >> 发件人: Benjamin Mahler
> > > > >> 发送时间: 2013-04-23 10:56
> > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > >> TaskTracker: http://slave5:50060
> > > > >>  The scheduler we wrote for Hadoop will start its own
> TaskTrackers,
> > > > >> meaning
> > > > >> you do not have to start any TaskTrackers yourself
> > > > >>
> > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > running
> > > > >> in your cluster?
> > > > >>
> > > > >> Looking at your jps output, is there already a TaskTracker
> running?
> > > > >> [root@master logs]# jps
> > > > >> 13896 RunJar
> > > > >> 14123 Jps
> > > > >> 12718 NameNode
> > > > >> 12900 DataNode
> > > > >> 13374 TaskTracker  <--- How was this started?
> > > > >> 13218 JobTracker
> > > > >>
> > > > >>
> > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > >>
> > > > >> > Hi, Ben and Guodong,
> > > > >> >
> > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > know
> > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > familiar
> > > > >> with
> > > > >> > mesos very much.
> > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > core-site.xml
> > > > in
> > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > set
> > > up
> > > > >> > hadoop on mesos, and run my MR tasks.
> > > > >> >
> > > > >> > Thanks very much for your patient reply...Maybe I have a long
> way
> > to
> > > > >> go...
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > The log messages you see:
> > > > >> > 2013-04-18 16:47:19,645 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > >> >
> > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > currently
> > > > >> > don't support running your own TaskTrackers, as the
> MesosScheduler
> > > > will
> > > > >> > launch them on your behalf when needed.
> > > > >> >
> > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > Hadoop
> > > > with
> > > > >> > mesos is that you no longer have to do that. We will detect that
> > > jobs
> > > > >> have
> > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > >> >
> > > > >> > Guodong may be able to help further getting set up!
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Wang Yu
> > > > >> >
> > > > >> > From: 王国栋
> > > > >> > Date: 2013-04-18 17:10
> > > > >> > To: mesos-dev; wangyu
> > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > >> > TaskTracker: http://slave5:50060
> > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > >> normally
> > > > >> > located in the dir like
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > >> > The log is tasktracker log.
> > > > >> >
> > > > >> > I hope it will help.
> > > > >> >
> > > > >> > Guodong
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > >> >
> > > > >> > > **
> > > > >> > > Hi All,
> > > > >> > >
> > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> and
> > > it
> > > > >> works
> > > > >> > > well.
> > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > master,
> > > > >> slave1,
> > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > >> > >  [root@master logs]# jps
> > > > >> > > 13896 RunJar
> > > > >> > > 14123 Jps
> > > > >> > > 12718 NameNode
> > > > >> > > 12900 DataNode
> > > > >> > > 13374 TaskTracker
> > > > >> > > 13218 JobTracker
> > > > >> > >
> > > > >> > > Then I run test benchmark, it can not go on working...
> > > > >> > >  [root@master
> > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > hadoop-examples-0.20.205.0.jar
> > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > >> > > Running 30 maps.
> > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > >> > job_201304181646_0001
> > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > >> > > It stopped here.
> > > > >> > >
> > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> it
> > > > shows:
> > > > >> > >  2013-04-18 16
> > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > RUNNING
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 5
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 6
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 9
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 7
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 8
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> a
> > > new
> > > > >> > node: /default-rack/master
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > tracker
> > > > >> > tracker_master:localhost/
> > > > >> > > 127.0.0.1:44997 to host master
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > org.apache.hadoop.net.NetworkTopology:
> > > > >> > Adding a new node: /default-rack/slave5
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,686 INFO
> > org.apache.hadoop.mapred.JobTracker:
> > > > >> Adding
> > > > >> > tracker tracker_slave5:
> > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > Does anybody can help me? Thanks very much!
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: 回复: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited TaskTracker: http://slave5:50060

Posted by Benjamin Mahler <be...@gmail.com>.
Hey Brenden, are there any bugs in particular here that you're referring to?

Wang, can you provide the logs for the JobTracker, the slave, and the
master?


On Tue, May 7, 2013 at 11:50 AM, Brenden Matthews <
brenden.matthews@airbedandbreakfast.com> wrote:

> You may want to try Airbnb's dist of Mesos:
>
> https://github.com/airbnb/mesos/tree/testing
>
> A good number of these Mesos bugs have been fixed but aren't yet merged
> into upstream.
>
>
> On Mon, May 6, 2013 at 8:34 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
>
> > The log on each slave of the lost task is : No executor found with ID:
> > executor_Task_Tracker_XXX.
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: 王瑜
> > 发送时间: 2013-05-07 11:13
> > 收件人: mesos-dev
> > 主题: 回复: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > Hi all,
> >
> > I have tried adding file extension when upload executor as well as the
> > conf file, but it still can not work.
> >
> > And I have seen
> >
> /tmp/mesos/slaves/201304131144-33597632-5050-4949-0/frameworks/201304131144-33597632-5050-4949-0006/executors/executor_Task_Tracker_63/runs/latest,
> > but it is a null directory.
> >
> > Is there any other logs I can read to know why the TASK_LOST happened? I
> > really need your help, thanks very much!
> >
> >
> >
> >
> > Wang Yu
> >
> > 发件人: Vinod Kone
> > 发送时间: 2013-04-26 01:31
> > 收件人: mesos-dev@incubator.apache.org
> > 抄送: wangyu
> > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > TaskTracker: http://slave5:50060
> > Also, you could look at the executor logs (default:
> > /tmp/mesos/slaves/....../executors/../runs/latest/) to see why the
> >  TASK_LOST happened.
> >
> >
> >
> > On Thu, Apr 25, 2013 at 10:19 AM, Benjamin Mahler <
> > benjamin.mahler@gmail.com> wrote:
> >
> > Can you maintain the file extension? That is how mesos knows to extract
> it:
> > hadoop fs -copyFromLocal
> > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > /user/mesos/mesos-executor.tar.gz
> >
> > Also make sure your mapred-site.xml has the extension as well.
> >
> >
> >
> > On Thu, Apr 25, 2013 at 1:08 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> >
> > > Hi, Ben,
> > >
> > > I have tried as you said, but It still can not work.
> > > I have upload mesos-executor using: hadoop fs -copyFromLocal
> > > /home/mesos/build/hadoop/hadoop-0.20.205.0/build/hadoop.tar.gz
> > > /user/mesos/mesos-executor
> > > Did I do the right thing? Thanks very much!
> > >
> > > The log in jobtracker is:
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Launching task
> > > Task_Tracker_82 on http://slave1:31000
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > slots needed.
> > > 13/04/25 16:00:55 INFO mapred.MesosScheduler: Status update of
> > > Task_Tracker_82 to TASK_LOST with message Executor terminated
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: JobTracker Status
> > >       Pending Map Tasks: 2
> > >    Pending Reduce Tasks: 1
> > >          Idle Map Slots: 0
> > >       Idle Reduce Slots: 0
> > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > >        Needed Map Slots: 2
> > >     Needed Reduce Slots: 1
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Launching task
> > > Task_Tracker_83 on http://slave1:31000
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Satisfied map and reduce
> > > slots needed.
> > > 13/04/25 16:00:56 INFO mapred.MesosScheduler: Status update of
> > > Task_Tracker_83 to TASK_LOST with message Executor terminated
> > > 13/04/25 16:00:57 INFO mapred.MesosScheduler: JobTracker Status
> > >       Pending Map Tasks: 2
> > >    Pending Reduce Tasks: 1
> > >          Idle Map Slots: 0
> > >       Idle Reduce Slots: 0
> > >      Inactive Map Slots: 6 (launched but no hearbeat yet)
> > >   Inactive Reduce Slots: 6 (launched but no hearbeat yet)
> > >        Needed Map Slots: 2
> > >     Needed Reduce Slots: 1
> > >
> > >
> > >
> > >
> > >
> > > Wang Yu
> > >
> > > 发件人: Benjamin Mahler
> > > 发送时间: 2013-04-24 07:49
> > > 收件人: mesos-dev@incubator.apache.org; wangyu
> > > 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler: Unknown/exited
> > > TaskTracker: http://slave5:50060
> > > You need to instead upload the hadoop.tar.gz generated by the tutorial.
> > > Then point the conf file to the hdfs directory (you had the right idea,
> > > just uploaded the wrong file). :)
> > >
> > > Can you try that and report back?
> > >
> > >
> > > On Tue, Apr 23, 2013 at 12:45 AM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > >
> > > > Guodong,
> > > >
> > > > There still are problems with me, I think there are some problem with
> > my
> > > > executor setting.
> > > >
> > > > In mapred-site.xml, I set:("master" is the hostname of
> > > > mesos-master-hostname)
> > > >   <property>
> > > >     <name>mapred.mesos.executor</name>
> > > > #    <value>hdfs://hdfs.name.node:port/hadoop.zip</value>
> > > >     <value>hdfs://master/user/mesos/mesos-executor</value>
> > > >   </property>
> > > >
> > > > And I upload mesos-executor in /user/mesos/mesos-executor
> > > >
> > > > The head content is as follows:
> > > >
> > > > #! /bin/sh
> > > >
> > > > # mesos-executor - temporary wrapper script for .libs/mesos-executor
> > > > # Generated by ltmain.sh (GNU libtool) 2.2.6b
> > > > #
> > > > # The mesos-executor program cannot be directly executed until all
> the
> > > > libtool
> > > > # libraries that it depends on are installed.
> > > > #
> > > > # This wrapper script should never be moved out of the build
> directory.
> > > > # If it is, it will not operate correctly.
> > > >
> > > > # Sed substitution that helps us do robust quoting.  It
> backslashifies
> > > > # metacharacters that are still active within double-quoted strings.
> > > > Xsed='/bin/sed -e 1s/^X//'
> > > > sed_quote_subst='s/\([`"$\\]\)/\\\1/g'
> > > >
> > > > # Be Bourne compatible
> > > > if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then
> > > >   emulate sh
> > > >   NULLCMD=:
> > > >   # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which
> > > >   # is contrary to our usage.  Disable this feature.
> > > >   alias -g '${1+"$@"}'='"$@"'
> > > >   setopt NO_GLOB_SUBST
> > > > else
> > > >   case `(set -o) 2>/dev/null` in *posix*) set -o posix;; esac
> > > > fi
> > > > BIN_SH=xpg4; export BIN_SH # for Tru64
> > > > DUALCASE=1; export DUALCASE # for MKS sh
> > > >
> > > > # The HP-UX ksh and POSIX shell print the target directory to stdout
> > > > # if CDPATH is set.
> > > > (unset CDPATH) >/dev/null 2>&1 && unset CDPATH
> > > >
> > > > relink_command="(cd /home/mesos/build/src; { test -z
> > > > \"\${LIBRARY_PATH+set}\" || unset LIBRARY_PATH || { LIBRARY_PATH=;
> > export
> > > > LIBRARY_PATH; }; }; { test -z \"\${COMPILER_PATH+set}\" || unset
> > > > COMPILER_PATH || { COMPILER_PATH=; export COMPILER_PATH; }; }; { test
> > -z
> > > > \"\${GCC_EXEC_PREFIX+set}\" || unset GCC_EXEC_PREFIX || {
> > > GCC_EXEC_PREFIX=;
> > > > export GCC_EXEC_PREFIX; }; }; { test -z \"\${LD_RUN_PATH+set}\" ||
> > unset
> > > > LD_RUN_PATH || { LD_RUN_PATH=; export LD_RUN_PATH; }; };
> > > >
> > >
> >
> LD_LIBRARY_PATH=/home/wangyu/protobuf/lib:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64/;
> > > > export LD_LIBRARY_PATH;
> > > >
> > >
> >
> PATH=/home/wangyu/protobuf/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib/jvm/java-7-sun/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/home/mesos/mesos-0.10.0/hadoop/hadoop-0.20.205.0/bin:/usr/lib/ant/apache-ant-1.8.4/bin:/opt/scala-2.9.1.final/bin:/home/haidong/zookeeper-3.4.5/bin:/home/hadoop/hive-0.9.0/bin:/home/hadoop/pig-0.10.0/bin:/home/mesos/mpi/build/bin:/home/mesos/torque/torque-4.1.3:/home/mesos/mesos-0.9.0/build/hadoop/hadoop-0.20.205.0/bin:/root/bin;
> > > > export PATH; g++ -g -g2 -O2 -o \$progdir/\$file
> > > > launcher/mesos_executor-executor.o  ./.libs/libmesos.so
> > > > -L/usr/lib/jvm/java-7-sun/jre/lib/amd64/server -lpthread -lcurl -lssl
> > > > -lcrypto -lz -lrt -pthread -Wl,-rpath -Wl,/home/mesos/build/src/.libs
> > > > -Wl,-rpath -Wl,/home/mesos/build/lib)"
> > > > ...
> > > >
> > > >
> > > > Did I upload the right file? and set up it in conf file correct?
> Thanks
> > > > very much!
> > > >
> > > >
> > > >
> > > > Wang Yu
> > > >
> > > > From: 王国栋
> > > > Date: 2013-04-23 13:32
> > > > To: wangyu
> > > > CC: mesos-dev
> > > > Subject: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > TaskTracker: http://slave5:50060
> > > > Hmm. it seems that the mapred.mesos.master is set correctly.
> > > >
> > > > if you run hadoop in local mode, use the following setting is ok
> > > >   <property>
> > > >     <name>mapred.mesos.master</name>
> > > >     <value>local</value>
> > > >   </property>
> > > >
> > > > if you want to start the cluster. set mapred.mesos.master as the
> > > > mesos-master-hostname:mesos-master-port.
> > > >
> > > > Make sure the dns parser result for mesos-master-hostname is the
> right
> > > ip.
> > > >
> > > > BTW: when you starting the jobtracker, you can check mesos webUI and
> > > check
> > > > if there is hadoop framework registered.
> > > >
> > > > Thanks.
> > > >
> > > > Guodong
> > > >
> > > >
> > > > On Tue, Apr 23, 2013 at 1:24 PM, 王瑜 <wa...@nfs.iscas.ac.cn> wrote:
> > > >
> > > > > **
> > > > > Hi, Guodong,
> > > > >
> > > > > I start hadoop as you said, then I saw this error:
> > > > > 13/04/23 13:03:43 ERROR mapred.MesosScheduler: Error from scheduler
> > > > driver: Cannot parse
> > > > > '@0.0.0.0:0'
> > > > >
> > > > > What's this mean? where should I change MesosScheduler code to fix
> > > this?
> > > > > Thanks very much! I am so sorry for interrupt you once again...
> > > > >
> > > > > The whole log is as follows:
> > > > >
> > > > >  [root@master hadoop-0.20.205.0]# hadoop jobtracker
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: STARTUP_MSG:
> > > > > /************************************************************
> > > > > STARTUP_MSG: Starting JobTracker
> > > > > STARTUP_MSG:   host = master/192.168.0.2
> > > > > STARTUP_MSG:   args = []
> > > > > STARTUP_MSG:   version = 0.20.205.0
> > > > >
> > > > > STARTUP_MSG:   build =  -r ; compiled by 'root' on Sat Apr 13
> > 11:19:33
> > > > CST 2013
> > > > > ************************************************************/
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsConfig: loaded properties from
> > > > hadoop-metrics2.properties
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > MetricsSystem,sub=Stats registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: Scheduled snapshot
> > > period
> > > > at 10 second(s).
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSystemImpl: JobTracker metrics
> > > system
> > > > started
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > QueueMetrics,q=default registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > ugi
> > > > registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Updating the current master key for generating delegation tokens
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Starting expired delegation token remover thread,
> > > > tokenRemoverScanInterval=60 min(s)
> > > > >
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Scheduler configured with
> > > > (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> > limitMaxMemForMapTasks,
> > > > limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> > > > >
> > > > > 13/04/23 13:21:04 INFO
> > delegation.AbstractDelegationTokenSecretManager:
> > > > Updating the current master key for generating delegation tokens
> > > > >
> > > > > 13/04/23 13:21:04 INFO util.HostsFileReader: Refreshing hosts
> > > > (include/exclude) list
> > > > >
> > > > > 13/04/23 13:21:04 INFO mapred.JobTracker: Starting jobtracker with
> > > owner
> > > > as root
> > > > > 13/04/23 13:21:04 INFO ipc.Server: Starting SocketReader
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > RpcDetailedActivityForPort9001 registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO impl.MetricsSourceAdapter: MBean for source
> > > > RpcActivityForPort9001 registered.
> > > > >
> > > > > 13/04/23 13:21:04 INFO mortbay.log: Logging to
> > > > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > > > org.mortbay.log.Slf4jLog
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Added global filtersafety
> > > > (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Port returned by
> > > > webServer.getConnectors()[0].getLocalPort() before open() is -1.
> > Opening
> > > > the listener on 50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: listener.getLocalPort()
> > > returned
> > > > 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
> > > > > 13/04/23 13:21:05 INFO http.HttpServer: Jetty bound to port 50030
> > > > > 13/04/23 13:21:05 INFO mortbay.log: jetty-6.1.26
> > > > > 13/04/23 13:21:05 INFO mortbay.log: Started
> > > > > SelectChannelConnector@0.0.0.0:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > jvm
> > > > registered.
> > > > >
> > > > > 13/04/23 13:21:05 INFO impl.MetricsSourceAdapter: MBean for source
> > > > JobTrackerMetrics registered.
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker up at: 9001
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: JobTracker webserver:
> 50030
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Cleaning up the system
> > > > directory
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: History server being
> > > > initialized in embedded mode
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobHistoryServer: Started job history
> > > > server at: localhost:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Job History Server web
> > > > address: localhost:50030
> > > > >
> > > > > 13/04/23 13:21:05 INFO mapred.CompletedJobStatusStore: Completed
> job
> > > > store is inactive
> > > > > 13/04/23 13:21:05 INFO mapred.MesosScheduler: Starting
> MesosScheduler
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Refreshing hosts
> > information
> > > > >
> > > > > 13/04/23 13:21:05 ERROR mapred.MesosScheduler: Error from scheduler
> > > > driver: Cannot parse '@
> > > > > 0.0.0.0:0'
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the includes
> > file
> > > to
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Setting the excludes
> > file
> > > to
> > > > >
> > > > > 13/04/23 13:21:05 INFO util.HostsFileReader: Refreshing hosts
> > > > (include/exclude) list
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Decommissioning 0 nodes
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server Responder: starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server listener on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 0 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 1 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 3 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 2 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 5 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 4 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 6 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 7 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO mapred.JobTracker: Starting RUNNING
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 8 on 9001:
> > > starting
> > > > > 13/04/23 13:21:05 INFO ipc.Server: IPC Server handler 9 on 9001:
> > > starting
> > > > >
> > > > > 13/04/23 13:21:32 WARN util.NativeCodeLoader: Unable to load
> > > > native-hadoop library for your platform... using builtin-java classes
> > > where
> > > > applicable
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: job_201304231321_0001:
> > > > nMaps=0 nReduces=0 max=-1
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.MesosScheduler: Added job
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Job job_201304231321_0001
> > > > added successfully for user 'root' to queue 'default'
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.AuditLogger: USER=root
> >  IP=192.168.0.2
> > > >  OPERATION=SUBMIT_JOB    TARGET=job_201304231321_0001
>  RESULT=SUCCESS
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobTracker: Initializing
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Initializing
> > > > job_201304231321_0001
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: jobToken generated and
> > > > stored with users keys in
> > > > /home/HadoopRun/tmp/mapred/system/job_201304231321_0001/jobToken
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Input size for job
> > > > job_201304231321_0001 = 0. Number of splits = 0
> > > > >
> > > > > 13/04/23 13:21:32 INFO mapred.JobInProgress: Job
> > job_201304231321_0001
> > > > initialized successfully with 0 map tasks and 0 reduce tasks.
> > > > >
> > > > > ------------------------------
> > > > > Wang Yu
> > > > >
> > > > >  *From:* 王国栋 <wa...@gmail.com>
> > > > > *Date:* 2013-04-23 11:34
> > > > > *To:* mesos-dev <me...@incubator.apache.org>; wangyu<
> > > > wangyu@nfs.iscas.ac.cn>
> > > > > *Subject:* Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> > > > > Unknown/exited TaskTracker: http://slave5:50060
> > > > >  Hi Yu,
> > > > >
> > > > > Mesos will just launch tasktracker on each slave node as long as
> the
> > > > > required resource is enough for the tasktracker. So you have to run
> > > > > NameNode, Jobtracker and DataNode by your own.
> > > > >
> > > > > Basicly, starting the hadoop on mesos is like this.
> > > > > 1. start the dfs. use hadoop/bin/start-dfs.sh. (you should
> configure
> > > > > core-sites.xml and hdfs-site.xml). dfs is no different from the
> > normal
> > > > one.
> > > > > 2. start jobtracker, use hadoop/bin/hadoop jobtracker (you should
> > > > > configure mapred-site.xml, this jobtracker should contains the
> patch
> > > for
> > > > > mesos)
> > > > >
> > > > > Then, you can use mesos web UI and jobtracker web UI to check the
> > > status
> > > > > of Jobtracker.
> > > > >
> > > > >  Guodong
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2013 at 11:06 AM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > >
> > > > >> Oh, yes, I start my hadoop using "start-all.sh". I know what's my
> > > > >> problem. Thanks very much!
> > > > >>
> > > > >> ps: Besides TaskTracker, is there any other roles(like JobTracker,
> > > > >> DataNode) I should stop it first?
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Wang Yu
> > > > >>
> > > > >> 发件人: Benjamin Mahler
> > > > >> 发送时间: 2013-04-23 10:56
> > > > >> 收件人: mesos-dev@incubator.apache.org; wangyu
> > > > >> 主题: Re: Re: org.apache.hadoop.mapred.MesosScheduler:
> Unknown/exited
> > > > >> TaskTracker: http://slave5:50060
> > > > >>  The scheduler we wrote for Hadoop will start its own
> TaskTrackers,
> > > > >> meaning
> > > > >> you do not have to start any TaskTrackers yourself
> > > > >>
> > > > >> Are you starting your own TaskTrackers? Are there any TaskTrackers
> > > > running
> > > > >> in your cluster?
> > > > >>
> > > > >> Looking at your jps output, is there already a TaskTracker
> running?
> > > > >> [root@master logs]# jps
> > > > >> 13896 RunJar
> > > > >> 14123 Jps
> > > > >> 12718 NameNode
> > > > >> 12900 DataNode
> > > > >> 13374 TaskTracker  <--- How was this started?
> > > > >> 13218 JobTracker
> > > > >>
> > > > >>
> > > > >> On Mon, Apr 22, 2013 at 7:47 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> wrote:
> > > > >>
> > > > >> > Hi, Ben and Guodong,
> > > > >> >
> > > > >> > What do you mean "managing your own TaskTrackers"? How should I
> > know
> > > > >> > whether I have manager my own TaskTrackers? Sorry, I do not
> > familiar
> > > > >> with
> > > > >> > mesos very much.
> > > > >> > Dies it mean I do not need configure hdfs-site.xml and
> > core-site.xml
> > > > in
> > > > >> > hadoop? I do not want to run my own TaskTracker, I just want to
> > set
> > > up
> > > > >> > hadoop on mesos, and run my MR tasks.
> > > > >> >
> > > > >> > Thanks very much for your patient reply...Maybe I have a long
> way
> > to
> > > > >> go...
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > The log messages you see:
> > > > >> > 2013-04-18 16:47:19,645 INFO
> > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker: http://master:50060.
> > > > >> >
> > > > >> > Are printed when mesos does not know about the TaskTracker. We
> > > > currently
> > > > >> > don't support running your own TaskTrackers, as the
> MesosScheduler
> > > > will
> > > > >> > launch them on your behalf when needed.
> > > > >> >
> > > > >> > Are you managing your own TaskTrackers? The purpose of using
> > Hadoop
> > > > with
> > > > >> > mesos is that you no longer have to do that. We will detect that
> > > jobs
> > > > >> have
> > > > >> > pending map / reduce tasks and launch TaskTrackers accordingly.
> > > > >> >
> > > > >> > Guodong may be able to help further getting set up!
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > Wang Yu
> > > > >> >
> > > > >> > From: 王国栋
> > > > >> > Date: 2013-04-18 17:10
> > > > >> > To: mesos-dev; wangyu
> > > > >> > Subject: Re: org.apache.hadoop.mapred.MesosScheduler:
> > Unknown/exited
> > > > >> > TaskTracker: http://slave5:50060
> > > > >> > You can check the slave log and the mesos-executor log, which is
> > > > >> normally
> > > > >> > located in the dir like
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> "/tmp/mesos/slaves/201304181115-16842879-5050-4680-13/frameworks/201304181115-16842879-5050-4680-0003/executors/executor_Task_Tracker_16/runs/latest/stderr".
> > > > >> > The log is tasktracker log.
> > > > >> >
> > > > >> > I hope it will help.
> > > > >> >
> > > > >> > Guodong
> > > > >> >
> > > > >> >
> > > > >> > On Thu, Apr 18, 2013 at 5:03 PM, 王瑜 <wa...@nfs.iscas.ac.cn>
> > wrote:
> > > > >> >
> > > > >> > > **
> > > > >> > > Hi All,
> > > > >> > >
> > > > >> > > I have deployed mesos on three node: master, slave1, slave5.
> and
> > > it
> > > > >> works
> > > > >> > > well.
> > > > >> > >  Then I set hadoop over it, using master as namenode, and
> > master,
> > > > >> slave1,
> > > > >> > > slave5 as datanode. When I using 'jps', it looks works well.
> > > > >> > >  [root@master logs]# jps
> > > > >> > > 13896 RunJar
> > > > >> > > 14123 Jps
> > > > >> > > 12718 NameNode
> > > > >> > > 12900 DataNode
> > > > >> > > 13374 TaskTracker
> > > > >> > > 13218 JobTracker
> > > > >> > >
> > > > >> > > Then I run test benchmark, it can not go on working...
> > > > >> > >  [root@master
> > > > >> > >  hadoop-0.20.205.0]# bin/hadoop jar
> > hadoop-examples-0.20.205.0.jar
> > > > >> > randomwriter -Dtest.randomwrite.bytes_per_map=6710886
> > > > >> > -Dtest.randomwriter.maps_per_host=10 rand
> > > > >> > > Running 30 maps.
> > > > >> > > Job started: Thu Apr 18 16:49:36 CST 2013
> > > > >> > > 13/04/18 16:49:36 INFO mapred.JobClient: Running job:
> > > > >> > job_201304181646_0001
> > > > >> > > 13/04/18 16:49:37 INFO mapred.JobClient:  map 0% reduce 0%
> > > > >> > > It stopped here.
> > > > >> > >
> > > > >> > > Then I read the log file: hadoop-root-jobtracker-master.log,
> it
> > > > shows:
> > > > >> > >  2013-04-18 16
> > > > >> > > :46:51,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
> > > > RUNNING
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,726 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 5
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 6
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 9
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 7
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:51,727 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 8
> > > > on
> > > > >> > 9001: starting
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,557 INFO org.apache.hadoop.net.NetworkTopology: Adding
> a
> > > new
> > > > >> > node: /default-rack/master
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,560 INFO org.apache.hadoop.mapred.JobTracker: Adding
> > > tracker
> > > > >> > tracker_master:localhost/
> > > > >> > > 127.0.0.1:44997 to host master
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:52,568 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:55,581 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :46:58,590 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > > 2013-04-18 16
> > > > >> > > :47:01,600 INFO org.apache.hadoop.mapred.MesosScheduler:
> > > > >> Unknown/exited
> > > > >> > TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:04,609 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:07,618 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:10,625 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,632 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,686 INFO
> > > org.apache.hadoop.net.NetworkTopology:
> > > > >> > Adding a new node: /default-rack/slave5
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,686 INFO
> > org.apache.hadoop.mapred.JobTracker:
> > > > >> Adding
> > > > >> > tracker tracker_slave5:
> > > > >> > > 127.0.0.1/127.0.0.1:60621 to host slave5
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:13,687 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:16,638 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:16,697 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:19,645 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:19,707 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:22,651 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:22,715 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:25,658 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:25,725 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://slave5:50060.
> > > > >> > >
> > > > >> > > 2013-04-18 16:47:28,665 INFO
> > > > org.apache.hadoop.mapred.MesosScheduler:
> > > > >> > Unknown/exited TaskTracker:
> > > > >> > > http://master:50060.
> > > > >> > >
> > > > >> > > Does anybody can help me? Thanks very much!
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>