You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/07/22 22:28:32 UTC

Fwd: Issues when running Hadoop on Mesos

---------- Forwarded message ----------
From: 夏俊鸾 <xi...@gmail.com>
Date: Mon, Jul 22, 2013 at 6:48 AM
Subject: Issues when running Hadoop on Mesos
To: vinodkone@gmail.com


Hi Vinod,

 Sorry for send you email directly to ask you mesos questions, and it seems
that mesos mail list mesos-dev-subscribe@incubator.apache.org is not
available right now.
    I have downloaded mesos from trunk branch(I would like to support
hadoop-2.0.0-cdh4.1.2) and build mesos(./configure && make && make install)
and make hadoop-2.0.0-mr1-cdh4.1.2, it will launch jobtracker and wordcount
test application automatically, everything for now seems Ok.
    Now, I configure the core-site.xml/hdfs-site.xml/mapred-site.xml to run
hadoop on mesos cluster and details are as below

*========core-site.xml============*
*<property>*
*
*
*<name>io.native.lib.available</name>*
*
*
*<value>true</value>*
*
*
*</property>*
*
*
*<property>*
*<name>fs.default.name</name>*
*<value>hdfs://10.0.2.19:9000</value>*
*</property>*
*==========mapred-site.xml===========*
 *<property>*
*    <name>mapred.job.tracker</name>*
*    <value>10.0.2.19:54311</value>*
*  </property>*
*  <property>*
*    <name>mapred.jobtracker.taskScheduler</name>*
*    <value>org.apache.hadoop.mapred.MesosScheduler</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.taskScheduler</name>*
*    <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.master</name>*
*    <value>10.0.2.19:5050</value>*
*  </property>*
*#*
*# Make sure to uncomment the 'mapred.mesos.executor' property,*
*# when running the Hadoop JobTracker on a real Mesos cluster.*
*# NOTE: You need to MANUALLY upload the Mesos executor bundle*
*# to the location that is set as the value of this property.*
*  <property>*
*    <name>mapred.mesos.executor</name>*
*    <value>hdfs://10.0.2.19:9000/hadoop.tar.gz</value>*
*  </property>*
*
*
*# The properties below indicate the amount of resources*
*# that are allocated to a Hadoop slot (i.e., map/reduce task) by Mesos.*
*  <property>*
*    <name>mapred.mesos.slot.cpus</name>*
*    <value>0.2</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.slot.disk</name>*
*    <!-- The value is in MB. -->*
*    <value>1024</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.slot.mem</name>*
*    <!-- Note that this is the total memory required for*
*         JVM overhead (256 MB) and the heap (-Xmx) of the task.*
*         The value is in MB. -->*
*    <value>512</value>*
*  </property>*

And then I launch jobtracker(./bin/hadoop jobtracker) and wordcount
application manually, but errors happens as following

*============word count ==================*
*[andrew@sr419 hadoop-2.0.0-mr1-cdh4.1.2]$ ./bin/hadoop jar
hadoop-examples-2.0.0-mr1-cdh4.1.2.jar wordcount /user/andrew/tmp out*
*SLF4J: Class path contains multiple SLF4J bindings.*
*SLF4J: Found binding in
[jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
*
*SLF4J: Found binding in
[jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/build/ivy/lib/Hadoop/common/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
*
*SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.*
*13/07/22 20:33:43 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.*
*13/07/22 20:33:43 INFO input.FileInputFormat: Total input paths to process
: 1*
*13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable*
*13/07/22 20:33:43 INFO mapred.JobClient: Running job: job_201307222033_0002
*
*13/07/22 20:33:44 INFO mapred.JobClient:  map 0% reduce 0% // word count
seems to be pending*
*
*
*============job tracker(it will be TASK_LOST circularly)===================
*

*13/07/22 20:33:43 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable*
*13/07/22 20:33:43 INFO mapred.MesosScheduler: Added job
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.JobTracker: Job job_201307222033_0002 added
successfully for user 'andrew' to queue 'default'*
*13/07/22 20:33:43 INFO mapred.JobTracker: Initializing
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Initializing
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.AuditLogger: USER=andrew IP=10.0.2.19
OPERATION=SUBMIT_JOB TARGET=job_201307222033_0002 RESULT=SUCCESS*
*13/07/22 20:33:43 INFO mapred.JobInProgress: jobToken generated and stored
with users keys in
/tmp/hadoop-andrew/mapred/system/job_201307222033_0002/jobToken*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Input size for job
job_201307222033_0002 = 4010. Number of splits = 1*
*13/07/22 20:33:43 INFO net.NetworkTopology: Adding a new node:
/default-rack/sr419*
*13/07/22 20:33:43 INFO mapred.JobInProgress:
tip:task_201307222033_0002_m_000000 has split on node:/default-rack/sr419*
*13/07/22 20:33:43 INFO mapred.JobInProgress: job_201307222033_0002
LOCALITY_WAIT_FACTOR=1.0*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Job job_201307222033_0002
initialized successfully with 1 map tasks and 1 reduce tasks.*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Launching task
Task_Tracker_0 on http://sr419:31000*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Status update of
Task_Tracker_0 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Launching task
Task_Tracker_1 on http://sr419:31000*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Status update of
Task_Tracker_1 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Launching task
Task_Tracker_2 on http://sr419:31000*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Status update of
Task_Tracker_2 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:51 INFO mapred.MesosScheduler: JobTracker Status*
*
*
*=============mesos-slave.INFO===================*
*Registered with master master@10.0.2.19:5050; given slave ID
201307222033-318898186-5050-19972-0
*
*I0722 20:33:48.378780 20034 slave.cpp:739] Got assigned task
Task_Tracker_0 for framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.379360 20034 slave.cpp:837] Launching task Task_Tracker_0
for framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.380995 20034 paths.hpp:303] Created executor directory
'/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a'
*
*I0722 20:33:48.381255 20034 slave.cpp:948] Queuing task 'Task_Tracker_0'
for executor executor_Task_Tracker_0 of framework
'201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.381343 20026 process_isolator.cpp:99] Launching
executor_Task_Tracker_0 (cd hadoop-* && ./bin/mesos-executor) in
/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a
with resources cpus=1; mem=1280' for framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.381484 20015 slave.cpp:511] Successfully attached file
'/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a'
*
*I0722 20:33:48.382462 20026 process_isolator.cpp:161] Forked executor at
20434*
*I0722 20:33:48.479176 20035 process_isolator.cpp:461] Telling slave of
terminated executor 'executor_Task_Tracker_0' of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.479310 20015 slave.cpp:2060] Executor
'executor_Task_Tracker_0' of framework
201307222033-318898186-5050-19972-0000 has exited with status 255*
*I0722 20:33:48.480988 20015 slave.cpp:1692] Handling status update
TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 from @
0.0.0.0:0*
*I0722 20:33:48.481205 20025 status_update_manager.cpp:290] Received status
update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 with
checkpoint=false*
*I0722 20:33:48.481266 20025 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_0 of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.481461 20025 status_update_manager.cpp:336] Forwarding
status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for
task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to
master@10.0.2.19:5050*
*I0722 20:33:48.481613 20025 slave.cpp:1809] Sending acknowledgement for
status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for
task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to @
0.0.0.0:0*
*I0722 20:33:48.485322 20030 status_update_manager.cpp:360] Received status
update acknowledgement 61050093-911f-47ad-a7df-bebffd2a753a for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.485424 20030 status_update_manager.cpp:481] Cleaning up
status update stream for task Task_Tracker_0 of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.479262 20035 process_isolator.cpp:259] Performing killtree
operation on 20434*
*Failed to stop 20434: No such process*
*  Children of 20434: {  }*
*Signaled 20434*
*I0722 20:33:48.505930 20035 process_isolator.cpp:287] Asked to update
resources for an unknown/killed executor 'executor_Task_Tracker_0' of
framework 201307222033-318898186-5050-19972-0000*

*===========log in /tmp for 'executor_Task_Tracker_0' is empty==========*










                                     I have suffered above issues for
several days and cannot resolve it for now. One point that I would like
highlight is that I am not sure how to set the property
"mapred.mesos.executor"(it must be the name hadoop.tar.gz? template puzzled
me), could you help me to analysis above issues. thank you in advance.

regards,
Andrew