You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Florian Leibert (flo) (JIRA)" <ji...@apache.org> on 2013/03/05 02:38:13 UTC
[jira] [Comment Edited] (MESOS-377) Tasks stuck in STAGING
[ https://issues.apache.org/jira/browse/MESOS-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592869#comment-13592869 ]
Florian Leibert (flo) edited comment on MESOS-377 at 3/5/13 1:37 AM:
---------------------------------------------------------------------
Here is an entry from the mesos-web-ui about a task which was fairly recent (same format as above):
ct:create_omg_table-fact_ad_stats:1362429000000:0 ChronosTask:create_omg_table-fact_ad_stats STAGING
Logs matching this task_id on the mesos-master:
2013-03-05T00Z:2013-03-05T00:49:12.555936+00:00 i-a6db08d5 authpriv.notice sudo: - ubuntu : TTY=pts/1 ; PWD=/tmp ; USER=root ; COMMAND=/bin/grep ct:create_omg_table-fact_ad_stats:1362429000000:0 ConfiguratorTest_CommandLine_b78ixP ConfiguratorTest_CommandLineConfFlag_pZqsKv ConfiguratorTest_ConfigFileSpacesIgnored_jdDPdc ConfiguratorTest_ConfigFileWithConfDir_wyj4Ba ConfiguratorTest_DefaultOptions_8oc9vu ConfiguratorTest_Environment_XcOsy9 ConfiguratorTest_LoadingPriorities_QDkkXQ ConfiguratorTest_MalformedConfigFile_kKHlxx etc ExamplesTest_JavaException_xWLaeC ExamplesTest_JavaFramework_0s25mq ExamplesTest_NoExecutorFramework_18Ge9l ExamplesTest_PythonFramework_2496FO ExamplesTest_TestFramework_Y6Y16m Exception Framework (Java).INFO Exception Framework (Java).ip-10-47-49-113.invalid-user.log.INFO.20130221-020112.23425 Exception Framework (Java).ip-10-47-49-113.invalid-user.log.WARNING.20130221-020112.23425 Exception Framework (Java).WARNING FilesTest_AttachTest_Sah2wJ FilesTest_BrowseTest_YdNoxU FilesTest_DetachTest_rXjAOd
lt-mesos-master.i-a6db08d5.invalid-user.log.INFO.20130221-020215.24408:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
lt-mesos-master.INFO:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
Logs from chronos (framework) - this is the last line that contains this task ID, it's basically just specifying that the task was launched.
2013-03-04T20Z:2013-03-04T20:30:07.783132+00:00 INFO user.notice - [2013-03-04 20:30:07,783] com.airbnb.scheduler.mesos.MesosJobFramework: Task 'ct:create_omg_table-fact_ad_stats:1362429000000:0' launched, status: 'DRIVER_RUNNING'
None of the slaves contain this task id. It's as if they never saw this task coming in.
was (Author: florianleibert):
Here is an entry from the mesos-web-ui about a task which was fairly recent (same format as above):
ct:create_omg_table-fact_ad_stats:1362429000000:0 ChronosTask:create_omg_table-fact_ad_stats STAGING
Logs matching this task_id on the mesos-master:
2013-03-05T00Z:2013-03-05T00:49:12.555936+00:00 i-a6db08d5 authpriv.notice sudo: - ubuntu : TTY=pts/1 ; PWD=/tmp ; USER=root ; COMMAND=/bin/grep ct:create_omg_table-fact_ad_stats:1362429000000:0 ConfiguratorTest_CommandLine_b78ixP ConfiguratorTest_CommandLineConfFlag_pZqsKv ConfiguratorTest_ConfigFileSpacesIgnored_jdDPdc ConfiguratorTest_ConfigFileWithConfDir_wyj4Ba ConfiguratorTest_DefaultOptions_8oc9vu ConfiguratorTest_Environment_XcOsy9 ConfiguratorTest_LoadingPriorities_QDkkXQ ConfiguratorTest_MalformedConfigFile_kKHlxx etc ExamplesTest_JavaException_xWLaeC ExamplesTest_JavaFramework_0s25mq ExamplesTest_NoExecutorFramework_18Ge9l ExamplesTest_PythonFramework_2496FO ExamplesTest_TestFramework_Y6Y16m Exception Framework (Java).INFO Exception Framework (Java).ip-10-47-49-113.invalid-user.log.INFO.20130221-020112.23425 Exception Framework (Java).ip-10-47-49-113.invalid-user.log.WARNING.20130221-020112.23425 Exception Framework (Java).WARNING FilesTest_AttachTest_Sah2wJ FilesTest_BrowseTest_YdNoxU FilesTest_DetachTest_rXjAOd
lt-mesos-master.i-a6db08d5.invalid-user.log.INFO.20130221-020215.24408:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
lt-mesos-master.INFO:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
Logs from chronos (framework) - this is the last line that contains this task ID - meaning
2013-03-04T20Z:2013-03-04T20:30:07.783132+00:00 INFO user.notice - [2013-03-04 20:30:07,783] com.airbnb.scheduler.mesos.MesosJobFramework: Task 'ct:create_omg_table-fact_ad_stats:1362429000000:0' launched, status: 'DRIVER_RUNNING'
None of the slaves contain this task id. It's as if they never saw this task coming in.
> Tasks stuck in STAGING
> ----------------------
>
> Key: MESOS-377
> URL: https://issues.apache.org/jira/browse/MESOS-377
> Project: Mesos
> Issue Type: Bug
> Reporter: Florian Leibert (flo)
> Priority: Blocker
>
> GIT SHA: ac9fb5b0c713653140d853f6af29aaa3e3829476
> I see more and more tasks stuck in STAGING - they ran a long time ago but are missing the assignment of a slave.
> Is this a known bug?
> ct:update_s3_deployment:1362348902328:1 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362348900000:0 ChronosTask:update_s3_deployment STAGING
> ct:update_s3_deployment:1362348000000:0 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362347101316:1 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362347100000:0 ChronosTask:update_s3_deployment STAGING
> ct:update_s3_deployment:1362346200000:0 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362345300000:0 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362344400000:0 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_s3_deployment:1362343500000:0 ChronosTask:update_s3_deployment FINISHED
> ct:update_s3_deployment:1362342600000:0 ChronosTask:update_s3_deployment FINISHED i-04097277
> ct:update_s3_deployment:1362341700000:0 ChronosTask:update_s3_deployment FINISHED i-6282fa11
> ct:update_mobile_use:1362355210893:1 ChronosTask:update_mobile_use STAGING
> ct:update_mobile_use:1362355208879:0 ChronosTask:update_mobile_use FAILED i-6282fa11
> ct:update_mobile_use:1362355204743:2 ChronosTask:update_mobile_use STAGING
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira