You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Florian Leibert (flo) (JIRA)" <ji...@apache.org> on 2013/03/05 02:38:13 UTC

[jira] [Comment Edited] (MESOS-377) Tasks stuck in STAGING

    [ https://issues.apache.org/jira/browse/MESOS-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592869#comment-13592869 ] 

Florian Leibert (flo) edited comment on MESOS-377 at 3/5/13 1:37 AM:
---------------------------------------------------------------------

Here is an entry from the mesos-web-ui about a task which was fairly recent (same format as above): 

ct:create_omg_table-fact_ad_stats:1362429000000:0	ChronosTask:create_omg_table-fact_ad_stats	STAGING	


Logs matching this task_id on the mesos-master:

2013-03-05T00Z:2013-03-05T00:49:12.555936+00:00 i-a6db08d5 authpriv.notice sudo: -   ubuntu : TTY=pts/1 ; PWD=/tmp ; USER=root ; COMMAND=/bin/grep ct:create_omg_table-fact_ad_stats:1362429000000:0 ConfiguratorTest_CommandLine_b78ixP ConfiguratorTest_CommandLineConfFlag_pZqsKv ConfiguratorTest_ConfigFileSpacesIgnored_jdDPdc ConfiguratorTest_ConfigFileWithConfDir_wyj4Ba ConfiguratorTest_DefaultOptions_8oc9vu ConfiguratorTest_Environment_XcOsy9 ConfiguratorTest_LoadingPriorities_QDkkXQ ConfiguratorTest_MalformedConfigFile_kKHlxx etc ExamplesTest_JavaException_xWLaeC ExamplesTest_JavaFramework_0s25mq ExamplesTest_NoExecutorFramework_18Ge9l ExamplesTest_PythonFramework_2496FO ExamplesTest_TestFramework_Y6Y16m Exception Framework (Java).INFO Exception Framework (Java).ip-10-47-49-113.invalid-user.log.INFO.20130221-020112.23425 Exception Framework (Java).ip-10-47-49-113.invalid-user.log.WARNING.20130221-020112.23425 Exception Framework (Java).WARNING FilesTest_AttachTest_Sah2wJ FilesTest_BrowseTest_YdNoxU FilesTest_DetachTest_rXjAOd
lt-mesos-master.i-a6db08d5.invalid-user.log.INFO.20130221-020215.24408:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
lt-mesos-master.INFO:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)



Logs from chronos (framework) - this is the last line that contains this task ID, it's basically just specifying that the task was launched. 
2013-03-04T20Z:2013-03-04T20:30:07.783132+00:00 INFO user.notice  - [2013-03-04 20:30:07,783] com.airbnb.scheduler.mesos.MesosJobFramework: Task 'ct:create_omg_table-fact_ad_stats:1362429000000:0' launched, status: 'DRIVER_RUNNING'

None of the slaves contain this task id. It's as if they never saw this task coming in. 


                
      was (Author: florianleibert):
    Here is an entry from the mesos-web-ui about a task which was fairly recent (same format as above): 

ct:create_omg_table-fact_ad_stats:1362429000000:0	ChronosTask:create_omg_table-fact_ad_stats	STAGING	


Logs matching this task_id on the mesos-master:

2013-03-05T00Z:2013-03-05T00:49:12.555936+00:00 i-a6db08d5 authpriv.notice sudo: -   ubuntu : TTY=pts/1 ; PWD=/tmp ; USER=root ; COMMAND=/bin/grep ct:create_omg_table-fact_ad_stats:1362429000000:0 ConfiguratorTest_CommandLine_b78ixP ConfiguratorTest_CommandLineConfFlag_pZqsKv ConfiguratorTest_ConfigFileSpacesIgnored_jdDPdc ConfiguratorTest_ConfigFileWithConfDir_wyj4Ba ConfiguratorTest_DefaultOptions_8oc9vu ConfiguratorTest_Environment_XcOsy9 ConfiguratorTest_LoadingPriorities_QDkkXQ ConfiguratorTest_MalformedConfigFile_kKHlxx etc ExamplesTest_JavaException_xWLaeC ExamplesTest_JavaFramework_0s25mq ExamplesTest_NoExecutorFramework_18Ge9l ExamplesTest_PythonFramework_2496FO ExamplesTest_TestFramework_Y6Y16m Exception Framework (Java).INFO Exception Framework (Java).ip-10-47-49-113.invalid-user.log.INFO.20130221-020112.23425 Exception Framework (Java).ip-10-47-49-113.invalid-user.log.WARNING.20130221-020112.23425 Exception Framework (Java).WARNING FilesTest_AttachTest_Sah2wJ FilesTest_BrowseTest_YdNoxU FilesTest_DetachTest_rXjAOd
lt-mesos-master.i-a6db08d5.invalid-user.log.INFO.20130221-020215.24408:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)
lt-mesos-master.INFO:I0304 20:30:07.787580 24441 master.cpp:1580] Launching task ct:create_omg_table-fact_ad_stats:1362429000000:0 of framework chronos with resources cpus=1; mem=1; disk=1 on slave 201302210202-1899048714-5050-24408-332 (i-d40e75a7)



Logs from chronos (framework) - this is the last line that contains this task ID - meaning 
2013-03-04T20Z:2013-03-04T20:30:07.783132+00:00 INFO user.notice  - [2013-03-04 20:30:07,783] com.airbnb.scheduler.mesos.MesosJobFramework: Task 'ct:create_omg_table-fact_ad_stats:1362429000000:0' launched, status: 'DRIVER_RUNNING'

None of the slaves contain this task id. It's as if they never saw this task coming in. 


                  
> Tasks stuck in STAGING
> ----------------------
>
>                 Key: MESOS-377
>                 URL: https://issues.apache.org/jira/browse/MESOS-377
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Florian Leibert (flo)
>            Priority: Blocker
>
> GIT SHA: ac9fb5b0c713653140d853f6af29aaa3e3829476
> I see more and more tasks stuck in STAGING - they ran a long time ago but are missing the assignment of a slave. 
> Is this a known bug?
> ct:update_s3_deployment:1362348902328:1	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362348900000:0	ChronosTask:update_s3_deployment	STAGING	
> ct:update_s3_deployment:1362348000000:0	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362347101316:1	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362347100000:0	ChronosTask:update_s3_deployment	STAGING	
> ct:update_s3_deployment:1362346200000:0	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362345300000:0	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362344400000:0	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_s3_deployment:1362343500000:0	ChronosTask:update_s3_deployment	FINISHED	
> ct:update_s3_deployment:1362342600000:0	ChronosTask:update_s3_deployment	FINISHED	 i-04097277
> ct:update_s3_deployment:1362341700000:0	ChronosTask:update_s3_deployment	FINISHED	 i-6282fa11
> ct:update_mobile_use:1362355210893:1	ChronosTask:update_mobile_use	STAGING	
> ct:update_mobile_use:1362355208879:0	ChronosTask:update_mobile_use	FAILED	 i-6282fa11
> ct:update_mobile_use:1362355204743:2	ChronosTask:update_mobile_use	STAGING	

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira