You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2012/10/22 20:14:12 UTC

[jira] [Created] (MESOS-294) Slaves should report last lines of stdout/stderr when an executor crashes

Matei Zaharia created MESOS-294:
-----------------------------------

             Summary: Slaves should report last lines of stdout/stderr when an executor crashes
                 Key: MESOS-294
                 URL: https://issues.apache.org/jira/browse/MESOS-294
             Project: Mesos
          Issue Type: Improvement
          Components: slave
            Reporter: Matei Zaharia
            Priority: Minor


A lot of the questions I see for running Spark on Mesos need to be answered by "log into that slave and look at stderr/stdout", followed by an explanation on where to find that. It would be cool to have a feature where the last 20-30 lines of the file are sent back automatically to the master so it can print them. Often the problem is 'java not found' or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-294) Slaves should report last lines of stdout/stderr when an executor crashes

Posted by "Vinod Kone (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483912#comment-13483912 ] 

Vinod Kone commented on MESOS-294:
----------------------------------

Sending back stdout/stderr tail sounds hacky to me.

I think a better way to solve this problem is to utilize the error message string (status.message) inside the StatusUpdate protobuf. Have you tried printing out those messages in Spark, when you get a LOST update? Unfortunately, we do not always fill up that error message field when sending a LOST update, but that could be easily fixed. Does that work?
                
> Slaves should report last lines of stdout/stderr when an executor crashes
> -------------------------------------------------------------------------
>
>                 Key: MESOS-294
>                 URL: https://issues.apache.org/jira/browse/MESOS-294
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> A lot of the questions I see for running Spark on Mesos need to be answered by "log into that slave and look at stderr/stdout", followed by an explanation on where to find that. It would be cool to have a feature where the last 20-30 lines of the file are sent back automatically to the master so it can print them. Often the problem is 'java not found' or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-294) Slaves should report last lines of stdout/stderr when an executor crashes

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484576#comment-13484576 ] 

Matei Zaharia commented on MESOS-294:
-------------------------------------

What would you put in that string? If it just says "child process exited with code 143", that's not very helpful, but if you add the last few lines of stdout / stderr, that might work. Just to be clear, I'm concerned only about the LOST updates, which are sent by Mesos, not by my framework. The framework sends its own status updates for any errors it can catch, but it cannot catch things such as the executor never being able to start because scala was not found, or the JVM crashing with an OutOfMemoryError.
                
> Slaves should report last lines of stdout/stderr when an executor crashes
> -------------------------------------------------------------------------
>
>                 Key: MESOS-294
>                 URL: https://issues.apache.org/jira/browse/MESOS-294
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> A lot of the questions I see for running Spark on Mesos need to be answered by "log into that slave and look at stderr/stdout", followed by an explanation on where to find that. It would be cool to have a feature where the last 20-30 lines of the file are sent back automatically to the master so it can print them. Often the problem is 'java not found' or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-294) Slaves should report last lines of stdout/stderr when an executor crashes

Posted by "Benjamin Mahler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481616#comment-13481616 ] 

Benjamin Mahler commented on MESOS-294:
---------------------------------------

The new webui may help with this issue, as we've added the ability to browse files for executor runs.
                
> Slaves should report last lines of stdout/stderr when an executor crashes
> -------------------------------------------------------------------------
>
>                 Key: MESOS-294
>                 URL: https://issues.apache.org/jira/browse/MESOS-294
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> A lot of the questions I see for running Spark on Mesos need to be answered by "log into that slave and look at stderr/stdout", followed by an explanation on where to find that. It would be cool to have a feature where the last 20-30 lines of the file are sent back automatically to the master so it can print them. Often the problem is 'java not found' or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-294) Slaves should report last lines of stdout/stderr when an executor crashes

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483908#comment-13483908 ] 

Matei Zaharia commented on MESOS-294:
-------------------------------------

That's certainly useful but it's still not great because it's a step away from the user. What the user might see when running Spark is something like this:

   Starting task 171 (map at MyJob.scala:50) as TID 171 on slave X
   ...
   Lost TID 171

I'd like to say why it was lost there. For other cases, such as when the task throws an exception (which Spark's executor code catches), we propagate the exception back to the master. That's super useful when you're just sitting at a console and running the job.
                
> Slaves should report last lines of stdout/stderr when an executor crashes
> -------------------------------------------------------------------------
>
>                 Key: MESOS-294
>                 URL: https://issues.apache.org/jira/browse/MESOS-294
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> A lot of the questions I see for running Spark on Mesos need to be answered by "log into that slave and look at stderr/stdout", followed by an explanation on where to find that. It would be cool to have a feature where the last 20-30 lines of the file are sent back automatically to the master so it can print them. Often the problem is 'java not found' or something similar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira