You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/09/14 21:32:45 UTC

[jira] [Updated] (MESOS-3423) Perf event isolator stops performing sampling if a single timeout occurs.

     [ https://issues.apache.org/jira/browse/MESOS-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-3423:
-----------------------------------
             Shepherd: Jie Yu
               Sprint: Twitter Mesos Q3 Sprint 5
    Affects Version/s: 0.24.0
     Target Version/s: 0.25.0
               Labels: twitter  (was: )
          Description: 
Currently the perf event isolator times out a sample after a fixed extra time of 2 seconds on top of the sample time elapses:

{code}
    Duration timeout = flags.perf_duration + Seconds(2);
{code}

This should be based on the reap interval maximum.

Also, the code stops sampling altogether when a single timeout occurs. We've observed time outs during normal operation, so it would be better for the isolator to continue performing perf sampling in the case of timeouts. It may also make sense to continue sampling in the case of errors, since these may be transient.

  was:[~jieyu] can you fill in the details here?

          Component/s: slave
                       isolation
              Summary: Perf event isolator stops performing sampling if a single timeout occurs.  (was: perf sampling stops after a timeout occurs)

> Perf event isolator stops performing sampling if a single timeout occurs.
> -------------------------------------------------------------------------
>
>                 Key: MESOS-3423
>                 URL: https://issues.apache.org/jira/browse/MESOS-3423
>             Project: Mesos
>          Issue Type: Bug
>          Components: isolation, slave
>    Affects Versions: 0.24.0
>            Reporter: Vinod Kone
>            Assignee: Cong Wang
>              Labels: twitter
>
> Currently the perf event isolator times out a sample after a fixed extra time of 2 seconds on top of the sample time elapses:
> {code}
>     Duration timeout = flags.perf_duration + Seconds(2);
> {code}
> This should be based on the reap interval maximum.
> Also, the code stops sampling altogether when a single timeout occurs. We've observed time outs during normal operation, so it would be better for the isolator to continue performing perf sampling in the case of timeouts. It may also make sense to continue sampling in the case of errors, since these may be transient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)