You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bigtop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/07 17:07:00 UTC
[jira] [Commented] (BIGTOP-2836) charm metric collector race condition

    [ https://issues.apache.org/jira/browse/BIGTOP-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078392#comment-16078392 ] 

ASF GitHub Bot commented on BIGTOP-2836:
----------------------------------------

GitHub user kwmonroe opened a pull request:

    https://github.com/apache/bigtop/pull/252

    BIGTOP-2836: charm metric collector race condition

    Ensure `echo 0` is the last thing to run so that the metric hook does not cause a failed deployment. Works in all tested scenarios:
    
    False && False || echo 0
    - 0
    
    True && False || echo 0
    - 0
    
    False && True || echo 0
    - 0
    
    True && True || echo 0
    - True

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/juju-solutions/bigtop bug/BIGTOP-2836/metric-race

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/bigtop/pull/252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #252
    
----
commit 839891282853684f550385d23775d9e5044f623a
Author: Kevin W Monroe <ke...@canonical.com>
Date:   2017-07-07T16:55:46Z

    BIGTOP-2836: charm metric collector race condition
    
    Ensure 'echo 0' is the last thing to run so that the metric hook does
    not cause a failed deployment. Works in all tested scenarios:
    
    False && False || echo 0
    - 0
    True && False || echo 0
    - 0
    False && True || echo 0
    - 0
    True && True || echo 0
    - True

----


> charm metric collector race condition
> -------------------------------------
>
>                 Key: BIGTOP-2836
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-2836
>             Project: Bigtop
>          Issue Type: Bug
>          Components: deployment
>    Affects Versions: 1.2.0, 1.2.1
>            Reporter: Kevin W Monroe
>            Assignee: Kevin W Monroe
>            Priority: Minor
>             Fix For: 1.3.0
>
>
> Initially thought fixed in BIGTOP-2801, it seems the charm metric collector can still cause a failed deployment.  As a refresher, metrics give users the ability see stuff like how many datanodes or zookeeper peers are deployed in an environment.
> The first attempt at fixing this was to include a precondition before collecting metrics, for example, ensure the namenode is "ready" before running "hdfs getconf".
> However, in this example, there can be a period of time where the charm tells the NN to start (at which point the "ready" state is set), yet the NN takes a while to format HDFS.  If the metric collector runs during this time, 'hdfs getconf' will fail, which means the metric hook fails, which means the deployment fails.
> There are a variety of ways to mitigate this:
> 1. Don't set "ready" until the NN is all the way up.
> 2. Don't let a metric hook fail the entire deployment.
> 3. Alter the collector so it handles a failed 'hdfs getconf' gracefully.
> #1: added to our todo, but will take more time to implement.
> #2: opened an issue against the metric layer to see if this is possible.
> This JIRA will focus on fixing the problem with option #3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)