You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/12 13:28:09 UTC

[GitHub] [beam] damccorm commented on a diff in pull request #23157: [GitHub Actions] - Adding Monitor Self-hosted Runners Workflow

damccorm commented on code in PR #23157:
URL: https://github.com/apache/beam/pull/23157#discussion_r968412131


##########
.github/workflows/monitor_self_hosted_runners.yml:
##########
@@ -0,0 +1,54 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Monitor Self-Hosted Runners Status
+on:
+  schedule:
+    - cron: "0 */12 * * *"

Review Comment:
   ```suggestion
       - cron: "*/30 * * * *"
   ```
   
   Lets maybe run this more frequently since it should be cheap?



##########
.github/workflows/monitor_self_hosted_runners.yml:
##########
@@ -0,0 +1,54 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: Monitor Self-Hosted Runners Status
+on:
+  schedule:
+    - cron: "0 */12 * * *"
+  workflow_dispatch:
+jobs:
+  monitor-runners:
+    name: Monitor Self-hosted Runners Status
+    runs-on: ubuntu-latest
+    steps:
+    - name: Set up Cloud SDK
+      uses: google-github-actions/setup-gcloud@v0
+    - name: Set up node
+      uses: actions/setup-node@v3
+      with:
+        node-version: 16
+    - name: Setup checkout
+      uses: actions/checkout@v3
+    - name: Setup GCP account
+      run: |
+            echo "${{ secrets.GCP_PLAYGROUND_SA_KEY }}" | base64 -d > /tmp/gcp_access.json
+            which gcloud
+            gcloud auth activate-service-account --project=apache-beam-testing --key-file=/tmp/gcp_access.json
+    - name: Exporting ID
+      run: echo "IDENTITY_TOKEN=$(gcloud auth print-identity-token)" >> $GITHUB_ENV
+            
+    - name: Run monitor script
+      run: |
+        npm install
+        node sendRunnersReport.js
+      working-directory: 'scripts/ci/self-hosted-runners-report' 
+      env:
+        ISSUE_REPORT_SENDER_EMAIL_ADDRESS: "${{ secrets.ISSUE_REPORT_SENDER_EMAIL_ADDRESS }}"
+        ISSUE_REPORT_SENDER_EMAIL_PASSWORD: ${{ secrets.ISSUE_REPORT_SENDER_EMAIL_PASSWORD }}
+        ISSUE_REPORT_RECIPIENT_EMAIL_ADDRESS: "dev@beam.apache.org"
+        ISSUE_REPORT_SENDER_EMAIL_SERVICE: "gmail"
+        ENDPOINT: "https://us-central1-apache-beam-testing.cloudfunctions.net/monitorRunnersStatus" #we suggest adding this ENDPOINT as a repo secret too

Review Comment:
   > we suggest adding this ENDPOINT as a repo secret too
   
   Does keeping this secret actually get us anything?



##########
scripts/ci/self-hosted-runners-report/sendRunnersReport.js:
##########
@@ -0,0 +1,101 @@
+//  Licensed to the Apache Software Foundation (ASF) under one
+//  or more contributor license agreements.  See the NOTICE file
+//  distributed with this work for additional information
+//  regarding copyright ownership.  The ASF licenses this file
+//  to you under the Apache License, Version 2.0 (the
+//  "License"); you may not use this file except in compliance
+//  with the License.  You may obtain a copy of the License at
+// 
+//    http://www.apache.org/licenses/LICENSE-2.0
+// 
+//  Unless required by applicable law or agreed to in writing,
+//  software distributed under the License is distributed on an
+//  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+//  KIND, either express or implied.  See the License for the
+//  specific language governing permissions and limitations
+//  under the License.
+
+const nodemailer = require("nodemailer");
+const axios = require('axios');
+
+
+async function getRunnersStatus() {
+    let status = await axios.post(process.env["ENDPOINT"], {}, {
+        headers: {
+            Accept: "application/json",
+            Authorization: "bearer " + process.env["IDENTITY_TOKEN"]
+        }
+    });
+    return status.data;
+}
+
+async function sendAlertEmail(status) {
+    statusTables = {}
+    //Creating status tables
+    for (let OS of ["Linux", "Windows"]) {
+        statusTables[OS] = `
+        <h3> ${OS} </h3>
+        <table style='border: 1px solid grey;'>
+            <tr>
+                <th>Total Runners</th>
+                <th>Online Runners </th>
+                <th>Offline Runners</th>
+            </tr>
+        
+            <tr>
+                <td>${status[OS].totalRunners}</td>
+                <td>${status[OS].onlineRunners}</td>
+                <td>${status[OS].offlineRunners}</td>
+            </tr>
+        </table>
+        `
+    }
+
+    const htmlMsg = ` 
+        <p>Here is the runners status per Operative System, please inspect GCP console for further details: </p> <br>

Review Comment:
   ```suggestion
           <p>Here is the runners status per Operating System, please inspect GCP console for further details: </p> <br>
   ```



##########
scripts/ci/self-hosted-runners-report/sendRunnersReport.js:
##########
@@ -0,0 +1,101 @@
+//  Licensed to the Apache Software Foundation (ASF) under one
+//  or more contributor license agreements.  See the NOTICE file
+//  distributed with this work for additional information
+//  regarding copyright ownership.  The ASF licenses this file
+//  to you under the Apache License, Version 2.0 (the
+//  "License"); you may not use this file except in compliance
+//  with the License.  You may obtain a copy of the License at
+// 
+//    http://www.apache.org/licenses/LICENSE-2.0
+// 
+//  Unless required by applicable law or agreed to in writing,
+//  software distributed under the License is distributed on an
+//  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+//  KIND, either express or implied.  See the License for the
+//  specific language governing permissions and limitations
+//  under the License.
+
+const nodemailer = require("nodemailer");
+const axios = require('axios');
+
+
+async function getRunnersStatus() {
+    let status = await axios.post(process.env["ENDPOINT"], {}, {
+        headers: {
+            Accept: "application/json",
+            Authorization: "bearer " + process.env["IDENTITY_TOKEN"]
+        }
+    });
+    return status.data;
+}
+
+async function sendAlertEmail(status) {
+    statusTables = {}
+    //Creating status tables
+    for (let OS of ["Linux", "Windows"]) {
+        statusTables[OS] = `
+        <h3> ${OS} </h3>
+        <table style='border: 1px solid grey;'>
+            <tr>
+                <th>Total Runners</th>
+                <th>Online Runners </th>
+                <th>Offline Runners</th>
+            </tr>
+        
+            <tr>
+                <td>${status[OS].totalRunners}</td>
+                <td>${status[OS].onlineRunners}</td>
+                <td>${status[OS].offlineRunners}</td>
+            </tr>
+        </table>
+        `
+    }
+
+    const htmlMsg = ` 
+        <p>Here is the runners status per Operative System, please inspect GCP console for further details: </p> <br>
+        ` + statusTables["Linux"] + "<br>" + statusTables["Windows"];
+
+    nodemailer.createTransport({
+        service: process.env['ISSUE_REPORT_SENDER_EMAIL_SERVICE'], 
+        auth: {
+            user: process.env['ISSUE_REPORT_SENDER_EMAIL_ADDRESS'],
+            pass: process.env['ISSUE_REPORT_SENDER_EMAIL_PASSWORD']
+        }
+    }).sendMail({
+        from: process.env['ISSUE_REPORT_SENDER_EMAIL_ADDRESS'],
+        to: process.env['ISSUE_REPORT_RECIPIENT_EMAIL_ADDRESS'],
+        subject: "Alert; self-hosted runners are not healthy",
+        html: htmlMsg,
+    }, function (error, info) {
+        if (error) {
+            throw new Error(`Failed to send email with error: ${error}`);
+        } else {
+            console.log('Email sent: ' + info.response);
+        }
+    });
+}
+
+async function monitorRunnersStatus() {
+    const status = await getRunnersStatus().catch(console.error);
+    console.log(status);
+    if (status.Linux.onlineRunners == 0 || status.Windows.onlineRunners == 0) {

Review Comment:
   How many runners do we expect to be online? I'd anticipate its much more than 1 each; can we alert on higher numbers here? (e.g. `if (status.Linux.onlineRunners <= 3 || status.Windows.onlineRunners <= 3) {` - but with realistic numbers instead of `3`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org