You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/07 17:12:14 UTC

[GitHub] [beam] damccorm opened a new pull request, #21736: Gather metrics on GH Issues

damccorm opened a new pull request, #21736:
URL: https://github.com/apache/beam/pull/21736

   Now that we're using GH Issues, we should start gathering metrics on them so that we can update the dashboard.
   
   This is step 1 of #21735
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Add a link to the appropriate issue in your description, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #21736:
URL: https://github.com/apache/beam/pull/21736#discussion_r892747723


##########
.test-infra/metrics/sync/github/sync.py:
##########
@@ -322,63 +379,107 @@ def upsertIntoPRsTable(cursor, values):
   cursor.execute(upsertPRRowQuery, values)
 
 
+def upsertIntoIssuesTable(cursor, values):
+  upsertPRRowQuery = f'''INSERT INTO {GH_ISSUES_TABLE_NAME}
+                            (issue_id,
+                            author,
+                            created_ts,
+                            updated_ts,
+                            closed_ts,
+                            title,
+                            assignees,
+                            labels)
+                          VALUES
+                            (%s, %s, %s, %s, %s, %s, %s, %s)
+                          ON CONFLICT (issue_id) DO UPDATE
+                            SET
+                            issue_id=excluded.issue_id,
+                            author=excluded.author,
+                            created_ts=excluded.created_ts,
+                            updated_ts=excluded.updated_ts,
+                            closed_ts=excluded.closed_ts,
+                            title=excluded.title,
+                            assignees=excluded.assignees,
+                            labels=excluded.labels
+                          '''
+  cursor.execute(upsertPRRowQuery, values)
+
+
 def fetchNewData():
   '''
   Main workhorse method. Fetches data from GitHub and puts it in metrics table.
   '''
-  connection = initDBConnection()
-  cursor = connection.cursor()
-  lastSyncTimestamp = fetchLastSyncTimestamp(cursor)
-  cursor.close()
-  connection.close()
-
-  currTS = lastSyncTimestamp
-
-  resultsPresent = True
-  while resultsPresent:
-    print("Syncing data for: ", currTS)
-    jsonData = fetchGHData(currTS)
+  for i in range(2):
+    kind = 'issue'
+    if i == 0:
+      kind = 'pr'
 
     connection = initDBConnection()
     cursor = connection.cursor()
+    lastSyncTimestamp = fetchLastSyncTimestamp(cursor, f'gh_{kind}_sync')
+    cursor.close()
+    connection.close()
+    if lastSyncTimestamp is None:
+      if kind == 'pr':
+        connection = initDBConnection()
+        cursor = connection.cursor()
+        lastSyncTimestamp = fetchLastSyncTimestampFallback(cursor)
+        cursor.close()
+        connection.close()
+      else:
+        lastSyncTimestamp = datetime(year=1980, month=1, day=1)
+
+    currTS = lastSyncTimestamp
+
+    resultsPresent = True
+    while resultsPresent:
+      print(f'Syncing data for {kind}s: ', currTS)
+      query = queries.MAIN_PR_QUERY if kind == 'pr' else queries.MAIN_ISSUES_QUERY
+      jsonData = fetchGHData(currTS, query)
+
+      connection = initDBConnection()
+      cursor = connection.cursor()
+
+      if "errors" in jsonData:
+        print("Failed to fetch data, error:", jsonData)
+        return
 
-    if "errors" in jsonData:
-      print("Failed to fetch data, error:", jsonData)
-      return
-
-    prs = None
-    try:
-      prs = jsonData["data"]["search"]["edges"]
-    except:
-      # TODO This means that API returned error.
-      # We might want to bring this to stderr or utilize other means of logging.
-      # Examples: we hit throttling, etc
-      print("Got bad json format: ", jsonData)
-      return
-
-    if not prs:
-      resultsPresent = False
-
-    for edge in prs:
-      pr = edge["node"]
+      data = None
       try:
-        rowValues = extractRowValuesFromPr(pr)
-      except Exception as e:
-        print("Failed to extract data. Exception: ", e, " PR: ", edge)
-        traceback.print_tb(e.__traceback__)
+        data = jsonData["data"]["search"]["edges"]
+      except:
+        # TODO This means that API returned error.
+        # We might want to bring this to stderr or utilize other means of logging.
+        # Examples: we hit throttling, etc
+        print("Got bad json format: ", jsonData)
         return
 
-      upsertIntoPRsTable(cursor, rowValues)
+      if not data:
+        resultsPresent = False
 
-      prUpdateTime = ghutilities.datetimeFromGHTimeStr(pr["updatedAt"])
+      for edge in data:
+        node = edge["node"]
+        try:
+          rowValues = extractRowValuesFromPr(node) if kind == 'pr' else extractRowValuesFromIssue(node)
+        except Exception as e:
+          print("Failed to extract data. Exception: ", e, f" {kind}: ", edge)
+          traceback.print_tb(e.__traceback__)
+          return
 
-      currTS = currTS if currTS > prUpdateTime else prUpdateTime
+        if kind == 'pr':
+          upsertIntoPRsTable(cursor, rowValues)
+        else:
+          upsertIntoIssuesTable(cursor, rowValues)
 
-    cursor.close()
-    connection.commit()
-    connection.close()
+        updateTime = ghutilities.datetimeFromGHTimeStr(node["updatedAt"])
+
+        currTS = currTS if currTS > updateTime else updateTime
+
+      cursor.close()
+      connection.commit()
+      connection.close()
 
-    updateLastSyncTimestamp(currTS)
+      updateLastSyncTimestamp(currTS, 'gh_pr_sync')

Review Comment:
   Yes, that's a good catch. Updated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #21736:
URL: https://github.com/apache/beam/pull/21736#issuecomment-1149068518

   R: @kileys - if there's a better reviewer with more affinity, let me know, I'm basing this off of history in the .test-infra directory


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kileys merged pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
kileys merged PR #21736:
URL: https://github.com/apache/beam/pull/21736


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kileys commented on pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
kileys commented on PR #21736:
URL: https://github.com/apache/beam/pull/21736#issuecomment-1150278166

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #21736:
URL: https://github.com/apache/beam/pull/21736#issuecomment-1148945702

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kileys commented on a diff in pull request #21736: Gather metrics on GH Issues

Posted by GitBox <gi...@apache.org>.
kileys commented on code in PR #21736:
URL: https://github.com/apache/beam/pull/21736#discussion_r892734210


##########
.test-infra/metrics/sync/github/sync.py:
##########
@@ -322,63 +379,107 @@ def upsertIntoPRsTable(cursor, values):
   cursor.execute(upsertPRRowQuery, values)
 
 
+def upsertIntoIssuesTable(cursor, values):
+  upsertPRRowQuery = f'''INSERT INTO {GH_ISSUES_TABLE_NAME}
+                            (issue_id,
+                            author,
+                            created_ts,
+                            updated_ts,
+                            closed_ts,
+                            title,
+                            assignees,
+                            labels)
+                          VALUES
+                            (%s, %s, %s, %s, %s, %s, %s, %s)
+                          ON CONFLICT (issue_id) DO UPDATE
+                            SET
+                            issue_id=excluded.issue_id,
+                            author=excluded.author,
+                            created_ts=excluded.created_ts,
+                            updated_ts=excluded.updated_ts,
+                            closed_ts=excluded.closed_ts,
+                            title=excluded.title,
+                            assignees=excluded.assignees,
+                            labels=excluded.labels
+                          '''
+  cursor.execute(upsertPRRowQuery, values)
+
+
 def fetchNewData():
   '''
   Main workhorse method. Fetches data from GitHub and puts it in metrics table.
   '''
-  connection = initDBConnection()
-  cursor = connection.cursor()
-  lastSyncTimestamp = fetchLastSyncTimestamp(cursor)
-  cursor.close()
-  connection.close()
-
-  currTS = lastSyncTimestamp
-
-  resultsPresent = True
-  while resultsPresent:
-    print("Syncing data for: ", currTS)
-    jsonData = fetchGHData(currTS)
+  for i in range(2):
+    kind = 'issue'
+    if i == 0:
+      kind = 'pr'
 
     connection = initDBConnection()
     cursor = connection.cursor()
+    lastSyncTimestamp = fetchLastSyncTimestamp(cursor, f'gh_{kind}_sync')
+    cursor.close()
+    connection.close()
+    if lastSyncTimestamp is None:
+      if kind == 'pr':
+        connection = initDBConnection()
+        cursor = connection.cursor()
+        lastSyncTimestamp = fetchLastSyncTimestampFallback(cursor)
+        cursor.close()
+        connection.close()
+      else:
+        lastSyncTimestamp = datetime(year=1980, month=1, day=1)
+
+    currTS = lastSyncTimestamp
+
+    resultsPresent = True
+    while resultsPresent:
+      print(f'Syncing data for {kind}s: ', currTS)
+      query = queries.MAIN_PR_QUERY if kind == 'pr' else queries.MAIN_ISSUES_QUERY
+      jsonData = fetchGHData(currTS, query)
+
+      connection = initDBConnection()
+      cursor = connection.cursor()
+
+      if "errors" in jsonData:
+        print("Failed to fetch data, error:", jsonData)
+        return
 
-    if "errors" in jsonData:
-      print("Failed to fetch data, error:", jsonData)
-      return
-
-    prs = None
-    try:
-      prs = jsonData["data"]["search"]["edges"]
-    except:
-      # TODO This means that API returned error.
-      # We might want to bring this to stderr or utilize other means of logging.
-      # Examples: we hit throttling, etc
-      print("Got bad json format: ", jsonData)
-      return
-
-    if not prs:
-      resultsPresent = False
-
-    for edge in prs:
-      pr = edge["node"]
+      data = None
       try:
-        rowValues = extractRowValuesFromPr(pr)
-      except Exception as e:
-        print("Failed to extract data. Exception: ", e, " PR: ", edge)
-        traceback.print_tb(e.__traceback__)
+        data = jsonData["data"]["search"]["edges"]
+      except:
+        # TODO This means that API returned error.
+        # We might want to bring this to stderr or utilize other means of logging.
+        # Examples: we hit throttling, etc
+        print("Got bad json format: ", jsonData)
         return
 
-      upsertIntoPRsTable(cursor, rowValues)
+      if not data:
+        resultsPresent = False
 
-      prUpdateTime = ghutilities.datetimeFromGHTimeStr(pr["updatedAt"])
+      for edge in data:
+        node = edge["node"]
+        try:
+          rowValues = extractRowValuesFromPr(node) if kind == 'pr' else extractRowValuesFromIssue(node)
+        except Exception as e:
+          print("Failed to extract data. Exception: ", e, f" {kind}: ", edge)
+          traceback.print_tb(e.__traceback__)
+          return
 
-      currTS = currTS if currTS > prUpdateTime else prUpdateTime
+        if kind == 'pr':
+          upsertIntoPRsTable(cursor, rowValues)
+        else:
+          upsertIntoIssuesTable(cursor, rowValues)
 
-    cursor.close()
-    connection.commit()
-    connection.close()
+        updateTime = ghutilities.datetimeFromGHTimeStr(node["updatedAt"])
+
+        currTS = currTS if currTS > updateTime else updateTime
+
+      cursor.close()
+      connection.commit()
+      connection.close()
 
-    updateLastSyncTimestamp(currTS)
+      updateLastSyncTimestamp(currTS, 'gh_pr_sync')

Review Comment:
   Should this be `gh_{kind}_sync` as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org