You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "John Roesler (Jira)" <ji...@apache.org> on 2023/05/12 15:42:00 UTC

[jira] [Updated] (KAFKA-14995) Automate asf.yaml collaborators refresh

     [ https://issues.apache.org/jira/browse/KAFKA-14995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Roesler updated KAFKA-14995:
---------------------------------
    Labels: newbie  (was: )

> Automate asf.yaml collaborators refresh
> ---------------------------------------
>
>                 Key: KAFKA-14995
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14995
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: John Roesler
>            Priority: Minor
>              Labels: newbie
>
> We have added a policy to use the asf.yaml Github Collaborators: [https://github.com/apache/kafka-site/pull/510]
> The policy states that we set this list to be the top 20 commit authors who are not Kafka committers. Unfortunately, it's not trivial to compute this list.
> Here is the process I followed to generate the list the first time (note that I generated this list on 2023-04-28, so the lookback is one year:
> 1. List authors by commit volume in the last year:
> {code:java}
> $ git shortlog --email --numbered --summary --since=2022-04-28 | vim {code}
> 2. manually filter out the authors who are committers, based on [https://kafka.apache.org/committers]
> 3. truncate the list to 20 authors
> 4. for each author
> 4a. Find a commit in the `git log` that they were the author on:
> {code:java}
> commit 440bed2391338dc10fe4d36ab17dc104b61b85e8
> Author: hudeqi <12...@qq.com>
> Date:   Fri May 12 14:03:17 2023 +0800
> ...{code}
> 4b. Look up that commit in Github: [https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8]
> 4c. Copy their Github username into .asf.yaml under both the PR whitelist and the Collaborators lists.
> 5. Send a PR to update .asf.yaml: [https://github.com/apache/kafka/pull/13713]
>  
> This is pretty time consuming and is very scriptable. Two complications:
>  * To do the filtering, we need to map from Git log "Author" to documented Kafka "Committer" that we can use to perform the filter. Suggestion: just update the structure of the "Committers" page to include their Git "Author" name and email ([https://github.com/apache/kafka-site/blob/asf-site/committers.html)]
>  * To generate the YAML lists, we need to map from Git log "Author" to Github username. There's presumably some way to do this in the Github REST API (the mapping is based on the email, IIUC), or we could also just update the Committers page to also document each committer's Github username.
>  
> Ideally, we would write this script (to be stored in the Apache Kafka repo) and create a Github Action to run it every three months.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)