You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2022/01/14 04:33:47 UTC

[spark] branch master updated: [SPARK-37905][INFRA] Make `merge_spark_pr.py` set primary author from the first commit in case of ties

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ef837ca  [SPARK-37905][INFRA] Make `merge_spark_pr.py` set primary author from the first commit in case of ties
ef837ca is described below

commit ef837ca71020950b841f9891c70dc4b29d968bf1
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Thu Jan 13 20:32:46 2022 -0800

    [SPARK-37905][INFRA] Make `merge_spark_pr.py` set primary author from the first commit in case of ties
    
    ### What changes were proposed in this pull request?
    
    This PR aim to make `merge_spark_pr.py` set the primary author from the first commit in case of ties.
    
    ### Why are the changes needed?
    
    Currently, `merge_spark_pr.py` chooses the primary author randomly when there are two commits from two authors.
    
    https://github.com/apache/spark/pull/35190
    
    The best case could choose the primary author based on the number of lines, but it seems to hard. So, this PR aims to become better than before.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This is a dev only.
    
    ### How was this patch tested?
    
    Manually.
    
    Closes #35205 from dongjoon-hyun/SPARK-37905.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 dev/merge_spark_pr.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 8d09c53..e21a39a 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -135,11 +135,12 @@ def merge_pr(pr_num, target_ref, title, body, pr_repo_desc):
         continue_maybe(msg)
         had_conflicts = True
 
+    # First commit author should be considered as the primary author when the rank is the same
     commit_authors = run_cmd(
-        ["git", "log", "HEAD..%s" % pr_branch_name, "--pretty=format:%an <%ae>"]
+        ["git", "log", "HEAD..%s" % pr_branch_name, "--pretty=format:%an <%ae>", "--reverse"]
     ).split("\n")
     distinct_authors = sorted(
-        set(commit_authors), key=lambda x: commit_authors.count(x), reverse=True
+        list(dict.fromkeys(commit_authors)), key=lambda x: commit_authors.count(x), reverse=True
     )
     primary_author = input(
         'Enter primary author in the format of "name <email>" [%s]: ' % distinct_authors[0]

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org