You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/22 17:38:38 UTC

[GitHub] [airflow] rossturk opened a new pull request, #25239: Adjust heuristics for PR of the Month script

rossturk opened a new pull request, #25239:
URL: https://github.com/apache/airflow/pull/25239

   We've begun using a script to help us sift through all the month's PRs, as part of the voting process for the PR of the Month in the newsletter.
   
   This alters the script in a few ways:
   
   * Adds `--load` and `--save` flags that serialize/unserialize the PR data retrieved from GitHub. This allows for quicker iteration on the scoring heuristics - you can re-run it with the same data and it will generate new scores.
   
   * Captures additional metrics on the body length of the PR, the total length of all the PR comments, and the number of additions/deletions. I reviewed all of the PRs in this month's date range and noticed that the PRs I found most interesting had *lengthy* descriptions and comments.
   
   * Splits out metrics that were previously used in scoring functions but not persisted (like num_reviewers) into separate properties so they can be `--save`d.
   
   * Adjusts the heuristics to incorporate new metrics & adjust weighting.
   
   * Adds some more verbose output, i.e.: 
   <img width="1141" alt="Screen Shot 2022-07-22 at 1 35 46 PM" src="https://user-images.githubusercontent.com/1434475/180493932-7586ea07-ead4-41a4-895e-c5c762a0539b.png">
   
   This also contributes a notebook that I used to inspect the contents of the `--save` file and export it to CSV. Sometimes a pivot table is just the tool we need; this makes it easy to create one.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25239:
URL: https://github.com/apache/airflow/pull/25239#issuecomment-1193898788

   It looks great and I am happy to merge, but I have one thought about another contributing factor that we can include.
   
   We are pretty rigorously follow "linking issues to PRs". There are a number of PRs that are linked to some issues our users raised. And I think some of the issues have a long history and a lot of comments, and a lot of people commenting. I think having closing such a heavily commented issue with multiple people commenting should be a great "boost" for importance of such a PR (and solving a long standing issue might be one as well).
   
   WDYT @rossturk ? Should be rather easy to implement. We can also split it to a separate PR  if you think it's too much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #25239:
URL: https://github.com/apache/airflow/pull/25239#issuecomment-1195371631

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #25239:
URL: https://github.com/apache/airflow/pull/25239


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rossturk commented on a diff in pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
rossturk commented on code in PR #25239:
URL: https://github.com/apache/airflow/pull/25239#discussion_r927880233


##########
dev/stats/get_important_pr_candidates.py:
##########
@@ -45,73 +46,180 @@
     required=True,
     help=textwrap.dedent(
         """
-        GitHub token used to authenticate.
-        You can set omit it if you have GITHUB_TOKEN env variable set
+        A GitHub token is required, and can also be provided by setting the GITHUB_TOKEN env variable.
         Can be generated with:
         https://github.com/settings/tokens/new?description=Read%20issues&scopes=repo:status"""
     ),
     envvar='GITHUB_TOKEN',
 )
 
-PROVIDER_SCORE = 0.5
-REGULAR_SCORE = 1.0
 
-REVIEW_INTERACTION_VALUE = 1.0
-COMMENT_INTERACTION_VALUE = 1.0
-REACTION_INTERACTION_VALUE = 0.1
+class PrStat:
+    PROVIDER_SCORE = 0.8
+    REGULAR_SCORE = 1.0
 
+    REVIEW_INTERACTION_VALUE = 2.0
+    COMMENT_INTERACTION_VALUE = 1.0
+    REACTION_INTERACTION_VALUE = 0.5
 
-class PrStat:
     def __init__(self, pull_request: PullRequest):
         self.pull_request = pull_request
         self._users: Set[str] = set()
 
-    @cached_property
+    @property

Review Comment:
   All the scoring methods shouldn't be cached, just the counting ones.



##########
dev/stats/get_important_pr_candidates.py:
##########
@@ -45,73 +46,180 @@
     required=True,
     help=textwrap.dedent(
         """
-        GitHub token used to authenticate.
-        You can set omit it if you have GITHUB_TOKEN env variable set
+        A GitHub token is required, and can also be provided by setting the GITHUB_TOKEN env variable.
         Can be generated with:
         https://github.com/settings/tokens/new?description=Read%20issues&scopes=repo:status"""
     ),
     envvar='GITHUB_TOKEN',
 )
 
-PROVIDER_SCORE = 0.5
-REGULAR_SCORE = 1.0
 
-REVIEW_INTERACTION_VALUE = 1.0
-COMMENT_INTERACTION_VALUE = 1.0
-REACTION_INTERACTION_VALUE = 0.1
+class PrStat:
+    PROVIDER_SCORE = 0.8
+    REGULAR_SCORE = 1.0
 
+    REVIEW_INTERACTION_VALUE = 2.0
+    COMMENT_INTERACTION_VALUE = 1.0
+    REACTION_INTERACTION_VALUE = 0.5
 
-class PrStat:
     def __init__(self, pull_request: PullRequest):
         self.pull_request = pull_request
         self._users: Set[str] = set()
 
-    @cached_property
+    @property

Review Comment:
   This is because all the scoring methods shouldn't be cached, just the counting ones.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #25239:
URL: https://github.com/apache/airflow/pull/25239#issuecomment-1195371284

   Works for me :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rossturk commented on a diff in pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
rossturk commented on code in PR #25239:
URL: https://github.com/apache/airflow/pull/25239#discussion_r927879982


##########
dev/stats/get_important_pr_candidates.py:
##########
@@ -45,73 +46,180 @@
     required=True,
     help=textwrap.dedent(
         """
-        GitHub token used to authenticate.
-        You can set omit it if you have GITHUB_TOKEN env variable set
+        A GitHub token is required, and can also be provided by setting the GITHUB_TOKEN env variable.
         Can be generated with:
         https://github.com/settings/tokens/new?description=Read%20issues&scopes=repo:status"""
     ),
     envvar='GITHUB_TOKEN',
 )
 
-PROVIDER_SCORE = 0.5
-REGULAR_SCORE = 1.0
 
-REVIEW_INTERACTION_VALUE = 1.0

Review Comment:
   Moving these inside the class lets us import the class into a notebook, needed for successful unpickling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rossturk commented on pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
rossturk commented on PR #25239:
URL: https://github.com/apache/airflow/pull/25239#issuecomment-1194320592

   @potiuk This is a great idea! I think it would also be cool to implement the suggestion from @dstandish about giving non-committer PRs more weight.
   
   Since this would be for next month at this point, let's do it in another PR. I think I may loop in @merobi-hub!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] rossturk commented on a diff in pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
rossturk commented on code in PR #25239:
URL: https://github.com/apache/airflow/pull/25239#discussion_r927881584


##########
dev/stats/get_important_pr_candidates.py:
##########
@@ -123,58 +231,84 @@ def __str__(self) -> str:
 
 
 @click.command()
+@option_github_token  # TODO: this should only be required if --load isn't provided

Review Comment:
   I attempted to implement a custom click.Option class to accomplish this and burned too much time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #25239: Adjust heuristics for PR of the Month script

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #25239:
URL: https://github.com/apache/airflow/pull/25239#issuecomment-1192798040

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org