You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Valentyn Tymofieiev <va...@google.com> on 2021/09/23 02:12:58 UTC

Re: Please help triage issues!

Thanks, Kyle!

Since adding labels is a part of possible triaging action, it would be good
to increase their usefulness.

Possible ideas:
- Add a triaging recommendation page that shows commonly used labels that
still make sense to use, add few word recommendation on when to use them if
not obvious. It would be nice if this page was easily discoverable when
users are looking for how to report issues.
- Given that labels are arbitrary and we don't control them, we should
create filters[1] and Kanban boards[2], which we control, and use them to
look up triaged issues in the future. Filters can pull information from
multiple labels. For example the flaky test filter[3] should pull issues
with labels 'flake', 'flaky', 'flaky-test', 'flakey'... We may need to
update filters periodically when synonymous labels start to appear.
- Some filters/boards I'd like to see: issues suitable for a first-time
PR,  ease-of-use issues, filters for medium-size self-contained projects
suitable for new contributors who are ramping up on a certain area of Beam.
- Use components instead of labels when a identical component is available.
- Maybe cleanup some of the labels if no longer useful. perhaps there is an
automated way to do that easily.

[1]
https://issues.apache.org/jira/secure/ManageFilters.jspa?filterView=search
[2] https://issues.apache.org/jira/secure/ManageRapidViews.jspa
[3]
https://issues.apache.org/jira/secure/RapidView.jspa?rapidView=464&tab=filter

Here are some more labels I collected among all open and closed issues.



   1. :jira$ cat ./get_all_labels.sh
   2. for file_column in $(
   3.   for file in *.csv; do
   4.     cat $file | head -n1 | awk -F ',' -v fname="$file"
'{for(c=1;c<=NF;c++) if($c=="Labels") print fname "," c }' ;
   5.   done
   6.   );
   7. do
   8.   file=$(echo $file_column | cut -d',' -f1);
   9.   column=$(echo $file_column | cut -d',' -f2);
   10.   # cat $file | tail -n +2 | awk -F  "\"*,\"*" '{print $'$colname'}'
   11.   csvtool format '%('$column')\n' $file | tail -n +2
   12. done  | grep -v "^\s*$" | sort | uniq -c | sort -h -r
   13.
   14.
   15. :jira$ ./get_all_labels.sh
   16.    1166 Done
   17.     530 Clarified
   18.     382 starter
   19.     332 portability
   20.     252 flake
   21.     222 stale-assigned
   22.     194 newbie
   23.     185 currently-failing
   24.     154 stale-P2
   25.     128 portability-spark
   26.     122 website-revamp-2020
   27.     115 beam-fixit
   28.     113 portability-flink
   29.     111 backward-incompatible
   30.      71 dataframe-api
   31.      69 dsl_sql_merge
   32.      60 easyfix
   33.      49 triaged
   34.      48 errorprone
   35.      47 beginner
   36.      39 documentation
   37.      35 zetasql-compliance
   38.      31 python
   39.      31 flaky-test
   40.      30 sickbay
   41.      30 portability-samza
   42.      28 gradle
   43.      25 structured-streaming
   44.      24 BeamSummitEU2019
   45.      24 beamsummit
   46.      22 zetasql-java-udf
   47.      22 findbugs
   48.      22 easy
   49.      22 BeamSummitWebsite
   50.      21 io
   51.      21 dataflow
   52.      20 performance
   53.      20 nexmark
   54.      20 features
   55.      20 beamevents
   56.      19 GCP
   57.      19 dsl_sql_review
   58.      16 types
   59.      16 pipeline-patterns
   60.      15 pull-request-available
   61.      15 ccoss2019
   62.      15 bigquery
   63.      14 test
   64.      14 simple
   65.      14 beam-site-automation-reliability
   66.      13 mentor
   67.      13 gcp
   68.      13 flaky
   69.      12 sdk-consistency
   70.      12 gsoc
   71.      12 beam
   72.      11 website-revamp-sprint-8
   73.      10 java
   74.       9 website-revamp-sprint-5
   75.       9 schema-io
   76.       8 website-revamp-sprint-4
   77.       8 usability
   78.       7 test-failure
   79.       7 patch
   80.       7 JdbcIO
   81.       6 Windowing
   82.       6 security
   83.       6 mongodb
   84.       6 KafkaIO
   85.       6 infra
   86.       6 gsoc2020
   87.       6 gsoc2019
   88.       6 flink
   89.       6 build
   90.       6 backwards-incompatible
   91.       5 website-revamp-sprint-9
   92.       5 website-revamp-sprint-6
   93.       5 website-revamp-sprint-10
   94.       5 Triggers
   95.       5 State
   96.       5 outreachy19dec
   97.       5 jenkins
   98.       5 google-cloud-spanner
   99.       5 easy-fix
   100.       5 datastore
   101.       4 windowing
   102.       4 website-revamp-sprint-3
   103.       4 thrift
   104.       4 SQL
   105.       4 Python
   106.       4 MongoDB
   107.       4 java11
   108.       4 infrastructure
   109.       4 FlinkRunner
   110.       4 community-metrics
   111.       4 beam-website-sprint-2
   112.       4 beamsummitsponsor
   113.       4 apache-beam
   114.       3 website-revamp-sprint-12
   115.       3 website-revamp-sprint-11
   116.       3 testing
   117.       3 sql
   118.       3 spark
   119.       3 python-wheel
   120.       3 PubSubIO
   121.       3 pubsubio
   122.       3 pubsub
   123.       3 portable-metrics-bugs
   124.       3 P2
   125.       3 noob
   126.       3 metrics
   127.       3 maven
   128.       3 kafka
   129.       3 jdbc
   130.       3 intellij
   131.       3 GSoC2019
   132.       3 google-dataflow
   133.       3 document
   134.       3 community-onboarding
   135.       3 cloud
   136.       3 CI
   137.       3 bundle
   138.       3 bug
   139.       3 azureblob
   140.       2 website-revamp-sprint-7
   141.       2 watermark
   142.       2 test-failures
   143.       2 test-fail
   144.       2 Starter
   145.       2 split
   146.       2 spark-runner
   147.       2 schema
   148.       2 release
   149.       2 regression
   150.       2 reference
   151.       2 python-packages
   152.       2 PubsubLiteIO
   153.       2 perfomance
   154.       2 parallel-deployment
   155.       2 MySQL
   156.       2 MQTT
   157.       2 mitigated
   158.       2 join
   159.       2 Jenkins
   160.       2 javadoc
   161.       2 Java8
   162.       2 Java11
   163.       2 IO
   164.       2 has-pr
   165.       2 gsod2019
   166.       2 gsod
   167.       2 gsoc2018
   168.       2 gsoc2017
   169.       2 google-cloud-bigquery
   170.       2 golang
   171.       2 gcs
   172.       2 documentaion
   173.       2 docker
   174.       2 dataflow-runner-v2
   175.       2 containers
   176.       2 cassandra
   177.       2 buid
   178.       2 blocking-postcommit
   179.       2 bigdata
   180.       2 azure
   181.       2 AWS
   182.       1 www
   183.       1 windows
   184.       1 web
   185.       1 Watermark
   186.       1 vulnerabilities
   187.       1 Update
   188.       1 typo
   189.       1 Triaged
   190.       1 TFX+Beam
   191.       1 text
   192.       1 tests
   193.       1 test-patch
   194.       1 testlabel
   195.       1 test-infra
   196.       1 test-framework
   197.       1 tensorflow-datasets
   198.       1 tensorflow
   199.       1 T5
   200.       1 streaming
   201.       1 storage
   202.       1 starer
   203.       1 SSLException
   204.       1 SQS
   205.       1 sql-engine
   206.       1 spring-boot
   207.       1 spotbugs
   208.       1 spark-streaming
   209.       1 sparkrunner
   210.       1 spam
   211.       1 Snappy
   212.       1 SLF4J
   213.       1 sideinput
   214.       1 shade
   215.       1 SESSION
   216.       1 session
   217.       1 serialization
   218.       1 serializable
   219.       1 sdk-py-core
   220.       1 sdk
   221.       1 savepoints
   222.       1 S3
   223.       1 runner
   224.       1 restful
   225.       1 requirements
   226.       1 rabbitmq
   227.       1 quickstart
   228.       1 python-sqltransform
   229.       1 python-conversion
   230.       1 python3
   231.       1 precommit
   232.       1 pom.xml
   233.       1 Periodic
   234.       1 Parquet
   235.       1 parquet
   236.       1 parameter
   237.       1 P3
   238.       1 p2
   239.       1 p1
   240.       1 oracle
   241.       1 OOM
   242.       1 on-hold
   243.       1 offset
   244.       1 Novice
   245.       1 node.js
   246.       1 newbie,
   247.       1 n00b
   248.       1 multi-threading
   249.       1 mongo
   250.       1 low-hanging-fruit
   251.       1 logging,
   252.       1 log-aggregation
   253.       1 log4j
   254.       1 log
   255.       1 Learning
   256.       1 label123
   257.       1 kubernetes
   258.       1 kotlin
   259.       1 kafkaio
   260.       1 jdbc_connector
   261.       1 JavaDoc
   262.       1 java9
   263.       1 Java
   264.       1 I/O
   265.       1 hash
   266.       1 Guava
   267.       1 gsoc2021
   268.       1 Grouping
   269.       1 gradle-wrapper
   270.       1 google-cloud-dataflow
   271.       1 google
   272.       1 github
   273.       1 Flink
   274.       1 flakey
   275.       1 file-component
   276.       1 fieldtype
   277.       1 feature-request
   278.       1 failed-test
   279.       1 experimental
   280.       1 examples
   281.       1 eos
   282.       1 elasticsearch
   283.       1 Eclipse
   284.       1 EaseOfUse
   285.       1 duplicate
   286.       1 Documentation
   287.       1 docuentation
   288.       1 doc_cleanup
   289.       1 Doc
   290.       1 doc
   291.       1 dependencies
   292.       1 cross-platform
   293.       1 Couchbase
   294.       1 contribution-guide
   295.       1 compile-error
   296.       1 codehealth
   297.       1 ClassNotFoundException
   298.       1 ClassCastException
   299.       1 CI/CD
   300.       1 ci-builds
   301.       1 ci
   302.       1 calcite
   303.       1 C4
   304.       1 c
   305.       1 blog
   306.       1 blocking
   307.       1 Bigtable
   308.       1 aws-s3
   309.       1 aws
   310.       1 auth
   311.       1 apex-runner
   312.       1 apache
   313.       1 annotation
   314.       1 2.2.0


On Thu, May 13, 2021 at 12:08 PM Kyle Weaver <kc...@google.com> wrote:

> It's a little cumbersome, but you can query JIRA and export a CSV with the
> labels, and run a script to count them. Also, it won't let you export
> results from a query with more than 1000 results.
>
> Here's the list from query "project = beam and created > startOfYear()"
>
> dataframe-api 45
> stale-P2 36
> currently-failing 31
> stale-assigned 31
> website-revamp-2020 28
> flake 27
> zetasql-java-udf 17
> portability-spark 6
> portability-flink 4
> test-failure 4
> starter 4
> MongoDB 3
> Python 3
> PubSubIO 3
> GCP 3
> pipeline-patterns 3
> newbie 2
> python 2
> PubsubLiteIO 2
> beam-fixit 2
> vulnerabilities 1
> documentation 1
> containers 1
> types 1
> mongo 1
> mongodb 1
> elasticsearch 1
> dataflow 1
> java 1
> Grouping 1
> Windowing 1
> Doc 1
> Learning 1
> ClassNotFoundException 1
> jdbc 1
> gcp 1
> pubsub 1
> pubsubio 1
> apache-beam 1
> ClassCastException 1
> JdbcIO 1
> MySQL 1
> easyfix 1
> pull-request-available 1
> gsoc 1
> gsoc2021 1
> mentor 1
> python-sqltransform 1
> OOM 1
> AWS 1
> multi-threading 1
> S3 1
> log4j 1
> log-aggregation 1
> "logging 1
> " 1
> SLF4J 1
> google-cloud-spanner 1
> kafka 1
> savepoints 1
> flaky-test 1
> website-revamp-sprint-12 1
> structured-streaming 1
> nexmark 1
>
>
> On Wed, May 12, 2021 at 3:10 PM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
>> Is there a way to see the list of labels used in Beam ? I found a
>> discussion on using labels gadget and some SQL queries to pull the
>> labels[1], but did not find a way to use them  - does anyone have hands-on
>> experience with any of these approaches? Does adding a gadget require PMC
>> privileges?
>>
>> Thanks!
>>
>> [1]
>> https://community.atlassian.com/t5/Jira-questions/Is-there-a-way-to-get-a-list-of-all-labels-being-used-in-a/qaq-p/344778
>>
>> On Mon, Mar 29, 2021 at 10:59 AM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> We are down to about 550.
>>>
>>> I randomly selected some long-time contributors who I am sure know about
>>> components and priorities well enough. There are 10-15 issues across a
>>> number of people. If these are already good, then it would close out a lot
>>> of them and help focus on the ones that need attention.
>>>
>>> This Jira search searches by "current user" so you should see the bugs
>>> that you have reported that are still marked as "Triage Needed". Take a
>>> quick look and if you are confident you got the components, priority,
>>> labels (especially "currently-failing" and "flake") then you could bulk
>>> edit them to "Open" status:
>>>
>>>
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20%3D%20%22Triage%20Needed%22%20AND%20reporter%20in%20(currentUser())
>>>
>>> Kenn
>>>
>>> On Mon, Mar 15, 2021 at 10:28 AM Tyson Hamilton <ty...@google.com>
>>> wrote:
>>>
>>>> There is a 'Triaged' button that I click:
>>>> https://photos.app.goo.gl/Ub5Qwnpp6aFrmaDZ9
>>>>
>>>> On Mon, Mar 15, 2021 at 9:48 AM Alex Amato <aj...@google.com> wrote:
>>>>
>>>>> (Do I need certain permissions to be able to do this?)
>>>>>
>>>>> On Mon, Mar 15, 2021 at 9:47 AM Alex Amato <aj...@google.com> wrote:
>>>>>
>>>>>> Would you mind posting a screenshot of exactly where you are supposed
>>>>>> to click to move a jira issue to "Open" status? I honestly can't find where
>>>>>> to click. I don't see the option in the edit dialog box
>>>>>>
>>>>>> On Sun, Mar 14, 2021 at 8:03 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> No need for feeling any guilt :-)
>>>>>>>
>>>>>>> I'm just hoping that by everyone randomly doing a very small amount
>>>>>>> of work, this could be in good shape very quickly. I've done a number of
>>>>>>> bulk edits like automated dependency upgrade requests which brings the
>>>>>>> number down to just over 600.
>>>>>>>
>>>>>>> Your message does highlight some easy cases: issues filed to track
>>>>>>> your own feature work. I did built automation for this: "On Issue Created"
>>>>>>> -> "If Assignee == Issue Creator" -> "Transition to 'Open'". If the
>>>>>>> automation isn't working, that can probably be fixed. Some of the issues
>>>>>>> might just predate the automation.
>>>>>>>
>>>>>>> To be super clear: I don't mean to ask anyone to waste time looking
>>>>>>> at things that don't need attention, but to be able to notice things that
>>>>>>> do need attention. I did a few manually too, and the components, issue
>>>>>>> type, and priority very often need fixing up. I especially want to get
>>>>>>> untriaged P0s and P1s to zero.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Fri, Mar 12, 2021 at 5:07 PM Tyson Hamilton <ty...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm guilty of creating issues and not moving them to 'open'. I'll
>>>>>>>> do better to move them to open in the future. To recompense I will spend
>>>>>>>> some additional time triaging =)
>>>>>>>>
>>>>>>>> Thanks for the review of the flow.
>>>>>>>>
>>>>>>>> On Thu, Mar 11, 2021 at 12:39 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> You may or may not think about this very often, but our Jira
>>>>>>>>> workflow goes like this:
>>>>>>>>>
>>>>>>>>> Needs Triage --> Open --> In Progress --> Resolved
>>>>>>>>>
>>>>>>>>> "Needs Triage" means someone needs to look at it briefly:
>>>>>>>>>
>>>>>>>>>  - component(s)
>>>>>>>>>  - label(s)
>>>>>>>>>  - issue type
>>>>>>>>>  - priority (see
>>>>>>>>> https://beam.apache.org/contribute/jira-priorities/)
>>>>>>>>>  - if appropriate, ping someone or write to dev@ especially for
>>>>>>>>> P1 and P0
>>>>>>>>>
>>>>>>>>> Then transition the issue to "Open".
>>>>>>>>>
>>>>>>>>> Currently there is a big backlog but I don't think it is actually
>>>>>>>>> accurate. I also think we have enough people to keep up with this and even
>>>>>>>>> to eliminate the backlog pretty quick.
>>>>>>>>>
>>>>>>>>> Here are some things you can do when you are waiting for Jenkins
>>>>>>>>> tests to complete:
>>>>>>>>>
>>>>>>>>>  - check your assigned issues
>>>>>>>>>  - open up this filter and triage a couple issues at random:
>>>>>>>>> https://issues.apache.org/jira/issues/?filter=12345682
>>>>>>>>>
>>>>>>>>> 800+ may seem like a lot, but dev@ had 65 participants in the
>>>>>>>>> last 28 days (126 participants in the last 3 months). I would guess it
>>>>>>>>> averages less than a minute per issue so this could be done in less than a
>>>>>>>>> day, especially considering our CI times :-)
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>>

Re: Please help triage issues!

Posted by Valentyn Tymofieiev <va...@google.com>.
Re: using filters to query several labels:

I created a filter [1] for various labels that refer to 'flake' and are
still open. The query should be editable by committers. The filter is
accessible by a short link http://s.apache.org/beam-flakes

There is already a filter for starter tasks[2], accessible by
http://s.apache.org/beam-starter-tasks. Filter is currently not editable
but captures current open starter issues. There were other labels (simple,
noob, n00b, novice, Learning) not included in this filter, but
issues with these labels included one of the labels included in the query
or are fixed now.

Finding all relevant filters/hotlists is tricky: there appears to be at
least 3 different ways, each having some unique results:

- Filters with the word 'Beam' [3]
- Filters shared with 'Project' 'Beam' [4]
- Filters shared with 'Group' beam[5]

The links in 3-5 are not readily copyable from the jira URLs (I had to fish
the form IDs.).

As a best practice, perhaps we should add a word 'Beam' in the filter name,
given that the filter names are in global namespace, and make queries
editable by Beam committers. Then we could use a shortlink
http://s.apache.org/beam-filters (new link pointing to [5]) to look up
relevant filters, and modify queries as necessary.

To edit filter settings one can use a link like [6].

[1] https://issues.apache.org/jira/issues/?filter=12350929
[2] https://issues.apache.org/jira/issues/?filter=12343676
[3]
https://issues.apache.org/jira/secure/ManageFilters.jspa?search=search&searchName=Beam&searchShareType=any&roleShare=&returnUrl=ManageFilters.jspa&Search=Search
[4]
https://issues.apache.org/jira/secure/ManageFilters.jspa?search=search&searchName=&searchShareType=project&projectShare=12319527&roleShare=&returnUrl=ManageFilters.jspa&Search=Search&filterView=search
[5]
https://issues.apache.org/jira/secure/ManageFilters.jspa?search=search&searchName=&searchShareType=group&groupShare=beam&roleShare=&returnUrl=ManageFilters.jspa&Search=Search&filterView=search
[6]
https://issues.apache.org/jira/secure/EditFilter!default.jspa?filterId=12349495

On Wed, Sep 22, 2021 at 7:12 PM Valentyn Tymofieiev <va...@google.com>
wrote:

> Thanks, Kyle!
>
> Since adding labels is a part of possible triaging action, it would be
> good to increase their usefulness.
>
> Possible ideas:
> - Add a triaging recommendation page that shows commonly used labels that
> still make sense to use, add few word recommendation on when to use them if
> not obvious. It would be nice if this page was easily discoverable when
> users are looking for how to report issues.
> - Given that labels are arbitrary and we don't control them, we should
> create filters[1] and Kanban boards[2], which we control, and use them to
> look up triaged issues in the future. Filters can pull information from
> multiple labels. For example the flaky test filter[3] should pull issues
> with labels 'flake', 'flaky', 'flaky-test', 'flakey'... We may need to
> update filters periodically when synonymous labels start to appear.
> - Some filters/boards I'd like to see: issues suitable for a first-time
> PR,  ease-of-use issues, filters for medium-size self-contained projects
> suitable for new contributors who are ramping up on a certain area of Beam.
> - Use components instead of labels when a identical component is available.
> - Maybe cleanup some of the labels if no longer useful. perhaps there is
> an automated way to do that easily.
>
> [1]
> https://issues.apache.org/jira/secure/ManageFilters.jspa?filterView=search
> [2] https://issues.apache.org/jira/secure/ManageRapidViews.jspa
> [3]
> https://issues.apache.org/jira/secure/RapidView.jspa?rapidView=464&tab=filter
>
> Here are some more labels I collected among all open and closed issues.
>
>
>
>    1. :jira$ cat ./get_all_labels.sh
>    2. for file_column in $(
>    3.   for file in *.csv; do
>    4.     cat $file | head -n1 | awk -F ',' -v fname="$file" '{for(c=1;c<=NF;c++) if($c=="Labels") print fname "," c }' ;
>    5.   done
>    6.   );
>    7. do
>    8.   file=$(echo $file_column | cut -d',' -f1);
>    9.   column=$(echo $file_column | cut -d',' -f2);
>    10.   # cat $file | tail -n +2 | awk -F  "\"*,\"*" '{print $'$colname'}'
>    11.   csvtool format '%('$column')\n' $file | tail -n +2
>    12. done  | grep -v "^\s*$" | sort | uniq -c | sort -h -r
>    13.
>    14.
>    15. :jira$ ./get_all_labels.sh
>    16.    1166 Done
>    17.     530 Clarified
>    18.     382 starter
>    19.     332 portability
>    20.     252 flake
>    21.     222 stale-assigned
>    22.     194 newbie
>    23.     185 currently-failing
>    24.     154 stale-P2
>    25.     128 portability-spark
>    26.     122 website-revamp-2020
>    27.     115 beam-fixit
>    28.     113 portability-flink
>    29.     111 backward-incompatible
>    30.      71 dataframe-api
>    31.      69 dsl_sql_merge
>    32.      60 easyfix
>    33.      49 triaged
>    34.      48 errorprone
>    35.      47 beginner
>    36.      39 documentation
>    37.      35 zetasql-compliance
>    38.      31 python
>    39.      31 flaky-test
>    40.      30 sickbay
>    41.      30 portability-samza
>    42.      28 gradle
>    43.      25 structured-streaming
>    44.      24 BeamSummitEU2019
>    45.      24 beamsummit
>    46.      22 zetasql-java-udf
>    47.      22 findbugs
>    48.      22 easy
>    49.      22 BeamSummitWebsite
>    50.      21 io
>    51.      21 dataflow
>    52.      20 performance
>    53.      20 nexmark
>    54.      20 features
>    55.      20 beamevents
>    56.      19 GCP
>    57.      19 dsl_sql_review
>    58.      16 types
>    59.      16 pipeline-patterns
>    60.      15 pull-request-available
>    61.      15 ccoss2019
>    62.      15 bigquery
>    63.      14 test
>    64.      14 simple
>    65.      14 beam-site-automation-reliability
>    66.      13 mentor
>    67.      13 gcp
>    68.      13 flaky
>    69.      12 sdk-consistency
>    70.      12 gsoc
>    71.      12 beam
>    72.      11 website-revamp-sprint-8
>    73.      10 java
>    74.       9 website-revamp-sprint-5
>    75.       9 schema-io
>    76.       8 website-revamp-sprint-4
>    77.       8 usability
>    78.       7 test-failure
>    79.       7 patch
>    80.       7 JdbcIO
>    81.       6 Windowing
>    82.       6 security
>    83.       6 mongodb
>    84.       6 KafkaIO
>    85.       6 infra
>    86.       6 gsoc2020
>    87.       6 gsoc2019
>    88.       6 flink
>    89.       6 build
>    90.       6 backwards-incompatible
>    91.       5 website-revamp-sprint-9
>    92.       5 website-revamp-sprint-6
>    93.       5 website-revamp-sprint-10
>    94.       5 Triggers
>    95.       5 State
>    96.       5 outreachy19dec
>    97.       5 jenkins
>    98.       5 google-cloud-spanner
>    99.       5 easy-fix
>    100.       5 datastore
>    101.       4 windowing
>    102.       4 website-revamp-sprint-3
>    103.       4 thrift
>    104.       4 SQL
>    105.       4 Python
>    106.       4 MongoDB
>    107.       4 java11
>    108.       4 infrastructure
>    109.       4 FlinkRunner
>    110.       4 community-metrics
>    111.       4 beam-website-sprint-2
>    112.       4 beamsummitsponsor
>    113.       4 apache-beam
>    114.       3 website-revamp-sprint-12
>    115.       3 website-revamp-sprint-11
>    116.       3 testing
>    117.       3 sql
>    118.       3 spark
>    119.       3 python-wheel
>    120.       3 PubSubIO
>    121.       3 pubsubio
>    122.       3 pubsub
>    123.       3 portable-metrics-bugs
>    124.       3 P2
>    125.       3 noob
>    126.       3 metrics
>    127.       3 maven
>    128.       3 kafka
>    129.       3 jdbc
>    130.       3 intellij
>    131.       3 GSoC2019
>    132.       3 google-dataflow
>    133.       3 document
>    134.       3 community-onboarding
>    135.       3 cloud
>    136.       3 CI
>    137.       3 bundle
>    138.       3 bug
>    139.       3 azureblob
>    140.       2 website-revamp-sprint-7
>    141.       2 watermark
>    142.       2 test-failures
>    143.       2 test-fail
>    144.       2 Starter
>    145.       2 split
>    146.       2 spark-runner
>    147.       2 schema
>    148.       2 release
>    149.       2 regression
>    150.       2 reference
>    151.       2 python-packages
>    152.       2 PubsubLiteIO
>    153.       2 perfomance
>    154.       2 parallel-deployment
>    155.       2 MySQL
>    156.       2 MQTT
>    157.       2 mitigated
>    158.       2 join
>    159.       2 Jenkins
>    160.       2 javadoc
>    161.       2 Java8
>    162.       2 Java11
>    163.       2 IO
>    164.       2 has-pr
>    165.       2 gsod2019
>    166.       2 gsod
>    167.       2 gsoc2018
>    168.       2 gsoc2017
>    169.       2 google-cloud-bigquery
>    170.       2 golang
>    171.       2 gcs
>    172.       2 documentaion
>    173.       2 docker
>    174.       2 dataflow-runner-v2
>    175.       2 containers
>    176.       2 cassandra
>    177.       2 buid
>    178.       2 blocking-postcommit
>    179.       2 bigdata
>    180.       2 azure
>    181.       2 AWS
>    182.       1 www
>    183.       1 windows
>    184.       1 web
>    185.       1 Watermark
>    186.       1 vulnerabilities
>    187.       1 Update
>    188.       1 typo
>    189.       1 Triaged
>    190.       1 TFX+Beam
>    191.       1 text
>    192.       1 tests
>    193.       1 test-patch
>    194.       1 testlabel
>    195.       1 test-infra
>    196.       1 test-framework
>    197.       1 tensorflow-datasets
>    198.       1 tensorflow
>    199.       1 T5
>    200.       1 streaming
>    201.       1 storage
>    202.       1 starer
>    203.       1 SSLException
>    204.       1 SQS
>    205.       1 sql-engine
>    206.       1 spring-boot
>    207.       1 spotbugs
>    208.       1 spark-streaming
>    209.       1 sparkrunner
>    210.       1 spam
>    211.       1 Snappy
>    212.       1 SLF4J
>    213.       1 sideinput
>    214.       1 shade
>    215.       1 SESSION
>    216.       1 session
>    217.       1 serialization
>    218.       1 serializable
>    219.       1 sdk-py-core
>    220.       1 sdk
>    221.       1 savepoints
>    222.       1 S3
>    223.       1 runner
>    224.       1 restful
>    225.       1 requirements
>    226.       1 rabbitmq
>    227.       1 quickstart
>    228.       1 python-sqltransform
>    229.       1 python-conversion
>    230.       1 python3
>    231.       1 precommit
>    232.       1 pom.xml
>    233.       1 Periodic
>    234.       1 Parquet
>    235.       1 parquet
>    236.       1 parameter
>    237.       1 P3
>    238.       1 p2
>    239.       1 p1
>    240.       1 oracle
>    241.       1 OOM
>    242.       1 on-hold
>    243.       1 offset
>    244.       1 Novice
>    245.       1 node.js
>    246.       1 newbie,
>    247.       1 n00b
>    248.       1 multi-threading
>    249.       1 mongo
>    250.       1 low-hanging-fruit
>    251.       1 logging,
>    252.       1 log-aggregation
>    253.       1 log4j
>    254.       1 log
>    255.       1 Learning
>    256.       1 label123
>    257.       1 kubernetes
>    258.       1 kotlin
>    259.       1 kafkaio
>    260.       1 jdbc_connector
>    261.       1 JavaDoc
>    262.       1 java9
>    263.       1 Java
>    264.       1 I/O
>    265.       1 hash
>    266.       1 Guava
>    267.       1 gsoc2021
>    268.       1 Grouping
>    269.       1 gradle-wrapper
>    270.       1 google-cloud-dataflow
>    271.       1 google
>    272.       1 github
>    273.       1 Flink
>    274.       1 flakey
>    275.       1 file-component
>    276.       1 fieldtype
>    277.       1 feature-request
>    278.       1 failed-test
>    279.       1 experimental
>    280.       1 examples
>    281.       1 eos
>    282.       1 elasticsearch
>    283.       1 Eclipse
>    284.       1 EaseOfUse
>    285.       1 duplicate
>    286.       1 Documentation
>    287.       1 docuentation
>    288.       1 doc_cleanup
>    289.       1 Doc
>    290.       1 doc
>    291.       1 dependencies
>    292.       1 cross-platform
>    293.       1 Couchbase
>    294.       1 contribution-guide
>    295.       1 compile-error
>    296.       1 codehealth
>    297.       1 ClassNotFoundException
>    298.       1 ClassCastException
>    299.       1 CI/CD
>    300.       1 ci-builds
>    301.       1 ci
>    302.       1 calcite
>    303.       1 C4
>    304.       1 c
>    305.       1 blog
>    306.       1 blocking
>    307.       1 Bigtable
>    308.       1 aws-s3
>    309.       1 aws
>    310.       1 auth
>    311.       1 apex-runner
>    312.       1 apache
>    313.       1 annotation
>    314.       1 2.2.0
>
>
> On Thu, May 13, 2021 at 12:08 PM Kyle Weaver <kc...@google.com> wrote:
>
>> It's a little cumbersome, but you can query JIRA and export a CSV with
>> the labels, and run a script to count them. Also, it won't let you export
>> results from a query with more than 1000 results.
>>
>> Here's the list from query "project = beam and created > startOfYear()"
>>
>> dataframe-api 45
>> stale-P2 36
>> currently-failing 31
>> stale-assigned 31
>> website-revamp-2020 28
>> flake 27
>> zetasql-java-udf 17
>> portability-spark 6
>> portability-flink 4
>> test-failure 4
>> starter 4
>> MongoDB 3
>> Python 3
>> PubSubIO 3
>> GCP 3
>> pipeline-patterns 3
>> newbie 2
>> python 2
>> PubsubLiteIO 2
>> beam-fixit 2
>> vulnerabilities 1
>> documentation 1
>> containers 1
>> types 1
>> mongo 1
>> mongodb 1
>> elasticsearch 1
>> dataflow 1
>> java 1
>> Grouping 1
>> Windowing 1
>> Doc 1
>> Learning 1
>> ClassNotFoundException 1
>> jdbc 1
>> gcp 1
>> pubsub 1
>> pubsubio 1
>> apache-beam 1
>> ClassCastException 1
>> JdbcIO 1
>> MySQL 1
>> easyfix 1
>> pull-request-available 1
>> gsoc 1
>> gsoc2021 1
>> mentor 1
>> python-sqltransform 1
>> OOM 1
>> AWS 1
>> multi-threading 1
>> S3 1
>> log4j 1
>> log-aggregation 1
>> "logging 1
>> " 1
>> SLF4J 1
>> google-cloud-spanner 1
>> kafka 1
>> savepoints 1
>> flaky-test 1
>> website-revamp-sprint-12 1
>> structured-streaming 1
>> nexmark 1
>>
>>
>> On Wed, May 12, 2021 at 3:10 PM Valentyn Tymofieiev <va...@google.com>
>> wrote:
>>
>>> Is there a way to see the list of labels used in Beam ? I found a
>>> discussion on using labels gadget and some SQL queries to pull the
>>> labels[1], but did not find a way to use them  - does anyone have hands-on
>>> experience with any of these approaches? Does adding a gadget require PMC
>>> privileges?
>>>
>>> Thanks!
>>>
>>> [1]
>>> https://community.atlassian.com/t5/Jira-questions/Is-there-a-way-to-get-a-list-of-all-labels-being-used-in-a/qaq-p/344778
>>>
>>> On Mon, Mar 29, 2021 at 10:59 AM Kenneth Knowles <ke...@apache.org>
>>> wrote:
>>>
>>>> We are down to about 550.
>>>>
>>>> I randomly selected some long-time contributors who I am sure know
>>>> about components and priorities well enough. There are 10-15 issues across
>>>> a number of people. If these are already good, then it would close out a
>>>> lot of them and help focus on the ones that need attention.
>>>>
>>>> This Jira search searches by "current user" so you should see the bugs
>>>> that you have reported that are still marked as "Triage Needed". Take a
>>>> quick look and if you are confident you got the components, priority,
>>>> labels (especially "currently-failing" and "flake") then you could bulk
>>>> edit them to "Open" status:
>>>>
>>>>
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20status%20%3D%20%22Triage%20Needed%22%20AND%20reporter%20in%20(currentUser())
>>>>
>>>> Kenn
>>>>
>>>> On Mon, Mar 15, 2021 at 10:28 AM Tyson Hamilton <ty...@google.com>
>>>> wrote:
>>>>
>>>>> There is a 'Triaged' button that I click:
>>>>> https://photos.app.goo.gl/Ub5Qwnpp6aFrmaDZ9
>>>>>
>>>>> On Mon, Mar 15, 2021 at 9:48 AM Alex Amato <aj...@google.com> wrote:
>>>>>
>>>>>> (Do I need certain permissions to be able to do this?)
>>>>>>
>>>>>> On Mon, Mar 15, 2021 at 9:47 AM Alex Amato <aj...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Would you mind posting a screenshot of exactly where you are
>>>>>>> supposed to click to move a jira issue to "Open" status? I honestly can't
>>>>>>> find where to click. I don't see the option in the edit dialog box
>>>>>>>
>>>>>>> On Sun, Mar 14, 2021 at 8:03 PM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No need for feeling any guilt :-)
>>>>>>>>
>>>>>>>> I'm just hoping that by everyone randomly doing a very small amount
>>>>>>>> of work, this could be in good shape very quickly. I've done a number of
>>>>>>>> bulk edits like automated dependency upgrade requests which brings the
>>>>>>>> number down to just over 600.
>>>>>>>>
>>>>>>>> Your message does highlight some easy cases: issues filed to track
>>>>>>>> your own feature work. I did built automation for this: "On Issue Created"
>>>>>>>> -> "If Assignee == Issue Creator" -> "Transition to 'Open'". If the
>>>>>>>> automation isn't working, that can probably be fixed. Some of the issues
>>>>>>>> might just predate the automation.
>>>>>>>>
>>>>>>>> To be super clear: I don't mean to ask anyone to waste time looking
>>>>>>>> at things that don't need attention, but to be able to notice things that
>>>>>>>> do need attention. I did a few manually too, and the components, issue
>>>>>>>> type, and priority very often need fixing up. I especially want to get
>>>>>>>> untriaged P0s and P1s to zero.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Fri, Mar 12, 2021 at 5:07 PM Tyson Hamilton <ty...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'm guilty of creating issues and not moving them to 'open'. I'll
>>>>>>>>> do better to move them to open in the future. To recompense I will spend
>>>>>>>>> some additional time triaging =)
>>>>>>>>>
>>>>>>>>> Thanks for the review of the flow.
>>>>>>>>>
>>>>>>>>> On Thu, Mar 11, 2021 at 12:39 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> You may or may not think about this very often, but our Jira
>>>>>>>>>> workflow goes like this:
>>>>>>>>>>
>>>>>>>>>> Needs Triage --> Open --> In Progress --> Resolved
>>>>>>>>>>
>>>>>>>>>> "Needs Triage" means someone needs to look at it briefly:
>>>>>>>>>>
>>>>>>>>>>  - component(s)
>>>>>>>>>>  - label(s)
>>>>>>>>>>  - issue type
>>>>>>>>>>  - priority (see
>>>>>>>>>> https://beam.apache.org/contribute/jira-priorities/)
>>>>>>>>>>  - if appropriate, ping someone or write to dev@ especially for
>>>>>>>>>> P1 and P0
>>>>>>>>>>
>>>>>>>>>> Then transition the issue to "Open".
>>>>>>>>>>
>>>>>>>>>> Currently there is a big backlog but I don't think it is actually
>>>>>>>>>> accurate. I also think we have enough people to keep up with this and even
>>>>>>>>>> to eliminate the backlog pretty quick.
>>>>>>>>>>
>>>>>>>>>> Here are some things you can do when you are waiting for Jenkins
>>>>>>>>>> tests to complete:
>>>>>>>>>>
>>>>>>>>>>  - check your assigned issues
>>>>>>>>>>  - open up this filter and triage a couple issues at random:
>>>>>>>>>> https://issues.apache.org/jira/issues/?filter=12345682
>>>>>>>>>>
>>>>>>>>>> 800+ may seem like a lot, but dev@ had 65 participants in the
>>>>>>>>>> last 28 days (126 participants in the last 3 months). I would guess it
>>>>>>>>>> averages less than a minute per issue so this could be done in less than a
>>>>>>>>>> day, especially considering our CI times :-)
>>>>>>>>>>
>>>>>>>>>> Kenn
>>>>>>>>>>
>>>>>>>>>>