You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by be...@gmail.com on 2022/06/22 10:02:30 UTC

P1 issues report (70)

This is your daily summary of Beam's current P1 issues, not including flaky tests.

    See https://beam.apache.org/contribute/issue-priorities/#p1-critical for the meaning and expectations around P1 issues.



https://api.github.com/repos/apache/beam/issues/21978: [Playground] Implement Share Any Code feature on the frontend
https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to read or write to file when running Beam in Flink
https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject illformed GBK Coders
https://api.github.com/repos/apache/beam/issues/21897: [Feature Request]: Flink runner savepoint backward compatibility 
https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery Storage Write API implementation does not support table partitioning
https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner creates a new timer whenever the output timestamp is change
https://api.github.com/repos/apache/beam/issues/21763: [Playground Task]: Migrate from Google Analytics to Matomo Cloud
https://api.github.com/repos/apache/beam/issues/21715: Data missing when using CassandraIO.Read
https://api.github.com/repos/apache/beam/issues/21713: 404s in BigQueryIO don't get output to Failed Inserts PCollection
https://api.github.com/repos/apache/beam/issues/21711: Python Streaming job failing to drain with BigQueryIO write errors
https://api.github.com/repos/apache/beam/issues/21703: pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT failing in beam PostCommit Java V1
https://api.github.com/repos/apache/beam/issues/21700: --dataflowServiceOptions=use_runner_v2 is broken
https://api.github.com/repos/apache/beam/issues/21695: DataflowPipelineResult does not raise exception for unsuccessful states.
https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage API insert with writeResult retry and write to error table
https://api.github.com/repos/apache/beam/issues/21479: Install Python wheel and dependencies to local venv in SDK harness
https://api.github.com/repos/apache/beam/issues/21478: KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
https://api.github.com/repos/apache/beam/issues/21477: Add integration testing for BQ Storage API  write modes
https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery Dynamic table destinations returns wrong tableId
https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang Dataflow tests failing due to _InactiveRpcError
https://api.github.com/repos/apache/beam/issues/21473: PVR_Spark2_Streaming perma-red
https://api.github.com/repos/apache/beam/issues/21466: Simplify version override for Dev versions of the Go SDK.
https://api.github.com/repos/apache/beam/issues/21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle
https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned files
https://api.github.com/repos/apache/beam/issues/21268: Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory
https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi
https://api.github.com/repos/apache/beam/issues/21265: apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe induced) Bricked Dataflow Pipeline 
https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny, AfterAll do not follow spec
https://api.github.com/repos/apache/beam/issues/21260: Python DirectRunner does not emit data at GC time
https://api.github.com/repos/apache/beam/issues/21259: Consumer group with random prefix
https://api.github.com/repos/apache/beam/issues/21258: Dataflow error in CombinePerKey operation
https://api.github.com/repos/apache/beam/issues/21257: Either Create or DirectRunner fails to produce all elements to the following transform
https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.
https://api.github.com/repos/apache/beam/issues/21119: Migrate to the next version of Python `requests` when released
https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT Tests" - missing data in grafana
https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date conversion is sensitive to OS
https://api.github.com/repos/apache/beam/issues/21112: Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery
https://api.github.com/repos/apache/beam/issues/21111: Java creates an incorrect pipeline proto when core-construction-java jar is not in the CLASSPATH
https://api.github.com/repos/apache/beam/issues/21110: codecov/patch has poor behavior
https://api.github.com/repos/apache/beam/issues/21109: SDF BoundedSource seems to execute significantly slower than 'normal' BoundedSource
https://api.github.com/repos/apache/beam/issues/21108: java.io.InvalidClassException With Flink Kafka
https://api.github.com/repos/apache/beam/issues/20979: Portable runners should be able to issue checkpoints to Splittable DoFn
https://api.github.com/repos/apache/beam/issues/20978: PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some Avro logical types
https://api.github.com/repos/apache/beam/issues/20973: Python Beam SDK Harness hangs when installing pip packages
https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read does not handle XML encoding per spec
https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not acknowledging messages correctly
https://api.github.com/repos/apache/beam/issues/20813: No trigger early repeatedly for session windows
https://api.github.com/repos/apache/beam/issues/20812: Cross-language consistency (RequiresStableInputs) is quietly broken (at least on portable flink runner)
https://api.github.com/repos/apache/beam/issues/20692: Timer with dataflow runner can be set multiple times (dataflow runner)
https://api.github.com/repos/apache/beam/issues/20691: Beam metrics should be displayed in Flink UI "Metrics" tab
https://api.github.com/repos/apache/beam/issues/20689: Kafka commitOffsetsInFinalize OOM on Flink
https://api.github.com/repos/apache/beam/issues/20532: Support for coder argument in WriteToBigQuery
https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink: allow setting temp directory provider per dynamic destination
https://api.github.com/repos/apache/beam/issues/20530: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms
https://api.github.com/repos/apache/beam/issues/20529: SpannerIO tests don't actually assert anything.
https://api.github.com/repos/apache/beam/issues/20528: python CombineGlobally().with_fanout() cause duplicate combine results for sliding windows
https://api.github.com/repos/apache/beam/issues/20333: beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"
https://api.github.com/repos/apache/beam/issues/20332: FileIO writeDynamic with AvroIO.sink not writing all data
https://api.github.com/repos/apache/beam/issues/20330: Remove insecure ssl options from MongoDBIO
https://api.github.com/repos/apache/beam/issues/20109: SortValues should fail if SecondaryKey coder is not deterministic
https://api.github.com/repos/apache/beam/issues/20108: Python direct runner doesn't emit empty pane when it should
https://api.github.com/repos/apache/beam/issues/20009: Environment-sensitive provisioning for Dataflow
https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner)
https://api.github.com/repos/apache/beam/issues/19817: datetime and decimal should be logical types
https://api.github.com/repos/apache/beam/issues/19815: Add support for remaining data types in python RowCoder 
https://api.github.com/repos/apache/beam/issues/19813: PubsubIO returns empty message bodies for all messages read
https://api.github.com/repos/apache/beam/issues/19556: User reports protobuf ClassChangeError running against 2.6.0 or above
https://api.github.com/repos/apache/beam/issues/19369: KafkaIO doesn't commit offsets while being used as bounded source
https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java Precommit permared

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Brian Hulette <bh...@google.com>.

Thanks Danny! I just merged the PR

On Fri, Jun 24, 2022 at 8:52 AM Danny McCormick <da...@google.com>
wrote:

> I put up a pr to make these changes -
> https://github.com/apache/beam/pull/22045
>
> > 2. The links in this report start with api.github.* and don’t take us
> directly to the issues.
>
> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>
> This is already fixed - Pablo actually beat me to it!
> <https://github.com/apache/beam/pull/22033>
>
> Thanks,
> Danny
>
> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bh...@google.com> wrote:
>
>> +1 for that proposal!
>>
>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>> have a longer time window for the rest of not triaged or stagnate issues
>> and include them?
>>
>> I worry these lists would get _very_ long and wouldn't be actionable. But
>> maybe it's worth reporting something like "There are 376 P2's with no
>> update in the last 6 months" with a link to a query?
>>
>> > 2. The links in this report start with api.github.* and don’t take us
>> directly to the issues.
>>
>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>>
>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Thanks. I like the proposal, and I've found the emails useful.
>>> Best
>>> -P.
>>>
>>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com>
>>> wrote:
>>>
>>>> Sounds good! It’s like our internal reports of JIRA tickets exceeding
>>>> SLA time and having no response from engineers.  We either resolve them or
>>>> downgrade the priority to extend time window.
>>>>
>>>> Besides,
>>>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>>>> have a longer time window for the rest of not triaged or stagnate issues
>>>> and include them?
>>>> 2. The links in this report start with api.github.* and don’t take us
>>>> directly to the issues.
>>>>
>>>>
>>>> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>>>>
>>>>> That generally sounds right to me - I also would vote that we
>>>>> consolidate to 1 email and stop distinguishing between flaky P1s and normal
>>>>> P1s.
>>>>>
>>>>> So the single daily report would be:
>>>>>
>>>>> - Unassigned P0s
>>>>> - P0s with no update in the last 36 hours
>>>>> - Unassigned P1s
>>>>> - P1s with no update in the last 7 days
>>>>>
>>>>> I think that will generate a pretty good list of issues that require
>>>>> some kind of action.
>>>>>
>>>>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are
>>>>>> more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think that Danny's alternate proposal (a daily email that show
>>>>>>> only issues last updated >7 days ago, and those with no assignee) fits well
>>>>>>> with the two goals you describe, if we include "triage needed" issues in
>>>>>>> the latter category. Maybe we also explicitly separate these two concerns
>>>>>>> in the report?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Forking thread because lots of people may just ignore this topic,
>>>>>>>> per the discussion :-)
>>>>>>>>
>>>>>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>>>>>
>>>>>>>> I'll add some other outcomes of these emails:
>>>>>>>>
>>>>>>>>  - people file P0s that are not outages and P1s that are not data
>>>>>>>> loss and I downgrade them
>>>>>>>>  - I randomly open up a few flaky test bugs and see if I can fix
>>>>>>>> them really quick
>>>>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>>>>
>>>>>>>> Of these, only the last one seems important (not just that *I*
>>>>>>>> follow them, but that new P0s and P1s get immediate attention from many
>>>>>>>> eyes)
>>>>>>>>
>>>>>>>> So maybe one take on the goal is to:
>>>>>>>>
>>>>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>>>>> evaluated for release blocking, etc.
>>>>>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>>>>>
>>>>>>>> It can also be helpful to just state the failure modes which would
>>>>>>>> happen by default if we don't have a good process or automation:
>>>>>>>>
>>>>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>>>>> blocking users and/or community in real time
>>>>>>>>  - Real P1 gets filed and not noticed, so release goes out with
>>>>>>>> known data loss bug or other total loss of functionality
>>>>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and
>>>>>>>> making it hard to find the real problems
>>>>>>>>  - Flakes are never fixed
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> If we have P0s and P1s in the "awaiting triage" state, those are
>>>>>>>> the ones we need to notice. Then for a P0 or P1 outside of that state, we
>>>>>>>> just need some way of making sure it doesn't stagnate. Or if it does
>>>>>>>> stagnate, that empirically demonstrates it isn't really P1 (just like our
>>>>>>>> P2 to P3 downgrade automation). If everything is P1, nothing is, as they
>>>>>>>> say.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>
>>>>>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>
>>>>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>>>>
>>>>>>>>> > I think the motivation for daily emails is that per the
>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>> that hasn't been happening though...
>>>>>>>>>
>>>>>>>>> I guess the questions here are:
>>>>>>>>>
>>>>>>>>> 1) What is the goal of this email?
>>>>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>>>>
>>>>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>>>>> there almost always will be *some *valid P1s that don't need
>>>>>>>>> extra attention.
>>>>>>>>>
>>>>>>>>> I do still think this could have value as a weekly email, with the
>>>>>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>>>>>> these". Another option would be to only include issues with no action in
>>>>>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>>>>>
>>>>>>>>> A couple side notes:
>>>>>>>>> - No matter what we do, if we keep the current automation in any
>>>>>>>>> form we should fix the url from
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>>>>> https://github.com/apache/beam/issues/# - the current links are
>>>>>>>>> very annoying.
>>>>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>>>>> lower).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Danny
>>>>>>>>>
>>>>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <
>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I think the motivation for daily emails is that per the
>>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>>> that hasn't been happening though...
>>>>>>>>>>
>>>>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think a weekly summary seems like a good idea for the P1
>>>>>>>>>>> issues and flaky tests, though daily still seems appropriate for P0 issues.
>>>>>>>>>>> I put up https://github.com/apache/beam/pull/22017 to just send
>>>>>>>>>>> the P1/flaky test reports on Wednesdays, if anyone objects please let me
>>>>>>>>>>> know - I'll wait on merging til tomorrow to leave time for feedback (and
>>>>>>>>>>> it's always reversible 🙂).
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Danny
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <
>>>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> what is this daily summary intended for? Not all issues look
>>>>>>>>>>>> like P1. And will a weekly summary be less noise?
>>>>>>>>>>>>
>>>>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>>>>> including flaky tests.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     See
>>>>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]:
>>>>>>>>>>>>> No way to read or write to file when running Beam in Flink
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897:
>>>>>>>>>>>>> [Feature Request]: Flink runner savepoint backward compatibility
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>>>>>> partitioning
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794:
>>>>>>>>>>>>> Dataflow runner creates a new timer whenever the output timestamp is change
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s
>>>>>>>>>>>>> in BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694:
>>>>>>>>>>>>> BigQuery Storage API insert with writeResult retry and write to error table
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479:
>>>>>>>>>>>>> Install Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466:
>>>>>>>>>>>>> Simplify version override for Dev versions of the Go SDK.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>>>>> shuffle
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>>>>>> orphaned files
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>>>>> returned from googleapi
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263:
>>>>>>>>>>>>> (Broken Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259:
>>>>>>>>>>>>> Consumer group with random prefix
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258:
>>>>>>>>>>>>> Dataflow error in CombinePerKey operation
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123:
>>>>>>>>>>>>> Multiple jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>>>>> environment.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119:
>>>>>>>>>>>>> Migrate to the next version of Python `requests` when released
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java
>>>>>>>>>>>>> IO IT Tests" - missing data in grafana
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112:
>>>>>>>>>>>>> Dataflow SocketException (SSLException) error while trying to send message
>>>>>>>>>>>>> from Cloud Pub/Sub to BigQuery
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>>>>> in the CLASSPATH
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>>>>> BoundedSource
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979:
>>>>>>>>>>>>> Portable runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>>>>> some Avro logical types
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO
>>>>>>>>>>>>> is not acknowledging messages correctly
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>>>>> least on portable flink runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532:
>>>>>>>>>>>>> Support for coder argument in WriteToBigQuery
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>>>>> transforms
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529:
>>>>>>>>>>>>> SpannerIO tests don't actually assert anything.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>>>>>> windows
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>>>>> allocated"
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>>>>> (Direct Runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817:
>>>>>>>>>>>>> datetime and decimal should be logical types
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813:
>>>>>>>>>>>>> PubsubIO returns empty message bodies for all messages read
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369:
>>>>>>>>>>>>> KafkaIO doesn't commit offsets while being used as bounded source
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>>>>>> Java Precommit permared
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Kenneth Knowles <ke...@apache.org>.

Regarding P2s and P3s getting resolved as well: pretty much every healthy
project has a backlog that grows without bound. So we do need a place to
put that backlog. I think P3 is where things tend to end up, because P2s
that do not receive a comment are automatically downgraded to P3. These may
still be resolved, but there isn't any hope that _all_ of them get resolved.

Kenn

On Fri, Jun 24, 2022 at 10:32 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> Thanks, Danny!
>
> On 24 Jun 2022, at 19:23, Danny McCormick <da...@google.com>
> wrote:
>
> Sure, I put up a fix - https://github.com/apache/beam/pull/22048
>
> On Fri, Jun 24, 2022 at 1:20 PM Alexey Romanenko <ar...@gmail.com>
> wrote:
>
>>
>>
>> > 2. The links in this report start with api.github.* and don’t take us
>> directly to the issues.
>>
>> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix
>> it?
>>
>> This is already fixed - Pablo actually beat me to it!
>> <https://github.com/apache/beam/pull/22033>
>>
>>
>> It adds also a colon after URL and some mail clients consider it as a
>> part of URL which leads to a broken link.
>> Should we just remove a colon there or add a space between?
>>
>> —
>> Alexey
>>
>>
>> Thanks,
>> Danny
>>
>> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> +1 for that proposal!
>>>
>>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>>> have a longer time window for the rest of not triaged or stagnate issues
>>> and include them?
>>>
>>> I worry these lists would get _very_ long and wouldn't be actionable.
>>> But maybe it's worth reporting something like "There are 376 P2's with no
>>> update in the last 6 months" with a link to a query?
>>>
>>> > 2. The links in this report start with api.github.* and don’t take us
>>> directly to the issues.
>>>
>>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>>>
>>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pa...@google.com>
>>> wrote:
>>>
>>>> Thanks. I like the proposal, and I've found the emails useful.
>>>> Best
>>>> -P.
>>>>
>>>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sounds good! It’s like our internal reports of JIRA tickets exceeding
>>>>> SLA time and having no response from engineers.  We either resolve them or
>>>>> downgrade the priority to extend time window.
>>>>>
>>>>> Besides,
>>>>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>>>>> have a longer time window for the rest of not triaged or stagnate issues
>>>>> and include them?
>>>>> 2. The links in this report start with api.github.* and don’t take us
>>>>> directly to the issues.
>>>>>
>>>>>
>>>>> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>>>>>
>>>>>> That generally sounds right to me - I also would vote that we
>>>>>> consolidate to 1 email and stop distinguishing between flaky P1s and normal
>>>>>> P1s.
>>>>>>
>>>>>> So the single daily report would be:
>>>>>>
>>>>>> - Unassigned P0s
>>>>>> - P0s with no update in the last 36 hours
>>>>>> - Unassigned P1s
>>>>>> - P1s with no update in the last 7 days
>>>>>>
>>>>>> I think that will generate a pretty good list of issues that require
>>>>>> some kind of action.
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are
>>>>>>> more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think that Danny's alternate proposal (a daily email that show
>>>>>>>> only issues last updated >7 days ago, and those with no assignee) fits well
>>>>>>>> with the two goals you describe, if we include "triage needed" issues in
>>>>>>>> the latter category. Maybe we also explicitly separate these two concerns
>>>>>>>> in the report?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Forking thread because lots of people may just ignore this topic,
>>>>>>>>> per the discussion :-)
>>>>>>>>>
>>>>>>>>> (sometimes gmail doesn't fork thread properly, but here's
>>>>>>>>> hoping...)
>>>>>>>>>
>>>>>>>>> I'll add some other outcomes of these emails:
>>>>>>>>>
>>>>>>>>>  - people file P0s that are not outages and P1s that are not data
>>>>>>>>> loss and I downgrade them
>>>>>>>>>  - I randomly open up a few flaky test bugs and see if I can fix
>>>>>>>>> them really quick
>>>>>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>>>>>
>>>>>>>>> Of these, only the last one seems important (not just that *I*
>>>>>>>>> follow them, but that new P0s and P1s get immediate attention from many
>>>>>>>>> eyes)
>>>>>>>>>
>>>>>>>>> So maybe one take on the goal is to:
>>>>>>>>>
>>>>>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>>>>>> evaluated for release blocking, etc.
>>>>>>>>>  - make sure P0s and P1s get attention appropriate to their
>>>>>>>>> priority
>>>>>>>>>
>>>>>>>>> It can also be helpful to just state the failure modes which would
>>>>>>>>> happen by default if we don't have a good process or automation:
>>>>>>>>>
>>>>>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>>>>>> blocking users and/or community in real time
>>>>>>>>>  - Real P1 gets filed and not noticed, so release goes out with
>>>>>>>>> known data loss bug or other total loss of functionality
>>>>>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and
>>>>>>>>> making it hard to find the real problems
>>>>>>>>>  - Flakes are never fixed
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> If we have P0s and P1s in the "awaiting triage" state, those are
>>>>>>>>> the ones we need to notice. Then for a P0 or P1 outside of that state, we
>>>>>>>>> just need some way of making sure it doesn't stagnate. Or if it does
>>>>>>>>> stagnate, that empirically demonstrates it isn't really P1 (just like our
>>>>>>>>> P2 to P3 downgrade automation). If everything is P1, nothing is, as they
>>>>>>>>> say.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> > Maybe it would be helpful to sort these by last update time
>>>>>>>>>> (and potentially include that information in the email). Then we can at
>>>>>>>>>> least prioritize them instead of looking at a big wall of issues.
>>>>>>>>>>
>>>>>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>>>>>
>>>>>>>>>> > I think the motivation for daily emails is that per the
>>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>>> that hasn't been happening though...
>>>>>>>>>>
>>>>>>>>>> I guess the questions here are:
>>>>>>>>>>
>>>>>>>>>> 1) What is the goal of this email?
>>>>>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>>>>>
>>>>>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>>>>>> there almost always will be *some *valid P1s that don't need
>>>>>>>>>> extra attention.
>>>>>>>>>>
>>>>>>>>>> I do still think this could have value as a weekly email, with
>>>>>>>>>> the goal being "it's probably a good idea for someone to take a look at
>>>>>>>>>> each of these". Another option would be to only include issues with no
>>>>>>>>>> action in the last 7 days and/or no assignees and keep it daily.
>>>>>>>>>>
>>>>>>>>>> A couple side notes:
>>>>>>>>>> - No matter what we do, if we keep the current automation in any
>>>>>>>>>> form we should fix the url from
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>>>>>> https://github.com/apache/beam/issues/# - the current links are
>>>>>>>>>> very annoying.
>>>>>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>>>>>> lower).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Danny
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <
>>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think the motivation for daily emails is that per the
>>>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>>>> that hasn't been happening though...
>>>>>>>>>>>
>>>>>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I think a weekly summary seems like a good idea for the P1
>>>>>>>>>>>> issues and flaky tests, though daily still seems appropriate for P0 issues.
>>>>>>>>>>>> I put up https://github.com/apache/beam/pull/22017 to just
>>>>>>>>>>>> send the P1/flaky test reports on Wednesdays, if anyone objects please let
>>>>>>>>>>>> me know - I'll wait on merging til tomorrow to leave time for feedback (and
>>>>>>>>>>>> it's always reversible 🙂).
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Danny
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <
>>>>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> what is this daily summary intended for? Not all issues look
>>>>>>>>>>>>> like P1. And will a weekly summary be less noise?
>>>>>>>>>>>>>
>>>>>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>>>>>> including flaky tests.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     See
>>>>>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946:
>>>>>>>>>>>>>> [Bug]: No way to read or write to file when running Beam in Flink
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935:
>>>>>>>>>>>>>> [Bug]: Reject illformed GBK Coders
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897:
>>>>>>>>>>>>>> [Feature Request]: Flink runner savepoint backward compatibility
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893:
>>>>>>>>>>>>>> [Bug]: BigQuery Storage Write API implementation does not support table
>>>>>>>>>>>>>> partitioning
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794:
>>>>>>>>>>>>>> Dataflow runner creates a new timer whenever the output timestamp is change
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s
>>>>>>>>>>>>>> in BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711:
>>>>>>>>>>>>>> Python Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694:
>>>>>>>>>>>>>> BigQuery Storage API insert with writeResult retry and write to error table
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479:
>>>>>>>>>>>>>> Install Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466:
>>>>>>>>>>>>>> Simplify version override for Dev versions of the Go SDK.
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>>>>>> shuffle
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269:
>>>>>>>>>>>>>> Delete orphaned files
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>>>>>> returned from googleapi
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263:
>>>>>>>>>>>>>> (Broken Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262:
>>>>>>>>>>>>>> Python AfterAny, AfterAll do not follow spec
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260:
>>>>>>>>>>>>>> Python DirectRunner does not emit data at GC time
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259:
>>>>>>>>>>>>>> Consumer group with random prefix
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258:
>>>>>>>>>>>>>> Dataflow error in CombinePerKey operation
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257:
>>>>>>>>>>>>>> Either Create or DirectRunner fails to produce all elements to the
>>>>>>>>>>>>>> following transform
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123:
>>>>>>>>>>>>>> Multiple jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119:
>>>>>>>>>>>>>> Migrate to the next version of Python `requests` when released
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java
>>>>>>>>>>>>>> IO IT Tests" - missing data in grafana
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115:
>>>>>>>>>>>>>> JdbcIO date conversion is sensitive to OS
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112:
>>>>>>>>>>>>>> Dataflow SocketException (SSLException) error while trying to send message
>>>>>>>>>>>>>> from Cloud Pub/Sub to BigQuery
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>>>>>> in the CLASSPATH
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>>>>>> BoundedSource
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979:
>>>>>>>>>>>>>> Portable runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>>>>>> some Avro logical types
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973:
>>>>>>>>>>>>>> Python Beam SDK Harness hangs when installing pip packages
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO
>>>>>>>>>>>>>> is not acknowledging messages correctly
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>>>>>> least on portable flink runner)
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532:
>>>>>>>>>>>>>> Support for coder argument in WriteToBigQuery
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>>>>>> transforms
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529:
>>>>>>>>>>>>>> SpannerIO tests don't actually assert anything.
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528:
>>>>>>>>>>>>>> python CombineGlobally().with_fanout() cause duplicate combine results for
>>>>>>>>>>>>>> sliding windows
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>>>>>> allocated"
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332:
>>>>>>>>>>>>>> FileIO writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330:
>>>>>>>>>>>>>> Remove insecure ssl options from MongoDBIO
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108:
>>>>>>>>>>>>>> Python direct runner doesn't emit empty pane when it should
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>>>>>> (Direct Runner)
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817:
>>>>>>>>>>>>>> datetime and decimal should be logical types
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813:
>>>>>>>>>>>>>> PubsubIO returns empty message bodies for all messages read
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369:
>>>>>>>>>>>>>> KafkaIO doesn't commit offsets while being used as bounded source
>>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950:
>>>>>>>>>>>>>> [Bug]: Java Precommit permared
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>
>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Alexey Romanenko <ar...@gmail.com>.

Thanks, Danny!

> On 24 Jun 2022, at 19:23, Danny McCormick <da...@google.com> wrote:
> 
> Sure, I put up a fix - https://github.com/apache/beam/pull/22048 <https://github.com/apache/beam/pull/22048>
> On Fri, Jun 24, 2022 at 1:20 PM Alexey Romanenko <aromanenko.dev@gmail.com <ma...@gmail.com>> wrote:
> 
> 
>> > 2. The links in this report start with api.github.* and don’t take us directly to the issues.
>> 
>> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> 
>> This is already fixed - Pablo actually beat me to it! <https://github.com/apache/beam/pull/22033>
> It adds also a colon after URL and some mail clients consider it as a part of URL which leads to a broken link.
> Should we just remove a colon there or add a space between?
> 
> —
> Alexey
> 
>> 
>> Thanks,
>> Danny
>> 
>> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
>> +1 for that proposal!
>> 
>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a longer time window for the rest of not triaged or stagnate issues and include them?
>> 
>> I worry these lists would get _very_ long and wouldn't be actionable. But maybe it's worth reporting something like "There are 376 P2's with no update in the last 6 months" with a link to a query?
>> 
>> > 2. The links in this report start with api.github.* and don’t take us directly to the issues.
>> 
>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>> 
>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pabloem@google.com <ma...@google.com>> wrote:
>> Thanks. I like the proposal, and I've found the emails useful.
>> Best
>> -P.
>> 
>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <owenzhang1990@gmail.com <ma...@gmail.com>> wrote:
>> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA time and having no response from engineers.  We either resolve them or downgrade the priority to extend time window.
>> 
>> Besides,
>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a longer time window for the rest of not triaged or stagnate issues and include them?
>> 2. The links in this report start with api.github.* and don’t take us directly to the issues.
>> 
>> 
>> Danny McCormick <dannymccormick@google.com <ma...@google.com>>于2022年6月24日 周五04:48写道：
>> That generally sounds right to me - I also would vote that we consolidate to 1 email and stop distinguishing between flaky P1s and normal P1s.
>> 
>> So the single daily report would be:
>> 
>> - Unassigned P0s
>> - P0s with no update in the last 36 hours
>> - Unassigned P1s
>> - P1s with no update in the last 7 days
>> 
>> I think that will generate a pretty good list of issues that require some kind of action.
>> 
>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>> 
>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
>> I think that Danny's alternate proposal (a daily email that show only issues last updated >7 days ago, and those with no assignee) fits well with the two goals you describe, if we include "triage needed" issues in the latter category. Maybe we also explicitly separate these two concerns in the report?
>> 
>> 
>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
>> Forking thread because lots of people may just ignore this topic, per the discussion :-)
>> 
>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>> 
>> I'll add some other outcomes of these emails:
>> 
>>  - people file P0s that are not outages and P1s that are not data loss and I downgrade them
>>  - I randomly open up a few flaky test bugs and see if I can fix them really quick
>>  - people file legit P0s and P1s and I subscribe and follow them
>> 
>> Of these, only the last one seems important (not just that *I* follow them, but that new P0s and P1s get immediate attention from many eyes)
>> 
>> So maybe one take on the goal is to:
>> 
>>  - have new P0s and P1s evaluated quickly: P0s are an outage or outage-like occurrence that needs immediate remedy, and P1s need to be evaluated for release blocking, etc.
>>  - make sure P0s and P1s get attention appropriate to their priority
>> 
>> It can also be helpful to just state the failure modes which would happen by default if we don't have a good process or automation:
>> 
>>  - Real P0 gets filed and not noticed or fixed in a timely manner, blocking users and/or community in real time
>>  - Real P1 gets filed and not noticed, so release goes out with known data loss bug or other total loss of functionality
>>  - Non-real P0s and P1s accumulate, throwing off our data and making it hard to find the real problems
>>  - Flakes are never fixed
>> 
>> WDYT?
>> 
>> If we have P0s and P1s in the "awaiting triage" state, those are the ones we need to notice. Then for a P0 or P1 outside of that state, we just need some way of making sure it doesn't stagnate. Or if it does stagnate, that empirically demonstrates it isn't really P1 (just like our P2 to P3 downgrade automation). If everything is P1, nothing is, as they say.
>> 
>> Kenn
>> 
>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <dannymccormick@google.com <ma...@google.com>> wrote:
>> > Maybe it would be helpful to sort these by last update time (and potentially include that information in the email). Then we can at least prioritize them instead of looking at a big wall of issues.
>> 
>> I agree that this is a good idea (and pretty trivial to do). I'll update the automation to do that once we get consensus on an approach.
>> 
>> > I think the motivation for daily emails is that per the priorities guide [1] P1 issues should be getting "continuous status updates". If these issues aren't actually that important, I think the noise is good as it should motivate us to prioritize them correctly. In practice that hasn't been happening though...
>> 
>> I guess the questions here are:
>> 
>> 1) What is the goal of this email?
>> 2) Is it effective at accomplishing that goal.
>> 
>> I think you're saying that the goal (or a goal) is to highlight issues that aren't getting the attention they need; if that's our goal, then I don't think this is a particularly effective mechanism for it because (a) its very unclear which issues fall into that category and (b) there are too many to manually go through on a daily basis. From the email alone, it's not clear to me that any of the issues above "shouldn't" be P1s (though I'd guess you're right that some/many of them don't belong since most were created before the Jira -> GH migration based on the titles). I'd also argue that a daily email just desensitizes us to them since there almost always will be some valid P1s that don't need extra attention.
>> 
>> I do still think this could have value as a weekly email, with the goal being "it's probably a good idea for someone to take a look at each of these". Another option would be to only include issues with no action in the last 7 days and/or no assignees and keep it daily.
>> 
>> A couple side notes:
>> - No matter what we do, if we keep the current automation in any form we should fix the url from https://api.github.com/repos/apache/beam/issues/# <https://api.github.com/repos/apache/beam/issues/#> to https://github.com/apache/beam/issues/# <https://github.com/apache/beam/issues/#> - the current links are very annoying.
>> - After I send this, I will do a pass of the current P1s since it does indeed seem like too many are P1s and many should actually be P2s (or lower).
>> 
>> Thanks,
>> Danny
>> 
>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
>> I think the motivation for daily emails is that per the priorities guide [1] P1 issues should be getting "continuous status updates". If these issues aren't actually that important, I think the noise is good as it should motivate us to prioritize them correctly. In practice that hasn't been happening though...
>> 
>> Maybe it would be helpful to sort these by last update time (and potentially include that information in the email). Then we can at least prioritize them instead of looking at a big wall of issues.
>> 
>> Brian
>> 
>> [1] https://beam.apache.org/contribute/issue-priorities/ <https://beam.apache.org/contribute/issue-priorities/>
>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <dannymccormick@google.com <ma...@google.com>> wrote:
>> I think a weekly summary seems like a good idea for the P1 issues and flaky tests, though daily still seems appropriate for P0 issues. I put up https://github.com/apache/beam/pull/22017 <https://github.com/apache/beam/pull/22017> to just send the P1/flaky test reports on Wednesdays, if anyone objects please let me know - I'll wait on merging til tomorrow to leave time for feedback (and it's always reversible 🙂).
>> 
>> Thanks,
>> Danny
>> 
>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <owenzhang1990@gmail.com <ma...@gmail.com>> wrote:
>> Hi all,
>> 
>> what is this daily summary intended for? Not all issues look like P1. And will a weekly summary be less noise?
>> 
>> <beamactions@gmail.com <ma...@gmail.com>>于2022年6月22日 周三23:45写道：
>> This is your daily summary of Beam's current P1 issues, not including flaky tests.
>> 
>>     See https://beam.apache.org/contribute/issue-priorities/#p1-critical <https://beam.apache.org/contribute/issue-priorities/#p1-critical> for the meaning and expectations around P1 issues.
>> 
>> 
>> 
>> https://api.github.com/repos/apache/beam/issues/21978 <https://api.github.com/repos/apache/beam/issues/21978>: [Playground] Implement Share Any Code feature on the frontend
>> https://api.github.com/repos/apache/beam/issues/21946 <https://api.github.com/repos/apache/beam/issues/21946>: [Bug]: No way to read or write to file when running Beam in Flink
>> https://api.github.com/repos/apache/beam/issues/21935 <https://api.github.com/repos/apache/beam/issues/21935>: [Bug]: Reject illformed GBK Coders
>> https://api.github.com/repos/apache/beam/issues/21897 <https://api.github.com/repos/apache/beam/issues/21897>: [Feature Request]: Flink runner savepoint backward compatibility 
>> https://api.github.com/repos/apache/beam/issues/21893 <https://api.github.com/repos/apache/beam/issues/21893>: [Bug]: BigQuery Storage Write API implementation does not support table partitioning
>> https://api.github.com/repos/apache/beam/issues/21794 <https://api.github.com/repos/apache/beam/issues/21794>: Dataflow runner creates a new timer whenever the output timestamp is change
>> https://api.github.com/repos/apache/beam/issues/21763 <https://api.github.com/repos/apache/beam/issues/21763>: [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>> https://api.github.com/repos/apache/beam/issues/21715 <https://api.github.com/repos/apache/beam/issues/21715>: Data missing when using CassandraIO.Read
>> https://api.github.com/repos/apache/beam/issues/21713 <https://api.github.com/repos/apache/beam/issues/21713>: 404s in BigQueryIO don't get output to Failed Inserts PCollection
>> https://api.github.com/repos/apache/beam/issues/21711 <https://api.github.com/repos/apache/beam/issues/21711>: Python Streaming job failing to drain with BigQueryIO write errors
>> https://api.github.com/repos/apache/beam/issues/21703 <https://api.github.com/repos/apache/beam/issues/21703>: pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>> https://api.github.com/repos/apache/beam/issues/21702 <https://api.github.com/repos/apache/beam/issues/21702>: SpannerWriteIT failing in beam PostCommit Java V1
>> https://api.github.com/repos/apache/beam/issues/21700 <https://api.github.com/repos/apache/beam/issues/21700>: --dataflowServiceOptions=use_runner_v2 is broken
>> https://api.github.com/repos/apache/beam/issues/21695 <https://api.github.com/repos/apache/beam/issues/21695>: DataflowPipelineResult does not raise exception for unsuccessful states.
>> https://api.github.com/repos/apache/beam/issues/21694 <https://api.github.com/repos/apache/beam/issues/21694>: BigQuery Storage API insert with writeResult retry and write to error table
>> https://api.github.com/repos/apache/beam/issues/21479 <https://api.github.com/repos/apache/beam/issues/21479>: Install Python wheel and dependencies to local venv in SDK harness
>> https://api.github.com/repos/apache/beam/issues/21478 <https://api.github.com/repos/apache/beam/issues/21478>: KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>> https://api.github.com/repos/apache/beam/issues/21477 <https://api.github.com/repos/apache/beam/issues/21477>: Add integration testing for BQ Storage API  write modes
>> https://api.github.com/repos/apache/beam/issues/21476 <https://api.github.com/repos/apache/beam/issues/21476>: WriteToBigQuery Dynamic table destinations returns wrong tableId
>> https://api.github.com/repos/apache/beam/issues/21475 <https://api.github.com/repos/apache/beam/issues/21475>: Beam x-lang Dataflow tests failing due to _InactiveRpcError
>> https://api.github.com/repos/apache/beam/issues/21473 <https://api.github.com/repos/apache/beam/issues/21473>: PVR_Spark2_Streaming perma-red
>> https://api.github.com/repos/apache/beam/issues/21466 <https://api.github.com/repos/apache/beam/issues/21466>: Simplify version override for Dev versions of the Go SDK.
>> https://api.github.com/repos/apache/beam/issues/21465 <https://api.github.com/repos/apache/beam/issues/21465>: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle
>> https://api.github.com/repos/apache/beam/issues/21269 <https://api.github.com/repos/apache/beam/issues/21269>: Delete orphaned files
>> https://api.github.com/repos/apache/beam/issues/21268 <https://api.github.com/repos/apache/beam/issues/21268>: Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory
>> https://api.github.com/repos/apache/beam/issues/21267 <https://api.github.com/repos/apache/beam/issues/21267>: WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi
>> https://api.github.com/repos/apache/beam/issues/21265 <https://api.github.com/repos/apache/beam/issues/21265>: apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>> https://api.github.com/repos/apache/beam/issues/21263 <https://api.github.com/repos/apache/beam/issues/21263>: (Broken Pipe induced) Bricked Dataflow Pipeline 
>> https://api.github.com/repos/apache/beam/issues/21262 <https://api.github.com/repos/apache/beam/issues/21262>: Python AfterAny, AfterAll do not follow spec
>> https://api.github.com/repos/apache/beam/issues/21260 <https://api.github.com/repos/apache/beam/issues/21260>: Python DirectRunner does not emit data at GC time
>> https://api.github.com/repos/apache/beam/issues/21259 <https://api.github.com/repos/apache/beam/issues/21259>: Consumer group with random prefix
>> https://api.github.com/repos/apache/beam/issues/21258 <https://api.github.com/repos/apache/beam/issues/21258>: Dataflow error in CombinePerKey operation
>> https://api.github.com/repos/apache/beam/issues/21257 <https://api.github.com/repos/apache/beam/issues/21257>: Either Create or DirectRunner fails to produce all elements to the following transform
>> https://api.github.com/repos/apache/beam/issues/21123 <https://api.github.com/repos/apache/beam/issues/21123>: Multiple jobs running on Flink session cluster reuse the persistent Python environment.
>> https://api.github.com/repos/apache/beam/issues/21119 <https://api.github.com/repos/apache/beam/issues/21119>: Migrate to the next version of Python `requests` when released
>> https://api.github.com/repos/apache/beam/issues/21117 <https://api.github.com/repos/apache/beam/issues/21117>: "Java IO IT Tests" - missing data in grafana
>> https://api.github.com/repos/apache/beam/issues/21115 <https://api.github.com/repos/apache/beam/issues/21115>: JdbcIO date conversion is sensitive to OS
>> https://api.github.com/repos/apache/beam/issues/21112 <https://api.github.com/repos/apache/beam/issues/21112>: Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery
>> https://api.github.com/repos/apache/beam/issues/21111 <https://api.github.com/repos/apache/beam/issues/21111>: Java creates an incorrect pipeline proto when core-construction-java jar is not in the CLASSPATH
>> https://api.github.com/repos/apache/beam/issues/21110 <https://api.github.com/repos/apache/beam/issues/21110>: codecov/patch has poor behavior
>> https://api.github.com/repos/apache/beam/issues/21109 <https://api.github.com/repos/apache/beam/issues/21109>: SDF BoundedSource seems to execute significantly slower than 'normal' BoundedSource
>> https://api.github.com/repos/apache/beam/issues/21108 <https://api.github.com/repos/apache/beam/issues/21108>: java.io.InvalidClassException With Flink Kafka
>> https://api.github.com/repos/apache/beam/issues/20979 <https://api.github.com/repos/apache/beam/issues/20979>: Portable runners should be able to issue checkpoints to Splittable DoFn
>> https://api.github.com/repos/apache/beam/issues/20978 <https://api.github.com/repos/apache/beam/issues/20978>: PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some Avro logical types
>> https://api.github.com/repos/apache/beam/issues/20973 <https://api.github.com/repos/apache/beam/issues/20973>: Python Beam SDK Harness hangs when installing pip packages
>> https://api.github.com/repos/apache/beam/issues/20818 <https://api.github.com/repos/apache/beam/issues/20818>: XmlIO.Read does not handle XML encoding per spec
>> https://api.github.com/repos/apache/beam/issues/20814 <https://api.github.com/repos/apache/beam/issues/20814>: JmsIO is not acknowledging messages correctly
>> https://api.github.com/repos/apache/beam/issues/20813 <https://api.github.com/repos/apache/beam/issues/20813>: No trigger early repeatedly for session windows
>> https://api.github.com/repos/apache/beam/issues/20812 <https://api.github.com/repos/apache/beam/issues/20812>: Cross-language consistency (RequiresStableInputs) is quietly broken (at least on portable flink runner)
>> https://api.github.com/repos/apache/beam/issues/20692 <https://api.github.com/repos/apache/beam/issues/20692>: Timer with dataflow runner can be set multiple times (dataflow runner)
>> https://api.github.com/repos/apache/beam/issues/20691 <https://api.github.com/repos/apache/beam/issues/20691>: Beam metrics should be displayed in Flink UI "Metrics" tab
>> https://api.github.com/repos/apache/beam/issues/20689 <https://api.github.com/repos/apache/beam/issues/20689>: Kafka commitOffsetsInFinalize OOM on Flink
>> https://api.github.com/repos/apache/beam/issues/20532 <https://api.github.com/repos/apache/beam/issues/20532>: Support for coder argument in WriteToBigQuery
>> https://api.github.com/repos/apache/beam/issues/20531 <https://api.github.com/repos/apache/beam/issues/20531>: FileBasedSink: allow setting temp directory provider per dynamic destination
>> https://api.github.com/repos/apache/beam/issues/20530 <https://api.github.com/repos/apache/beam/issues/20530>: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms
>> https://api.github.com/repos/apache/beam/issues/20529 <https://api.github.com/repos/apache/beam/issues/20529>: SpannerIO tests don't actually assert anything.
>> https://api.github.com/repos/apache/beam/issues/20528 <https://api.github.com/repos/apache/beam/issues/20528>: python CombineGlobally().with_fanout() cause duplicate combine results for sliding windows
>> https://api.github.com/repos/apache/beam/issues/20333 <https://api.github.com/repos/apache/beam/issues/20333>: beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"
>> https://api.github.com/repos/apache/beam/issues/20332 <https://api.github.com/repos/apache/beam/issues/20332>: FileIO writeDynamic with AvroIO.sink not writing all data
>> https://api.github.com/repos/apache/beam/issues/20330 <https://api.github.com/repos/apache/beam/issues/20330>: Remove insecure ssl options from MongoDBIO
>> https://api.github.com/repos/apache/beam/issues/20109 <https://api.github.com/repos/apache/beam/issues/20109>: SortValues should fail if SecondaryKey coder is not deterministic
>> https://api.github.com/repos/apache/beam/issues/20108 <https://api.github.com/repos/apache/beam/issues/20108>: Python direct runner doesn't emit empty pane when it should
>> https://api.github.com/repos/apache/beam/issues/20009 <https://api.github.com/repos/apache/beam/issues/20009>: Environment-sensitive provisioning for Dataflow
>> https://api.github.com/repos/apache/beam/issues/19971 <https://api.github.com/repos/apache/beam/issues/19971>: [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner)
>> https://api.github.com/repos/apache/beam/issues/19817 <https://api.github.com/repos/apache/beam/issues/19817>: datetime and decimal should be logical types
>> https://api.github.com/repos/apache/beam/issues/19815 <https://api.github.com/repos/apache/beam/issues/19815>: Add support for remaining data types in python RowCoder 
>> https://api.github.com/repos/apache/beam/issues/19813 <https://api.github.com/repos/apache/beam/issues/19813>: PubsubIO returns empty message bodies for all messages read
>> https://api.github.com/repos/apache/beam/issues/19556 <https://api.github.com/repos/apache/beam/issues/19556>: User reports protobuf ClassChangeError running against 2.6.0 or above
>> https://api.github.com/repos/apache/beam/issues/19369 <https://api.github.com/repos/apache/beam/issues/19369>: KafkaIO doesn't commit offsets while being used as bounded source
>> https://api.github.com/repos/apache/beam/issues/17950 <https://api.github.com/repos/apache/beam/issues/17950>: [Bug]: Java Precommit permared
>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Danny McCormick <da...@google.com>.

Sure, I put up a fix - https://github.com/apache/beam/pull/22048

On Fri, Jun 24, 2022 at 1:20 PM Alexey Romanenko <ar...@gmail.com>
wrote:

>
>
> > 2. The links in this report start with api.github.* and don’t take us
> directly to the issues.
>
> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>
> This is already fixed - Pablo actually beat me to it!
> <https://github.com/apache/beam/pull/22033>
>
>
> It adds also a colon after URL and some mail clients consider it as a part
> of URL which leads to a broken link.
> Should we just remove a colon there or add a space between?
>
> —
> Alexey
>
>
> Thanks,
> Danny
>
> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bh...@google.com> wrote:
>
>> +1 for that proposal!
>>
>> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>> have a longer time window for the rest of not triaged or stagnate issues
>> and include them?
>>
>> I worry these lists would get _very_ long and wouldn't be actionable. But
>> maybe it's worth reporting something like "There are 376 P2's with no
>> update in the last 6 months" with a link to a query?
>>
>> > 2. The links in this report start with api.github.* and don’t take us
>> directly to the issues.
>>
>> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>>
>> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pa...@google.com> wrote:
>>
>>> Thanks. I like the proposal, and I've found the emails useful.
>>> Best
>>> -P.
>>>
>>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com>
>>> wrote:
>>>
>>>> Sounds good! It’s like our internal reports of JIRA tickets exceeding
>>>> SLA time and having no response from engineers.  We either resolve them or
>>>> downgrade the priority to extend time window.
>>>>
>>>> Besides,
>>>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>>>> have a longer time window for the rest of not triaged or stagnate issues
>>>> and include them?
>>>> 2. The links in this report start with api.github.* and don’t take us
>>>> directly to the issues.
>>>>
>>>>
>>>> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>>>>
>>>>> That generally sounds right to me - I also would vote that we
>>>>> consolidate to 1 email and stop distinguishing between flaky P1s and normal
>>>>> P1s.
>>>>>
>>>>> So the single daily report would be:
>>>>>
>>>>> - Unassigned P0s
>>>>> - P0s with no update in the last 36 hours
>>>>> - Unassigned P1s
>>>>> - P1s with no update in the last 7 days
>>>>>
>>>>> I think that will generate a pretty good list of issues that require
>>>>> some kind of action.
>>>>>
>>>>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are
>>>>>> more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think that Danny's alternate proposal (a daily email that show
>>>>>>> only issues last updated >7 days ago, and those with no assignee) fits well
>>>>>>> with the two goals you describe, if we include "triage needed" issues in
>>>>>>> the latter category. Maybe we also explicitly separate these two concerns
>>>>>>> in the report?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Forking thread because lots of people may just ignore this topic,
>>>>>>>> per the discussion :-)
>>>>>>>>
>>>>>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>>>>>
>>>>>>>> I'll add some other outcomes of these emails:
>>>>>>>>
>>>>>>>>  - people file P0s that are not outages and P1s that are not data
>>>>>>>> loss and I downgrade them
>>>>>>>>  - I randomly open up a few flaky test bugs and see if I can fix
>>>>>>>> them really quick
>>>>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>>>>
>>>>>>>> Of these, only the last one seems important (not just that *I*
>>>>>>>> follow them, but that new P0s and P1s get immediate attention from many
>>>>>>>> eyes)
>>>>>>>>
>>>>>>>> So maybe one take on the goal is to:
>>>>>>>>
>>>>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>>>>> evaluated for release blocking, etc.
>>>>>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>>>>>
>>>>>>>> It can also be helpful to just state the failure modes which would
>>>>>>>> happen by default if we don't have a good process or automation:
>>>>>>>>
>>>>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>>>>> blocking users and/or community in real time
>>>>>>>>  - Real P1 gets filed and not noticed, so release goes out with
>>>>>>>> known data loss bug or other total loss of functionality
>>>>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and
>>>>>>>> making it hard to find the real problems
>>>>>>>>  - Flakes are never fixed
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> If we have P0s and P1s in the "awaiting triage" state, those are
>>>>>>>> the ones we need to notice. Then for a P0 or P1 outside of that state, we
>>>>>>>> just need some way of making sure it doesn't stagnate. Or if it does
>>>>>>>> stagnate, that empirically demonstrates it isn't really P1 (just like our
>>>>>>>> P2 to P3 downgrade automation). If everything is P1, nothing is, as they
>>>>>>>> say.
>>>>>>>>
>>>>>>>> Kenn
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>
>>>>>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>
>>>>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>>>>
>>>>>>>>> > I think the motivation for daily emails is that per the
>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>> that hasn't been happening though...
>>>>>>>>>
>>>>>>>>> I guess the questions here are:
>>>>>>>>>
>>>>>>>>> 1) What is the goal of this email?
>>>>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>>>>
>>>>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>>>>> there almost always will be *some *valid P1s that don't need
>>>>>>>>> extra attention.
>>>>>>>>>
>>>>>>>>> I do still think this could have value as a weekly email, with the
>>>>>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>>>>>> these". Another option would be to only include issues with no action in
>>>>>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>>>>>
>>>>>>>>> A couple side notes:
>>>>>>>>> - No matter what we do, if we keep the current automation in any
>>>>>>>>> form we should fix the url from
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>>>>> https://github.com/apache/beam/issues/# - the current links are
>>>>>>>>> very annoying.
>>>>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>>>>> lower).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Danny
>>>>>>>>>
>>>>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <
>>>>>>>>> bhulette@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I think the motivation for daily emails is that per the
>>>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>>>> that hasn't been happening though...
>>>>>>>>>>
>>>>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think a weekly summary seems like a good idea for the P1
>>>>>>>>>>> issues and flaky tests, though daily still seems appropriate for P0 issues.
>>>>>>>>>>> I put up https://github.com/apache/beam/pull/22017 to just send
>>>>>>>>>>> the P1/flaky test reports on Wednesdays, if anyone objects please let me
>>>>>>>>>>> know - I'll wait on merging til tomorrow to leave time for feedback (and
>>>>>>>>>>> it's always reversible 🙂).
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Danny
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <
>>>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> what is this daily summary intended for? Not all issues look
>>>>>>>>>>>> like P1. And will a weekly summary be less noise?
>>>>>>>>>>>>
>>>>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>>>>
>>>>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>>>>> including flaky tests.
>>>>>>>>>>>>>
>>>>>>>>>>>>>     See
>>>>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]:
>>>>>>>>>>>>> No way to read or write to file when running Beam in Flink
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897:
>>>>>>>>>>>>> [Feature Request]: Flink runner savepoint backward compatibility
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>>>>>> partitioning
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794:
>>>>>>>>>>>>> Dataflow runner creates a new timer whenever the output timestamp is change
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s
>>>>>>>>>>>>> in BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694:
>>>>>>>>>>>>> BigQuery Storage API insert with writeResult retry and write to error table
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479:
>>>>>>>>>>>>> Install Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466:
>>>>>>>>>>>>> Simplify version override for Dev versions of the Go SDK.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>>>>> shuffle
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>>>>>> orphaned files
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>>>>> returned from googleapi
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263:
>>>>>>>>>>>>> (Broken Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259:
>>>>>>>>>>>>> Consumer group with random prefix
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258:
>>>>>>>>>>>>> Dataflow error in CombinePerKey operation
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>>>>>> transform
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123:
>>>>>>>>>>>>> Multiple jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>>>>> environment.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119:
>>>>>>>>>>>>> Migrate to the next version of Python `requests` when released
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java
>>>>>>>>>>>>> IO IT Tests" - missing data in grafana
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112:
>>>>>>>>>>>>> Dataflow SocketException (SSLException) error while trying to send message
>>>>>>>>>>>>> from Cloud Pub/Sub to BigQuery
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>>>>> in the CLASSPATH
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>>>>> BoundedSource
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979:
>>>>>>>>>>>>> Portable runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>>>>> some Avro logical types
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO
>>>>>>>>>>>>> is not acknowledging messages correctly
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>>>>> least on portable flink runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532:
>>>>>>>>>>>>> Support for coder argument in WriteToBigQuery
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>>>>> transforms
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529:
>>>>>>>>>>>>> SpannerIO tests don't actually assert anything.
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>>>>>> windows
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>>>>> allocated"
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>>>>> (Direct Runner)
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817:
>>>>>>>>>>>>> datetime and decimal should be logical types
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813:
>>>>>>>>>>>>> PubsubIO returns empty message bodies for all messages read
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369:
>>>>>>>>>>>>> KafkaIO doesn't commit offsets while being used as bounded source
>>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>>>>>> Java Precommit permared
>>>>>>>>>>>>>
>>>>>>>>>>>>
>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Alexey Romanenko <ar...@gmail.com>.


> > 2. The links in this report start with api.github.* and don’t take us directly to the issues.
> 
> > Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
> 
> This is already fixed - Pablo actually beat me to it! <https://github.com/apache/beam/pull/22033>
It adds also a colon after URL and some mail clients consider it as a part of URL which leads to a broken link.
Should we just remove a colon there or add a space between?

—
Alexey

> 
> Thanks,
> Danny
> 
> On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
> +1 for that proposal!
> 
> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a longer time window for the rest of not triaged or stagnate issues and include them?
> 
> I worry these lists would get _very_ long and wouldn't be actionable. But maybe it's worth reporting something like "There are 376 P2's with no update in the last 6 months" with a link to a query?
> 
> > 2. The links in this report start with api.github.* and don’t take us directly to the issues.
> 
> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
> 
> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pabloem@google.com <ma...@google.com>> wrote:
> Thanks. I like the proposal, and I've found the emails useful.
> Best
> -P.
> 
> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <owenzhang1990@gmail.com <ma...@gmail.com>> wrote:
> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA time and having no response from engineers.  We either resolve them or downgrade the priority to extend time window.
> 
> Besides,
> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have a longer time window for the rest of not triaged or stagnate issues and include them?
> 2. The links in this report start with api.github.* and don’t take us directly to the issues.
> 
> 
> Danny McCormick <dannymccormick@google.com <ma...@google.com>>于2022年6月24日 周五04:48写道：
> That generally sounds right to me - I also would vote that we consolidate to 1 email and stop distinguishing between flaky P1s and normal P1s.
> 
> So the single daily report would be:
> 
> - Unassigned P0s
> - P0s with no update in the last 36 hours
> - Unassigned P1s
> - P1s with no update in the last 7 days
> 
> I think that will generate a pretty good list of issues that require some kind of action.
> 
> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
> 
> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
> I think that Danny's alternate proposal (a daily email that show only issues last updated >7 days ago, and those with no assignee) fits well with the two goals you describe, if we include "triage needed" issues in the latter category. Maybe we also explicitly separate these two concerns in the report?
> 
> 
> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> Forking thread because lots of people may just ignore this topic, per the discussion :-)
> 
> (sometimes gmail doesn't fork thread properly, but here's hoping...)
> 
> I'll add some other outcomes of these emails:
> 
>  - people file P0s that are not outages and P1s that are not data loss and I downgrade them
>  - I randomly open up a few flaky test bugs and see if I can fix them really quick
>  - people file legit P0s and P1s and I subscribe and follow them
> 
> Of these, only the last one seems important (not just that *I* follow them, but that new P0s and P1s get immediate attention from many eyes)
> 
> So maybe one take on the goal is to:
> 
>  - have new P0s and P1s evaluated quickly: P0s are an outage or outage-like occurrence that needs immediate remedy, and P1s need to be evaluated for release blocking, etc.
>  - make sure P0s and P1s get attention appropriate to their priority
> 
> It can also be helpful to just state the failure modes which would happen by default if we don't have a good process or automation:
> 
>  - Real P0 gets filed and not noticed or fixed in a timely manner, blocking users and/or community in real time
>  - Real P1 gets filed and not noticed, so release goes out with known data loss bug or other total loss of functionality
>  - Non-real P0s and P1s accumulate, throwing off our data and making it hard to find the real problems
>  - Flakes are never fixed
> 
> WDYT?
> 
> If we have P0s and P1s in the "awaiting triage" state, those are the ones we need to notice. Then for a P0 or P1 outside of that state, we just need some way of making sure it doesn't stagnate. Or if it does stagnate, that empirically demonstrates it isn't really P1 (just like our P2 to P3 downgrade automation). If everything is P1, nothing is, as they say.
> 
> Kenn
> 
> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <dannymccormick@google.com <ma...@google.com>> wrote:
> > Maybe it would be helpful to sort these by last update time (and potentially include that information in the email). Then we can at least prioritize them instead of looking at a big wall of issues.
> 
> I agree that this is a good idea (and pretty trivial to do). I'll update the automation to do that once we get consensus on an approach.
> 
> > I think the motivation for daily emails is that per the priorities guide [1] P1 issues should be getting "continuous status updates". If these issues aren't actually that important, I think the noise is good as it should motivate us to prioritize them correctly. In practice that hasn't been happening though...
> 
> I guess the questions here are:
> 
> 1) What is the goal of this email?
> 2) Is it effective at accomplishing that goal.
> 
> I think you're saying that the goal (or a goal) is to highlight issues that aren't getting the attention they need; if that's our goal, then I don't think this is a particularly effective mechanism for it because (a) its very unclear which issues fall into that category and (b) there are too many to manually go through on a daily basis. From the email alone, it's not clear to me that any of the issues above "shouldn't" be P1s (though I'd guess you're right that some/many of them don't belong since most were created before the Jira -> GH migration based on the titles). I'd also argue that a daily email just desensitizes us to them since there almost always will be some valid P1s that don't need extra attention.
> 
> I do still think this could have value as a weekly email, with the goal being "it's probably a good idea for someone to take a look at each of these". Another option would be to only include issues with no action in the last 7 days and/or no assignees and keep it daily.
> 
> A couple side notes:
> - No matter what we do, if we keep the current automation in any form we should fix the url from https://api.github.com/repos/apache/beam/issues/# <https://api.github.com/repos/apache/beam/issues/#> to https://github.com/apache/beam/issues/# <https://github.com/apache/beam/issues/#> - the current links are very annoying.
> - After I send this, I will do a pass of the current P1s since it does indeed seem like too many are P1s and many should actually be P2s (or lower).
> 
> Thanks,
> Danny
> 
> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bhulette@google.com <ma...@google.com>> wrote:
> I think the motivation for daily emails is that per the priorities guide [1] P1 issues should be getting "continuous status updates". If these issues aren't actually that important, I think the noise is good as it should motivate us to prioritize them correctly. In practice that hasn't been happening though...
> 
> Maybe it would be helpful to sort these by last update time (and potentially include that information in the email). Then we can at least prioritize them instead of looking at a big wall of issues.
> 
> Brian
> 
> [1] https://beam.apache.org/contribute/issue-priorities/ <https://beam.apache.org/contribute/issue-priorities/>
> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <dannymccormick@google.com <ma...@google.com>> wrote:
> I think a weekly summary seems like a good idea for the P1 issues and flaky tests, though daily still seems appropriate for P0 issues. I put up https://github.com/apache/beam/pull/22017 <https://github.com/apache/beam/pull/22017> to just send the P1/flaky test reports on Wednesdays, if anyone objects please let me know - I'll wait on merging til tomorrow to leave time for feedback (and it's always reversible 🙂).
> 
> Thanks,
> Danny
> 
> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <owenzhang1990@gmail.com <ma...@gmail.com>> wrote:
> Hi all,
> 
> what is this daily summary intended for? Not all issues look like P1. And will a weekly summary be less noise?
> 
> <beamactions@gmail.com <ma...@gmail.com>>于2022年6月22日 周三23:45写道：
> This is your daily summary of Beam's current P1 issues, not including flaky tests.
> 
>     See https://beam.apache.org/contribute/issue-priorities/#p1-critical <https://beam.apache.org/contribute/issue-priorities/#p1-critical> for the meaning and expectations around P1 issues.
> 
> 
> 
> https://api.github.com/repos/apache/beam/issues/21978 <https://api.github.com/repos/apache/beam/issues/21978>: [Playground] Implement Share Any Code feature on the frontend
> https://api.github.com/repos/apache/beam/issues/21946 <https://api.github.com/repos/apache/beam/issues/21946>: [Bug]: No way to read or write to file when running Beam in Flink
> https://api.github.com/repos/apache/beam/issues/21935 <https://api.github.com/repos/apache/beam/issues/21935>: [Bug]: Reject illformed GBK Coders
> https://api.github.com/repos/apache/beam/issues/21897 <https://api.github.com/repos/apache/beam/issues/21897>: [Feature Request]: Flink runner savepoint backward compatibility 
> https://api.github.com/repos/apache/beam/issues/21893 <https://api.github.com/repos/apache/beam/issues/21893>: [Bug]: BigQuery Storage Write API implementation does not support table partitioning
> https://api.github.com/repos/apache/beam/issues/21794 <https://api.github.com/repos/apache/beam/issues/21794>: Dataflow runner creates a new timer whenever the output timestamp is change
> https://api.github.com/repos/apache/beam/issues/21763 <https://api.github.com/repos/apache/beam/issues/21763>: [Playground Task]: Migrate from Google Analytics to Matomo Cloud
> https://api.github.com/repos/apache/beam/issues/21715 <https://api.github.com/repos/apache/beam/issues/21715>: Data missing when using CassandraIO.Read
> https://api.github.com/repos/apache/beam/issues/21713 <https://api.github.com/repos/apache/beam/issues/21713>: 404s in BigQueryIO don't get output to Failed Inserts PCollection
> https://api.github.com/repos/apache/beam/issues/21711 <https://api.github.com/repos/apache/beam/issues/21711>: Python Streaming job failing to drain with BigQueryIO write errors
> https://api.github.com/repos/apache/beam/issues/21703 <https://api.github.com/repos/apache/beam/issues/21703>: pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
> https://api.github.com/repos/apache/beam/issues/21702 <https://api.github.com/repos/apache/beam/issues/21702>: SpannerWriteIT failing in beam PostCommit Java V1
> https://api.github.com/repos/apache/beam/issues/21700 <https://api.github.com/repos/apache/beam/issues/21700>: --dataflowServiceOptions=use_runner_v2 is broken
> https://api.github.com/repos/apache/beam/issues/21695 <https://api.github.com/repos/apache/beam/issues/21695>: DataflowPipelineResult does not raise exception for unsuccessful states.
> https://api.github.com/repos/apache/beam/issues/21694 <https://api.github.com/repos/apache/beam/issues/21694>: BigQuery Storage API insert with writeResult retry and write to error table
> https://api.github.com/repos/apache/beam/issues/21479 <https://api.github.com/repos/apache/beam/issues/21479>: Install Python wheel and dependencies to local venv in SDK harness
> https://api.github.com/repos/apache/beam/issues/21478 <https://api.github.com/repos/apache/beam/issues/21478>: KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
> https://api.github.com/repos/apache/beam/issues/21477 <https://api.github.com/repos/apache/beam/issues/21477>: Add integration testing for BQ Storage API  write modes
> https://api.github.com/repos/apache/beam/issues/21476 <https://api.github.com/repos/apache/beam/issues/21476>: WriteToBigQuery Dynamic table destinations returns wrong tableId
> https://api.github.com/repos/apache/beam/issues/21475 <https://api.github.com/repos/apache/beam/issues/21475>: Beam x-lang Dataflow tests failing due to _InactiveRpcError
> https://api.github.com/repos/apache/beam/issues/21473 <https://api.github.com/repos/apache/beam/issues/21473>: PVR_Spark2_Streaming perma-red
> https://api.github.com/repos/apache/beam/issues/21466 <https://api.github.com/repos/apache/beam/issues/21466>: Simplify version override for Dev versions of the Go SDK.
> https://api.github.com/repos/apache/beam/issues/21465 <https://api.github.com/repos/apache/beam/issues/21465>: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle
> https://api.github.com/repos/apache/beam/issues/21269 <https://api.github.com/repos/apache/beam/issues/21269>: Delete orphaned files
> https://api.github.com/repos/apache/beam/issues/21268 <https://api.github.com/repos/apache/beam/issues/21268>: Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory
> https://api.github.com/repos/apache/beam/issues/21267 <https://api.github.com/repos/apache/beam/issues/21267>: WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi
> https://api.github.com/repos/apache/beam/issues/21265 <https://api.github.com/repos/apache/beam/issues/21265>: apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
> https://api.github.com/repos/apache/beam/issues/21263 <https://api.github.com/repos/apache/beam/issues/21263>: (Broken Pipe induced) Bricked Dataflow Pipeline 
> https://api.github.com/repos/apache/beam/issues/21262 <https://api.github.com/repos/apache/beam/issues/21262>: Python AfterAny, AfterAll do not follow spec
> https://api.github.com/repos/apache/beam/issues/21260 <https://api.github.com/repos/apache/beam/issues/21260>: Python DirectRunner does not emit data at GC time
> https://api.github.com/repos/apache/beam/issues/21259 <https://api.github.com/repos/apache/beam/issues/21259>: Consumer group with random prefix
> https://api.github.com/repos/apache/beam/issues/21258 <https://api.github.com/repos/apache/beam/issues/21258>: Dataflow error in CombinePerKey operation
> https://api.github.com/repos/apache/beam/issues/21257 <https://api.github.com/repos/apache/beam/issues/21257>: Either Create or DirectRunner fails to produce all elements to the following transform
> https://api.github.com/repos/apache/beam/issues/21123 <https://api.github.com/repos/apache/beam/issues/21123>: Multiple jobs running on Flink session cluster reuse the persistent Python environment.
> https://api.github.com/repos/apache/beam/issues/21119 <https://api.github.com/repos/apache/beam/issues/21119>: Migrate to the next version of Python `requests` when released
> https://api.github.com/repos/apache/beam/issues/21117 <https://api.github.com/repos/apache/beam/issues/21117>: "Java IO IT Tests" - missing data in grafana
> https://api.github.com/repos/apache/beam/issues/21115 <https://api.github.com/repos/apache/beam/issues/21115>: JdbcIO date conversion is sensitive to OS
> https://api.github.com/repos/apache/beam/issues/21112 <https://api.github.com/repos/apache/beam/issues/21112>: Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery
> https://api.github.com/repos/apache/beam/issues/21111 <https://api.github.com/repos/apache/beam/issues/21111>: Java creates an incorrect pipeline proto when core-construction-java jar is not in the CLASSPATH
> https://api.github.com/repos/apache/beam/issues/21110 <https://api.github.com/repos/apache/beam/issues/21110>: codecov/patch has poor behavior
> https://api.github.com/repos/apache/beam/issues/21109 <https://api.github.com/repos/apache/beam/issues/21109>: SDF BoundedSource seems to execute significantly slower than 'normal' BoundedSource
> https://api.github.com/repos/apache/beam/issues/21108 <https://api.github.com/repos/apache/beam/issues/21108>: java.io.InvalidClassException With Flink Kafka
> https://api.github.com/repos/apache/beam/issues/20979 <https://api.github.com/repos/apache/beam/issues/20979>: Portable runners should be able to issue checkpoints to Splittable DoFn
> https://api.github.com/repos/apache/beam/issues/20978 <https://api.github.com/repos/apache/beam/issues/20978>: PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some Avro logical types
> https://api.github.com/repos/apache/beam/issues/20973 <https://api.github.com/repos/apache/beam/issues/20973>: Python Beam SDK Harness hangs when installing pip packages
> https://api.github.com/repos/apache/beam/issues/20818 <https://api.github.com/repos/apache/beam/issues/20818>: XmlIO.Read does not handle XML encoding per spec
> https://api.github.com/repos/apache/beam/issues/20814 <https://api.github.com/repos/apache/beam/issues/20814>: JmsIO is not acknowledging messages correctly
> https://api.github.com/repos/apache/beam/issues/20813 <https://api.github.com/repos/apache/beam/issues/20813>: No trigger early repeatedly for session windows
> https://api.github.com/repos/apache/beam/issues/20812 <https://api.github.com/repos/apache/beam/issues/20812>: Cross-language consistency (RequiresStableInputs) is quietly broken (at least on portable flink runner)
> https://api.github.com/repos/apache/beam/issues/20692 <https://api.github.com/repos/apache/beam/issues/20692>: Timer with dataflow runner can be set multiple times (dataflow runner)
> https://api.github.com/repos/apache/beam/issues/20691 <https://api.github.com/repos/apache/beam/issues/20691>: Beam metrics should be displayed in Flink UI "Metrics" tab
> https://api.github.com/repos/apache/beam/issues/20689 <https://api.github.com/repos/apache/beam/issues/20689>: Kafka commitOffsetsInFinalize OOM on Flink
> https://api.github.com/repos/apache/beam/issues/20532 <https://api.github.com/repos/apache/beam/issues/20532>: Support for coder argument in WriteToBigQuery
> https://api.github.com/repos/apache/beam/issues/20531 <https://api.github.com/repos/apache/beam/issues/20531>: FileBasedSink: allow setting temp directory provider per dynamic destination
> https://api.github.com/repos/apache/beam/issues/20530 <https://api.github.com/repos/apache/beam/issues/20530>: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms
> https://api.github.com/repos/apache/beam/issues/20529 <https://api.github.com/repos/apache/beam/issues/20529>: SpannerIO tests don't actually assert anything.
> https://api.github.com/repos/apache/beam/issues/20528 <https://api.github.com/repos/apache/beam/issues/20528>: python CombineGlobally().with_fanout() cause duplicate combine results for sliding windows
> https://api.github.com/repos/apache/beam/issues/20333 <https://api.github.com/repos/apache/beam/issues/20333>: beam_PerformanceTests_Kafka_IO failing due to " provided port is already allocated"
> https://api.github.com/repos/apache/beam/issues/20332 <https://api.github.com/repos/apache/beam/issues/20332>: FileIO writeDynamic with AvroIO.sink not writing all data
> https://api.github.com/repos/apache/beam/issues/20330 <https://api.github.com/repos/apache/beam/issues/20330>: Remove insecure ssl options from MongoDBIO
> https://api.github.com/repos/apache/beam/issues/20109 <https://api.github.com/repos/apache/beam/issues/20109>: SortValues should fail if SecondaryKey coder is not deterministic
> https://api.github.com/repos/apache/beam/issues/20108 <https://api.github.com/repos/apache/beam/issues/20108>: Python direct runner doesn't emit empty pane when it should
> https://api.github.com/repos/apache/beam/issues/20009 <https://api.github.com/repos/apache/beam/issues/20009>: Environment-sensitive provisioning for Dataflow
> https://api.github.com/repos/apache/beam/issues/19971 <https://api.github.com/repos/apache/beam/issues/19971>: [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner)
> https://api.github.com/repos/apache/beam/issues/19817 <https://api.github.com/repos/apache/beam/issues/19817>: datetime and decimal should be logical types
> https://api.github.com/repos/apache/beam/issues/19815 <https://api.github.com/repos/apache/beam/issues/19815>: Add support for remaining data types in python RowCoder 
> https://api.github.com/repos/apache/beam/issues/19813 <https://api.github.com/repos/apache/beam/issues/19813>: PubsubIO returns empty message bodies for all messages read
> https://api.github.com/repos/apache/beam/issues/19556 <https://api.github.com/repos/apache/beam/issues/19556>: User reports protobuf ClassChangeError running against 2.6.0 or above
> https://api.github.com/repos/apache/beam/issues/19369 <https://api.github.com/repos/apache/beam/issues/19369>: KafkaIO doesn't commit offsets while being used as bounded source
> https://api.github.com/repos/apache/beam/issues/17950 <https://api.github.com/repos/apache/beam/issues/17950>: [Bug]: Java Precommit permared

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Danny McCormick <da...@google.com>.

I put up a pr to make these changes -
https://github.com/apache/beam/pull/22045

> 2. The links in this report start with api.github.* and don’t take us
directly to the issues.

> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?

This is already fixed - Pablo actually beat me to it!
<https://github.com/apache/beam/pull/22033>

Thanks,
Danny

On Thu, Jun 23, 2022 at 8:30 PM Brian Hulette <bh...@google.com> wrote:

> +1 for that proposal!
>
> > 1. P2 and P3 issues should be noticed and resolved as well. Shall we
> have a longer time window for the rest of not triaged or stagnate issues
> and include them?
>
> I worry these lists would get _very_ long and wouldn't be actionable. But
> maybe it's worth reporting something like "There are 376 P2's with no
> update in the last 6 months" with a link to a query?
>
> > 2. The links in this report start with api.github.* and don’t take us
> directly to the issues.
>
> Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?
>
> On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pa...@google.com> wrote:
>
>> Thanks. I like the proposal, and I've found the emails useful.
>> Best
>> -P.
>>
>> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com>
>> wrote:
>>
>>> Sounds good! It’s like our internal reports of JIRA tickets exceeding
>>> SLA time and having no response from engineers.  We either resolve them or
>>> downgrade the priority to extend time window.
>>>
>>> Besides,
>>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we
>>> have a longer time window for the rest of not triaged or stagnate issues
>>> and include them?
>>> 2. The links in this report start with api.github.* and don’t take us
>>> directly to the issues.
>>>
>>>
>>> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>>>
>>>> That generally sounds right to me - I also would vote that we
>>>> consolidate to 1 email and stop distinguishing between flaky P1s and normal
>>>> P1s.
>>>>
>>>> So the single daily report would be:
>>>>
>>>> - Unassigned P0s
>>>> - P0s with no update in the last 36 hours
>>>> - Unassigned P1s
>>>> - P1s with no update in the last 7 days
>>>>
>>>> I think that will generate a pretty good list of issues that require
>>>> some kind of action.
>>>>
>>>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are
>>>>> more like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>>>
>>>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I think that Danny's alternate proposal (a daily email that show only
>>>>>> issues last updated >7 days ago, and those with no assignee) fits well with
>>>>>> the two goals you describe, if we include "triage needed" issues in the
>>>>>> latter category. Maybe we also explicitly separate these two concerns in
>>>>>> the report?
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Forking thread because lots of people may just ignore this topic,
>>>>>>> per the discussion :-)
>>>>>>>
>>>>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>>>>
>>>>>>> I'll add some other outcomes of these emails:
>>>>>>>
>>>>>>>  - people file P0s that are not outages and P1s that are not data
>>>>>>> loss and I downgrade them
>>>>>>>  - I randomly open up a few flaky test bugs and see if I can fix
>>>>>>> them really quick
>>>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>>>
>>>>>>> Of these, only the last one seems important (not just that *I*
>>>>>>> follow them, but that new P0s and P1s get immediate attention from many
>>>>>>> eyes)
>>>>>>>
>>>>>>> So maybe one take on the goal is to:
>>>>>>>
>>>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>>>> evaluated for release blocking, etc.
>>>>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>>>>
>>>>>>> It can also be helpful to just state the failure modes which would
>>>>>>> happen by default if we don't have a good process or automation:
>>>>>>>
>>>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>>>> blocking users and/or community in real time
>>>>>>>  - Real P1 gets filed and not noticed, so release goes out with
>>>>>>> known data loss bug or other total loss of functionality
>>>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and making
>>>>>>> it hard to find the real problems
>>>>>>>  - Flakes are never fixed
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> If we have P0s and P1s in the "awaiting triage" state, those are the
>>>>>>> ones we need to notice. Then for a P0 or P1 outside of that state, we just
>>>>>>> need some way of making sure it doesn't stagnate. Or if it does stagnate,
>>>>>>> that empirically demonstrates it isn't really P1 (just like our P2 to P3
>>>>>>> downgrade automation). If everything is P1, nothing is, as they say.
>>>>>>>
>>>>>>> Kenn
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>
>>>>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>
>>>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>>>
>>>>>>>> > I think the motivation for daily emails is that per the
>>>>>>>> priorities guide [1] P1 issues should be getting "continuous status
>>>>>>>> updates". If these issues aren't actually that important, I think the noise
>>>>>>>> is good as it should motivate us to prioritize them correctly. In practice
>>>>>>>> that hasn't been happening though...
>>>>>>>>
>>>>>>>> I guess the questions here are:
>>>>>>>>
>>>>>>>> 1) What is the goal of this email?
>>>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>>>
>>>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>>>> there almost always will be *some *valid P1s that don't need extra
>>>>>>>> attention.
>>>>>>>>
>>>>>>>> I do still think this could have value as a weekly email, with the
>>>>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>>>>> these". Another option would be to only include issues with no action in
>>>>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>>>>
>>>>>>>> A couple side notes:
>>>>>>>> - No matter what we do, if we keep the current automation in any
>>>>>>>> form we should fix the url from
>>>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>>>> https://github.com/apache/beam/issues/# - the current links are
>>>>>>>> very annoying.
>>>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>>>> lower).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Danny
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think the motivation for daily emails is that per the priorities
>>>>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>>>>> been happening though...
>>>>>>>>>
>>>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>>>
>>>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> I think a weekly summary seems like a good idea for the P1 issues
>>>>>>>>>> and flaky tests, though daily still seems appropriate for P0 issues. I put
>>>>>>>>>> up https://github.com/apache/beam/pull/22017 to just send the
>>>>>>>>>> P1/flaky test reports on Wednesdays, if anyone objects please let me know -
>>>>>>>>>> I'll wait on merging til tomorrow to leave time for feedback (and it's
>>>>>>>>>> always reversible 🙂).
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Danny
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <
>>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> what is this daily summary intended for? Not all issues look
>>>>>>>>>>> like P1. And will a weekly summary be less noise?
>>>>>>>>>>>
>>>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>>>
>>>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>>>> including flaky tests.
>>>>>>>>>>>>
>>>>>>>>>>>>     See
>>>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]:
>>>>>>>>>>>> No way to read or write to file when running Beam in Flink
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897:
>>>>>>>>>>>> [Feature Request]: Flink runner savepoint backward compatibility
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>>>>> partitioning
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794:
>>>>>>>>>>>> Dataflow runner creates a new timer whenever the output timestamp is change
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694:
>>>>>>>>>>>> BigQuery Storage API insert with writeResult retry and write to error table
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466:
>>>>>>>>>>>> Simplify version override for Dev versions of the Go SDK.
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>>>> shuffle
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>>>>> orphaned files
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>>>> returned from googleapi
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken
>>>>>>>>>>>> Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259:
>>>>>>>>>>>> Consumer group with random prefix
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258:
>>>>>>>>>>>> Dataflow error in CombinePerKey operation
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>>>>> transform
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123:
>>>>>>>>>>>> Multiple jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>>>> environment.
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate
>>>>>>>>>>>> to the next version of Python `requests` when released
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java
>>>>>>>>>>>> IO IT Tests" - missing data in grafana
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112:
>>>>>>>>>>>> Dataflow SocketException (SSLException) error while trying to send message
>>>>>>>>>>>> from Cloud Pub/Sub to BigQuery
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>>>> in the CLASSPATH
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>>>> BoundedSource
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979:
>>>>>>>>>>>> Portable runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>>>> some Avro logical types
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO
>>>>>>>>>>>> is not acknowledging messages correctly
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>>>> least on portable flink runner)
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support
>>>>>>>>>>>> for coder argument in WriteToBigQuery
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>>>> transforms
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529:
>>>>>>>>>>>> SpannerIO tests don't actually assert anything.
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>>>>> windows
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>>>> allocated"
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>>>> (Direct Runner)
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817:
>>>>>>>>>>>> datetime and decimal should be logical types
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813:
>>>>>>>>>>>> PubsubIO returns empty message bodies for all messages read
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>>>>> Java Precommit permared
>>>>>>>>>>>>
>>>>>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Brian Hulette <bh...@google.com>.

+1 for that proposal!

> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have
a longer time window for the rest of not triaged or stagnate issues and
include them?

I worry these lists would get _very_ long and wouldn't be actionable. But
maybe it's worth reporting something like "There are 376 P2's with no
update in the last 6 months" with a link to a query?

> 2. The links in this report start with api.github.* and don’t take us
directly to the issues.

Yeah Danny pointed that out as well. I'm assuming he knows how to fix it?

On Thu, Jun 23, 2022 at 2:37 PM Pablo Estrada <pa...@google.com> wrote:

> Thanks. I like the proposal, and I've found the emails useful.
> Best
> -P.
>
> On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com>
> wrote:
>
>> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA
>> time and having no response from engineers.  We either resolve them or
>> downgrade the priority to extend time window.
>>
>> Besides,
>> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have
>> a longer time window for the rest of not triaged or stagnate issues and
>> include them?
>> 2. The links in this report start with api.github.* and don’t take us
>> directly to the issues.
>>
>>
>> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>>
>>> That generally sounds right to me - I also would vote that we
>>> consolidate to 1 email and stop distinguishing between flaky P1s and normal
>>> P1s.
>>>
>>> So the single daily report would be:
>>>
>>> - Unassigned P0s
>>> - P0s with no update in the last 36 hours
>>> - Unassigned P1s
>>> - P1s with no update in the last 7 days
>>>
>>> I think that will generate a pretty good list of issues that require
>>> some kind of action.
>>>
>>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more
>>>> like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>>
>>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> I think that Danny's alternate proposal (a daily email that show only
>>>>> issues last updated >7 days ago, and those with no assignee) fits well with
>>>>> the two goals you describe, if we include "triage needed" issues in the
>>>>> latter category. Maybe we also explicitly separate these two concerns in
>>>>> the report?
>>>>>
>>>>>
>>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Forking thread because lots of people may just ignore this topic, per
>>>>>> the discussion :-)
>>>>>>
>>>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>>>
>>>>>> I'll add some other outcomes of these emails:
>>>>>>
>>>>>>  - people file P0s that are not outages and P1s that are not data
>>>>>> loss and I downgrade them
>>>>>>  - I randomly open up a few flaky test bugs and see if I can fix them
>>>>>> really quick
>>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>>
>>>>>> Of these, only the last one seems important (not just that *I* follow
>>>>>> them, but that new P0s and P1s get immediate attention from many eyes)
>>>>>>
>>>>>> So maybe one take on the goal is to:
>>>>>>
>>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>>> evaluated for release blocking, etc.
>>>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>>>
>>>>>> It can also be helpful to just state the failure modes which would
>>>>>> happen by default if we don't have a good process or automation:
>>>>>>
>>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>>> blocking users and/or community in real time
>>>>>>  - Real P1 gets filed and not noticed, so release goes out with known
>>>>>> data loss bug or other total loss of functionality
>>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and making
>>>>>> it hard to find the real problems
>>>>>>  - Flakes are never fixed
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> If we have P0s and P1s in the "awaiting triage" state, those are the
>>>>>> ones we need to notice. Then for a P0 or P1 outside of that state, we just
>>>>>> need some way of making sure it doesn't stagnate. Or if it does stagnate,
>>>>>> that empirically demonstrates it isn't really P1 (just like our P2 to P3
>>>>>> downgrade automation). If everything is P1, nothing is, as they say.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>>> dannymccormick@google.com> wrote:
>>>>>>
>>>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>
>>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>>
>>>>>>> > I think the motivation for daily emails is that per the priorities
>>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>>> been happening though...
>>>>>>>
>>>>>>> I guess the questions here are:
>>>>>>>
>>>>>>> 1) What is the goal of this email?
>>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>>
>>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>>> there almost always will be *some *valid P1s that don't need extra
>>>>>>> attention.
>>>>>>>
>>>>>>> I do still think this could have value as a weekly email, with the
>>>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>>>> these". Another option would be to only include issues with no action in
>>>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>>>
>>>>>>> A couple side notes:
>>>>>>> - No matter what we do, if we keep the current automation in any
>>>>>>> form we should fix the url from
>>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>>> https://github.com/apache/beam/issues/# - the current links are
>>>>>>> very annoying.
>>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>>> lower).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Danny
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think the motivation for daily emails is that per the priorities
>>>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>>>> been happening though...
>>>>>>>>
>>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>>
>>>>>>>> Brian
>>>>>>>>
>>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>>
>>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>>
>>>>>>>>> I think a weekly summary seems like a good idea for the P1 issues
>>>>>>>>> and flaky tests, though daily still seems appropriate for P0 issues. I put
>>>>>>>>> up https://github.com/apache/beam/pull/22017 to just send the
>>>>>>>>> P1/flaky test reports on Wednesdays, if anyone objects please let me know -
>>>>>>>>> I'll wait on merging til tomorrow to leave time for feedback (and it's
>>>>>>>>> always reversible 🙂).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Danny
>>>>>>>>>
>>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <
>>>>>>>>> owenzhang1990@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> what is this daily summary intended for? Not all issues look like
>>>>>>>>>> P1. And will a weekly summary be less noise?
>>>>>>>>>>
>>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>>
>>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>>> including flaky tests.
>>>>>>>>>>>
>>>>>>>>>>>     See
>>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]:
>>>>>>>>>>> No way to read or write to file when running Beam in Flink
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>>>> partitioning
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>>>>>>> version override for Dev versions of the Go SDK.
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>>> shuffle
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>>>> orphaned files
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>>> returned from googleapi
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken
>>>>>>>>>>> Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>>>>>>> group with random prefix
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>>>>>>> error in CombinePerKey operation
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>>>> transform
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple
>>>>>>>>>>> jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>>> environment.
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate
>>>>>>>>>>> to the next version of Python `requests` when released
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO
>>>>>>>>>>> IT Tests" - missing data in grafana
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>>>>>>> Cloud Pub/Sub to BigQuery
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>>> in the CLASSPATH
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>>> BoundedSource
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>>> some Avro logical types
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is
>>>>>>>>>>> not acknowledging messages correctly
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>>> least on portable flink runner)
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support
>>>>>>>>>>> for coder argument in WriteToBigQuery
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>>> transforms
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529:
>>>>>>>>>>> SpannerIO tests don't actually assert anything.
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>>>> windows
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>>> allocated"
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>>> (Direct Runner)
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime
>>>>>>>>>>> and decimal should be logical types
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>>>>>>> returns empty message bodies for all messages read
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>>>> Java Precommit permared
>>>>>>>>>>>
>>>>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Pablo Estrada <pa...@google.com>.

Thanks. I like the proposal, and I've found the emails useful.
Best
-P.

On Thu, Jun 23, 2022 at 2:33 PM Manu Zhang <ow...@gmail.com> wrote:

> Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA
> time and having no response from engineers.  We either resolve them or
> downgrade the priority to extend time window.
>
> Besides,
> 1. P2 and P3 issues should be noticed and resolved as well. Shall we have
> a longer time window for the rest of not triaged or stagnate issues and
> include them?
> 2. The links in this report start with api.github.* and don’t take us
> directly to the issues.
>
>
> Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：
>
>> That generally sounds right to me - I also would vote that we consolidate
>> to 1 email and stop distinguishing between flaky P1s and normal P1s.
>>
>> So the single daily report would be:
>>
>> - Unassigned P0s
>> - P0s with no update in the last 36 hours
>> - Unassigned P1s
>> - P1s with no update in the last 7 days
>>
>> I think that will generate a pretty good list of issues that require some
>> kind of action.
>>
>> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more
>>> like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>>
>>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> I think that Danny's alternate proposal (a daily email that show only
>>>> issues last updated >7 days ago, and those with no assignee) fits well with
>>>> the two goals you describe, if we include "triage needed" issues in the
>>>> latter category. Maybe we also explicitly separate these two concerns in
>>>> the report?
>>>>
>>>>
>>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org>
>>>> wrote:
>>>>
>>>>> Forking thread because lots of people may just ignore this topic, per
>>>>> the discussion :-)
>>>>>
>>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>>
>>>>> I'll add some other outcomes of these emails:
>>>>>
>>>>>  - people file P0s that are not outages and P1s that are not data loss
>>>>> and I downgrade them
>>>>>  - I randomly open up a few flaky test bugs and see if I can fix them
>>>>> really quick
>>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>>
>>>>> Of these, only the last one seems important (not just that *I* follow
>>>>> them, but that new P0s and P1s get immediate attention from many eyes)
>>>>>
>>>>> So maybe one take on the goal is to:
>>>>>
>>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>>> evaluated for release blocking, etc.
>>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>>
>>>>> It can also be helpful to just state the failure modes which would
>>>>> happen by default if we don't have a good process or automation:
>>>>>
>>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>>> blocking users and/or community in real time
>>>>>  - Real P1 gets filed and not noticed, so release goes out with known
>>>>> data loss bug or other total loss of functionality
>>>>>  - Non-real P0s and P1s accumulate, throwing off our data and making
>>>>> it hard to find the real problems
>>>>>  - Flakes are never fixed
>>>>>
>>>>> WDYT?
>>>>>
>>>>> If we have P0s and P1s in the "awaiting triage" state, those are the
>>>>> ones we need to notice. Then for a P0 or P1 outside of that state, we just
>>>>> need some way of making sure it doesn't stagnate. Or if it does stagnate,
>>>>> that empirically demonstrates it isn't really P1 (just like our P2 to P3
>>>>> downgrade automation). If everything is P1, nothing is, as they say.
>>>>>
>>>>> Kenn
>>>>>
>>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>>> dannymccormick@google.com> wrote:
>>>>>
>>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>>> potentially include that information in the email). Then we can at least
>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>
>>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>>> update the automation to do that once we get consensus on an approach.
>>>>>>
>>>>>> > I think the motivation for daily emails is that per the priorities
>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>> been happening though...
>>>>>>
>>>>>> I guess the questions here are:
>>>>>>
>>>>>> 1) What is the goal of this email?
>>>>>> 2) Is it effective at accomplishing that goal.
>>>>>>
>>>>>> I think you're saying that the goal (or a goal) is to highlight
>>>>>> issues that aren't getting the attention they need; if that's our goal,
>>>>>> then I don't think this is a particularly effective mechanism for it
>>>>>> because (a) its very unclear which issues fall into that category and (b)
>>>>>> there are too many to manually go through on a daily basis. From the email
>>>>>> alone, it's not clear to me that any of the issues above "shouldn't" be P1s
>>>>>> (though I'd guess you're right that some/many of them don't belong since
>>>>>> most were created before the Jira -> GH migration based on the titles). I'd
>>>>>> also argue that a daily email just desensitizes us to them since
>>>>>> there almost always will be *some *valid P1s that don't need extra
>>>>>> attention.
>>>>>>
>>>>>> I do still think this could have value as a weekly email, with the
>>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>>> these". Another option would be to only include issues with no action in
>>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>>
>>>>>> A couple side notes:
>>>>>> - No matter what we do, if we keep the current automation in any form
>>>>>> we should fix the url from
>>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>>> https://github.com/apache/beam/issues/# - the current links are very
>>>>>> annoying.
>>>>>> - After I send this, I will do a pass of the current P1s since it
>>>>>> does indeed seem like too many are P1s and many should actually be P2s (or
>>>>>> lower).
>>>>>>
>>>>>> Thanks,
>>>>>> Danny
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I think the motivation for daily emails is that per the priorities
>>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>>> been happening though...
>>>>>>>
>>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>>> potentially include that information in the email). Then we can at least
>>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>>
>>>>>>> Brian
>>>>>>>
>>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>>
>>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>>> dannymccormick@google.com> wrote:
>>>>>>>
>>>>>>>> I think a weekly summary seems like a good idea for the P1 issues
>>>>>>>> and flaky tests, though daily still seems appropriate for P0 issues. I put
>>>>>>>> up https://github.com/apache/beam/pull/22017 to just send the
>>>>>>>> P1/flaky test reports on Wednesdays, if anyone objects please let me know -
>>>>>>>> I'll wait on merging til tomorrow to leave time for feedback (and it's
>>>>>>>> always reversible 🙂).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Danny
>>>>>>>>
>>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> what is this daily summary intended for? Not all issues look like
>>>>>>>>> P1. And will a weekly summary be less noise?
>>>>>>>>>
>>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>>
>>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>>> including flaky tests.
>>>>>>>>>>
>>>>>>>>>>     See
>>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No
>>>>>>>>>> way to read or write to file when running Beam in Flink
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>>> partitioning
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>>>>>> version override for Dev versions of the Go SDK.
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>>> shuffle
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>>> orphaned files
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>>> via OutboundObserverFactory
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>>> returned from googleapi
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken
>>>>>>>>>> Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>>>>>> group with random prefix
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>>>>>> error in CombinePerKey operation
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>>> transform
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple
>>>>>>>>>> jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>>> environment.
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate
>>>>>>>>>> to the next version of Python `requests` when released
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO
>>>>>>>>>> IT Tests" - missing data in grafana
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>>>>>> Cloud Pub/Sub to BigQuery
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>>> in the CLASSPATH
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>>> BoundedSource
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>>> some Avro logical types
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818:
>>>>>>>>>> XmlIO.Read does not handle XML encoding per spec
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is
>>>>>>>>>> not acknowledging messages correctly
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No
>>>>>>>>>> trigger early repeatedly for session windows
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>>> least on portable flink runner)
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer
>>>>>>>>>> with dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support
>>>>>>>>>> for coder argument in WriteToBigQuery
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>>> transforms
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>>>>>>> tests don't actually assert anything.
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>>> windows
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>>> allocated"
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109:
>>>>>>>>>> SortValues should fail if SecondaryKey coder is not deterministic
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL]
>>>>>>>>>> Some Hive tests throw NullPointerException, but get marked as passing
>>>>>>>>>> (Direct Runner)
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime
>>>>>>>>>> and decimal should be logical types
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>>>>>> returns empty message bodies for all messages read
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>>> Java Precommit permared
>>>>>>>>>>
>>>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Manu Zhang <ow...@gmail.com>.

Sounds good! It’s like our internal reports of JIRA tickets exceeding SLA
time and having no response from engineers.  We either resolve them or
downgrade the priority to extend time window.

Besides,
1. P2 and P3 issues should be noticed and resolved as well. Shall we have a
longer time window for the rest of not triaged or stagnate issues and
include them?
2. The links in this report start with api.github.* and don’t take us
directly to the issues.


Danny McCormick <da...@google.com>于2022年6月24日 周五04:48写道：

> That generally sounds right to me - I also would vote that we consolidate
> to 1 email and stop distinguishing between flaky P1s and normal P1s.
>
> So the single daily report would be:
>
> - Unassigned P0s
> - P0s with no update in the last 36 hours
> - Unassigned P1s
> - P1s with no update in the last 7 days
>
> I think that will generate a pretty good list of issues that require some
> kind of action.
>
> On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more
>> like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>>
>> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> I think that Danny's alternate proposal (a daily email that show only
>>> issues last updated >7 days ago, and those with no assignee) fits well with
>>> the two goals you describe, if we include "triage needed" issues in the
>>> latter category. Maybe we also explicitly separate these two concerns in
>>> the report?
>>>
>>>
>>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org> wrote:
>>>
>>>> Forking thread because lots of people may just ignore this topic, per
>>>> the discussion :-)
>>>>
>>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>>
>>>> I'll add some other outcomes of these emails:
>>>>
>>>>  - people file P0s that are not outages and P1s that are not data loss
>>>> and I downgrade them
>>>>  - I randomly open up a few flaky test bugs and see if I can fix them
>>>> really quick
>>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>>
>>>> Of these, only the last one seems important (not just that *I* follow
>>>> them, but that new P0s and P1s get immediate attention from many eyes)
>>>>
>>>> So maybe one take on the goal is to:
>>>>
>>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>>> evaluated for release blocking, etc.
>>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>>
>>>> It can also be helpful to just state the failure modes which would
>>>> happen by default if we don't have a good process or automation:
>>>>
>>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>>> blocking users and/or community in real time
>>>>  - Real P1 gets filed and not noticed, so release goes out with known
>>>> data loss bug or other total loss of functionality
>>>>  - Non-real P0s and P1s accumulate, throwing off our data and making it
>>>> hard to find the real problems
>>>>  - Flakes are never fixed
>>>>
>>>> WDYT?
>>>>
>>>> If we have P0s and P1s in the "awaiting triage" state, those are the
>>>> ones we need to notice. Then for a P0 or P1 outside of that state, we just
>>>> need some way of making sure it doesn't stagnate. Or if it does stagnate,
>>>> that empirically demonstrates it isn't really P1 (just like our P2 to P3
>>>> downgrade automation). If everything is P1, nothing is, as they say.
>>>>
>>>> Kenn
>>>>
>>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>>> dannymccormick@google.com> wrote:
>>>>
>>>>> > Maybe it would be helpful to sort these by last update time (and
>>>>> potentially include that information in the email). Then we can at least
>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>
>>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>>> update the automation to do that once we get consensus on an approach.
>>>>>
>>>>> > I think the motivation for daily emails is that per the priorities
>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>> issues aren't actually that important, I think the noise is good as it
>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>> been happening though...
>>>>>
>>>>> I guess the questions here are:
>>>>>
>>>>> 1) What is the goal of this email?
>>>>> 2) Is it effective at accomplishing that goal.
>>>>>
>>>>> I think you're saying that the goal (or a goal) is to highlight issues
>>>>> that aren't getting the attention they need; if that's our goal, then I
>>>>> don't think this is a particularly effective mechanism for it because (a)
>>>>> its very unclear which issues fall into that category and (b) there are too
>>>>> many to manually go through on a daily basis. From the email alone, it's
>>>>> not clear to me that any of the issues above "shouldn't" be P1s (though I'd
>>>>> guess you're right that some/many of them don't belong since most were
>>>>> created before the Jira -> GH migration based on the titles). I'd also
>>>>> argue that a daily email just desensitizes us to them since there almost
>>>>> always will be *some *valid P1s that don't need extra attention.
>>>>>
>>>>> I do still think this could have value as a weekly email, with the
>>>>> goal being "it's probably a good idea for someone to take a look at each of
>>>>> these". Another option would be to only include issues with no action in
>>>>> the last 7 days and/or no assignees and keep it daily.
>>>>>
>>>>> A couple side notes:
>>>>> - No matter what we do, if we keep the current automation in any form
>>>>> we should fix the url from
>>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>>> https://github.com/apache/beam/issues/# - the current links are very
>>>>> annoying.
>>>>> - After I send this, I will do a pass of the current P1s since it does
>>>>> indeed seem like too many are P1s and many should actually be P2s (or
>>>>> lower).
>>>>>
>>>>> Thanks,
>>>>> Danny
>>>>>
>>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I think the motivation for daily emails is that per the priorities
>>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>>> issues aren't actually that important, I think the noise is good as it
>>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>>> been happening though...
>>>>>>
>>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>>> potentially include that information in the email). Then we can at least
>>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>>
>>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>>> dannymccormick@google.com> wrote:
>>>>>>
>>>>>>> I think a weekly summary seems like a good idea for the P1 issues
>>>>>>> and flaky tests, though daily still seems appropriate for P0 issues. I put
>>>>>>> up https://github.com/apache/beam/pull/22017 to just send the
>>>>>>> P1/flaky test reports on Wednesdays, if anyone objects please let me know -
>>>>>>> I'll wait on merging til tomorrow to leave time for feedback (and it's
>>>>>>> always reversible 🙂).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Danny
>>>>>>>
>>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> what is this daily summary intended for? Not all issues look like
>>>>>>>> P1. And will a weekly summary be less noise?
>>>>>>>>
>>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>>
>>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>>> including flaky tests.
>>>>>>>>>
>>>>>>>>>     See
>>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No
>>>>>>>>> way to read or write to file when running Beam in Flink
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>>> Reject illformed GBK Coders
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>>> partitioning
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763:
>>>>>>>>> [Playground Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>>> missing when using CassandraIO.Read
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam
>>>>>>>>> x-lang Dataflow tests failing due to _InactiveRpcError
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>>>>> version override for Dev versions of the Go SDK.
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>>> shuffle
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>>> orphaned files
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>>> via OutboundObserverFactory
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>>> returned from googleapi
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken
>>>>>>>>> Pipe induced) Bricked Dataflow Pipeline
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>>>>> group with random prefix
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>>>>> error in CombinePerKey operation
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>>> transform
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple
>>>>>>>>> jobs running on Flink session cluster reuse the persistent Python
>>>>>>>>> environment.
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to
>>>>>>>>> the next version of Python `requests` when released
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO
>>>>>>>>> IT Tests" - missing data in grafana
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO
>>>>>>>>> date conversion is sensitive to OS
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>>>>> Cloud Pub/Sub to BigQuery
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>>> in the CLASSPATH
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>>> codecov/patch has poor behavior
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>>> BoundedSource
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>>> some Avro logical types
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python
>>>>>>>>> Beam SDK Harness hangs when installing pip packages
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read
>>>>>>>>> does not handle XML encoding per spec
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is
>>>>>>>>> not acknowledging messages correctly
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>>>>>>> early repeatedly for session windows
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>>> least on portable flink runner)
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>>>>>>> dataflow runner can be set multiple times (dataflow runner)
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support
>>>>>>>>> for coder argument in WriteToBigQuery
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>>> transforms
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>>>>>> tests don't actually assert anything.
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>>> windows
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>>> allocated"
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>>>>>>> should fail if SecondaryKey coder is not deterministic
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some
>>>>>>>>> Hive tests throw NullPointerException, but get marked as passing (Direct
>>>>>>>>> Runner)
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime
>>>>>>>>> and decimal should be logical types
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add
>>>>>>>>> support for remaining data types in python RowCoder
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>>>>> returns empty message bodies for all messages read
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]:
>>>>>>>>> Java Precommit permared
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Danny McCormick <da...@google.com>.

That generally sounds right to me - I also would vote that we consolidate
to 1 email and stop distinguishing between flaky P1s and normal P1s.

So the single daily report would be:

- Unassigned P0s
- P0s with no update in the last 36 hours
- Unassigned P1s
- P1s with no update in the last 7 days

I think that will generate a pretty good list of issues that require some
kind of action.

On Thu, Jun 23, 2022 at 4:43 PM Kenneth Knowles <ke...@apache.org> wrote:

> Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more
> like ~hours for true outages of CI/website/etc) and P1s > 7 days?
>
> On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com> wrote:
>
>> I think that Danny's alternate proposal (a daily email that show only
>> issues last updated >7 days ago, and those with no assignee) fits well with
>> the two goals you describe, if we include "triage needed" issues in the
>> latter category. Maybe we also explicitly separate these two concerns in
>> the report?
>>
>>
>> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org> wrote:
>>
>>> Forking thread because lots of people may just ignore this topic, per
>>> the discussion :-)
>>>
>>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>>
>>> I'll add some other outcomes of these emails:
>>>
>>>  - people file P0s that are not outages and P1s that are not data loss
>>> and I downgrade them
>>>  - I randomly open up a few flaky test bugs and see if I can fix them
>>> really quick
>>>  - people file legit P0s and P1s and I subscribe and follow them
>>>
>>> Of these, only the last one seems important (not just that *I* follow
>>> them, but that new P0s and P1s get immediate attention from many eyes)
>>>
>>> So maybe one take on the goal is to:
>>>
>>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>>> outage-like occurrence that needs immediate remedy, and P1s need to be
>>> evaluated for release blocking, etc.
>>>  - make sure P0s and P1s get attention appropriate to their priority
>>>
>>> It can also be helpful to just state the failure modes which would
>>> happen by default if we don't have a good process or automation:
>>>
>>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>>> blocking users and/or community in real time
>>>  - Real P1 gets filed and not noticed, so release goes out with known
>>> data loss bug or other total loss of functionality
>>>  - Non-real P0s and P1s accumulate, throwing off our data and making it
>>> hard to find the real problems
>>>  - Flakes are never fixed
>>>
>>> WDYT?
>>>
>>> If we have P0s and P1s in the "awaiting triage" state, those are the
>>> ones we need to notice. Then for a P0 or P1 outside of that state, we just
>>> need some way of making sure it doesn't stagnate. Or if it does stagnate,
>>> that empirically demonstrates it isn't really P1 (just like our P2 to P3
>>> downgrade automation). If everything is P1, nothing is, as they say.
>>>
>>> Kenn
>>>
>>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>>> dannymccormick@google.com> wrote:
>>>
>>>> > Maybe it would be helpful to sort these by last update time (and
>>>> potentially include that information in the email). Then we can at least
>>>> prioritize them instead of looking at a big wall of issues.
>>>>
>>>> I agree that this is a good idea (and pretty trivial to do). I'll
>>>> update the automation to do that once we get consensus on an approach.
>>>>
>>>> > I think the motivation for daily emails is that per the priorities
>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>> issues aren't actually that important, I think the noise is good as it
>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>> been happening though...
>>>>
>>>> I guess the questions here are:
>>>>
>>>> 1) What is the goal of this email?
>>>> 2) Is it effective at accomplishing that goal.
>>>>
>>>> I think you're saying that the goal (or a goal) is to highlight issues
>>>> that aren't getting the attention they need; if that's our goal, then I
>>>> don't think this is a particularly effective mechanism for it because (a)
>>>> its very unclear which issues fall into that category and (b) there are too
>>>> many to manually go through on a daily basis. From the email alone, it's
>>>> not clear to me that any of the issues above "shouldn't" be P1s (though I'd
>>>> guess you're right that some/many of them don't belong since most were
>>>> created before the Jira -> GH migration based on the titles). I'd also
>>>> argue that a daily email just desensitizes us to them since there almost
>>>> always will be *some *valid P1s that don't need extra attention.
>>>>
>>>> I do still think this could have value as a weekly email, with the goal
>>>> being "it's probably a good idea for someone to take a look at each of
>>>> these". Another option would be to only include issues with no action in
>>>> the last 7 days and/or no assignees and keep it daily.
>>>>
>>>> A couple side notes:
>>>> - No matter what we do, if we keep the current automation in any form
>>>> we should fix the url from
>>>> https://api.github.com/repos/apache/beam/issues/# to
>>>> https://github.com/apache/beam/issues/# - the current links are very
>>>> annoying.
>>>> - After I send this, I will do a pass of the current P1s since it does
>>>> indeed seem like too many are P1s and many should actually be P2s (or
>>>> lower).
>>>>
>>>> Thanks,
>>>> Danny
>>>>
>>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>>> wrote:
>>>>
>>>>> I think the motivation for daily emails is that per the priorities
>>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>>> issues aren't actually that important, I think the noise is good as it
>>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>>> been happening though...
>>>>>
>>>>> Maybe it would be helpful to sort these by last update time (and
>>>>> potentially include that information in the email). Then we can at least
>>>>> prioritize them instead of looking at a big wall of issues.
>>>>>
>>>>> Brian
>>>>>
>>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>>
>>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>>> dannymccormick@google.com> wrote:
>>>>>
>>>>>> I think a weekly summary seems like a good idea for the P1 issues and
>>>>>> flaky tests, though daily still seems appropriate for P0 issues. I put up
>>>>>> https://github.com/apache/beam/pull/22017 to just send the P1/flaky
>>>>>> test reports on Wednesdays, if anyone objects please let me know - I'll
>>>>>> wait on merging til tomorrow to leave time for feedback (and it's always
>>>>>> reversible 🙂).
>>>>>>
>>>>>> Thanks,
>>>>>> Danny
>>>>>>
>>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> what is this daily summary intended for? Not all issues look like
>>>>>>> P1. And will a weekly summary be less noise?
>>>>>>>
>>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>>
>>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>>> including flaky tests.
>>>>>>>>
>>>>>>>>     See
>>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21978:
>>>>>>>> [Playground] Implement Share Any Code feature on the frontend
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No
>>>>>>>> way to read or write to file when running Beam in Flink
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>>> Reject illformed GBK Coders
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>>> partitioning
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>>>>>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data
>>>>>>>> missing when using CassandraIO.Read
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>>>>>>> Dataflow tests failing due to _InactiveRpcError
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>>>> version override for Dev versions of the Go SDK.
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka
>>>>>>>> commit offset drop data on failure for runners that have non-checkpointing
>>>>>>>> shuffle
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>>> orphaned files
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race
>>>>>>>> between member variable being accessed due to leaking uninitialized state
>>>>>>>> via OutboundObserverFactory
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>>> returned from googleapi
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken
>>>>>>>> Pipe induced) Bricked Dataflow Pipeline
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>>> DirectRunner does not emit data at GC time
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>>>> group with random prefix
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>>>> error in CombinePerKey operation
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>>> transform
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple
>>>>>>>> jobs running on Flink session cluster reuse the persistent Python
>>>>>>>> environment.
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to
>>>>>>>> the next version of Python `requests` when released
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>>>>>>> Tests" - missing data in grafana
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>>>>>>> conversion is sensitive to OS
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>>>> Cloud Pub/Sub to BigQuery
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java
>>>>>>>> creates an incorrect pipeline proto when core-construction-java jar is not
>>>>>>>> in the CLASSPATH
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>>> codecov/patch has poor behavior
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>>> BoundedSource
>>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>>> some Avro logical types
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam
>>>>>>>> SDK Harness hangs when installing pip packages
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read
>>>>>>>> does not handle XML encoding per spec
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is
>>>>>>>> not acknowledging messages correctly
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>>>>>> early repeatedly for session windows
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>>> least on portable flink runner)
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>>>>>> dataflow runner can be set multiple times (dataflow runner)
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam
>>>>>>>> metrics should be displayed in Flink UI "Metrics" tab
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>>>>>>> coder argument in WriteToBigQuery
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>>> transforms
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>>>>> tests don't actually assert anything.
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>>> windows
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>>> allocated"
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>>> insecure ssl options from MongoDBIO
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>>>>>> should fail if SecondaryKey coder is not deterministic
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some
>>>>>>>> Hive tests throw NullPointerException, but get marked as passing (Direct
>>>>>>>> Runner)
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime
>>>>>>>> and decimal should be logical types
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add support
>>>>>>>> for remaining data types in python RowCoder
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>>>> returns empty message bodies for all messages read
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User
>>>>>>>> reports protobuf ClassChangeError running against 2.6.0 or above
>>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>>>>>>> Precommit permared
>>>>>>>>
>>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Kenneth Knowles <ke...@apache.org>.

Sounds good to me. Perhaps P0s > 36 hours ago (presumably they are more
like ~hours for true outages of CI/website/etc) and P1s > 7 days?

On Thu, Jun 23, 2022 at 1:27 PM Brian Hulette <bh...@google.com> wrote:

> I think that Danny's alternate proposal (a daily email that show only
> issues last updated >7 days ago, and those with no assignee) fits well with
> the two goals you describe, if we include "triage needed" issues in the
> latter category. Maybe we also explicitly separate these two concerns in
> the report?
>
>
> On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> Forking thread because lots of people may just ignore this topic, per the
>> discussion :-)
>>
>> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>>
>> I'll add some other outcomes of these emails:
>>
>>  - people file P0s that are not outages and P1s that are not data loss
>> and I downgrade them
>>  - I randomly open up a few flaky test bugs and see if I can fix them
>> really quick
>>  - people file legit P0s and P1s and I subscribe and follow them
>>
>> Of these, only the last one seems important (not just that *I* follow
>> them, but that new P0s and P1s get immediate attention from many eyes)
>>
>> So maybe one take on the goal is to:
>>
>>  - have new P0s and P1s evaluated quickly: P0s are an outage or
>> outage-like occurrence that needs immediate remedy, and P1s need to be
>> evaluated for release blocking, etc.
>>  - make sure P0s and P1s get attention appropriate to their priority
>>
>> It can also be helpful to just state the failure modes which would happen
>> by default if we don't have a good process or automation:
>>
>>  - Real P0 gets filed and not noticed or fixed in a timely manner,
>> blocking users and/or community in real time
>>  - Real P1 gets filed and not noticed, so release goes out with known
>> data loss bug or other total loss of functionality
>>  - Non-real P0s and P1s accumulate, throwing off our data and making it
>> hard to find the real problems
>>  - Flakes are never fixed
>>
>> WDYT?
>>
>> If we have P0s and P1s in the "awaiting triage" state, those are the ones
>> we need to notice. Then for a P0 or P1 outside of that state, we just need
>> some way of making sure it doesn't stagnate. Or if it does stagnate, that
>> empirically demonstrates it isn't really P1 (just like our P2 to P3
>> downgrade automation). If everything is P1, nothing is, as they say.
>>
>> Kenn
>>
>> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
>> dannymccormick@google.com> wrote:
>>
>>> > Maybe it would be helpful to sort these by last update time (and
>>> potentially include that information in the email). Then we can at least
>>> prioritize them instead of looking at a big wall of issues.
>>>
>>> I agree that this is a good idea (and pretty trivial to do). I'll update
>>> the automation to do that once we get consensus on an approach.
>>>
>>> > I think the motivation for daily emails is that per the priorities
>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>> issues aren't actually that important, I think the noise is good as it
>>> should motivate us to prioritize them correctly. In practice that hasn't
>>> been happening though...
>>>
>>> I guess the questions here are:
>>>
>>> 1) What is the goal of this email?
>>> 2) Is it effective at accomplishing that goal.
>>>
>>> I think you're saying that the goal (or a goal) is to highlight issues
>>> that aren't getting the attention they need; if that's our goal, then I
>>> don't think this is a particularly effective mechanism for it because (a)
>>> its very unclear which issues fall into that category and (b) there are too
>>> many to manually go through on a daily basis. From the email alone, it's
>>> not clear to me that any of the issues above "shouldn't" be P1s (though I'd
>>> guess you're right that some/many of them don't belong since most were
>>> created before the Jira -> GH migration based on the titles). I'd also
>>> argue that a daily email just desensitizes us to them since there almost
>>> always will be *some *valid P1s that don't need extra attention.
>>>
>>> I do still think this could have value as a weekly email, with the goal
>>> being "it's probably a good idea for someone to take a look at each of
>>> these". Another option would be to only include issues with no action in
>>> the last 7 days and/or no assignees and keep it daily.
>>>
>>> A couple side notes:
>>> - No matter what we do, if we keep the current automation in any form we
>>> should fix the url from
>>> https://api.github.com/repos/apache/beam/issues/# to
>>> https://github.com/apache/beam/issues/# - the current links are very
>>> annoying.
>>> - After I send this, I will do a pass of the current P1s since it does
>>> indeed seem like too many are P1s and many should actually be P2s (or
>>> lower).
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>>
>>>> I think the motivation for daily emails is that per the priorities
>>>> guide [1] P1 issues should be getting "continuous status updates". If these
>>>> issues aren't actually that important, I think the noise is good as it
>>>> should motivate us to prioritize them correctly. In practice that hasn't
>>>> been happening though...
>>>>
>>>> Maybe it would be helpful to sort these by last update time (and
>>>> potentially include that information in the email). Then we can at least
>>>> prioritize them instead of looking at a big wall of issues.
>>>>
>>>> Brian
>>>>
>>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>>
>>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>>> dannymccormick@google.com> wrote:
>>>>
>>>>> I think a weekly summary seems like a good idea for the P1 issues and
>>>>> flaky tests, though daily still seems appropriate for P0 issues. I put up
>>>>> https://github.com/apache/beam/pull/22017 to just send the P1/flaky
>>>>> test reports on Wednesdays, if anyone objects please let me know - I'll
>>>>> wait on merging til tomorrow to leave time for feedback (and it's always
>>>>> reversible 🙂).
>>>>>
>>>>> Thanks,
>>>>> Danny
>>>>>
>>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> what is this daily summary intended for? Not all issues look like P1.
>>>>>> And will a weekly summary be less noise?
>>>>>>
>>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>>
>>>>>>> This is your daily summary of Beam's current P1 issues, not
>>>>>>> including flaky tests.
>>>>>>>
>>>>>>>     See
>>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>>>>>> for the meaning and expectations around P1 issues.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>>>>>>> Implement Share Any Code feature on the frontend
>>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No
>>>>>>> way to read or write to file when running Beam in Flink
>>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]:
>>>>>>> Reject illformed GBK Coders
>>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>>> partitioning
>>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>>>>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data missing
>>>>>>> when using CassandraIO.Read
>>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>>> integration testing for BQ Storage API  write modes
>>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>>>>>> Dataflow tests failing due to _InactiveRpcError
>>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>>> PVR_Spark2_Streaming perma-red
>>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>>> version override for Dev versions of the Go SDK.
>>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>>>>>>> offset drop data on failure for runners that have non-checkpointing shuffle
>>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>>> orphaned files
>>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>>>>>>> member variable being accessed due to leaking uninitialized state via
>>>>>>> OutboundObserverFactory
>>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>>> returned from googleapi
>>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>>>>>>> induced) Bricked Dataflow Pipeline
>>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>>> AfterAny, AfterAll do not follow spec
>>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>>> DirectRunner does not emit data at GC time
>>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>>> group with random prefix
>>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>>> error in CombinePerKey operation
>>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either
>>>>>>> Create or DirectRunner fails to produce all elements to the following
>>>>>>> transform
>>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple
>>>>>>> jobs running on Flink session cluster reuse the persistent Python
>>>>>>> environment.
>>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to
>>>>>>> the next version of Python `requests` when released
>>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>>>>>> Tests" - missing data in grafana
>>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>>>>>> conversion is sensitive to OS
>>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>>> Cloud Pub/Sub to BigQuery
>>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java creates
>>>>>>> an incorrect pipeline proto when core-construction-java jar is not in the
>>>>>>> CLASSPATH
>>>>>>> https://api.github.com/repos/apache/beam/issues/21110:
>>>>>>> codecov/patch has poor behavior
>>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>>> BoundedSource
>>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>>> some Avro logical types
>>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam
>>>>>>> SDK Harness hangs when installing pip packages
>>>>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read
>>>>>>> does not handle XML encoding per spec
>>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>>>>>>> acknowledging messages correctly
>>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>>>>> early repeatedly for session windows
>>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>>> least on portable flink runner)
>>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>>>>> dataflow runner can be set multiple times (dataflow runner)
>>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>>>>>>> should be displayed in Flink UI "Metrics" tab
>>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>>>>>> coder argument in WriteToBigQuery
>>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>>> transforms
>>>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>>>> tests don't actually assert anything.
>>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>>> windows
>>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>>> allocated"
>>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>>> insecure ssl options from MongoDBIO
>>>>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>>>>> should fail if SecondaryKey coder is not deterministic
>>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python
>>>>>>> direct runner doesn't emit empty pane when it should
>>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some
>>>>>>> Hive tests throw NullPointerException, but get marked as passing (Direct
>>>>>>> Runner)
>>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>>>>>>> decimal should be logical types
>>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add support
>>>>>>> for remaining data types in python RowCoder
>>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>>> returns empty message bodies for all messages read
>>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>>>>>>> protobuf ClassChangeError running against 2.6.0 or above
>>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>>> doesn't commit offsets while being used as bounded source
>>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>>>>>> Precommit permared
>>>>>>>
>>>>>>

Re: [DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Brian Hulette <bh...@google.com>.

I think that Danny's alternate proposal (a daily email that show only
issues last updated >7 days ago, and those with no assignee) fits well with
the two goals you describe, if we include "triage needed" issues in the
latter category. Maybe we also explicitly separate these two concerns in
the report?


On Thu, Jun 23, 2022 at 1:14 PM Kenneth Knowles <ke...@apache.org> wrote:

> Forking thread because lots of people may just ignore this topic, per the
> discussion :-)
>
> (sometimes gmail doesn't fork thread properly, but here's hoping...)
>
> I'll add some other outcomes of these emails:
>
>  - people file P0s that are not outages and P1s that are not data loss and
> I downgrade them
>  - I randomly open up a few flaky test bugs and see if I can fix them
> really quick
>  - people file legit P0s and P1s and I subscribe and follow them
>
> Of these, only the last one seems important (not just that *I* follow
> them, but that new P0s and P1s get immediate attention from many eyes)
>
> So maybe one take on the goal is to:
>
>  - have new P0s and P1s evaluated quickly: P0s are an outage or
> outage-like occurrence that needs immediate remedy, and P1s need to be
> evaluated for release blocking, etc.
>  - make sure P0s and P1s get attention appropriate to their priority
>
> It can also be helpful to just state the failure modes which would happen
> by default if we don't have a good process or automation:
>
>  - Real P0 gets filed and not noticed or fixed in a timely manner,
> blocking users and/or community in real time
>  - Real P1 gets filed and not noticed, so release goes out with known data
> loss bug or other total loss of functionality
>  - Non-real P0s and P1s accumulate, throwing off our data and making it
> hard to find the real problems
>  - Flakes are never fixed
>
> WDYT?
>
> If we have P0s and P1s in the "awaiting triage" state, those are the ones
> we need to notice. Then for a P0 or P1 outside of that state, we just need
> some way of making sure it doesn't stagnate. Or if it does stagnate, that
> empirically demonstrates it isn't really P1 (just like our P2 to P3
> downgrade automation). If everything is P1, nothing is, as they say.
>
> Kenn
>
> On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <
> dannymccormick@google.com> wrote:
>
>> > Maybe it would be helpful to sort these by last update time (and
>> potentially include that information in the email). Then we can at least
>> prioritize them instead of looking at a big wall of issues.
>>
>> I agree that this is a good idea (and pretty trivial to do). I'll update
>> the automation to do that once we get consensus on an approach.
>>
>> > I think the motivation for daily emails is that per the priorities
>> guide [1] P1 issues should be getting "continuous status updates". If these
>> issues aren't actually that important, I think the noise is good as it
>> should motivate us to prioritize them correctly. In practice that hasn't
>> been happening though...
>>
>> I guess the questions here are:
>>
>> 1) What is the goal of this email?
>> 2) Is it effective at accomplishing that goal.
>>
>> I think you're saying that the goal (or a goal) is to highlight issues
>> that aren't getting the attention they need; if that's our goal, then I
>> don't think this is a particularly effective mechanism for it because (a)
>> its very unclear which issues fall into that category and (b) there are too
>> many to manually go through on a daily basis. From the email alone, it's
>> not clear to me that any of the issues above "shouldn't" be P1s (though I'd
>> guess you're right that some/many of them don't belong since most were
>> created before the Jira -> GH migration based on the titles). I'd also
>> argue that a daily email just desensitizes us to them since there almost
>> always will be *some *valid P1s that don't need extra attention.
>>
>> I do still think this could have value as a weekly email, with the goal
>> being "it's probably a good idea for someone to take a look at each of
>> these". Another option would be to only include issues with no action in
>> the last 7 days and/or no assignees and keep it daily.
>>
>> A couple side notes:
>> - No matter what we do, if we keep the current automation in any form we
>> should fix the url from https://api.github.com/repos/apache/beam/issues/#
>> to https://github.com/apache/beam/issues/# - the current links are very
>> annoying.
>> - After I send this, I will do a pass of the current P1s since it does
>> indeed seem like too many are P1s and many should actually be P2s (or
>> lower).
>>
>> Thanks,
>> Danny
>>
>> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
>> wrote:
>>
>>> I think the motivation for daily emails is that per the priorities guide
>>> [1] P1 issues should be getting "continuous status updates". If these
>>> issues aren't actually that important, I think the noise is good as it
>>> should motivate us to prioritize them correctly. In practice that hasn't
>>> been happening though...
>>>
>>> Maybe it would be helpful to sort these by last update time (and
>>> potentially include that information in the email). Then we can at least
>>> prioritize them instead of looking at a big wall of issues.
>>>
>>> Brian
>>>
>>> [1] https://beam.apache.org/contribute/issue-priorities/
>>>
>>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>>> dannymccormick@google.com> wrote:
>>>
>>>> I think a weekly summary seems like a good idea for the P1 issues and
>>>> flaky tests, though daily still seems appropriate for P0 issues. I put up
>>>> https://github.com/apache/beam/pull/22017 to just send the P1/flaky
>>>> test reports on Wednesdays, if anyone objects please let me know - I'll
>>>> wait on merging til tomorrow to leave time for feedback (and it's always
>>>> reversible 🙂).
>>>>
>>>> Thanks,
>>>> Danny
>>>>
>>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> what is this daily summary intended for? Not all issues look like P1.
>>>>> And will a weekly summary be less noise?
>>>>>
>>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>>
>>>>>> This is your daily summary of Beam's current P1 issues, not including
>>>>>> flaky tests.
>>>>>>
>>>>>>     See
>>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical for
>>>>>> the meaning and expectations around P1 issues.
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>>>>>> Implement Share Any Code feature on the frontend
>>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way
>>>>>> to read or write to file when running Beam in Flink
>>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
>>>>>> illformed GBK Coders
>>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>>> Request]: Flink runner savepoint backward compatibility
>>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>>> BigQuery Storage Write API implementation does not support table
>>>>>> partitioning
>>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>>> runner creates a new timer whenever the output timestamp is change
>>>>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>>>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data missing
>>>>>> when using CassandraIO.Read
>>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>>> https://api.github.com/repos/apache/beam/issues/21702:
>>>>>> SpannerWriteIT failing in beam PostCommit Java V1
>>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>>> Storage API insert with writeResult retry and write to error table
>>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install
>>>>>> Python wheel and dependencies to local venv in SDK harness
>>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>>> integration testing for BQ Storage API  write modes
>>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>>>>> Dataflow tests failing due to _InactiveRpcError
>>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>>> PVR_Spark2_Streaming perma-red
>>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>>> version override for Dev versions of the Go SDK.
>>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>>>>>> offset drop data on failure for runners that have non-checkpointing shuffle
>>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>>> orphaned files
>>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>>>>>> member variable being accessed due to leaking uninitialized state via
>>>>>> OutboundObserverFactory
>>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>>> returned from googleapi
>>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>>>>>> induced) Bricked Dataflow Pipeline
>>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>>> AfterAny, AfterAll do not follow spec
>>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>>> DirectRunner does not emit data at GC time
>>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer
>>>>>> group with random prefix
>>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow
>>>>>> error in CombinePerKey operation
>>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either Create
>>>>>> or DirectRunner fails to produce all elements to the following transform
>>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
>>>>>> running on Flink session cluster reuse the persistent Python environment.
>>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to
>>>>>> the next version of Python `requests` when released
>>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>>>>> Tests" - missing data in grafana
>>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>>>>> conversion is sensitive to OS
>>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>>> SocketException (SSLException) error while trying to send message from
>>>>>> Cloud Pub/Sub to BigQuery
>>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java creates
>>>>>> an incorrect pipeline proto when core-construction-java jar is not in the
>>>>>> CLASSPATH
>>>>>> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch
>>>>>> has poor behavior
>>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>>> BoundedSource
>>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>>> java.io.InvalidClassException With Flink Kafka
>>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>>> some Avro logical types
>>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam
>>>>>> SDK Harness hangs when installing pip packages
>>>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read
>>>>>> does not handle XML encoding per spec
>>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>>>>>> acknowledging messages correctly
>>>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>>>> early repeatedly for session windows
>>>>>> https://api.github.com/repos/apache/beam/issues/20812:
>>>>>> Cross-language consistency (RequiresStableInputs) is quietly broken (at
>>>>>> least on portable flink runner)
>>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>>>> dataflow runner can be set multiple times (dataflow runner)
>>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>>>>>> should be displayed in Flink UI "Metrics" tab
>>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>>> commitOffsetsInFinalize OOM on Flink
>>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>>>>> coder argument in WriteToBigQuery
>>>>>> https://api.github.com/repos/apache/beam/issues/20531:
>>>>>> FileBasedSink: allow setting temp directory provider per dynamic destination
>>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>>> transforms
>>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>>> tests don't actually assert anything.
>>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>>> windows
>>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>>> allocated"
>>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>>> insecure ssl options from MongoDBIO
>>>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>>>> should fail if SecondaryKey coder is not deterministic
>>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python direct
>>>>>> runner doesn't emit empty pane when it should
>>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>>> Environment-sensitive provisioning for Dataflow
>>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some
>>>>>> Hive tests throw NullPointerException, but get marked as passing (Direct
>>>>>> Runner)
>>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>>>>>> decimal should be logical types
>>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add support
>>>>>> for remaining data types in python RowCoder
>>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>>> returns empty message bodies for all messages read
>>>>>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>>>>>> protobuf ClassChangeError running against 2.6.0 or above
>>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>>> doesn't commit offsets while being used as bounded source
>>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>>>>> Precommit permared
>>>>>>
>>>>>

[DISCUSS] What to do about P0/P1/flake automation Was: P1 issues report (70)

Posted by Kenneth Knowles <ke...@apache.org>.

Forking thread because lots of people may just ignore this topic, per the
discussion :-)

(sometimes gmail doesn't fork thread properly, but here's hoping...)

I'll add some other outcomes of these emails:

 - people file P0s that are not outages and P1s that are not data loss and
I downgrade them
 - I randomly open up a few flaky test bugs and see if I can fix them
really quick
 - people file legit P0s and P1s and I subscribe and follow them

Of these, only the last one seems important (not just that *I* follow them,
but that new P0s and P1s get immediate attention from many eyes)

So maybe one take on the goal is to:

 - have new P0s and P1s evaluated quickly: P0s are an outage or outage-like
occurrence that needs immediate remedy, and P1s need to be evaluated for
release blocking, etc.
 - make sure P0s and P1s get attention appropriate to their priority

It can also be helpful to just state the failure modes which would happen
by default if we don't have a good process or automation:

 - Real P0 gets filed and not noticed or fixed in a timely manner, blocking
users and/or community in real time
 - Real P1 gets filed and not noticed, so release goes out with known data
loss bug or other total loss of functionality
 - Non-real P0s and P1s accumulate, throwing off our data and making it
hard to find the real problems
 - Flakes are never fixed

WDYT?

If we have P0s and P1s in the "awaiting triage" state, those are the ones
we need to notice. Then for a P0 or P1 outside of that state, we just need
some way of making sure it doesn't stagnate. Or if it does stagnate, that
empirically demonstrates it isn't really P1 (just like our P2 to P3
downgrade automation). If everything is P1, nothing is, as they say.

Kenn

On Thu, Jun 23, 2022 at 10:01 AM Danny McCormick <da...@google.com>
wrote:

> > Maybe it would be helpful to sort these by last update time (and
> potentially include that information in the email). Then we can at least
> prioritize them instead of looking at a big wall of issues.
>
> I agree that this is a good idea (and pretty trivial to do). I'll update
> the automation to do that once we get consensus on an approach.
>
> > I think the motivation for daily emails is that per the priorities guide
> [1] P1 issues should be getting "continuous status updates". If these
> issues aren't actually that important, I think the noise is good as it
> should motivate us to prioritize them correctly. In practice that hasn't
> been happening though...
>
> I guess the questions here are:
>
> 1) What is the goal of this email?
> 2) Is it effective at accomplishing that goal.
>
> I think you're saying that the goal (or a goal) is to highlight issues
> that aren't getting the attention they need; if that's our goal, then I
> don't think this is a particularly effective mechanism for it because (a)
> its very unclear which issues fall into that category and (b) there are too
> many to manually go through on a daily basis. From the email alone, it's
> not clear to me that any of the issues above "shouldn't" be P1s (though I'd
> guess you're right that some/many of them don't belong since most were
> created before the Jira -> GH migration based on the titles). I'd also
> argue that a daily email just desensitizes us to them since there almost
> always will be *some *valid P1s that don't need extra attention.
>
> I do still think this could have value as a weekly email, with the goal
> being "it's probably a good idea for someone to take a look at each of
> these". Another option would be to only include issues with no action in
> the last 7 days and/or no assignees and keep it daily.
>
> A couple side notes:
> - No matter what we do, if we keep the current automation in any form we
> should fix the url from https://api.github.com/repos/apache/beam/issues/#
> to https://github.com/apache/beam/issues/# - the current links are very
> annoying.
> - After I send this, I will do a pass of the current P1s since it does
> indeed seem like too many are P1s and many should actually be P2s (or
> lower).
>
> Thanks,
> Danny
>
> On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com>
> wrote:
>
>> I think the motivation for daily emails is that per the priorities guide
>> [1] P1 issues should be getting "continuous status updates". If these
>> issues aren't actually that important, I think the noise is good as it
>> should motivate us to prioritize them correctly. In practice that hasn't
>> been happening though...
>>
>> Maybe it would be helpful to sort these by last update time (and
>> potentially include that information in the email). Then we can at least
>> prioritize them instead of looking at a big wall of issues.
>>
>> Brian
>>
>> [1] https://beam.apache.org/contribute/issue-priorities/
>>
>> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <
>> dannymccormick@google.com> wrote:
>>
>>> I think a weekly summary seems like a good idea for the P1 issues and
>>> flaky tests, though daily still seems appropriate for P0 issues. I put up
>>> https://github.com/apache/beam/pull/22017 to just send the P1/flaky
>>> test reports on Wednesdays, if anyone objects please let me know - I'll
>>> wait on merging til tomorrow to leave time for feedback (and it's always
>>> reversible 🙂).
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> what is this daily summary intended for? Not all issues look like P1.
>>>> And will a weekly summary be less noise?
>>>>
>>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>>
>>>>> This is your daily summary of Beam's current P1 issues, not including
>>>>> flaky tests.
>>>>>
>>>>>     See
>>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical for
>>>>> the meaning and expectations around P1 issues.
>>>>>
>>>>>
>>>>>
>>>>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>>>>> Implement Share Any Code feature on the frontend
>>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way
>>>>> to read or write to file when running Beam in Flink
>>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
>>>>> illformed GBK Coders
>>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>>> Request]: Flink runner savepoint backward compatibility
>>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]:
>>>>> BigQuery Storage Write API implementation does not support table
>>>>> partitioning
>>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow
>>>>> runner creates a new timer whenever the output timestamp is change
>>>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>>>> https://api.github.com/repos/apache/beam/issues/21715: Data missing
>>>>> when using CassandraIO.Read
>>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>>> Streaming job failing to drain with BigQueryIO write errors
>>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>>> https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT
>>>>> failing in beam PostCommit Java V1
>>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>>> Storage API insert with writeResult retry and write to error table
>>>>> https://api.github.com/repos/apache/beam/issues/21479: Install Python
>>>>> wheel and dependencies to local venv in SDK harness
>>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>>> https://api.github.com/repos/apache/beam/issues/21477: Add
>>>>> integration testing for BQ Storage API  write modes
>>>>> https://api.github.com/repos/apache/beam/issues/21476:
>>>>> WriteToBigQuery Dynamic table destinations returns wrong tableId
>>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>>>> Dataflow tests failing due to _InactiveRpcError
>>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>>> PVR_Spark2_Streaming perma-red
>>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>>> version override for Dev versions of the Go SDK.
>>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>>>>> offset drop data on failure for runners that have non-checkpointing shuffle
>>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete
>>>>> orphaned files
>>>>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>>>>> member variable being accessed due to leaking uninitialized state via
>>>>> OutboundObserverFactory
>>>>> https://api.github.com/repos/apache/beam/issues/21267:
>>>>> WriteToBigQuery submits a duplicate BQ load job if a 503 error code is
>>>>> returned from googleapi
>>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>>>>> induced) Bricked Dataflow Pipeline
>>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>>> AfterAny, AfterAll do not follow spec
>>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>>> DirectRunner does not emit data at GC time
>>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer group
>>>>> with random prefix
>>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow error
>>>>> in CombinePerKey operation
>>>>> https://api.github.com/repos/apache/beam/issues/21257: Either Create
>>>>> or DirectRunner fails to produce all elements to the following transform
>>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
>>>>> running on Flink session cluster reuse the persistent Python environment.
>>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to the
>>>>> next version of Python `requests` when released
>>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>>>> Tests" - missing data in grafana
>>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>>>> conversion is sensitive to OS
>>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>>> SocketException (SSLException) error while trying to send message from
>>>>> Cloud Pub/Sub to BigQuery
>>>>> https://api.github.com/repos/apache/beam/issues/21111: Java creates
>>>>> an incorrect pipeline proto when core-construction-java jar is not in the
>>>>> CLASSPATH
>>>>> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch
>>>>> has poor behavior
>>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>>> BoundedSource
>>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>>> java.io.InvalidClassException With Flink Kafka
>>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>>> some Avro logical types
>>>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam
>>>>> SDK Harness hangs when installing pip packages
>>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read
>>>>> does not handle XML encoding per spec
>>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>>>>> acknowledging messages correctly
>>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>>> early repeatedly for session windows
>>>>> https://api.github.com/repos/apache/beam/issues/20812: Cross-language
>>>>> consistency (RequiresStableInputs) is quietly broken (at least on portable
>>>>> flink runner)
>>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>>> dataflow runner can be set multiple times (dataflow runner)
>>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>>>>> should be displayed in Flink UI "Metrics" tab
>>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>>> commitOffsetsInFinalize OOM on Flink
>>>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>>>> coder argument in WriteToBigQuery
>>>>> https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink:
>>>>> allow setting temp directory provider per dynamic destination
>>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>>> transforms
>>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO
>>>>> tests don't actually assert anything.
>>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>>> windows
>>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>>> allocated"
>>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>>> writeDynamic with AvroIO.sink not writing all data
>>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove
>>>>> insecure ssl options from MongoDBIO
>>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>>> should fail if SecondaryKey coder is not deterministic
>>>>> https://api.github.com/repos/apache/beam/issues/20108: Python direct
>>>>> runner doesn't emit empty pane when it should
>>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>>> Environment-sensitive provisioning for Dataflow
>>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some
>>>>> Hive tests throw NullPointerException, but get marked as passing (Direct
>>>>> Runner)
>>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>>>>> decimal should be logical types
>>>>> https://api.github.com/repos/apache/beam/issues/19815: Add support
>>>>> for remaining data types in python RowCoder
>>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>>> returns empty message bodies for all messages read
>>>>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>>>>> protobuf ClassChangeError running against 2.6.0 or above
>>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO
>>>>> doesn't commit offsets while being used as bounded source
>>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>>>> Precommit permared
>>>>>
>>>>

Re: P1 issues report (70)

Posted by Danny McCormick <da...@google.com>.

> Maybe it would be helpful to sort these by last update time (and
potentially include that information in the email). Then we can at least
prioritize them instead of looking at a big wall of issues.

I agree that this is a good idea (and pretty trivial to do). I'll update
the automation to do that once we get consensus on an approach.

> I think the motivation for daily emails is that per the priorities guide
[1] P1 issues should be getting "continuous status updates". If these
issues aren't actually that important, I think the noise is good as it
should motivate us to prioritize them correctly. In practice that hasn't
been happening though...

I guess the questions here are:

1) What is the goal of this email?
2) Is it effective at accomplishing that goal.

I think you're saying that the goal (or a goal) is to highlight issues that
aren't getting the attention they need; if that's our goal, then I don't
think this is a particularly effective mechanism for it because (a) its
very unclear which issues fall into that category and (b) there are too
many to manually go through on a daily basis. From the email alone, it's
not clear to me that any of the issues above "shouldn't" be P1s (though I'd
guess you're right that some/many of them don't belong since most were
created before the Jira -> GH migration based on the titles). I'd also
argue that a daily email just desensitizes us to them since there almost
always will be *some *valid P1s that don't need extra attention.

I do still think this could have value as a weekly email, with the goal
being "it's probably a good idea for someone to take a look at each of
these". Another option would be to only include issues with no action in
the last 7 days and/or no assignees and keep it daily.

A couple side notes:
- No matter what we do, if we keep the current automation in any form we
should fix the url from https://api.github.com/repos/apache/beam/issues/#
to https://github.com/apache/beam/issues/# - the current links are very
annoying.
- After I send this, I will do a pass of the current P1s since it does
indeed seem like too many are P1s and many should actually be P2s (or
lower).

Thanks,
Danny

On Thu, Jun 23, 2022 at 12:21 PM Brian Hulette <bh...@google.com> wrote:

> I think the motivation for daily emails is that per the priorities guide
> [1] P1 issues should be getting "continuous status updates". If these
> issues aren't actually that important, I think the noise is good as it
> should motivate us to prioritize them correctly. In practice that hasn't
> been happening though...
>
> Maybe it would be helpful to sort these by last update time (and
> potentially include that information in the email). Then we can at least
> prioritize them instead of looking at a big wall of issues.
>
> Brian
>
> [1] https://beam.apache.org/contribute/issue-priorities/
>
> On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <da...@google.com>
> wrote:
>
>> I think a weekly summary seems like a good idea for the P1 issues and
>> flaky tests, though daily still seems appropriate for P0 issues. I put up
>> https://github.com/apache/beam/pull/22017 to just send the P1/flaky test
>> reports on Wednesdays, if anyone objects please let me know - I'll wait on
>> merging til tomorrow to leave time for feedback (and it's always reversible
>> 🙂).
>>
>> Thanks,
>> Danny
>>
>> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> what is this daily summary intended for? Not all issues look like P1.
>>> And will a weekly summary be less noise?
>>>
>>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>>
>>>> This is your daily summary of Beam's current P1 issues, not including
>>>> flaky tests.
>>>>
>>>>     See
>>>> https://beam.apache.org/contribute/issue-priorities/#p1-critical for
>>>> the meaning and expectations around P1 issues.
>>>>
>>>>
>>>>
>>>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>>>> Implement Share Any Code feature on the frontend
>>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way
>>>> to read or write to file when running Beam in Flink
>>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
>>>> illformed GBK Coders
>>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>>> Request]: Flink runner savepoint backward compatibility
>>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery
>>>> Storage Write API implementation does not support table partitioning
>>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner
>>>> creates a new timer whenever the output timestamp is change
>>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>>> https://api.github.com/repos/apache/beam/issues/21715: Data missing
>>>> when using CassandraIO.Read
>>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>>> BigQueryIO don't get output to Failed Inserts PCollection
>>>> https://api.github.com/repos/apache/beam/issues/21711: Python
>>>> Streaming job failing to drain with BigQueryIO write errors
>>>> https://api.github.com/repos/apache/beam/issues/21703:
>>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>>> https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT
>>>> failing in beam PostCommit Java V1
>>>> https://api.github.com/repos/apache/beam/issues/21700:
>>>> --dataflowServiceOptions=use_runner_v2 is broken
>>>> https://api.github.com/repos/apache/beam/issues/21695:
>>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery
>>>> Storage API insert with writeResult retry and write to error table
>>>> https://api.github.com/repos/apache/beam/issues/21479: Install Python
>>>> wheel and dependencies to local venv in SDK harness
>>>> https://api.github.com/repos/apache/beam/issues/21478:
>>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>>> https://api.github.com/repos/apache/beam/issues/21477: Add integration
>>>> testing for BQ Storage API  write modes
>>>> https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery
>>>> Dynamic table destinations returns wrong tableId
>>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>>> Dataflow tests failing due to _InactiveRpcError
>>>> https://api.github.com/repos/apache/beam/issues/21473:
>>>> PVR_Spark2_Streaming perma-red
>>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify
>>>> version override for Dev versions of the Go SDK.
>>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>>>> offset drop data on failure for runners that have non-checkpointing shuffle
>>>> https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned
>>>> files
>>>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>>>> member variable being accessed due to leaking uninitialized state via
>>>> OutboundObserverFactory
>>>> https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery
>>>> submits a duplicate BQ load job if a 503 error code is returned from
>>>> googleapi
>>>> https://api.github.com/repos/apache/beam/issues/21265:
>>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>>>> induced) Bricked Dataflow Pipeline
>>>> https://api.github.com/repos/apache/beam/issues/21262: Python
>>>> AfterAny, AfterAll do not follow spec
>>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>>> DirectRunner does not emit data at GC time
>>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer group
>>>> with random prefix
>>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow error
>>>> in CombinePerKey operation
>>>> https://api.github.com/repos/apache/beam/issues/21257: Either Create
>>>> or DirectRunner fails to produce all elements to the following transform
>>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
>>>> running on Flink session cluster reuse the persistent Python environment.
>>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to the
>>>> next version of Python `requests` when released
>>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>>> Tests" - missing data in grafana
>>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>>> conversion is sensitive to OS
>>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>>> SocketException (SSLException) error while trying to send message from
>>>> Cloud Pub/Sub to BigQuery
>>>> https://api.github.com/repos/apache/beam/issues/21111: Java creates an
>>>> incorrect pipeline proto when core-construction-java jar is not in the
>>>> CLASSPATH
>>>> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch
>>>> has poor behavior
>>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>>> BoundedSource seems to execute significantly slower than 'normal'
>>>> BoundedSource
>>>> https://api.github.com/repos/apache/beam/issues/21108:
>>>> java.io.InvalidClassException With Flink Kafka
>>>> https://api.github.com/repos/apache/beam/issues/20979: Portable
>>>> runners should be able to issue checkpoints to Splittable DoFn
>>>> https://api.github.com/repos/apache/beam/issues/20978:
>>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>>> some Avro logical types
>>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam SDK
>>>> Harness hangs when installing pip packages
>>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read does
>>>> not handle XML encoding per spec
>>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>>>> acknowledging messages correctly
>>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger
>>>> early repeatedly for session windows
>>>> https://api.github.com/repos/apache/beam/issues/20812: Cross-language
>>>> consistency (RequiresStableInputs) is quietly broken (at least on portable
>>>> flink runner)
>>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>>> dataflow runner can be set multiple times (dataflow runner)
>>>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>>>> should be displayed in Flink UI "Metrics" tab
>>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>>> commitOffsetsInFinalize OOM on Flink
>>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>>> coder argument in WriteToBigQuery
>>>> https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink:
>>>> allow setting temp directory provider per dynamic destination
>>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>>> transforms
>>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO tests
>>>> don't actually assert anything.
>>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>>> windows
>>>> https://api.github.com/repos/apache/beam/issues/20333:
>>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>>> allocated"
>>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>>> writeDynamic with AvroIO.sink not writing all data
>>>> https://api.github.com/repos/apache/beam/issues/20330: Remove insecure
>>>> ssl options from MongoDBIO
>>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>>> should fail if SecondaryKey coder is not deterministic
>>>> https://api.github.com/repos/apache/beam/issues/20108: Python direct
>>>> runner doesn't emit empty pane when it should
>>>> https://api.github.com/repos/apache/beam/issues/20009:
>>>> Environment-sensitive provisioning for Dataflow
>>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some Hive
>>>> tests throw NullPointerException, but get marked as passing (Direct Runner)
>>>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>>>> decimal should be logical types
>>>> https://api.github.com/repos/apache/beam/issues/19815: Add support for
>>>> remaining data types in python RowCoder
>>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO
>>>> returns empty message bodies for all messages read
>>>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>>>> protobuf ClassChangeError running against 2.6.0 or above
>>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO doesn't
>>>> commit offsets while being used as bounded source
>>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>>> Precommit permared
>>>>
>>>

Re: P1 issues report (70)

Posted by Brian Hulette <bh...@google.com>.

I think the motivation for daily emails is that per the priorities guide
[1] P1 issues should be getting "continuous status updates". If these
issues aren't actually that important, I think the noise is good as it
should motivate us to prioritize them correctly. In practice that hasn't
been happening though...

Maybe it would be helpful to sort these by last update time (and
potentially include that information in the email). Then we can at least
prioritize them instead of looking at a big wall of issues.

Brian

[1] https://beam.apache.org/contribute/issue-priorities/

On Thu, Jun 23, 2022 at 6:07 AM Danny McCormick <da...@google.com>
wrote:

> I think a weekly summary seems like a good idea for the P1 issues and
> flaky tests, though daily still seems appropriate for P0 issues. I put up
> https://github.com/apache/beam/pull/22017 to just send the P1/flaky test
> reports on Wednesdays, if anyone objects please let me know - I'll wait on
> merging til tomorrow to leave time for feedback (and it's always reversible
> 🙂).
>
> Thanks,
> Danny
>
> On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> what is this daily summary intended for? Not all issues look like P1. And
>> will a weekly summary be less noise?
>>
>> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>>
>>> This is your daily summary of Beam's current P1 issues, not including
>>> flaky tests.
>>>
>>>     See https://beam.apache.org/contribute/issue-priorities/#p1-critical
>>> for the meaning and expectations around P1 issues.
>>>
>>>
>>>
>>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>>> Implement Share Any Code feature on the frontend
>>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to
>>> read or write to file when running Beam in Flink
>>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
>>> illformed GBK Coders
>>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>>> Request]: Flink runner savepoint backward compatibility
>>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery
>>> Storage Write API implementation does not support table partitioning
>>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner
>>> creates a new timer whenever the output timestamp is change
>>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>>> Task]: Migrate from Google Analytics to Matomo Cloud
>>> https://api.github.com/repos/apache/beam/issues/21715: Data missing
>>> when using CassandraIO.Read
>>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>>> BigQueryIO don't get output to Failed Inserts PCollection
>>> https://api.github.com/repos/apache/beam/issues/21711: Python Streaming
>>> job failing to drain with BigQueryIO write errors
>>> https://api.github.com/repos/apache/beam/issues/21703:
>>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>>> https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT
>>> failing in beam PostCommit Java V1
>>> https://api.github.com/repos/apache/beam/issues/21700:
>>> --dataflowServiceOptions=use_runner_v2 is broken
>>> https://api.github.com/repos/apache/beam/issues/21695:
>>> DataflowPipelineResult does not raise exception for unsuccessful states.
>>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage
>>> API insert with writeResult retry and write to error table
>>> https://api.github.com/repos/apache/beam/issues/21479: Install Python
>>> wheel and dependencies to local venv in SDK harness
>>> https://api.github.com/repos/apache/beam/issues/21478:
>>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>>> https://api.github.com/repos/apache/beam/issues/21477: Add integration
>>> testing for BQ Storage API  write modes
>>> https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery
>>> Dynamic table destinations returns wrong tableId
>>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>>> Dataflow tests failing due to _InactiveRpcError
>>> https://api.github.com/repos/apache/beam/issues/21473:
>>> PVR_Spark2_Streaming perma-red
>>> https://api.github.com/repos/apache/beam/issues/21466: Simplify version
>>> override for Dev versions of the Go SDK.
>>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>>> offset drop data on failure for runners that have non-checkpointing shuffle
>>> https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned
>>> files
>>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>>> member variable being accessed due to leaking uninitialized state via
>>> OutboundObserverFactory
>>> https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery
>>> submits a duplicate BQ load job if a 503 error code is returned from
>>> googleapi
>>> https://api.github.com/repos/apache/beam/issues/21265:
>>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>>> induced) Bricked Dataflow Pipeline
>>> https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny,
>>> AfterAll do not follow spec
>>> https://api.github.com/repos/apache/beam/issues/21260: Python
>>> DirectRunner does not emit data at GC time
>>> https://api.github.com/repos/apache/beam/issues/21259: Consumer group
>>> with random prefix
>>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow error
>>> in CombinePerKey operation
>>> https://api.github.com/repos/apache/beam/issues/21257: Either Create or
>>> DirectRunner fails to produce all elements to the following transform
>>> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
>>> running on Flink session cluster reuse the persistent Python environment.
>>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to the
>>> next version of Python `requests` when released
>>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>>> Tests" - missing data in grafana
>>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>>> conversion is sensitive to OS
>>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>>> SocketException (SSLException) error while trying to send message from
>>> Cloud Pub/Sub to BigQuery
>>> https://api.github.com/repos/apache/beam/issues/21111: Java creates an
>>> incorrect pipeline proto when core-construction-java jar is not in the
>>> CLASSPATH
>>> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch
>>> has poor behavior
>>> https://api.github.com/repos/apache/beam/issues/21109: SDF
>>> BoundedSource seems to execute significantly slower than 'normal'
>>> BoundedSource
>>> https://api.github.com/repos/apache/beam/issues/21108:
>>> java.io.InvalidClassException With Flink Kafka
>>> https://api.github.com/repos/apache/beam/issues/20979: Portable runners
>>> should be able to issue checkpoints to Splittable DoFn
>>> https://api.github.com/repos/apache/beam/issues/20978:
>>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>>> some Avro logical types
>>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam SDK
>>> Harness hangs when installing pip packages
>>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read does
>>> not handle XML encoding per spec
>>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>>> acknowledging messages correctly
>>> https://api.github.com/repos/apache/beam/issues/20813: No trigger early
>>> repeatedly for session windows
>>> https://api.github.com/repos/apache/beam/issues/20812: Cross-language
>>> consistency (RequiresStableInputs) is quietly broken (at least on portable
>>> flink runner)
>>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>>> dataflow runner can be set multiple times (dataflow runner)
>>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>>> should be displayed in Flink UI "Metrics" tab
>>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>>> commitOffsetsInFinalize OOM on Flink
>>> https://api.github.com/repos/apache/beam/issues/20532: Support for
>>> coder argument in WriteToBigQuery
>>> https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink:
>>> allow setting temp directory provider per dynamic destination
>>> https://api.github.com/repos/apache/beam/issues/20530: Make
>>> non-portable Splittable DoFn the only option when executing Java "Read"
>>> transforms
>>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO tests
>>> don't actually assert anything.
>>> https://api.github.com/repos/apache/beam/issues/20528: python
>>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>>> windows
>>> https://api.github.com/repos/apache/beam/issues/20333:
>>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>>> allocated"
>>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>>> writeDynamic with AvroIO.sink not writing all data
>>> https://api.github.com/repos/apache/beam/issues/20330: Remove insecure
>>> ssl options from MongoDBIO
>>> https://api.github.com/repos/apache/beam/issues/20109: SortValues
>>> should fail if SecondaryKey coder is not deterministic
>>> https://api.github.com/repos/apache/beam/issues/20108: Python direct
>>> runner doesn't emit empty pane when it should
>>> https://api.github.com/repos/apache/beam/issues/20009:
>>> Environment-sensitive provisioning for Dataflow
>>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some Hive
>>> tests throw NullPointerException, but get marked as passing (Direct Runner)
>>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>>> decimal should be logical types
>>> https://api.github.com/repos/apache/beam/issues/19815: Add support for
>>> remaining data types in python RowCoder
>>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO returns
>>> empty message bodies for all messages read
>>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>>> protobuf ClassChangeError running against 2.6.0 or above
>>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO doesn't
>>> commit offsets while being used as bounded source
>>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>>> Precommit permared
>>>
>>

Re: P1 issues report (70)

Posted by Danny McCormick <da...@google.com>.

I think a weekly summary seems like a good idea for the P1 issues and flaky
tests, though daily still seems appropriate for P0 issues. I put up
https://github.com/apache/beam/pull/22017 to just send the P1/flaky test
reports on Wednesdays, if anyone objects please let me know - I'll wait on
merging til tomorrow to leave time for feedback (and it's always reversible
🙂).

Thanks,
Danny

On Wed, Jun 22, 2022 at 7:05 PM Manu Zhang <ow...@gmail.com> wrote:

> Hi all,
>
> what is this daily summary intended for? Not all issues look like P1. And
> will a weekly summary be less noise?
>
> <be...@gmail.com>于2022年6月22日 周三23:45写道：
>
>> This is your daily summary of Beam's current P1 issues, not including
>> flaky tests.
>>
>>     See https://beam.apache.org/contribute/issue-priorities/#p1-critical
>> for the meaning and expectations around P1 issues.
>>
>>
>>
>> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
>> Implement Share Any Code feature on the frontend
>> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to
>> read or write to file when running Beam in Flink
>> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
>> illformed GBK Coders
>> https://api.github.com/repos/apache/beam/issues/21897: [Feature
>> Request]: Flink runner savepoint backward compatibility
>> https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery
>> Storage Write API implementation does not support table partitioning
>> https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner
>> creates a new timer whenever the output timestamp is change
>> https://api.github.com/repos/apache/beam/issues/21763: [Playground
>> Task]: Migrate from Google Analytics to Matomo Cloud
>> https://api.github.com/repos/apache/beam/issues/21715: Data missing when
>> using CassandraIO.Read
>> https://api.github.com/repos/apache/beam/issues/21713: 404s in
>> BigQueryIO don't get output to Failed Inserts PCollection
>> https://api.github.com/repos/apache/beam/issues/21711: Python Streaming
>> job failing to drain with BigQueryIO write errors
>> https://api.github.com/repos/apache/beam/issues/21703:
>> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
>> https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT
>> failing in beam PostCommit Java V1
>> https://api.github.com/repos/apache/beam/issues/21700:
>> --dataflowServiceOptions=use_runner_v2 is broken
>> https://api.github.com/repos/apache/beam/issues/21695:
>> DataflowPipelineResult does not raise exception for unsuccessful states.
>> https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage
>> API insert with writeResult retry and write to error table
>> https://api.github.com/repos/apache/beam/issues/21479: Install Python
>> wheel and dependencies to local venv in SDK harness
>> https://api.github.com/repos/apache/beam/issues/21478:
>> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
>> https://api.github.com/repos/apache/beam/issues/21477: Add integration
>> testing for BQ Storage API  write modes
>> https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery
>> Dynamic table destinations returns wrong tableId
>> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
>> Dataflow tests failing due to _InactiveRpcError
>> https://api.github.com/repos/apache/beam/issues/21473:
>> PVR_Spark2_Streaming perma-red
>> https://api.github.com/repos/apache/beam/issues/21466: Simplify version
>> override for Dev versions of the Go SDK.
>> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
>> offset drop data on failure for runners that have non-checkpointing shuffle
>> https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned
>> files
>> https://api.github.com/repos/apache/beam/issues/21268: Race between
>> member variable being accessed due to leaking uninitialized state via
>> OutboundObserverFactory
>> https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery
>> submits a duplicate BQ load job if a 503 error code is returned from
>> googleapi
>> https://api.github.com/repos/apache/beam/issues/21265:
>> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
>> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
>> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
>> induced) Bricked Dataflow Pipeline
>> https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny,
>> AfterAll do not follow spec
>> https://api.github.com/repos/apache/beam/issues/21260: Python
>> DirectRunner does not emit data at GC time
>> https://api.github.com/repos/apache/beam/issues/21259: Consumer group
>> with random prefix
>> https://api.github.com/repos/apache/beam/issues/21258: Dataflow error in
>> CombinePerKey operation
>> https://api.github.com/repos/apache/beam/issues/21257: Either Create or
>> DirectRunner fails to produce all elements to the following transform
>> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
>> running on Flink session cluster reuse the persistent Python environment.
>> https://api.github.com/repos/apache/beam/issues/21119: Migrate to the
>> next version of Python `requests` when released
>> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT
>> Tests" - missing data in grafana
>> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
>> conversion is sensitive to OS
>> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
>> SocketException (SSLException) error while trying to send message from
>> Cloud Pub/Sub to BigQuery
>> https://api.github.com/repos/apache/beam/issues/21111: Java creates an
>> incorrect pipeline proto when core-construction-java jar is not in the
>> CLASSPATH
>> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch has
>> poor behavior
>> https://api.github.com/repos/apache/beam/issues/21109: SDF BoundedSource
>> seems to execute significantly slower than 'normal' BoundedSource
>> https://api.github.com/repos/apache/beam/issues/21108:
>> java.io.InvalidClassException With Flink Kafka
>> https://api.github.com/repos/apache/beam/issues/20979: Portable runners
>> should be able to issue checkpoints to Splittable DoFn
>> https://api.github.com/repos/apache/beam/issues/20978:
>> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
>> some Avro logical types
>> https://api.github.com/repos/apache/beam/issues/20973: Python Beam SDK
>> Harness hangs when installing pip packages
>> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read does
>> not handle XML encoding per spec
>> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
>> acknowledging messages correctly
>> https://api.github.com/repos/apache/beam/issues/20813: No trigger early
>> repeatedly for session windows
>> https://api.github.com/repos/apache/beam/issues/20812: Cross-language
>> consistency (RequiresStableInputs) is quietly broken (at least on portable
>> flink runner)
>> https://api.github.com/repos/apache/beam/issues/20692: Timer with
>> dataflow runner can be set multiple times (dataflow runner)
>> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
>> should be displayed in Flink UI "Metrics" tab
>> https://api.github.com/repos/apache/beam/issues/20689: Kafka
>> commitOffsetsInFinalize OOM on Flink
>> https://api.github.com/repos/apache/beam/issues/20532: Support for coder
>> argument in WriteToBigQuery
>> https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink:
>> allow setting temp directory provider per dynamic destination
>> https://api.github.com/repos/apache/beam/issues/20530: Make non-portable
>> Splittable DoFn the only option when executing Java "Read" transforms
>> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO tests
>> don't actually assert anything.
>> https://api.github.com/repos/apache/beam/issues/20528: python
>> CombineGlobally().with_fanout() cause duplicate combine results for sliding
>> windows
>> https://api.github.com/repos/apache/beam/issues/20333:
>> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
>> allocated"
>> https://api.github.com/repos/apache/beam/issues/20332: FileIO
>> writeDynamic with AvroIO.sink not writing all data
>> https://api.github.com/repos/apache/beam/issues/20330: Remove insecure
>> ssl options from MongoDBIO
>> https://api.github.com/repos/apache/beam/issues/20109: SortValues should
>> fail if SecondaryKey coder is not deterministic
>> https://api.github.com/repos/apache/beam/issues/20108: Python direct
>> runner doesn't emit empty pane when it should
>> https://api.github.com/repos/apache/beam/issues/20009:
>> Environment-sensitive provisioning for Dataflow
>> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some Hive
>> tests throw NullPointerException, but get marked as passing (Direct Runner)
>> https://api.github.com/repos/apache/beam/issues/19817: datetime and
>> decimal should be logical types
>> https://api.github.com/repos/apache/beam/issues/19815: Add support for
>> remaining data types in python RowCoder
>> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO returns
>> empty message bodies for all messages read
>> https://api.github.com/repos/apache/beam/issues/19556: User reports
>> protobuf ClassChangeError running against 2.6.0 or above
>> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO doesn't
>> commit offsets while being used as bounded source
>> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
>> Precommit permared
>>
>

Re: P1 issues report (70)

Posted by Manu Zhang <ow...@gmail.com>.

Hi all,

what is this daily summary intended for? Not all issues look like P1. And
will a weekly summary be less noise?

<be...@gmail.com>于2022年6月22日 周三23:45写道：

> This is your daily summary of Beam's current P1 issues, not including
> flaky tests.
>
>     See https://beam.apache.org/contribute/issue-priorities/#p1-critical
> for the meaning and expectations around P1 issues.
>
>
>
> https://api.github.com/repos/apache/beam/issues/21978: [Playground]
> Implement Share Any Code feature on the frontend
> https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to
> read or write to file when running Beam in Flink
> https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject
> illformed GBK Coders
> https://api.github.com/repos/apache/beam/issues/21897: [Feature Request]:
> Flink runner savepoint backward compatibility
> https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery
> Storage Write API implementation does not support table partitioning
> https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner
> creates a new timer whenever the output timestamp is change
> https://api.github.com/repos/apache/beam/issues/21763: [Playground Task]:
> Migrate from Google Analytics to Matomo Cloud
> https://api.github.com/repos/apache/beam/issues/21715: Data missing when
> using CassandraIO.Read
> https://api.github.com/repos/apache/beam/issues/21713: 404s in BigQueryIO
> don't get output to Failed Inserts PCollection
> https://api.github.com/repos/apache/beam/issues/21711: Python Streaming
> job failing to drain with BigQueryIO write errors
> https://api.github.com/repos/apache/beam/issues/21703:
> pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2
> https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT
> failing in beam PostCommit Java V1
> https://api.github.com/repos/apache/beam/issues/21700:
> --dataflowServiceOptions=use_runner_v2 is broken
> https://api.github.com/repos/apache/beam/issues/21695:
> DataflowPipelineResult does not raise exception for unsuccessful states.
> https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage
> API insert with writeResult retry and write to error table
> https://api.github.com/repos/apache/beam/issues/21479: Install Python
> wheel and dependencies to local venv in SDK harness
> https://api.github.com/repos/apache/beam/issues/21478:
> KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
> https://api.github.com/repos/apache/beam/issues/21477: Add integration
> testing for BQ Storage API  write modes
> https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery
> Dynamic table destinations returns wrong tableId
> https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang
> Dataflow tests failing due to _InactiveRpcError
> https://api.github.com/repos/apache/beam/issues/21473:
> PVR_Spark2_Streaming perma-red
> https://api.github.com/repos/apache/beam/issues/21466: Simplify version
> override for Dev versions of the Go SDK.
> https://api.github.com/repos/apache/beam/issues/21465: Kafka commit
> offset drop data on failure for runners that have non-checkpointing shuffle
> https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned
> files
> https://api.github.com/repos/apache/beam/issues/21268: Race between
> member variable being accessed due to leaking uninitialized state via
> OutboundObserverFactory
> https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery
> submits a duplicate BQ load job if a 503 error code is returned from
> googleapi
> https://api.github.com/repos/apache/beam/issues/21265:
> apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
> 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
> https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe
> induced) Bricked Dataflow Pipeline
> https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny,
> AfterAll do not follow spec
> https://api.github.com/repos/apache/beam/issues/21260: Python
> DirectRunner does not emit data at GC time
> https://api.github.com/repos/apache/beam/issues/21259: Consumer group
> with random prefix
> https://api.github.com/repos/apache/beam/issues/21258: Dataflow error in
> CombinePerKey operation
> https://api.github.com/repos/apache/beam/issues/21257: Either Create or
> DirectRunner fails to produce all elements to the following transform
> https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs
> running on Flink session cluster reuse the persistent Python environment.
> https://api.github.com/repos/apache/beam/issues/21119: Migrate to the
> next version of Python `requests` when released
> https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT Tests"
> - missing data in grafana
> https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date
> conversion is sensitive to OS
> https://api.github.com/repos/apache/beam/issues/21112: Dataflow
> SocketException (SSLException) error while trying to send message from
> Cloud Pub/Sub to BigQuery
> https://api.github.com/repos/apache/beam/issues/21111: Java creates an
> incorrect pipeline proto when core-construction-java jar is not in the
> CLASSPATH
> https://api.github.com/repos/apache/beam/issues/21110: codecov/patch has
> poor behavior
> https://api.github.com/repos/apache/beam/issues/21109: SDF BoundedSource
> seems to execute significantly slower than 'normal' BoundedSource
> https://api.github.com/repos/apache/beam/issues/21108:
> java.io.InvalidClassException With Flink Kafka
> https://api.github.com/repos/apache/beam/issues/20979: Portable runners
> should be able to issue checkpoints to Splittable DoFn
> https://api.github.com/repos/apache/beam/issues/20978:
> PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode
> some Avro logical types
> https://api.github.com/repos/apache/beam/issues/20973: Python Beam SDK
> Harness hangs when installing pip packages
> https://api.github.com/repos/apache/beam/issues/20818: XmlIO.Read does
> not handle XML encoding per spec
> https://api.github.com/repos/apache/beam/issues/20814: JmsIO is not
> acknowledging messages correctly
> https://api.github.com/repos/apache/beam/issues/20813: No trigger early
> repeatedly for session windows
> https://api.github.com/repos/apache/beam/issues/20812: Cross-language
> consistency (RequiresStableInputs) is quietly broken (at least on portable
> flink runner)
> https://api.github.com/repos/apache/beam/issues/20692: Timer with
> dataflow runner can be set multiple times (dataflow runner)
> https://api.github.com/repos/apache/beam/issues/20691: Beam metrics
> should be displayed in Flink UI "Metrics" tab
> https://api.github.com/repos/apache/beam/issues/20689: Kafka
> commitOffsetsInFinalize OOM on Flink
> https://api.github.com/repos/apache/beam/issues/20532: Support for coder
> argument in WriteToBigQuery
> https://api.github.com/repos/apache/beam/issues/20531: FileBasedSink:
> allow setting temp directory provider per dynamic destination
> https://api.github.com/repos/apache/beam/issues/20530: Make non-portable
> Splittable DoFn the only option when executing Java "Read" transforms
> https://api.github.com/repos/apache/beam/issues/20529: SpannerIO tests
> don't actually assert anything.
> https://api.github.com/repos/apache/beam/issues/20528: python
> CombineGlobally().with_fanout() cause duplicate combine results for sliding
> windows
> https://api.github.com/repos/apache/beam/issues/20333:
> beam_PerformanceTests_Kafka_IO failing due to " provided port is already
> allocated"
> https://api.github.com/repos/apache/beam/issues/20332: FileIO
> writeDynamic with AvroIO.sink not writing all data
> https://api.github.com/repos/apache/beam/issues/20330: Remove insecure
> ssl options from MongoDBIO
> https://api.github.com/repos/apache/beam/issues/20109: SortValues should
> fail if SecondaryKey coder is not deterministic
> https://api.github.com/repos/apache/beam/issues/20108: Python direct
> runner doesn't emit empty pane when it should
> https://api.github.com/repos/apache/beam/issues/20009:
> Environment-sensitive provisioning for Dataflow
> https://api.github.com/repos/apache/beam/issues/19971: [SQL] Some Hive
> tests throw NullPointerException, but get marked as passing (Direct Runner)
> https://api.github.com/repos/apache/beam/issues/19817: datetime and
> decimal should be logical types
> https://api.github.com/repos/apache/beam/issues/19815: Add support for
> remaining data types in python RowCoder
> https://api.github.com/repos/apache/beam/issues/19813: PubsubIO returns
> empty message bodies for all messages read
> https://api.github.com/repos/apache/beam/issues/19556: User reports
> protobuf ClassChangeError running against 2.6.0 or above
> https://api.github.com/repos/apache/beam/issues/19369: KafkaIO doesn't
> commit offsets while being used as bounded source
> https://api.github.com/repos/apache/beam/issues/17950: [Bug]: Java
> Precommit permared
>