You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chad Dombrova <ch...@gmail.com> on 2019/10/28 17:34:28 UTC

RFC: python static typing PR

Hi all,
I've been working on a PR to add static typing to the beam python sdk for
the past 4 months or so.  This has been an epic journey which has required
chasing down numerous fixes across several other projects (mypy, pylint,
python-future), but the mypy tests are now passing!

I'm not sure how much convincing I need to do with this group on the
benefits of static typing, especially considering the heavy Java influence
on this project, but for those who are curious, there's some good info
here:  https://realpython.com/python-type-checking/#pros-and-cons.  Suffice
it to say, it's a game changer for projects like Beam (large code base,
high level of quality, good testing infrastructure), and it dovetails
perfectly into the efforts surrounding pipeline type analysis and
serialization.  Anecdotally speaking, all of the developers I've worked
with who were originally resistant to static typing in python ("it's
ugly!"  "it's not pythonic!") have changed their tune and fully embraced it
because of the time that it saves them every day.

More details over at the PR:  https://github.com/apache/beam/pull/9056

I look forward to your feedback!

-chad

Re: RFC: python static typing PR

Posted by Ismaël Mejía <ie...@gmail.com>.
The herculean term is perfect to describe this impressive achievement Chad.

Congratulations and thanks for the effort to make this happen.  This will give
Beam users not only improved functionality but as Robert mentioned help
others to understand more quickly the internals of the python SDK.
Maintenance is a
small price to pay for the big wins of typing.

Huge +1



On Wed, Oct 30, 2019 at 9:26 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>
>>
>> Do you believe that a future mypy plugin could replace pipeline type checks in Beam, or are there limits to what it can do?
>
>
> mypy will get us quite far on its own once we completely annotate the beam code.  That said, my PR does not include my efforts to turn PTransforms into Generics, which will be required to properly analyze pipelines, so there's still a lot more work to do.  I've experimented with a mypy plugin to smooth over some of the rough spots in that workflow and I will just say that the mypy API has a very steep learning curve.
>
> Another thing to note: mypy is very explicit about function annotations.  It does not do the "implicit" inference that Beam does, such as automatically detecting function return types.  I think it should be possible to do a lot of that as a mypy plugin, and in fact, since it has little to do with Beam it could grow into its own project with outside contributors.
>
> -chad
>

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
Thanks for the kind words, everyone.

Note that the PR that was merged was the first half, so the mypy-lint job
is not yet setup to trigger failures.

Part 2 is up now:  https://github.com/apache/beam/pull/10367

My goal is to bundle up changes into smaller PRs from here on out.  It
might take another 3 PRs to get through the rest.

-chad


On Wed, Dec 11, 2019 at 2:13 PM Ahmet Altay <al...@google.com> wrote:

> Thank you, Chad! This is awesome.
>
> On Wed, Dec 11, 2019 at 2:05 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> +1. Thanks!
>>
>> On Wed, Dec 11, 2019 at 1:44 PM Pablo Estrada <pa...@google.com> wrote:
>> >
>> > Love it. I've merged it, since it was already approved by Robert - and
>> yes, we don't want to hit a merge conflict.
>> >
>> > On Wed, Dec 11, 2019 at 1:36 PM Heejong Lee <he...@google.com> wrote:
>> >>
>> >> Wow, this is a big step forward. As a long fan of strongly typed
>> functional languages, I'm glad to see this change :)
>> >>
>> >> On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova <ch...@gmail.com>
>> wrote:
>> >>>
>> >>> Hi all,
>> >>> Robert has diligently reviewed the first batch of changes for this
>> PR, and all review notes are addressed and tests are passing:
>> https://github.com/apache/beam/pull/9915
>> >>>
>> >>> Due to the number of file touched there's a short window of about one
>> or two days before a merge conflict arrives on master, and after resolving
>> that it usually takes another 1-2 days of pasting "Run Python PreCommit"
>> until they pass again, so it would be great to get this merged while the
>> window is open!  Despite the number of files touched, the changes are
>> almost entirely type comments, so the PR is designed to be quite safe.
>> >>>
>> >>> -chad
>> >>>
>> >>>
>> >>> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com>
>> wrote:
>> >>>>
>> >>>> Glad to hear we have such a forward-thinking community!
>> >>>>
>> >>>>
>> >>>> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>>>>
>> >>>>> Sounds like we have consensus. Let's move forward. I'll follow up
>> with
>> >>>>> the discussions on the PRs themselves.
>> >>>>>
>> >>>>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>>> >
>> >>>>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com>
>> wrote:
>> >>>>> > >
>> >>>>> > >> Do you believe that a future mypy plugin could replace
>> pipeline type checks in Beam, or are there limits to what it can do?
>> >>>>> > >
>> >>>>> > > mypy will get us quite far on its own once we completely
>> annotate the beam code.  That said, my PR does not include my efforts to
>> turn PTransforms into Generics, which will be required to properly analyze
>> pipelines, so there's still a lot more work to do.  I've experimented with
>> a mypy plugin to smooth over some of the rough spots in that workflow and I
>> will just say that the mypy API has a very steep learning curve.
>> >>>>> > >
>> >>>>> > > Another thing to note: mypy is very explicit about function
>> annotations.  It does not do the "implicit" inference that Beam does, such
>> as automatically detecting function return types.  I think it should be
>> possible to do a lot of that as a mypy plugin, and in fact, since it has
>> little to do with Beam it could grow into its own project with outside
>> contributors.
>> >>>>> >
>> >>>>> > Yeah, I don't think, as is, it can replace what we do, but with
>> >>>>> > plugins I think it could possibly come closer. Certainly there is
>> >>>>> > information that is only available at runtime (e.g. reading from a
>> >>>>> > database or avro/parquet file could provide the schema which can
>> be
>> >>>>> > used for downstream checking) which may limit the ability to do
>> >>>>> > everything statically (even Beam Java is moving this direction).
>> Mypy
>> >>>>> > clearly has an implementation of the "is compatible with" operator
>> >>>>> > that I would love to borrow, but unfortunately it's not (easily?)
>> >>>>> > exposed.
>> >>>>> >
>> >>>>> > That being said, we should leverage what we can for pipeline
>> >>>>> > authoring, and it'll be a great development too regardless.
>>
>

Re: RFC: python static typing PR

Posted by Ahmet Altay <al...@google.com>.
Thank you, Chad! This is awesome.

On Wed, Dec 11, 2019 at 2:05 PM Robert Bradshaw <ro...@google.com> wrote:

> +1. Thanks!
>
> On Wed, Dec 11, 2019 at 1:44 PM Pablo Estrada <pa...@google.com> wrote:
> >
> > Love it. I've merged it, since it was already approved by Robert - and
> yes, we don't want to hit a merge conflict.
> >
> > On Wed, Dec 11, 2019 at 1:36 PM Heejong Lee <he...@google.com> wrote:
> >>
> >> Wow, this is a big step forward. As a long fan of strongly typed
> functional languages, I'm glad to see this change :)
> >>
> >> On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova <ch...@gmail.com>
> wrote:
> >>>
> >>> Hi all,
> >>> Robert has diligently reviewed the first batch of changes for this PR,
> and all review notes are addressed and tests are passing:
> https://github.com/apache/beam/pull/9915
> >>>
> >>> Due to the number of file touched there's a short window of about one
> or two days before a merge conflict arrives on master, and after resolving
> that it usually takes another 1-2 days of pasting "Run Python PreCommit"
> until they pass again, so it would be great to get this merged while the
> window is open!  Despite the number of files touched, the changes are
> almost entirely type comments, so the PR is designed to be quite safe.
> >>>
> >>> -chad
> >>>
> >>>
> >>> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com>
> wrote:
> >>>>
> >>>> Glad to hear we have such a forward-thinking community!
> >>>>
> >>>>
> >>>> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>>
> >>>>> Sounds like we have consensus. Let's move forward. I'll follow up
> with
> >>>>> the discussions on the PRs themselves.
> >>>>>
> >>>>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>>> >
> >>>>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com>
> wrote:
> >>>>> > >
> >>>>> > >> Do you believe that a future mypy plugin could replace pipeline
> type checks in Beam, or are there limits to what it can do?
> >>>>> > >
> >>>>> > > mypy will get us quite far on its own once we completely
> annotate the beam code.  That said, my PR does not include my efforts to
> turn PTransforms into Generics, which will be required to properly analyze
> pipelines, so there's still a lot more work to do.  I've experimented with
> a mypy plugin to smooth over some of the rough spots in that workflow and I
> will just say that the mypy API has a very steep learning curve.
> >>>>> > >
> >>>>> > > Another thing to note: mypy is very explicit about function
> annotations.  It does not do the "implicit" inference that Beam does, such
> as automatically detecting function return types.  I think it should be
> possible to do a lot of that as a mypy plugin, and in fact, since it has
> little to do with Beam it could grow into its own project with outside
> contributors.
> >>>>> >
> >>>>> > Yeah, I don't think, as is, it can replace what we do, but with
> >>>>> > plugins I think it could possibly come closer. Certainly there is
> >>>>> > information that is only available at runtime (e.g. reading from a
> >>>>> > database or avro/parquet file could provide the schema which can be
> >>>>> > used for downstream checking) which may limit the ability to do
> >>>>> > everything statically (even Beam Java is moving this direction).
> Mypy
> >>>>> > clearly has an implementation of the "is compatible with" operator
> >>>>> > that I would love to borrow, but unfortunately it's not (easily?)
> >>>>> > exposed.
> >>>>> >
> >>>>> > That being said, we should leverage what we can for pipeline
> >>>>> > authoring, and it'll be a great development too regardless.
>

Re: RFC: python static typing PR

Posted by Robert Bradshaw <ro...@google.com>.
+1. Thanks!

On Wed, Dec 11, 2019 at 1:44 PM Pablo Estrada <pa...@google.com> wrote:
>
> Love it. I've merged it, since it was already approved by Robert - and yes, we don't want to hit a merge conflict.
>
> On Wed, Dec 11, 2019 at 1:36 PM Heejong Lee <he...@google.com> wrote:
>>
>> Wow, this is a big step forward. As a long fan of strongly typed functional languages, I'm glad to see this change :)
>>
>> On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova <ch...@gmail.com> wrote:
>>>
>>> Hi all,
>>> Robert has diligently reviewed the first batch of changes for this PR, and all review notes are addressed and tests are passing:  https://github.com/apache/beam/pull/9915
>>>
>>> Due to the number of file touched there's a short window of about one or two days before a merge conflict arrives on master, and after resolving that it usually takes another 1-2 days of pasting "Run Python PreCommit" until they pass again, so it would be great to get this merged while the window is open!  Despite the number of files touched, the changes are almost entirely type comments, so the PR is designed to be quite safe.
>>>
>>> -chad
>>>
>>>
>>> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>
>>>> Glad to hear we have such a forward-thinking community!
>>>>
>>>>
>>>> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>
>>>>> Sounds like we have consensus. Let's move forward. I'll follow up with
>>>>> the discussions on the PRs themselves.
>>>>>
>>>>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>> >
>>>>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com> wrote:
>>>>> > >
>>>>> > >> Do you believe that a future mypy plugin could replace pipeline type checks in Beam, or are there limits to what it can do?
>>>>> > >
>>>>> > > mypy will get us quite far on its own once we completely annotate the beam code.  That said, my PR does not include my efforts to turn PTransforms into Generics, which will be required to properly analyze pipelines, so there's still a lot more work to do.  I've experimented with a mypy plugin to smooth over some of the rough spots in that workflow and I will just say that the mypy API has a very steep learning curve.
>>>>> > >
>>>>> > > Another thing to note: mypy is very explicit about function annotations.  It does not do the "implicit" inference that Beam does, such as automatically detecting function return types.  I think it should be possible to do a lot of that as a mypy plugin, and in fact, since it has little to do with Beam it could grow into its own project with outside contributors.
>>>>> >
>>>>> > Yeah, I don't think, as is, it can replace what we do, but with
>>>>> > plugins I think it could possibly come closer. Certainly there is
>>>>> > information that is only available at runtime (e.g. reading from a
>>>>> > database or avro/parquet file could provide the schema which can be
>>>>> > used for downstream checking) which may limit the ability to do
>>>>> > everything statically (even Beam Java is moving this direction). Mypy
>>>>> > clearly has an implementation of the "is compatible with" operator
>>>>> > that I would love to borrow, but unfortunately it's not (easily?)
>>>>> > exposed.
>>>>> >
>>>>> > That being said, we should leverage what we can for pipeline
>>>>> > authoring, and it'll be a great development too regardless.

Re: RFC: python static typing PR

Posted by Pablo Estrada <pa...@google.com>.
Love it. I've merged it, since it was already approved by Robert - and yes,
we don't want to hit a merge conflict.

On Wed, Dec 11, 2019 at 1:36 PM Heejong Lee <he...@google.com> wrote:

> Wow, this is a big step forward. As a long fan of strongly typed
> functional languages, I'm glad to see this change :)
>
> On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Hi all,
>> Robert has diligently reviewed the first batch of changes for this PR,
>> and all review notes are addressed and tests are passing:
>> https://github.com/apache/beam/pull/9915
>>
>> Due to the number of file touched there's a short window of about one or
>> two days before a merge conflict arrives on master, and after resolving
>> that it usually takes another 1-2 days of pasting "Run Python PreCommit"
>> until they pass again, so it would be great to get this merged while the
>> window is open!  Despite the number of files touched, the changes are
>> almost entirely type comments, so the PR is designed to be quite safe.
>>
>> -chad
>>
>>
>> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com> wrote:
>>
>>> Glad to hear we have such a forward-thinking community!
>>>
>>>
>>> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Sounds like we have consensus. Let's move forward. I'll follow up with
>>>> the discussions on the PRs themselves.
>>>>
>>>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>> >
>>>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > >> Do you believe that a future mypy plugin could replace pipeline
>>>> type checks in Beam, or are there limits to what it can do?
>>>> > >
>>>> > > mypy will get us quite far on its own once we completely annotate
>>>> the beam code.  That said, my PR does not include my efforts to turn
>>>> PTransforms into Generics, which will be required to properly analyze
>>>> pipelines, so there's still a lot more work to do.  I've experimented with
>>>> a mypy plugin to smooth over some of the rough spots in that workflow and I
>>>> will just say that the mypy API has a very steep learning curve.
>>>> > >
>>>> > > Another thing to note: mypy is very explicit about function
>>>> annotations.  It does not do the "implicit" inference that Beam does, such
>>>> as automatically detecting function return types.  I think it should be
>>>> possible to do a lot of that as a mypy plugin, and in fact, since it has
>>>> little to do with Beam it could grow into its own project with outside
>>>> contributors.
>>>> >
>>>> > Yeah, I don't think, as is, it can replace what we do, but with
>>>> > plugins I think it could possibly come closer. Certainly there is
>>>> > information that is only available at runtime (e.g. reading from a
>>>> > database or avro/parquet file could provide the schema which can be
>>>> > used for downstream checking) which may limit the ability to do
>>>> > everything statically (even Beam Java is moving this direction). Mypy
>>>> > clearly has an implementation of the "is compatible with" operator
>>>> > that I would love to borrow, but unfortunately it's not (easily?)
>>>> > exposed.
>>>> >
>>>> > That being said, we should leverage what we can for pipeline
>>>> > authoring, and it'll be a great development too regardless.
>>>>
>>>

Re: RFC: python static typing PR

Posted by Heejong Lee <he...@google.com>.
Wow, this is a big step forward. As a long fan of strongly typed functional
languages, I'm glad to see this change :)

On Wed, Dec 11, 2019 at 9:44 AM Chad Dombrova <ch...@gmail.com> wrote:

> Hi all,
> Robert has diligently reviewed the first batch of changes for this PR, and
> all review notes are addressed and tests are passing:
> https://github.com/apache/beam/pull/9915
>
> Due to the number of file touched there's a short window of about one or
> two days before a merge conflict arrives on master, and after resolving
> that it usually takes another 1-2 days of pasting "Run Python PreCommit"
> until they pass again, so it would be great to get this merged while the
> window is open!  Despite the number of files touched, the changes are
> almost entirely type comments, so the PR is designed to be quite safe.
>
> -chad
>
>
> On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Glad to hear we have such a forward-thinking community!
>>
>>
>> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Sounds like we have consensus. Let's move forward. I'll follow up with
>>> the discussions on the PRs themselves.
>>>
>>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>> >
>>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com>
>>> wrote:
>>> > >
>>> > >> Do you believe that a future mypy plugin could replace pipeline
>>> type checks in Beam, or are there limits to what it can do?
>>> > >
>>> > > mypy will get us quite far on its own once we completely annotate
>>> the beam code.  That said, my PR does not include my efforts to turn
>>> PTransforms into Generics, which will be required to properly analyze
>>> pipelines, so there's still a lot more work to do.  I've experimented with
>>> a mypy plugin to smooth over some of the rough spots in that workflow and I
>>> will just say that the mypy API has a very steep learning curve.
>>> > >
>>> > > Another thing to note: mypy is very explicit about function
>>> annotations.  It does not do the "implicit" inference that Beam does, such
>>> as automatically detecting function return types.  I think it should be
>>> possible to do a lot of that as a mypy plugin, and in fact, since it has
>>> little to do with Beam it could grow into its own project with outside
>>> contributors.
>>> >
>>> > Yeah, I don't think, as is, it can replace what we do, but with
>>> > plugins I think it could possibly come closer. Certainly there is
>>> > information that is only available at runtime (e.g. reading from a
>>> > database or avro/parquet file could provide the schema which can be
>>> > used for downstream checking) which may limit the ability to do
>>> > everything statically (even Beam Java is moving this direction). Mypy
>>> > clearly has an implementation of the "is compatible with" operator
>>> > that I would love to borrow, but unfortunately it's not (easily?)
>>> > exposed.
>>> >
>>> > That being said, we should leverage what we can for pipeline
>>> > authoring, and it'll be a great development too regardless.
>>>
>>

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
Hi all,
Robert has diligently reviewed the first batch of changes for this PR, and
all review notes are addressed and tests are passing:
https://github.com/apache/beam/pull/9915

Due to the number of file touched there's a short window of about one or
two days before a merge conflict arrives on master, and after resolving
that it usually takes another 1-2 days of pasting "Run Python PreCommit"
until they pass again, so it would be great to get this merged while the
window is open!  Despite the number of files touched, the changes are
almost entirely type comments, so the PR is designed to be quite safe.

-chad


On Tue, Nov 5, 2019 at 2:50 PM Chad Dombrova <ch...@gmail.com> wrote:

> Glad to hear we have such a forward-thinking community!
>
>
> On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Sounds like we have consensus. Let's move forward. I'll follow up with
>> the discussions on the PRs themselves.
>>
>> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >
>> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com>
>> wrote:
>> > >
>> > >> Do you believe that a future mypy plugin could replace pipeline type
>> checks in Beam, or are there limits to what it can do?
>> > >
>> > > mypy will get us quite far on its own once we completely annotate the
>> beam code.  That said, my PR does not include my efforts to turn
>> PTransforms into Generics, which will be required to properly analyze
>> pipelines, so there's still a lot more work to do.  I've experimented with
>> a mypy plugin to smooth over some of the rough spots in that workflow and I
>> will just say that the mypy API has a very steep learning curve.
>> > >
>> > > Another thing to note: mypy is very explicit about function
>> annotations.  It does not do the "implicit" inference that Beam does, such
>> as automatically detecting function return types.  I think it should be
>> possible to do a lot of that as a mypy plugin, and in fact, since it has
>> little to do with Beam it could grow into its own project with outside
>> contributors.
>> >
>> > Yeah, I don't think, as is, it can replace what we do, but with
>> > plugins I think it could possibly come closer. Certainly there is
>> > information that is only available at runtime (e.g. reading from a
>> > database or avro/parquet file could provide the schema which can be
>> > used for downstream checking) which may limit the ability to do
>> > everything statically (even Beam Java is moving this direction). Mypy
>> > clearly has an implementation of the "is compatible with" operator
>> > that I would love to borrow, but unfortunately it's not (easily?)
>> > exposed.
>> >
>> > That being said, we should leverage what we can for pipeline
>> > authoring, and it'll be a great development too regardless.
>>
>

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
Glad to hear we have such a forward-thinking community!


On Tue, Nov 5, 2019 at 2:43 PM Robert Bradshaw <ro...@google.com> wrote:

> Sounds like we have consensus. Let's move forward. I'll follow up with
> the discussions on the PRs themselves.
>
> On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >
> > On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com> wrote:
> > >
> > >> Do you believe that a future mypy plugin could replace pipeline type
> checks in Beam, or are there limits to what it can do?
> > >
> > > mypy will get us quite far on its own once we completely annotate the
> beam code.  That said, my PR does not include my efforts to turn
> PTransforms into Generics, which will be required to properly analyze
> pipelines, so there's still a lot more work to do.  I've experimented with
> a mypy plugin to smooth over some of the rough spots in that workflow and I
> will just say that the mypy API has a very steep learning curve.
> > >
> > > Another thing to note: mypy is very explicit about function
> annotations.  It does not do the "implicit" inference that Beam does, such
> as automatically detecting function return types.  I think it should be
> possible to do a lot of that as a mypy plugin, and in fact, since it has
> little to do with Beam it could grow into its own project with outside
> contributors.
> >
> > Yeah, I don't think, as is, it can replace what we do, but with
> > plugins I think it could possibly come closer. Certainly there is
> > information that is only available at runtime (e.g. reading from a
> > database or avro/parquet file could provide the schema which can be
> > used for downstream checking) which may limit the ability to do
> > everything statically (even Beam Java is moving this direction). Mypy
> > clearly has an implementation of the "is compatible with" operator
> > that I would love to borrow, but unfortunately it's not (easily?)
> > exposed.
> >
> > That being said, we should leverage what we can for pipeline
> > authoring, and it'll be a great development too regardless.
>

Re: RFC: python static typing PR

Posted by Robert Bradshaw <ro...@google.com>.
Sounds like we have consensus. Let's move forward. I'll follow up with
the discussions on the PRs themselves.

On Wed, Oct 30, 2019 at 2:38 PM Robert Bradshaw <ro...@google.com> wrote:
>
> On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com> wrote:
> >
> >> Do you believe that a future mypy plugin could replace pipeline type checks in Beam, or are there limits to what it can do?
> >
> > mypy will get us quite far on its own once we completely annotate the beam code.  That said, my PR does not include my efforts to turn PTransforms into Generics, which will be required to properly analyze pipelines, so there's still a lot more work to do.  I've experimented with a mypy plugin to smooth over some of the rough spots in that workflow and I will just say that the mypy API has a very steep learning curve.
> >
> > Another thing to note: mypy is very explicit about function annotations.  It does not do the "implicit" inference that Beam does, such as automatically detecting function return types.  I think it should be possible to do a lot of that as a mypy plugin, and in fact, since it has little to do with Beam it could grow into its own project with outside contributors.
>
> Yeah, I don't think, as is, it can replace what we do, but with
> plugins I think it could possibly come closer. Certainly there is
> information that is only available at runtime (e.g. reading from a
> database or avro/parquet file could provide the schema which can be
> used for downstream checking) which may limit the ability to do
> everything statically (even Beam Java is moving this direction). Mypy
> clearly has an implementation of the "is compatible with" operator
> that I would love to borrow, but unfortunately it's not (easily?)
> exposed.
>
> That being said, we should leverage what we can for pipeline
> authoring, and it'll be a great development too regardless.

Re: RFC: python static typing PR

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 30, 2019 at 1:26 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>> Do you believe that a future mypy plugin could replace pipeline type checks in Beam, or are there limits to what it can do?
>
> mypy will get us quite far on its own once we completely annotate the beam code.  That said, my PR does not include my efforts to turn PTransforms into Generics, which will be required to properly analyze pipelines, so there's still a lot more work to do.  I've experimented with a mypy plugin to smooth over some of the rough spots in that workflow and I will just say that the mypy API has a very steep learning curve.
>
> Another thing to note: mypy is very explicit about function annotations.  It does not do the "implicit" inference that Beam does, such as automatically detecting function return types.  I think it should be possible to do a lot of that as a mypy plugin, and in fact, since it has little to do with Beam it could grow into its own project with outside contributors.

Yeah, I don't think, as is, it can replace what we do, but with
plugins I think it could possibly come closer. Certainly there is
information that is only available at runtime (e.g. reading from a
database or avro/parquet file could provide the schema which can be
used for downstream checking) which may limit the ability to do
everything statically (even Beam Java is moving this direction). Mypy
clearly has an implementation of the "is compatible with" operator
that I would love to borrow, but unfortunately it's not (easily?)
exposed.

That being said, we should leverage what we can for pipeline
authoring, and it'll be a great development too regardless.

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
> Do you believe that a future mypy plugin could replace pipeline type
> checks in Beam, or are there limits to what it can do?
>

mypy will get us quite far on its own once we completely annotate the beam
code.  That said, my PR does not include my efforts to turn PTransforms
into Generics, which will be required to properly analyze pipelines, so
there's still a lot more work to do.  I've experimented with a mypy plugin
to smooth over some of the rough spots in that workflow and I will just say
that the mypy API has a very steep learning curve.

Another thing to note: mypy is very explicit about function annotations.
It does not do the "implicit" inference that Beam does, such as
automatically detecting function return types.  I *think* it should be
possible to do a lot of that as a mypy plugin, and in fact, since it has
little to do with Beam it could grow into its own project with outside
contributors.

-chad

Re: RFC: python static typing PR

Posted by Udi Meiri <eh...@google.com>.
Thank you for this monumental work Chad. I've added my review comments,
though I haven't had time to look at all the files.
Do you believe that a future mypy plugin could replace pipeline type checks
in Beam, or are there limits to what it can do?

On Wed, Oct 30, 2019 at 10:25 AM Chad Dombrova <ch...@gmail.com> wrote:

> As Beam devs will be gaining more first-hand experience with the tooling,
>> we may need to add a style guide/best practices/FAQ to our contributor
>> guide to clarify known issues.
>>
>
> I'm happy to help out with that, just let me know.
>
> -chad
>
>
>
>

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
>
> As Beam devs will be gaining more first-hand experience with the tooling,
> we may need to add a style guide/best practices/FAQ to our contributor
> guide to clarify known issues.
>

I'm happy to help out with that, just let me know.

-chad

Re: RFC: python static typing PR

Posted by Luke Cwik <lc...@google.com>.
+1 for type annotations.

On Mon, Oct 28, 2019 at 7:41 PM Robert Burke <ro...@frantil.com> wrote:

> As someone who cribs from the Python SDK to make changes in the Go SDK,
> this will make things much easier to follow! Thank you.
>
> On Mon, Oct 28, 2019, 6:52 PM Chad Dombrova <ch...@gmail.com> wrote:
>
>>
>> Wow, that is an incredible amount of work!
>>>
>>
>> Some people meditate.  I annotate ;)
>>
>> I'm definitely of the opinion that there's no viable counterargument to
>>> the value of types, especially for large or complex codebases.
>>>
>>
>> Agreed.  That's part of why I waited until I got the whole thing passing
>> before really promoting the review of this PR.
>>
>> Robert and I have worked out a rough plan for merging:
>>
>>    - I'll make a new PR with the foundations of the existing PR -- these
>>    are almost entirely type comments so no appreciable runtime changes --
>>    along with the mypy lint, which will be part of python precommit but not
>>    affect its failure status.   We'll get that merged first.
>>    - From there it should be easy for others to review and merge the
>>    remaining commits in parallel.
>>    - Once everything is in, we'll make a final commit to stop ignoring
>>    failures for the mypy job.
>>
>>
>> For those concerned about yet another lint (a very reasonable concern!)
>> the py37-mypy tox job completes in 37s on my macbook pro (excluding
>> virtualenv setup time).
>>
>> -chad
>>
>>
>>
>>
>

Re: RFC: python static typing PR

Posted by Robert Burke <ro...@frantil.com>.
As someone who cribs from the Python SDK to make changes in the Go SDK,
this will make things much easier to follow! Thank you.

On Mon, Oct 28, 2019, 6:52 PM Chad Dombrova <ch...@gmail.com> wrote:

>
> Wow, that is an incredible amount of work!
>>
>
> Some people meditate.  I annotate ;)
>
> I'm definitely of the opinion that there's no viable counterargument to
>> the value of types, especially for large or complex codebases.
>>
>
> Agreed.  That's part of why I waited until I got the whole thing passing
> before really promoting the review of this PR.
>
> Robert and I have worked out a rough plan for merging:
>
>    - I'll make a new PR with the foundations of the existing PR -- these
>    are almost entirely type comments so no appreciable runtime changes --
>    along with the mypy lint, which will be part of python precommit but not
>    affect its failure status.   We'll get that merged first.
>    - From there it should be easy for others to review and merge the
>    remaining commits in parallel.
>    - Once everything is in, we'll make a final commit to stop ignoring
>    failures for the mypy job.
>
>
> For those concerned about yet another lint (a very reasonable concern!)
> the py37-mypy tox job completes in 37s on my macbook pro (excluding
> virtualenv setup time).
>
> -chad
>
>
>
>

Re: RFC: python static typing PR

Posted by Chad Dombrova <ch...@gmail.com>.
> Wow, that is an incredible amount of work!
>

Some people meditate.  I annotate ;)

I'm definitely of the opinion that there's no viable counterargument to the
> value of types, especially for large or complex codebases.
>

Agreed.  That's part of why I waited until I got the whole thing passing
before really promoting the review of this PR.

Robert and I have worked out a rough plan for merging:

   - I'll make a new PR with the foundations of the existing PR -- these
   are almost entirely type comments so no appreciable runtime changes --
   along with the mypy lint, which will be part of python precommit but not
   affect its failure status.   We'll get that merged first.
   - From there it should be easy for others to review and merge the
   remaining commits in parallel.
   - Once everything is in, we'll make a final commit to stop ignoring
   failures for the mypy job.


For those concerned about yet another lint (a very reasonable concern!) the
py37-mypy tox job completes in 37s on my macbook pro (excluding virtualenv
setup time).

-chad

Re: RFC: python static typing PR

Posted by Ahmet Altay <al...@google.com>.
Thank you Chad, everyone else who helped with reviews so far. I think this
is a positive change.

I do share the worry of one more lint. I hope that we can hear from
majority of beam python contributors. Good tooling and educational
information would be helpful here. Ideally this will serve current
developers and future first time contributors equally well.

A question about this particular approach is the choice of type comments.
Assuming this was necessary for python 2 and that will eventually phase
out, what would be the next steps for converting this into annotations?

Ahmet

On Mon, Oct 28, 2019 at 6:28 PM Kenneth Knowles <ke...@apache.org> wrote:

> Wow, that is an incredible amount of work!
>
> I'm definitely of the opinion that there's no viable counterargument to
> the value of types, especially for large or complex codebases.
>
> This kind of check must be in precommit or it will become perma-red very
> quickly.
>
> Kenn
>
> On Mon, Oct 28, 2019 at 4:21 PM Valentyn Tymofieiev <va...@google.com>
> wrote:
>
>> Thanks a lot, Chad. Looking at the PR, I am incredibly happy to see
>> explicit type annotations throughout Beam codebase. I believe this is a
>> step in the right direction even if the tooling were not able to do any
>> inference at all. The effort required from developers to add annotations in
>> their code changes will be well compensated by reducing cognitive load
>> required to read the codebase, and will lower the entry barrier for new
>> contributions. Lack of docstrings in Beam internals in Beam codebase,
>> especially the lack of typing information, has repeatedly come up as a
>> source of frustration among several folks with whom I work.
>>
>> As Beam devs will be gaining more first-hand experience with the tooling,
>> we may need to add a style guide/best practices/FAQ to our contributor
>> guide to clarify known issues.
>>
>> Happy to help with reviewing individual commits once we have a merge plan
>> in place.
>>
>> On Mon, Oct 28, 2019 at 2:41 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Thanks, Chad, this has been a herculean task. I'm excited for the
>>> additional tooling and documentation explicit types can bring to our
>>> code, even if tooling such as mypy isn't able to do as much inference
>>> for obvious cases as I would like.
>>>
>>> This will, of course, put another burden on developers in contributing
>>> code, similar to lint and writing unit tests. IMHO this will be worth
>>> the cost in terms of encouraging better code, but I think it's
>>> important to establish consensus if this is the direction we're
>>> moving. So if you have any thoughts or questions, positive or
>>> negative, please comment.
>>>
>>>
>>> On Mon, Oct 28, 2019 at 10:34 AM Chad Dombrova <ch...@gmail.com>
>>> wrote:
>>> >
>>> > Hi all,
>>> > I've been working on a PR to add static typing to the beam python sdk
>>> for the past 4 months or so.  This has been an epic journey which has
>>> required chasing down numerous fixes across several other projects (mypy,
>>> pylint, python-future), but the mypy tests are now passing!
>>> >
>>> > I'm not sure how much convincing I need to do with this group on the
>>> benefits of static typing, especially considering the heavy Java influence
>>> on this project, but for those who are curious, there's some good info
>>> here:  https://realpython.com/python-type-checking/#pros-and-cons.
>>> Suffice it to say, it's a game changer for projects like Beam (large code
>>> base, high level of quality, good testing infrastructure), and it dovetails
>>> perfectly into the efforts surrounding pipeline type analysis and
>>> serialization.  Anecdotally speaking, all of the developers I've worked
>>> with who were originally resistant to static typing in python ("it's
>>> ugly!"  "it's not pythonic!") have changed their tune and fully embraced it
>>> because of the time that it saves them every day.
>>> >
>>> > More details over at the PR:  https://github.com/apache/beam/pull/9056
>>> >
>>> > I look forward to your feedback!
>>> >
>>> > -chad
>>> >
>>>
>>

Re: RFC: python static typing PR

Posted by Kenneth Knowles <ke...@apache.org>.
Wow, that is an incredible amount of work!

I'm definitely of the opinion that there's no viable counterargument to the
value of types, especially for large or complex codebases.

This kind of check must be in precommit or it will become perma-red very
quickly.

Kenn

On Mon, Oct 28, 2019 at 4:21 PM Valentyn Tymofieiev <va...@google.com>
wrote:

> Thanks a lot, Chad. Looking at the PR, I am incredibly happy to see
> explicit type annotations throughout Beam codebase. I believe this is a
> step in the right direction even if the tooling were not able to do any
> inference at all. The effort required from developers to add annotations in
> their code changes will be well compensated by reducing cognitive load
> required to read the codebase, and will lower the entry barrier for new
> contributions. Lack of docstrings in Beam internals in Beam codebase,
> especially the lack of typing information, has repeatedly come up as a
> source of frustration among several folks with whom I work.
>
> As Beam devs will be gaining more first-hand experience with the tooling,
> we may need to add a style guide/best practices/FAQ to our contributor
> guide to clarify known issues.
>
> Happy to help with reviewing individual commits once we have a merge plan
> in place.
>
> On Mon, Oct 28, 2019 at 2:41 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Thanks, Chad, this has been a herculean task. I'm excited for the
>> additional tooling and documentation explicit types can bring to our
>> code, even if tooling such as mypy isn't able to do as much inference
>> for obvious cases as I would like.
>>
>> This will, of course, put another burden on developers in contributing
>> code, similar to lint and writing unit tests. IMHO this will be worth
>> the cost in terms of encouraging better code, but I think it's
>> important to establish consensus if this is the direction we're
>> moving. So if you have any thoughts or questions, positive or
>> negative, please comment.
>>
>>
>> On Mon, Oct 28, 2019 at 10:34 AM Chad Dombrova <ch...@gmail.com> wrote:
>> >
>> > Hi all,
>> > I've been working on a PR to add static typing to the beam python sdk
>> for the past 4 months or so.  This has been an epic journey which has
>> required chasing down numerous fixes across several other projects (mypy,
>> pylint, python-future), but the mypy tests are now passing!
>> >
>> > I'm not sure how much convincing I need to do with this group on the
>> benefits of static typing, especially considering the heavy Java influence
>> on this project, but for those who are curious, there's some good info
>> here:  https://realpython.com/python-type-checking/#pros-and-cons.
>> Suffice it to say, it's a game changer for projects like Beam (large code
>> base, high level of quality, good testing infrastructure), and it dovetails
>> perfectly into the efforts surrounding pipeline type analysis and
>> serialization.  Anecdotally speaking, all of the developers I've worked
>> with who were originally resistant to static typing in python ("it's
>> ugly!"  "it's not pythonic!") have changed their tune and fully embraced it
>> because of the time that it saves them every day.
>> >
>> > More details over at the PR:  https://github.com/apache/beam/pull/9056
>> >
>> > I look forward to your feedback!
>> >
>> > -chad
>> >
>>
>

Re: RFC: python static typing PR

Posted by Valentyn Tymofieiev <va...@google.com>.
Thanks a lot, Chad. Looking at the PR, I am incredibly happy to see
explicit type annotations throughout Beam codebase. I believe this is a
step in the right direction even if the tooling were not able to do any
inference at all. The effort required from developers to add annotations in
their code changes will be well compensated by reducing cognitive load
required to read the codebase, and will lower the entry barrier for new
contributions. Lack of docstrings in Beam internals in Beam codebase,
especially the lack of typing information, has repeatedly come up as a
source of frustration among several folks with whom I work.

As Beam devs will be gaining more first-hand experience with the tooling,
we may need to add a style guide/best practices/FAQ to our contributor
guide to clarify known issues.

Happy to help with reviewing individual commits once we have a merge plan
in place.

On Mon, Oct 28, 2019 at 2:41 PM Robert Bradshaw <ro...@google.com> wrote:

> Thanks, Chad, this has been a herculean task. I'm excited for the
> additional tooling and documentation explicit types can bring to our
> code, even if tooling such as mypy isn't able to do as much inference
> for obvious cases as I would like.
>
> This will, of course, put another burden on developers in contributing
> code, similar to lint and writing unit tests. IMHO this will be worth
> the cost in terms of encouraging better code, but I think it's
> important to establish consensus if this is the direction we're
> moving. So if you have any thoughts or questions, positive or
> negative, please comment.
>
>
> On Mon, Oct 28, 2019 at 10:34 AM Chad Dombrova <ch...@gmail.com> wrote:
> >
> > Hi all,
> > I've been working on a PR to add static typing to the beam python sdk
> for the past 4 months or so.  This has been an epic journey which has
> required chasing down numerous fixes across several other projects (mypy,
> pylint, python-future), but the mypy tests are now passing!
> >
> > I'm not sure how much convincing I need to do with this group on the
> benefits of static typing, especially considering the heavy Java influence
> on this project, but for those who are curious, there's some good info
> here:  https://realpython.com/python-type-checking/#pros-and-cons.
> Suffice it to say, it's a game changer for projects like Beam (large code
> base, high level of quality, good testing infrastructure), and it dovetails
> perfectly into the efforts surrounding pipeline type analysis and
> serialization.  Anecdotally speaking, all of the developers I've worked
> with who were originally resistant to static typing in python ("it's
> ugly!"  "it's not pythonic!") have changed their tune and fully embraced it
> because of the time that it saves them every day.
> >
> > More details over at the PR:  https://github.com/apache/beam/pull/9056
> >
> > I look forward to your feedback!
> >
> > -chad
> >
>

Re: RFC: python static typing PR

Posted by Robert Bradshaw <ro...@google.com>.
Thanks, Chad, this has been a herculean task. I'm excited for the
additional tooling and documentation explicit types can bring to our
code, even if tooling such as mypy isn't able to do as much inference
for obvious cases as I would like.

This will, of course, put another burden on developers in contributing
code, similar to lint and writing unit tests. IMHO this will be worth
the cost in terms of encouraging better code, but I think it's
important to establish consensus if this is the direction we're
moving. So if you have any thoughts or questions, positive or
negative, please comment.


On Mon, Oct 28, 2019 at 10:34 AM Chad Dombrova <ch...@gmail.com> wrote:
>
> Hi all,
> I've been working on a PR to add static typing to the beam python sdk for the past 4 months or so.  This has been an epic journey which has required chasing down numerous fixes across several other projects (mypy, pylint, python-future), but the mypy tests are now passing!
>
> I'm not sure how much convincing I need to do with this group on the benefits of static typing, especially considering the heavy Java influence on this project, but for those who are curious, there's some good info here:  https://realpython.com/python-type-checking/#pros-and-cons.  Suffice it to say, it's a game changer for projects like Beam (large code base, high level of quality, good testing infrastructure), and it dovetails perfectly into the efforts surrounding pipeline type analysis and serialization.  Anecdotally speaking, all of the developers I've worked with who were originally resistant to static typing in python ("it's ugly!"  "it's not pythonic!") have changed their tune and fully embraced it because of the time that it saves them every day.
>
> More details over at the PR:  https://github.com/apache/beam/pull/9056
>
> I look forward to your feedback!
>
> -chad
>