You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Robbe Sneyders <ro...@ml6.eu> on 2018/03/23 16:27:43 UTC

[PROPOSAL] Python 3 support

Hello everyone,

In the next month(s), me and my colleague Matthias will commit a lot of
time and effort to python 3 support for beam and we would like to discuss
the best way to go forward with this.

We have drawn up a document [1] with a high level outline of the proposed
approach and would like to get your feedback on this.

The main Jira issue [2] for python 3 support has been mostly inactive for
the past year. Other smaller issues have been opened, but it's hard to
track the general progress. It would be great if anyone could offer some
insights on how to best handle this project on Jira.

@Holden Karau, you seem to have already put in a lot of effort to add
python 3 support, so it would be great to get your insights and find a way
to merge our efforts.

Kind regards,
Robbe

[1]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing

[2] https://issues.apache.org/jira/browse/BEAM-1251
-- 

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Ahmet Altay <al...@google.com>.
Robbe, I added you as a contributor to our JIRA. You should be able to
assign issues to yourself. Board will auto update itself based on the
issues. Give it a try.

On Wed, Apr 18, 2018 at 1:15 AM, Robbe Sneyders <ro...@ml6.eu>
wrote:

> Thanks!
>
> Can someone give me permission to assign issues to myself?
> And edit rights to the Kanban board?
>
> Robbe
>
> On Tue, 17 Apr 2018 at 22:56 Ahmet Altay <al...@google.com> wrote:
>
>> Kanban board for python 3: https://issues.apache.org/
>> jira/secure/RapidBoard.jspa?rapidView=245
>>
>> (Thank you Davor!)
>>
>> Ahmet
>>
>> On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax <re...@google.com> wrote:
>>
>>> I had a similar problem.
>>>
>>> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> I tried to create a shared kanban board but I failed. I think I am
>>>> lacking some permission to create a shared filter. Could someone help with
>>>> creating this?
>>>>
>>>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>>>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>>>
>>>> Ahmet
>>>>
>>>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <ro...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I don't seem to have the permissions to create a Kanban board or even
>>>>> assign tasks to myself. Who could help me with this?
>>>>>
>>>>> I've updated the coders package pull request [1] and added the applied
>>>>> strategy to the proposal document [2].
>>>>> It would be great to get some feedback on this, so we can start moving
>>>>> forward with other subpackages.
>>>>>
>>>>> Kind regards,
>>>>> Robbe
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/4990
>>>>> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>
>>>>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu>
>>>>> wrote:
>>>>>
>>>>>> Hello Robert,
>>>>>>
>>>>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful
>>>>>> for this. I'll look into setting one up tomorrow.
>>>>>>
>>>>>> In the meantime, you can find the first pull request with the updated
>>>>>> coders package here:
>>>>>> https://github.com/apache/beam/pull/4990
>>>>>>
>>>>>> Kind regards,
>>>>>> Robbe
>>>>>>
>>>>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <
>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>
>>>>>>>> Thanks Ahmet and Robert,
>>>>>>>>
>>>>>>>> I think we can work on different subpackages in parallel, but it's
>>>>>>>> important to apply the same strategy everywhere. I'm currently working on
>>>>>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>>>>>> coders subpackage to create a first pull request. We can then discuss the
>>>>>>>> applied strategy in detail before merging and applying it to the other
>>>>>>>> subpackages.
>>>>>>>>
>>>>>>>
>>>>>>> Sounds good. Again, could you document (in a more permanent/easy to
>>>>>>> look up state than email) when packages are started/done?
>>>>>>>
>>>>>>>
>>>>>>>> This strategy also includes the choice of automated tools. I'm
>>>>>>>> focusing on writing python 3 code with python 2 compatibility, which means
>>>>>>>> depending on the future package instead of the six package (which is
>>>>>>>> already used in some places in the current code base). I have already
>>>>>>>> noticed that this indeed requires a lot of manual work after running the
>>>>>>>> automated script.
>>>>>>>> The future package supports python 3.3+ compatibility, so I don't
>>>>>>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>>>>>>
>>>>>>>
>>>>>>> Sure. It may incur a higher maintenance burden long-term though.
>>>>>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>>>>>> some time to come.)
>>>>>>>
>>>>>>>
>>>>>>>> I have already added a tox environment to run pylint2 with the
>>>>>>>> --py3k argument per updated subpackage, which should help avoid regression
>>>>>>>> between step 2 and step 3 of the proposal. This update will be pushed with
>>>>>>>> the first pull request.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Robbe
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thank you, Robbie, for your offer to help with contribution here.
>>>>>>>>> I read over your doc and the one thing I'd like to add is that this work is
>>>>>>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>>>>>>> some way to coordinate so as to not overlap work (or just waste time
>>>>>>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>>>>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>>>>>>> various automated/manual conversions along the other would be helpful?
>>>>>>>>>
>>>>>>>>> A note on automated tools, they're sometimes overly conservative,
>>>>>>>>> so we should be sure to review the changes manually. (A typical example of
>>>>>>>>> this is unnecessarily importing six.moves.xrange when there was no big
>>>>>>>>> reason to use xrange over range in Python 2, or conversely using
>>>>>>>>> list(range(...) in Python 3.)
>>>>>>>>>
>>>>>>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent
>>>>>>>>> regressions. If there's a cost to supporting 3.4 as opposed to requiring
>>>>>>>>> 3.5+ we should identify it and decide that before widespread announcement.
>>>>>>>>>
>>>>>>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <
>>>>>>>>>> holden@pigscanfly.ca> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Anand,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> It should be no problem to run everything on DataflowRunner as
>>>>>>>>>>>> well.
>>>>>>>>>>>> Are there any performance tests in place to check for
>>>>>>>>>>>> performance regressions?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Yes there is a suite (https://github.com/apache/
>>>>>>>>>> beam/blob/master/.test-infra/jenkins/job_beam_
>>>>>>>>>> PerformanceTests_Python.groovy). It may not be very
>>>>>>>>>> comprehensive and seems to be failing for a while. I would not block python
>>>>>>>>>> 3 work on performance for now. That is the unfortuante state of things.
>>>>>>>>>>
>>>>>>>>>> If anybody in the community is interested, this would be a great
>>>>>>>>>> opportunity to help with benchmarks in general.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Some questions were raised in the proposal document which I
>>>>>>>>>>>> want to add to this conversation:
>>>>>>>>>>>>
>>>>>>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>>>>>>> sources on this though).
>>>>>>>>>>>> If the beam community prefers 3.4, I would propose to target
>>>>>>>>>>>> 3.4 only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>>>>>>> It would be great to get some opinions on this.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>>>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>>>>>>> ~20% usage and python 3.4+ has 99% (https://semaphoreci.com/blog/
>>>>>>>>>> 2017/10/18/python-versions-used-in-commercial-projects-
>>>>>>>>>> in-2017.html). Based on that, I think it makes sense to support
>>>>>>>>>> it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>>>>>>> porting progress.
>>>>>>>>>>>> After applying step 1 and step 2, no python 3 compatibility
>>>>>>>>>>>> lint warnings should remain, so it would be great if we could enforce this
>>>>>>>>>>>> check for every pull request on an already updated subpackage.
>>>>>>>>>>>> After applying step 3, all tests should run on python 3, so
>>>>>>>>>>>> again it would be great if we can enforce these per updated subpackage.
>>>>>>>>>>>> Any insights on how to best accomplish this?
>>>>>>>>>>>>
>>>>>>>>>>> So you can look at some of the recent changes to tox.ini in the
>>>>>>>>>>> git log to see what we’ve done so far around this I suspect you can repeat
>>>>>>>>>>> that same pattern.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh
>>>>>>>>>> would help a lot to prevent regressions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Robbe
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you Robbe.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I reviewed the document it looks reasonable to me. I will
>>>>>>>>>>>>> touch on some points that were not mentioned:
>>>>>>>>>>>>> - Runner exercise different code paths. Doing auto conversions
>>>>>>>>>>>>> and focusing on DirectRunner is not enough. It is worthwhile to run things
>>>>>>>>>>>>> on DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For project tracking on JIRA, please feel free to create any
>>>>>>>>>>>>> new issues, close stale ones, or take ownership of any open issues. All
>>>>>>>>>>>>> JIRAs should be assigned to the people actively working on them. If you wan
>>>>>>>>>>>>> to track it in a separate way, you can also propose that. (For example a
>>>>>>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>>>>>>> JIRA.)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will also call out to a few other people in addition to
>>>>>>>>>>>>> Holden who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You
>>>>>>>>>>>>> can include these people (and myself) for reviews and other questions that
>>>>>>>>>>>>> you have.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the next month(s), me and my colleague Matthias will
>>>>>>>>>>>>>> commit a lot of time and effort to python 3 support for beam and we would
>>>>>>>>>>>>>> like to discuss the best way to go forward with this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We have drawn up a document [1] with a high level outline of
>>>>>>>>>>>>>> the proposed approach and would like to get your feedback on this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> @Holden Karau, you seem to have already put in a lot of
>>>>>>>>>>>>>> effort to add python 3 support, so it would be great to get your insights
>>>>>>>>>>>>>> and find a way to merge our efforts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>> Robbe
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>>>>>>>>>>>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ML6 Gent
>>>>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>>
>>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>>
>>>>>>>>>>>> ML6 Gent
>>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>>
>>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>
>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>
>>>>>>>> * Robbe Sneyders*
>>>>>>>>
>>>>>>>> ML6 Gent
>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>
>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>>
>>>>
>> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08
>

Re: [PROPOSAL] Python 3 support

Posted by Robbe Sneyders <ro...@ml6.eu>.
Thanks!

Can someone give me permission to assign issues to myself?
And edit rights to the Kanban board?

Robbe

On Tue, 17 Apr 2018 at 22:56 Ahmet Altay <al...@google.com> wrote:

> Kanban board for python 3:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245
>
> (Thank you Davor!)
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax <re...@google.com> wrote:
>
>> I had a similar problem.
>>
>> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay <al...@google.com> wrote:
>>
>>> I tried to create a shared kanban board but I failed. I think I am
>>> lacking some permission to create a shared filter. Could someone help with
>>> creating this?
>>>
>>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>>
>>> Ahmet
>>>
>>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <ro...@ml6.eu>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I don't seem to have the permissions to create a Kanban board or even
>>>> assign tasks to myself. Who could help me with this?
>>>>
>>>> I've updated the coders package pull request [1] and added the applied
>>>> strategy to the proposal document [2].
>>>> It would be great to get some feedback on this, so we can start moving
>>>> forward with other subpackages.
>>>>
>>>> Kind regards,
>>>> Robbe
>>>>
>>>> [1] https://github.com/apache/beam/pull/4990
>>>> [2]
>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>
>>>>
>>>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Hello Robert,
>>>>>
>>>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>>>> this. I'll look into setting one up tomorrow.
>>>>>
>>>>> In the meantime, you can find the first pull request with the updated
>>>>> coders package here:
>>>>> https://github.com/apache/beam/pull/4990
>>>>>
>>>>> Kind regards,
>>>>> Robbe
>>>>>
>>>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ahmet and Robert,
>>>>>>>
>>>>>>> I think we can work on different subpackages in parallel, but it's
>>>>>>> important to apply the same strategy everywhere. I'm currently working on
>>>>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>>>>> coders subpackage to create a first pull request. We can then discuss the
>>>>>>> applied strategy in detail before merging and applying it to the other
>>>>>>> subpackages.
>>>>>>>
>>>>>>
>>>>>> Sounds good. Again, could you document (in a more permanent/easy to
>>>>>> look up state than email) when packages are started/done?
>>>>>>
>>>>>>
>>>>>>> This strategy also includes the choice of automated tools. I'm
>>>>>>> focusing on writing python 3 code with python 2 compatibility, which means
>>>>>>> depending on the future package instead of the six package (which is
>>>>>>> already used in some places in the current code base). I have already
>>>>>>> noticed that this indeed requires a lot of manual work after running the
>>>>>>> automated script.
>>>>>>> The future package supports python 3.3+ compatibility, so I don't
>>>>>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>>>>>
>>>>>>
>>>>>> Sure. It may incur a higher maintenance burden long-term though.
>>>>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>>>>> some time to come.)
>>>>>>
>>>>>>
>>>>>>> I have already added a tox environment to run pylint2 with the
>>>>>>> --py3k argument per updated subpackage, which should help avoid regression
>>>>>>> between step 2 and step 3 of the proposal. This update will be pushed with
>>>>>>> the first pull request.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Robbe
>>>>>>>
>>>>>>>
>>>>>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>>>>>> read over your doc and the one thing I'd like to add is that this work is
>>>>>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>>>>>> some way to coordinate so as to not overlap work (or just waste time
>>>>>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>>>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>>>>>> various automated/manual conversions along the other would be helpful?
>>>>>>>>
>>>>>>>> A note on automated tools, they're sometimes overly conservative,
>>>>>>>> so we should be sure to review the changes manually. (A typical example of
>>>>>>>> this is unnecessarily importing six.moves.xrange when there was no big
>>>>>>>> reason to use xrange over range in Python 2, or conversely using
>>>>>>>> list(range(...) in Python 3.)
>>>>>>>>
>>>>>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent
>>>>>>>> regressions. If there's a cost to supporting 3.4 as opposed to requiring
>>>>>>>> 3.5+ we should identify it and decide that before widespread announcement.
>>>>>>>>
>>>>>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <
>>>>>>>>> holden@pigscanfly.ca> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Anand,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>>>
>>>>>>>>>>> It should be no problem to run everything on DataflowRunner as
>>>>>>>>>>> well.
>>>>>>>>>>> Are there any performance tests in place to check for
>>>>>>>>>>> performance regressions?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Yes there is a suite (
>>>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>>>>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>>>>>>> would not block python 3 work on performance for now. That is the
>>>>>>>>> unfortuante state of things.
>>>>>>>>>
>>>>>>>>> If anybody in the community is interested, this would be a great
>>>>>>>>> opportunity to help with benchmarks in general.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Some questions were raised in the proposal document which I want
>>>>>>>>>>> to add to this conversation:
>>>>>>>>>>>
>>>>>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>>>>>> sources on this though).
>>>>>>>>>>> If the beam community prefers 3.4, I would propose to target 3.4
>>>>>>>>>>> only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>>>>>> It would be great to get some opinions on this.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>>>>>> ~20% usage and python 3.4+ has 99% (
>>>>>>>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>>>>>>> Based on that, I think it makes sense to support it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>>>>>> porting progress.
>>>>>>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>>>>>>> for every pull request on an already updated subpackage.
>>>>>>>>>>> After applying step 3, all tests should run on python 3, so
>>>>>>>>>>> again it would be great if we can enforce these per updated subpackage.
>>>>>>>>>>> Any insights on how to best accomplish this?
>>>>>>>>>>>
>>>>>>>>>> So you can look at some of the recent changes to tox.ini in the
>>>>>>>>>> git log to see what we’ve done so far around this I suspect you can repeat
>>>>>>>>>> that same pattern.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh
>>>>>>>>> would help a lot to prevent regressions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Robbe
>>>>>>>>>>>
>>>>>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Robbe.
>>>>>>>>>>>>
>>>>>>>>>>>> I reviewed the document it looks reasonable to me. I will touch
>>>>>>>>>>>> on some points that were not mentioned:
>>>>>>>>>>>> - Runner exercise different code paths. Doing auto conversions
>>>>>>>>>>>> and focusing on DirectRunner is not enough. It is worthwhile to run things
>>>>>>>>>>>> on DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>>>>>
>>>>>>>>>>>> For project tracking on JIRA, please feel free to create any
>>>>>>>>>>>> new issues, close stale ones, or take ownership of any open issues. All
>>>>>>>>>>>> JIRAs should be assigned to the people actively working on them. If you wan
>>>>>>>>>>>> to track it in a separate way, you can also propose that. (For example a
>>>>>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>>>>>> JIRA.)
>>>>>>>>>>>>
>>>>>>>>>>>> I will also call out to a few other people in addition to
>>>>>>>>>>>> Holden who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You
>>>>>>>>>>>> can include these people (and myself) for reviews and other questions that
>>>>>>>>>>>> you have.
>>>>>>>>>>>>
>>>>>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the next month(s), me and my colleague Matthias will commit
>>>>>>>>>>>>> a lot of time and effort to python 3 support for beam and we would like to
>>>>>>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have drawn up a document [1] with a high level outline of
>>>>>>>>>>>>> the proposed approach and would like to get your feedback on this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>>>>>
>>>>>>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort
>>>>>>>>>>>>> to add python 3 support, so it would be great to get your insights and find
>>>>>>>>>>>>> a way to merge our efforts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>> Robbe
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>>>
>>>>>>>>>>>>> ML6 Gent
>>>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>>>
>>>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>
>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>
>>>>>>>>>>> ML6 Gent
>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>
>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>
>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>
>>>>>>> * Robbe Sneyders*
>>>>>>>
>>>>>>> ML6 Gent
>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>
>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>
>>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>>
>>>
> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Ahmet Altay <al...@google.com>.
Kanban board for python 3:
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245

(Thank you Davor!)

Ahmet

On Fri, Apr 6, 2018 at 6:32 PM, Reuven Lax <re...@google.com> wrote:

> I had a similar problem.
>
> On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay <al...@google.com> wrote:
>
>> I tried to create a shared kanban board but I failed. I think I am
>> lacking some permission to create a shared filter. Could someone help with
>> creating this?
>>
>> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784
>> OR parent = BEAM-1251) ORDER BY Rank ASC"
>>
>> Ahmet
>>
>> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <ro...@ml6.eu>
>> wrote:
>>
>>> Hi all,
>>>
>>> I don't seem to have the permissions to create a Kanban board or even
>>> assign tasks to myself. Who could help me with this?
>>>
>>> I've updated the coders package pull request [1] and added the applied
>>> strategy to the proposal document [2].
>>> It would be great to get some feedback on this, so we can start moving
>>> forward with other subpackages.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> [1] https://github.com/apache/beam/pull/4990
>>> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>
>>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu>
>>> wrote:
>>>
>>>> Hello Robert,
>>>>
>>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>>> this. I'll look into setting one up tomorrow.
>>>>
>>>> In the meantime, you can find the first pull request with the updated
>>>> coders package here:
>>>> https://github.com/apache/beam/pull/4990
>>>>
>>>> Kind regards,
>>>> Robbe
>>>>
>>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
>>>>> wrote:
>>>>>
>>>>>> Thanks Ahmet and Robert,
>>>>>>
>>>>>> I think we can work on different subpackages in parallel, but it's
>>>>>> important to apply the same strategy everywhere. I'm currently working on
>>>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>>>> coders subpackage to create a first pull request. We can then discuss the
>>>>>> applied strategy in detail before merging and applying it to the other
>>>>>> subpackages.
>>>>>>
>>>>>
>>>>> Sounds good. Again, could you document (in a more permanent/easy to
>>>>> look up state than email) when packages are started/done?
>>>>>
>>>>>
>>>>>> This strategy also includes the choice of automated tools. I'm
>>>>>> focusing on writing python 3 code with python 2 compatibility, which means
>>>>>> depending on the future package instead of the six package (which is
>>>>>> already used in some places in the current code base). I have already
>>>>>> noticed that this indeed requires a lot of manual work after running the
>>>>>> automated script.
>>>>>> The future package supports python 3.3+ compatibility, so I don't
>>>>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>>>>
>>>>>
>>>>> Sure. It may incur a higher maintenance burden long-term though.
>>>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>>>> some time to come.)
>>>>>
>>>>>
>>>>>> I have already added a tox environment to run pylint2 with the --py3k
>>>>>> argument per updated subpackage, which should help avoid regression between
>>>>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>>>>> first pull request.
>>>>>>
>>>>>> Kind regards,
>>>>>> Robbe
>>>>>>
>>>>>>
>>>>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>>>>> read over your doc and the one thing I'd like to add is that this work is
>>>>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>>>>> some way to coordinate so as to not overlap work (or just waste time
>>>>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>>>>> various automated/manual conversions along the other would be helpful?
>>>>>>>
>>>>>>> A note on automated tools, they're sometimes overly conservative, so
>>>>>>> we should be sure to review the changes manually. (A typical example of
>>>>>>> this is unnecessarily importing six.moves.xrange when there was no big
>>>>>>> reason to use xrange over range in Python 2, or conversely using
>>>>>>> list(range(...) in Python 3.)
>>>>>>>
>>>>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent
>>>>>>> regressions. If there's a cost to supporting 3.4 as opposed to requiring
>>>>>>> 3.5+ we should identify it and decide that before widespread announcement.
>>>>>>>
>>>>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <holden@pigscanfly.ca
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Anand,
>>>>>>>>>>
>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>>
>>>>>>>>>> It should be no problem to run everything on DataflowRunner as
>>>>>>>>>> well.
>>>>>>>>>> Are there any performance tests in place to check for performance
>>>>>>>>>> regressions?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes there is a suite (https://github.com/apache/
>>>>>>>> beam/blob/master/.test-infra/jenkins/job_beam_
>>>>>>>> PerformanceTests_Python.groovy). It may not be very comprehensive
>>>>>>>> and seems to be failing for a while. I would not block python 3 work on
>>>>>>>> performance for now. That is the unfortuante state of things.
>>>>>>>>
>>>>>>>> If anybody in the community is interested, this would be a great
>>>>>>>> opportunity to help with benchmarks in general.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Some questions were raised in the proposal document which I want
>>>>>>>>>> to add to this conversation:
>>>>>>>>>>
>>>>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>>>>> sources on this though).
>>>>>>>>>> If the beam community prefers 3.4, I would propose to target 3.4
>>>>>>>>>> only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>>>>> It would be great to get some opinions on this.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>>>>> ~20% usage and python 3.4+ has 99% (https://semaphoreci.com/blog/
>>>>>>>> 2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>>>>>> Based on that, I think it makes sense to support it.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>>>>> porting progress.
>>>>>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>>>>>> for every pull request on an already updated subpackage.
>>>>>>>>>> After applying step 3, all tests should run on python 3, so again
>>>>>>>>>> it would be great if we can enforce these per updated subpackage.
>>>>>>>>>> Any insights on how to best accomplish this?
>>>>>>>>>>
>>>>>>>>> So you can look at some of the recent changes to tox.ini in the
>>>>>>>>> git log to see what we’ve done so far around this I suspect you can repeat
>>>>>>>>> that same pattern.
>>>>>>>>>
>>>>>>>>
>>>>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh
>>>>>>>> would help a lot to prevent regressions.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Robbe
>>>>>>>>>>
>>>>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you Robbe.
>>>>>>>>>>>
>>>>>>>>>>> I reviewed the document it looks reasonable to me. I will touch
>>>>>>>>>>> on some points that were not mentioned:
>>>>>>>>>>> - Runner exercise different code paths. Doing auto conversions
>>>>>>>>>>> and focusing on DirectRunner is not enough. It is worthwhile to run things
>>>>>>>>>>> on DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>>>>
>>>>>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>>>>> JIRA.)
>>>>>>>>>>>
>>>>>>>>>>> I will also call out to a few other people in addition to Holden
>>>>>>>>>>> who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You
>>>>>>>>>>> can include these people (and myself) for reviews and other questions that
>>>>>>>>>>> you have.
>>>>>>>>>>>
>>>>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Ahmet
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> In the next month(s), me and my colleague Matthias will commit
>>>>>>>>>>>> a lot of time and effort to python 3 support for beam and we would like to
>>>>>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>>>>>
>>>>>>>>>>>> We have drawn up a document [1] with a high level outline of
>>>>>>>>>>>> the proposed approach and would like to get your feedback on this.
>>>>>>>>>>>>
>>>>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>>>>
>>>>>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort
>>>>>>>>>>>> to add python 3 support, so it would be great to get your insights and find
>>>>>>>>>>>> a way to merge our efforts.
>>>>>>>>>>>>
>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>> Robbe
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>>>>>>>>>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>>
>>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>>
>>>>>>>>>>>> ML6 Gent
>>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>>
>>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>
>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>
>>>>>>>>>> ML6 Gent
>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>
>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>> --
>>>
>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>
>>> M: +32 474 71 31 08
>>>
>>
>>

Re: [PROPOSAL] Python 3 support

Posted by Reuven Lax <re...@google.com>.
I had a similar problem.

On Fri, Apr 6, 2018, 6:23 PM Ahmet Altay <al...@google.com> wrote:

> I tried to create a shared kanban board but I failed. I think I am lacking
> some permission to create a shared filter. Could someone help with creating
> this?
>
> The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
> parent = BEAM-1251) ORDER BY Rank ASC"
>
> Ahmet
>
> On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <ro...@ml6.eu>
> wrote:
>
>> Hi all,
>>
>> I don't seem to have the permissions to create a Kanban board or even
>> assign tasks to myself. Who could help me with this?
>>
>> I've updated the coders package pull request [1] and added the applied
>> strategy to the proposal document [2].
>> It would be great to get some feedback on this, so we can start moving
>> forward with other subpackages.
>>
>> Kind regards,
>> Robbe
>>
>> [1] https://github.com/apache/beam/pull/4990
>> [2]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>>
>> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu> wrote:
>>
>>> Hello Robert,
>>>
>>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>>> this. I'll look into setting one up tomorrow.
>>>
>>> In the meantime, you can find the first pull request with the updated
>>> coders package here:
>>> https://github.com/apache/beam/pull/4990
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Thanks Ahmet and Robert,
>>>>>
>>>>> I think we can work on different subpackages in parallel, but it's
>>>>> important to apply the same strategy everywhere. I'm currently working on
>>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>>> coders subpackage to create a first pull request. We can then discuss the
>>>>> applied strategy in detail before merging and applying it to the other
>>>>> subpackages.
>>>>>
>>>>
>>>> Sounds good. Again, could you document (in a more permanent/easy to
>>>> look up state than email) when packages are started/done?
>>>>
>>>>
>>>>> This strategy also includes the choice of automated tools. I'm
>>>>> focusing on writing python 3 code with python 2 compatibility, which means
>>>>> depending on the future package instead of the six package (which is
>>>>> already used in some places in the current code base). I have already
>>>>> noticed that this indeed requires a lot of manual work after running the
>>>>> automated script.
>>>>> The future package supports python 3.3+ compatibility, so I don't
>>>>> think there is a higher cost supporting 3.4 compared to 3.5+.
>>>>>
>>>>
>>>> Sure. It may incur a higher maintenance burden long-term though.
>>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>>> some time to come.)
>>>>
>>>>
>>>>> I have already added a tox environment to run pylint2 with the --py3k
>>>>> argument per updated subpackage, which should help avoid regression between
>>>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>>>> first pull request.
>>>>>
>>>>> Kind regards,
>>>>> Robbe
>>>>>
>>>>>
>>>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>>>> read over your doc and the one thing I'd like to add is that this work is
>>>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>>>> some way to coordinate so as to not overlap work (or just waste time
>>>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>>>> various automated/manual conversions along the other would be helpful?
>>>>>>
>>>>>> A note on automated tools, they're sometimes overly conservative, so
>>>>>> we should be sure to review the changes manually. (A typical example of
>>>>>> this is unnecessarily importing six.moves.xrange when there was no big
>>>>>> reason to use xrange over range in Python 2, or conversely using
>>>>>> list(range(...) in Python 3.)
>>>>>>
>>>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>>>>>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>>>>> identify it and decide that before widespread announcement.
>>>>>>
>>>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>
>>>>>>>>> Hi Anand,
>>>>>>>>>
>>>>>>>>> Thanks for the feedback.
>>>>>>>>>
>>>>>>>>> It should be no problem to run everything on DataflowRunner as
>>>>>>>>> well.
>>>>>>>>> Are there any performance tests in place to check for performance
>>>>>>>>> regressions?
>>>>>>>>>
>>>>>>>>
>>>>>>> Yes there is a suite (
>>>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>>>>> would not block python 3 work on performance for now. That is the
>>>>>>> unfortuante state of things.
>>>>>>>
>>>>>>> If anybody in the community is interested, this would be a great
>>>>>>> opportunity to help with benchmarks in general.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Some questions were raised in the proposal document which I want
>>>>>>>>> to add to this conversation:
>>>>>>>>>
>>>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>>>> sources on this though).
>>>>>>>>> If the beam community prefers 3.4, I would propose to target 3.4
>>>>>>>>> only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>>>> It would be great to get some opinions on this.
>>>>>>>>>
>>>>>>>>
>>>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>>>> ~20% usage and python 3.4+ has 99% (
>>>>>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>>>>> Based on that, I think it makes sense to support it.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>>>> porting progress.
>>>>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>>>>> for every pull request on an already updated subpackage.
>>>>>>>>> After applying step 3, all tests should run on python 3, so again
>>>>>>>>> it would be great if we can enforce these per updated subpackage.
>>>>>>>>> Any insights on how to best accomplish this?
>>>>>>>>>
>>>>>>>> So you can look at some of the recent changes to tox.ini in the git
>>>>>>>> log to see what we’ve done so far around this I suspect you can repeat that
>>>>>>>> same pattern.
>>>>>>>>
>>>>>>>
>>>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh
>>>>>>> would help a lot to prevent regressions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Robbe
>>>>>>>>>
>>>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thank you Robbe.
>>>>>>>>>>
>>>>>>>>>> I reviewed the document it looks reasonable to me. I will touch
>>>>>>>>>> on some points that were not mentioned:
>>>>>>>>>> - Runner exercise different code paths. Doing auto conversions
>>>>>>>>>> and focusing on DirectRunner is not enough. It is worthwhile to run things
>>>>>>>>>> on DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>>>
>>>>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>>>> JIRA.)
>>>>>>>>>>
>>>>>>>>>> I will also call out to a few other people in addition to Holden
>>>>>>>>>> who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You
>>>>>>>>>> can include these people (and myself) for reviews and other questions that
>>>>>>>>>> you have.
>>>>>>>>>>
>>>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Ahmet
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>
>>>>>>>>>>> In the next month(s), me and my colleague Matthias will commit a
>>>>>>>>>>> lot of time and effort to python 3 support for beam and we would like to
>>>>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>>>>
>>>>>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>>>>>
>>>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>>>
>>>>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort
>>>>>>>>>>> to add python 3 support, so it would be great to get your insights and find
>>>>>>>>>>> a way to merge our efforts.
>>>>>>>>>>>
>>>>>>>>>>> Kind regards,
>>>>>>>>>>> Robbe
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>>
>>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>>
>>>>>>>>>>> ML6 Gent
>>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>>
>>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>
>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>
>>>>>>>>> ML6 Gent
>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>
>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>> --
>>>
>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>
>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>
>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08
>>
>
>

Re: [PROPOSAL] Python 3 support

Posted by Ahmet Altay <al...@google.com>.
I tried to create a shared kanban board but I failed. I think I am lacking
some permission to create a shared filter. Could someone help with creating
this?

The filter I planned to use was "project = BEAM AND (parent = BEAM-2784 OR
parent = BEAM-1251) ORDER BY Rank ASC"

Ahmet

On Fri, Apr 6, 2018 at 5:45 AM, Robbe Sneyders <ro...@ml6.eu>
wrote:

> Hi all,
>
> I don't seem to have the permissions to create a Kanban board or even
> assign tasks to myself. Who could help me with this?
>
> I've updated the coders package pull request [1] and added the applied
> strategy to the proposal document [2].
> It would be great to get some feedback on this, so we can start moving
> forward with other subpackages.
>
> Kind regards,
> Robbe
>
> [1] https://github.com/apache/beam/pull/4990
> [2] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>
> On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu> wrote:
>
>> Hello Robert,
>>
>> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
>> this. I'll look into setting one up tomorrow.
>>
>> In the meantime, you can find the first pull request with the updated
>> coders package here:
>> https://github.com/apache/beam/pull/4990
>>
>> Kind regards,
>> Robbe
>>
>> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com> wrote:
>>
>>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
>>> wrote:
>>>
>>>> Thanks Ahmet and Robert,
>>>>
>>>> I think we can work on different subpackages in parallel, but it's
>>>> important to apply the same strategy everywhere. I'm currently working on
>>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>>> coders subpackage to create a first pull request. We can then discuss the
>>>> applied strategy in detail before merging and applying it to the other
>>>> subpackages.
>>>>
>>>
>>> Sounds good. Again, could you document (in a more permanent/easy to look
>>> up state than email) when packages are started/done?
>>>
>>>
>>>> This strategy also includes the choice of automated tools. I'm focusing
>>>> on writing python 3 code with python 2 compatibility, which means depending
>>>> on the future package instead of the six package (which is already used in
>>>> some places in the current code base). I have already noticed that this
>>>> indeed requires a lot of manual work after running the automated script.
>>>> The future package supports python 3.3+ compatibility, so I don't think
>>>> there is a higher cost supporting 3.4 compared to 3.5+.
>>>>
>>>
>>> Sure. It may incur a higher maintenance burden long-term though.
>>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>>> some time to come.)
>>>
>>>
>>>> I have already added a tox environment to run pylint2 with the --py3k
>>>> argument per updated subpackage, which should help avoid regression between
>>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>>> first pull request.
>>>>
>>>> Kind regards,
>>>> Robbe
>>>>
>>>>
>>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>>> read over your doc and the one thing I'd like to add is that this work is
>>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>>> some way to coordinate so as to not overlap work (or just waste time
>>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>>> various automated/manual conversions along the other would be helpful?
>>>>>
>>>>> A note on automated tools, they're sometimes overly conservative, so
>>>>> we should be sure to review the changes manually. (A typical example of
>>>>> this is unnecessarily importing six.moves.xrange when there was no big
>>>>> reason to use xrange over range in Python 2, or conversely using
>>>>> list(range(...) in Python 3.)
>>>>>
>>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>>>>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>>>> identify it and decide that before widespread announcement.
>>>>>
>>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <
>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>
>>>>>>>> Hi Anand,
>>>>>>>>
>>>>>>>> Thanks for the feedback.
>>>>>>>>
>>>>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>>>>> Are there any performance tests in place to check for performance
>>>>>>>> regressions?
>>>>>>>>
>>>>>>>
>>>>>> Yes there is a suite (https://github.com/apache/
>>>>>> beam/blob/master/.test-infra/jenkins/job_beam_
>>>>>> PerformanceTests_Python.groovy). It may not be very comprehensive
>>>>>> and seems to be failing for a while. I would not block python 3 work on
>>>>>> performance for now. That is the unfortuante state of things.
>>>>>>
>>>>>> If anybody in the community is interested, this would be a great
>>>>>> opportunity to help with benchmarks in general.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Some questions were raised in the proposal document which I want to
>>>>>>>> add to this conversation:
>>>>>>>>
>>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>>> sources on this though).
>>>>>>>> If the beam community prefers 3.4, I would propose to target 3.4
>>>>>>>> only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>>> It would be great to get some opinions on this.
>>>>>>>>
>>>>>>>
>>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>>> ~20% usage and python 3.4+ has 99% (https://semaphoreci.com/blog/
>>>>>> 2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>>>> Based on that, I think it makes sense to support it.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>>> porting progress.
>>>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>>>> for every pull request on an already updated subpackage.
>>>>>>>> After applying step 3, all tests should run on python 3, so again
>>>>>>>> it would be great if we can enforce these per updated subpackage.
>>>>>>>> Any insights on how to best accomplish this?
>>>>>>>>
>>>>>>> So you can look at some of the recent changes to tox.ini in the git
>>>>>>> log to see what we’ve done so far around this I suspect you can repeat that
>>>>>>> same pattern.
>>>>>>>
>>>>>>
>>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh
>>>>>> would help a lot to prevent regressions.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Robbe
>>>>>>>>
>>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Thank you Robbe.
>>>>>>>>>
>>>>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>>>>> some points that were not mentioned:
>>>>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>>
>>>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>>> JIRA.)
>>>>>>>>>
>>>>>>>>> I will also call out to a few other people in addition to Holden
>>>>>>>>> who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>>>>> include these people (and myself) for reviews and other questions that you
>>>>>>>>> have.
>>>>>>>>>
>>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> Ahmet
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>>
>>>>>>>>>> Hello everyone,
>>>>>>>>>>
>>>>>>>>>> In the next month(s), me and my colleague Matthias will commit a
>>>>>>>>>> lot of time and effort to python 3 support for beam and we would like to
>>>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>>>
>>>>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>>>>
>>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>>
>>>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>>>>> add python 3 support, so it would be great to get your insights and find a
>>>>>>>>>> way to merge our efforts.
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>> Robbe
>>>>>>>>>>
>>>>>>>>>> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>>>>>>>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>>
>>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>>
>>>>>>>>>> ML6 Gent
>>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>>
>>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>
>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>
>>>>>>>> * Robbe Sneyders*
>>>>>>>>
>>>>>>>> ML6 Gent
>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>
>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>
>>>>>>> --
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>
>>>>>>
>>>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08
>

Re: [PROPOSAL] Python 3 support

Posted by Robbe Sneyders <ro...@ml6.eu>.
Hi all,

I don't seem to have the permissions to create a Kanban board or even
assign tasks to myself. Who could help me with this?

I've updated the coders package pull request [1] and added the applied
strategy to the proposal document [2].
It would be great to get some feedback on this, so we can start moving
forward with other subpackages.

Kind regards,
Robbe

[1] https://github.com/apache/beam/pull/4990
[2]
https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing


On Mon, 2 Apr 2018 at 21:07 Robbe Sneyders <ro...@ml6.eu> wrote:

> Hello Robert,
>
> I think a Kanban board on Jira as proposed by Ahmet can be helpful for
> this. I'll look into setting one up tomorrow.
>
> In the meantime, you can find the first pull request with the updated
> coders package here:
> https://github.com/apache/beam/pull/4990
>
> Kind regards,
> Robbe
>
> On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com> wrote:
>
>> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
>> wrote:
>>
>>> Thanks Ahmet and Robert,
>>>
>>> I think we can work on different subpackages in parallel, but it's
>>> important to apply the same strategy everywhere. I'm currently working on
>>> applying step 1 (was mostly done already) and 2 of the proposal to the
>>> coders subpackage to create a first pull request. We can then discuss the
>>> applied strategy in detail before merging and applying it to the other
>>> subpackages.
>>>
>>
>> Sounds good. Again, could you document (in a more permanent/easy to look
>> up state than email) when packages are started/done?
>>
>>
>>> This strategy also includes the choice of automated tools. I'm focusing
>>> on writing python 3 code with python 2 compatibility, which means depending
>>> on the future package instead of the six package (which is already used in
>>> some places in the current code base). I have already noticed that this
>>> indeed requires a lot of manual work after running the automated script.
>>> The future package supports python 3.3+ compatibility, so I don't think
>>> there is a higher cost supporting 3.4 compared to 3.5+.
>>>
>>
>> Sure. It may incur a higher maintenance burden long-term though.
>> (Basically, if we go out the door with 3.4 it's a promise to support it for
>> some time to come.)
>>
>>
>>> I have already added a tox environment to run pylint2 with the --py3k
>>> argument per updated subpackage, which should help avoid regression between
>>> step 2 and step 3 of the proposal. This update will be pushed with the
>>> first pull request.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>>
>>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Thank you, Robbie, for your offer to help with contribution here. I
>>>> read over your doc and the one thing I'd like to add is that this work is
>>>> very parallelizable, but if we have enough people looking at it we'll want
>>>> some way to coordinate so as to not overlap work (or just waste time
>>>> discovering what's been done). Tracking individual JIRAs and PRs gets
>>>> unwieldy, perhaps a spreadsheet with modules/packages on one axis and the
>>>> various automated/manual conversions along the other would be helpful?
>>>>
>>>> A note on automated tools, they're sometimes overly conservative, so we
>>>> should be sure to review the changes manually. (A typical example of this
>>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>>> Python 3.)
>>>>
>>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions.
>>>> If there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>>> identify it and decide that before widespread announcement.
>>>>
>>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Anand,
>>>>>>>
>>>>>>> Thanks for the feedback.
>>>>>>>
>>>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>>>> Are there any performance tests in place to check for performance
>>>>>>> regressions?
>>>>>>>
>>>>>>
>>>>> Yes there is a suite (
>>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>>> would not block python 3 work on performance for now. That is the
>>>>> unfortuante state of things.
>>>>>
>>>>> If anybody in the community is interested, this would be a great
>>>>> opportunity to help with benchmarks in general.
>>>>>
>>>>>
>>>>>>
>>>>>>> Some questions were raised in the proposal document which I want to
>>>>>>> add to this conversation:
>>>>>>>
>>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>>> sources on this though).
>>>>>>> If the beam community prefers 3.4, I would propose to target 3.4
>>>>>>> only during porting and add 3.5 and 3.6 later so we don't slow down the
>>>>>>> porting progress. 3.4 has the advantage of already being installed on the
>>>>>>> workers and allows pySpark pipelines to be moved over to beam more easily.
>>>>>>> It would be great to get some opinions on this.
>>>>>>>
>>>>>>
>>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>>> ~20% usage and python 3.4+ has 99% (
>>>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>>> Based on that, I think it makes sense to support it.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> Another comment was made on how to avoid regression during the
>>>>>>> porting progress.
>>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>>> for every pull request on an already updated subpackage.
>>>>>>> After applying step 3, all tests should run on python 3, so again it
>>>>>>> would be great if we can enforce these per updated subpackage.
>>>>>>> Any insights on how to best accomplish this?
>>>>>>>
>>>>>> So you can look at some of the recent changes to tox.ini in the git
>>>>>> log to see what we’ve done so far around this I suspect you can repeat that
>>>>>> same pattern.
>>>>>>
>>>>>
>>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>>>>> help a lot to prevent regressions.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Robbe
>>>>>>>
>>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>>>
>>>>>>>> Thank you Robbe.
>>>>>>>>
>>>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>>>> some points that were not mentioned:
>>>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>>> validate that we are still compatible for python 2.
>>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>>
>>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>>> JIRA.)
>>>>>>>>
>>>>>>>> I will also call out to a few other people in addition to Holden
>>>>>>>> who helped out or showed interest in helping with Python 3. @cclaus,
>>>>>>>> @luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>>>> include these people (and myself) for reviews and other questions that you
>>>>>>>> have.
>>>>>>>>
>>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> Ahmet
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>>
>>>>>>>>> Hello everyone,
>>>>>>>>>
>>>>>>>>> In the next month(s), me and my colleague Matthias will commit a
>>>>>>>>> lot of time and effort to python 3 support for beam and we would like to
>>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>>
>>>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>>>
>>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>>
>>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>>>> add python 3 support, so it would be great to get your insights and find a
>>>>>>>>> way to merge our efforts.
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>> Robbe
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>>
>>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>>
>>>>>>>>> * Robbe Sneyders*
>>>>>>>>>
>>>>>>>>> ML6 Gent
>>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>>
>>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>
>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>
>>>>>>> * Robbe Sneyders*
>>>>>>>
>>>>>>> ML6 Gent
>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>
>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>
>>>>>
>>>>> --
>>>
>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>
>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>
>> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>
-- 

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Robbe Sneyders <ro...@ml6.eu>.
Hello Robert,

I think a Kanban board on Jira as proposed by Ahmet can be helpful for
this. I'll look into setting one up tomorrow.

In the meantime, you can find the first pull request with the updated
coders package here:
https://github.com/apache/beam/pull/4990

Kind regards,
Robbe

On Fri, 30 Mar 2018 at 18:01 Robert Bradshaw <ro...@google.com> wrote:

> On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
> wrote:
>
>> Thanks Ahmet and Robert,
>>
>> I think we can work on different subpackages in parallel, but it's
>> important to apply the same strategy everywhere. I'm currently working on
>> applying step 1 (was mostly done already) and 2 of the proposal to the
>> coders subpackage to create a first pull request. We can then discuss the
>> applied strategy in detail before merging and applying it to the other
>> subpackages.
>>
>
> Sounds good. Again, could you document (in a more permanent/easy to look
> up state than email) when packages are started/done?
>
>
>> This strategy also includes the choice of automated tools. I'm focusing
>> on writing python 3 code with python 2 compatibility, which means depending
>> on the future package instead of the six package (which is already used in
>> some places in the current code base). I have already noticed that this
>> indeed requires a lot of manual work after running the automated script.
>> The future package supports python 3.3+ compatibility, so I don't think
>> there is a higher cost supporting 3.4 compared to 3.5+.
>>
>
> Sure. It may incur a higher maintenance burden long-term though.
> (Basically, if we go out the door with 3.4 it's a promise to support it for
> some time to come.)
>
>
>> I have already added a tox environment to run pylint2 with the --py3k
>> argument per updated subpackage, which should help avoid regression between
>> step 2 and step 3 of the proposal. This update will be pushed with the
>> first pull request.
>>
>> Kind regards,
>> Robbe
>>
>>
>> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com> wrote:
>>
>>> Thank you, Robbie, for your offer to help with contribution here. I read
>>> over your doc and the one thing I'd like to add is that this work is very
>>> parallelizable, but if we have enough people looking at it we'll want some
>>> way to coordinate so as to not overlap work (or just waste time discovering
>>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>>> a spreadsheet with modules/packages on one axis and the various
>>> automated/manual conversions along the other would be helpful?
>>>
>>> A note on automated tools, they're sometimes overly conservative, so we
>>> should be sure to review the changes manually. (A typical example of this
>>> is unnecessarily importing six.moves.xrange when there was no big reason to
>>> use xrange over range in Python 2, or conversely using list(range(...) in
>>> Python 3.)
>>>
>>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>>> identify it and decide that before widespread announcement.
>>>
>>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>>
>>>>>
>>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
>>>>> wrote:
>>>>>
>>>>>> Hi Anand,
>>>>>>
>>>>>> Thanks for the feedback.
>>>>>>
>>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>>> Are there any performance tests in place to check for performance
>>>>>> regressions?
>>>>>>
>>>>>
>>>> Yes there is a suite (
>>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>>> It may not be very comprehensive and seems to be failing for a while. I
>>>> would not block python 3 work on performance for now. That is the
>>>> unfortuante state of things.
>>>>
>>>> If anybody in the community is interested, this would be a great
>>>> opportunity to help with benchmarks in general.
>>>>
>>>>
>>>>>
>>>>>> Some questions were raised in the proposal document which I want to
>>>>>> add to this conversation:
>>>>>>
>>>>>> The first comment was about the targeted python 3 versions. We
>>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>>> sources on this though).
>>>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>>>> It would be great to get some opinions on this.
>>>>>>
>>>>>
>>>> My preference is to support 3.4+. I searched a bit on the web to
>>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>>> ~20% usage and python 3.4+ has 99% (
>>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>>> Based on that, I think it makes sense to support it.
>>>>
>>>>
>>>>
>>>>>
>>>>>> Another comment was made on how to avoid regression during the
>>>>>> porting progress.
>>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>>> for every pull request on an already updated subpackage.
>>>>>> After applying step 3, all tests should run on python 3, so again it
>>>>>> would be great if we can enforce these per updated subpackage.
>>>>>> Any insights on how to best accomplish this?
>>>>>>
>>>>> So you can look at some of the recent changes to tox.ini in the git
>>>>> log to see what we’ve done so far around this I suspect you can repeat that
>>>>> same pattern.
>>>>>
>>>>
>>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>>>> help a lot to prevent regressions.
>>>>
>>>>
>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Robbe
>>>>>>
>>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>>
>>>>>>> Thank you Robbe.
>>>>>>>
>>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>>> some points that were not mentioned:
>>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>>> validate that we are still compatible for python 2.
>>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>>
>>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>>> JIRA.)
>>>>>>>
>>>>>>> I will also call out to a few other people in addition to Holden who
>>>>>>> helped out or showed interest in helping with Python 3. @cclaus, @
>>>>>>> luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>>> include these people (and myself) for reviews and other questions that you
>>>>>>> have.
>>>>>>>
>>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ahmet
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> In the next month(s), me and my colleague Matthias will commit a
>>>>>>>> lot of time and effort to python 3 support for beam and we would like to
>>>>>>>> discuss the best way to go forward with this.
>>>>>>>>
>>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>>
>>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>>
>>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>>> add python 3 support, so it would be great to get your insights and find a
>>>>>>>> way to merge our efforts.
>>>>>>>>
>>>>>>>> Kind regards,
>>>>>>>> Robbe
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>>
>>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>> --
>>>>>>>>
>>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>>
>>>>>>>> * Robbe Sneyders*
>>>>>>>>
>>>>>>>> ML6 Gent
>>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>>
>>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>> --
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Mar 30, 2018 at 8:39 AM Robbe Sneyders <ro...@ml6.eu>
wrote:

> Thanks Ahmet and Robert,
>
> I think we can work on different subpackages in parallel, but it's
> important to apply the same strategy everywhere. I'm currently working on
> applying step 1 (was mostly done already) and 2 of the proposal to the
> coders subpackage to create a first pull request. We can then discuss the
> applied strategy in detail before merging and applying it to the other
> subpackages.
>

Sounds good. Again, could you document (in a more permanent/easy to look up
state than email) when packages are started/done?


> This strategy also includes the choice of automated tools. I'm focusing on
> writing python 3 code with python 2 compatibility, which means depending on
> the future package instead of the six package (which is already used in
> some places in the current code base). I have already noticed that this
> indeed requires a lot of manual work after running the automated script.
> The future package supports python 3.3+ compatibility, so I don't think
> there is a higher cost supporting 3.4 compared to 3.5+.
>

Sure. It may incur a higher maintenance burden long-term though.
(Basically, if we go out the door with 3.4 it's a promise to support it for
some time to come.)


> I have already added a tox environment to run pylint2 with the --py3k
> argument per updated subpackage, which should help avoid regression between
> step 2 and step 3 of the proposal. This update will be pushed with the
> first pull request.
>
> Kind regards,
> Robbe
>
>
> On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com> wrote:
>
>> Thank you, Robbie, for your offer to help with contribution here. I read
>> over your doc and the one thing I'd like to add is that this work is very
>> parallelizable, but if we have enough people looking at it we'll want some
>> way to coordinate so as to not overlap work (or just waste time discovering
>> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
>> a spreadsheet with modules/packages on one axis and the various
>> automated/manual conversions along the other would be helpful?
>>
>> A note on automated tools, they're sometimes overly conservative, so we
>> should be sure to review the changes manually. (A typical example of this
>> is unnecessarily importing six.moves.xrange when there was no big reason to
>> use xrange over range in Python 2, or conversely using list(range(...) in
>> Python 3.)
>>
>> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
>> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
>> identify it and decide that before widespread announcement.
>>
>> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>>
>>>
>>>
>>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>>
>>>>
>>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Hi Anand,
>>>>>
>>>>> Thanks for the feedback.
>>>>>
>>>>> It should be no problem to run everything on DataflowRunner as well.
>>>>> Are there any performance tests in place to check for performance
>>>>> regressions?
>>>>>
>>>>
>>> Yes there is a suite (
>>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>>> It may not be very comprehensive and seems to be failing for a while. I
>>> would not block python 3 work on performance for now. That is the
>>> unfortuante state of things.
>>>
>>> If anybody in the community is interested, this would be a great
>>> opportunity to help with benchmarks in general.
>>>
>>>
>>>>
>>>>> Some questions were raised in the proposal document which I want to
>>>>> add to this conversation:
>>>>>
>>>>> The first comment was about the targeted python 3 versions. We
>>>>> proposed to target 3.6 since it is the latest version available and added
>>>>> 3.5 because 3.6 adoption seems rather low (hard to find any relevant
>>>>> sources on this though).
>>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>>> It would be great to get some opinions on this.
>>>>>
>>>>
>>> My preference is to support 3.4+. I searched a bit on the web to
>>> understand the usage statistics for python 3, it seems like python 3.4 has
>>> ~20% usage and python 3.4+ has 99% (
>>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>>> Based on that, I think it makes sense to support it.
>>>
>>>
>>>
>>>>
>>>>> Another comment was made on how to avoid regression during the porting
>>>>> progress.
>>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>>> warnings should remain, so it would be great if we could enforce this check
>>>>> for every pull request on an already updated subpackage.
>>>>> After applying step 3, all tests should run on python 3, so again it
>>>>> would be great if we can enforce these per updated subpackage.
>>>>> Any insights on how to best accomplish this?
>>>>>
>>>> So you can look at some of the recent changes to tox.ini in the git log
>>>> to see what we’ve done so far around this I suspect you can repeat that
>>>> same pattern.
>>>>
>>>
>>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>>> help a lot to prevent regressions.
>>>
>>>
>>>
>>>>
>>>>> Thanks,
>>>>> Robbe
>>>>>
>>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>>
>>>>>> Thank you Robbe.
>>>>>>
>>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>>> some points that were not mentioned:
>>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>>> validate that we are still compatible for python 2.
>>>>>> - Similar to above but with an eye on perf regressions.
>>>>>>
>>>>>> For project tracking on JIRA, please feel free to create any new
>>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>>> should be assigned to the people actively working on them. If you wan to
>>>>>> track it in a separate way, you can also propose that. (For example a
>>>>>> kanban board is used for portability effort which is fully supported in
>>>>>> JIRA.)
>>>>>>
>>>>>> I will also call out to a few other people in addition to Holden who
>>>>>> helped out or showed interest in helping with Python 3. @cclaus, @
>>>>>> luke-zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can
>>>>>> include these people (and myself) for reviews and other questions that you
>>>>>> have.
>>>>>>
>>>>>> Welcome again, and looking forward to your contributions.
>>>>>>
>>>>>> Thank you,
>>>>>> Ahmet
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <
>>>>>> robbe.sneyders@ml6.eu> wrote:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> In the next month(s), me and my colleague Matthias will commit a lot
>>>>>>> of time and effort to python 3 support for beam and we would like to
>>>>>>> discuss the best way to go forward with this.
>>>>>>>
>>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>>
>>>>>>> The main Jira issue [2] for python 3 support has been mostly
>>>>>>> inactive for the past year. Other smaller issues have been opened, but it's
>>>>>>> hard to track the general progress. It would be great if anyone could offer
>>>>>>> some insights on how to best handle this project on Jira.
>>>>>>>
>>>>>>> @Holden Karau, you seem to have already put in a lot of effort to
>>>>>>> add python 3 support, so it would be great to get your insights and find a
>>>>>>> way to merge our efforts.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Robbe
>>>>>>>
>>>>>>> [1]
>>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>>
>>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>> --
>>>>>>>
>>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>>
>>>>>>> * Robbe Sneyders*
>>>>>>>
>>>>>>> ML6 Gent
>>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>>
>>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>>
>>>
>>> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08
>

Re: [PROPOSAL] Python 3 support

Posted by Robbe Sneyders <ro...@ml6.eu>.
Thanks Ahmet and Robert,

I think we can work on different subpackages in parallel, but it's
important to apply the same strategy everywhere. I'm currently working on
applying step 1 (was mostly done already) and 2 of the proposal to the
coders subpackage to create a first pull request. We can then discuss the
applied strategy in detail before merging and applying it to the other
subpackages.

This strategy also includes the choice of automated tools. I'm focusing on
writing python 3 code with python 2 compatibility, which means depending on
the future package instead of the six package (which is already used in
some places in the current code base). I have already noticed that this
indeed requires a lot of manual work after running the automated script.
The future package supports python 3.3+ compatibility, so I don't think
there is a higher cost supporting 3.4 compared to 3.5+.

I have already added a tox environment to run pylint2 with the --py3k
argument per updated subpackage, which should help avoid regression between
step 2 and step 3 of the proposal. This update will be pushed with the
first pull request.

Kind regards,
Robbe


On Fri, 30 Mar 2018 at 02:22 Robert Bradshaw <ro...@google.com> wrote:

> Thank you, Robbie, for your offer to help with contribution here. I read
> over your doc and the one thing I'd like to add is that this work is very
> parallelizable, but if we have enough people looking at it we'll want some
> way to coordinate so as to not overlap work (or just waste time discovering
> what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
> a spreadsheet with modules/packages on one axis and the various
> automated/manual conversions along the other would be helpful?
>
> A note on automated tools, they're sometimes overly conservative, so we
> should be sure to review the changes manually. (A typical example of this
> is unnecessarily importing six.moves.xrange when there was no big reason to
> use xrange over range in Python 2, or conversely using list(range(...) in
> Python 3.)
>
> Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
> there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
> identify it and decide that before widespread announcement.
>
> On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:
>
>>
>>
>> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>>>
>>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
>>> wrote:
>>>
>>>> Hi Anand,
>>>>
>>>> Thanks for the feedback.
>>>>
>>>> It should be no problem to run everything on DataflowRunner as well.
>>>> Are there any performance tests in place to check for performance
>>>> regressions?
>>>>
>>>
>> Yes there is a suite (
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
>> It may not be very comprehensive and seems to be failing for a while. I
>> would not block python 3 work on performance for now. That is the
>> unfortuante state of things.
>>
>> If anybody in the community is interested, this would be a great
>> opportunity to help with benchmarks in general.
>>
>>
>>>
>>>> Some questions were raised in the proposal document which I want to add
>>>> to this conversation:
>>>>
>>>> The first comment was about the targeted python 3 versions. We proposed
>>>> to target 3.6 since it is the latest version available and added 3.5
>>>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>>>> this though).
>>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>>> progress. 3.4 has the advantage of already being installed on the workers
>>>> and allows pySpark pipelines to be moved over to beam more easily.
>>>> It would be great to get some opinions on this.
>>>>
>>>
>> My preference is to support 3.4+. I searched a bit on the web to
>> understand the usage statistics for python 3, it seems like python 3.4 has
>> ~20% usage and python 3.4+ has 99% (
>> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
>> Based on that, I think it makes sense to support it.
>>
>>
>>
>>>
>>>> Another comment was made on how to avoid regression during the porting
>>>> progress.
>>>> After applying step 1 and step 2, no python 3 compatibility lint
>>>> warnings should remain, so it would be great if we could enforce this check
>>>> for every pull request on an already updated subpackage.
>>>> After applying step 3, all tests should run on python 3, so again it
>>>> would be great if we can enforce these per updated subpackage.
>>>> Any insights on how to best accomplish this?
>>>>
>>> So you can look at some of the recent changes to tox.ini in the git log
>>> to see what we’ve done so far around this I suspect you can repeat that
>>> same pattern.
>>>
>>
>> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
>> help a lot to prevent regressions.
>>
>>
>>
>>>
>>>> Thanks,
>>>> Robbe
>>>>
>>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> Thank you Robbe.
>>>>>
>>>>> I reviewed the document it looks reasonable to me. I will touch on
>>>>> some points that were not mentioned:
>>>>> - Runner exercise different code paths. Doing auto conversions and
>>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>>> validate that we are still compatible for python 2.
>>>>> - Similar to above but with an eye on perf regressions.
>>>>>
>>>>> For project tracking on JIRA, please feel free to create any new
>>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>>> should be assigned to the people actively working on them. If you wan to
>>>>> track it in a separate way, you can also propose that. (For example a
>>>>> kanban board is used for portability effort which is fully supported in
>>>>> JIRA.)
>>>>>
>>>>> I will also call out to a few other people in addition to Holden who
>>>>> helped out or showed interest in helping with Python 3. @cclaus, @luke
>>>>> -zhu, @udim, @robertwb, @charlesccychen, @tvalentyn. You can include
>>>>> these people (and myself) for reviews and other questions that you have.
>>>>>
>>>>> Welcome again, and looking forward to your contributions.
>>>>>
>>>>> Thank you,
>>>>> Ahmet
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <robbe.sneyders@ml6.eu
>>>>> > wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> In the next month(s), me and my colleague Matthias will commit a lot
>>>>>> of time and effort to python 3 support for beam and we would like to
>>>>>> discuss the best way to go forward with this.
>>>>>>
>>>>>> We have drawn up a document [1] with a high level outline of the
>>>>>> proposed approach and would like to get your feedback on this.
>>>>>>
>>>>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>>>>> for the past year. Other smaller issues have been opened, but it's hard to
>>>>>> track the general progress. It would be great if anyone could offer some
>>>>>> insights on how to best handle this project on Jira.
>>>>>>
>>>>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>>>>> python 3 support, so it would be great to get your insights and find a way
>>>>>> to merge our efforts.
>>>>>>
>>>>>> Kind regards,
>>>>>> Robbe
>>>>>>
>>>>>> [1]
>>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>>
>>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>> --
>>>>>>
>>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>>
>>>>>> * Robbe Sneyders*
>>>>>>
>>>>>> ML6 Gent
>>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>>
>>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>>
>>>>>
>>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Robert Bradshaw <ro...@google.com>.
Thank you, Robbie, for your offer to help with contribution here. I read
over your doc and the one thing I'd like to add is that this work is very
parallelizable, but if we have enough people looking at it we'll want some
way to coordinate so as to not overlap work (or just waste time discovering
what's been done). Tracking individual JIRAs and PRs gets unwieldy, perhaps
a spreadsheet with modules/packages on one axis and the various
automated/manual conversions along the other would be helpful?

A note on automated tools, they're sometimes overly conservative, so we
should be sure to review the changes manually. (A typical example of this
is unnecessarily importing six.moves.xrange when there was no big reason to
use xrange over range in Python 2, or conversely using list(range(...) in
Python 3.)

Also, +1 to targetting 3.4+ and upgrading tox to prevent regressions. If
there's a cost to supporting 3.4 as opposed to requiring 3.5+ we should
identify it and decide that before widespread announcement.

On Tue, Mar 27, 2018 at 2:27 PM Ahmet Altay <al...@google.com> wrote:

>
>
> On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>>
>> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
>> wrote:
>>
>>> Hi Anand,
>>>
>>> Thanks for the feedback.
>>>
>>> It should be no problem to run everything on DataflowRunner as well.
>>> Are there any performance tests in place to check for performance
>>> regressions?
>>>
>>
> Yes there is a suite (
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
> It may not be very comprehensive and seems to be failing for a while. I
> would not block python 3 work on performance for now. That is the
> unfortuante state of things.
>
> If anybody in the community is interested, this would be a great
> opportunity to help with benchmarks in general.
>
>
>>
>>> Some questions were raised in the proposal document which I want to add
>>> to this conversation:
>>>
>>> The first comment was about the targeted python 3 versions. We proposed
>>> to target 3.6 since it is the latest version available and added 3.5
>>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>>> this though).
>>> If the beam community prefers 3.4, I would propose to target 3.4 only
>>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>>> progress. 3.4 has the advantage of already being installed on the workers
>>> and allows pySpark pipelines to be moved over to beam more easily.
>>> It would be great to get some opinions on this.
>>>
>>
> My preference is to support 3.4+. I searched a bit on the web to
> understand the usage statistics for python 3, it seems like python 3.4 has
> ~20% usage and python 3.4+ has 99% (
> https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
> Based on that, I think it makes sense to support it.
>
>
>
>>
>>> Another comment was made on how to avoid regression during the porting
>>> progress.
>>> After applying step 1 and step 2, no python 3 compatibility lint
>>> warnings should remain, so it would be great if we could enforce this check
>>> for every pull request on an already updated subpackage.
>>> After applying step 3, all tests should run on python 3, so again it
>>> would be great if we can enforce these per updated subpackage.
>>> Any insights on how to best accomplish this?
>>>
>> So you can look at some of the recent changes to tox.ini in the git log
>> to see what we’ve done so far around this I suspect you can repeat that
>> same pattern.
>>
>
> +1 updating tox.ini and adding new checks to run_mini_py3lint.sh would
> help a lot to prevent regressions.
>
>
>
>>
>>> Thanks,
>>> Robbe
>>>
>>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Thank you Robbe.
>>>>
>>>> I reviewed the document it looks reasonable to me. I will touch on some
>>>> points that were not mentioned:
>>>> - Runner exercise different code paths. Doing auto conversions and
>>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>>> validate that we are still compatible for python 2.
>>>> - Similar to above but with an eye on perf regressions.
>>>>
>>>> For project tracking on JIRA, please feel free to create any new
>>>> issues, close stale ones, or take ownership of any open issues. All JIRAs
>>>> should be assigned to the people actively working on them. If you wan to
>>>> track it in a separate way, you can also propose that. (For example a
>>>> kanban board is used for portability effort which is fully supported in
>>>> JIRA.)
>>>>
>>>> I will also call out to a few other people in addition to Holden who
>>>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
>>>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
>>>> people (and myself) for reviews and other questions that you have.
>>>>
>>>> Welcome again, and looking forward to your contributions.
>>>>
>>>> Thank you,
>>>> Ahmet
>>>>
>>>>
>>>>
>>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <ro...@ml6.eu>
>>>> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> In the next month(s), me and my colleague Matthias will commit a lot
>>>>> of time and effort to python 3 support for beam and we would like to
>>>>> discuss the best way to go forward with this.
>>>>>
>>>>> We have drawn up a document [1] with a high level outline of the
>>>>> proposed approach and would like to get your feedback on this.
>>>>>
>>>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>>>> for the past year. Other smaller issues have been opened, but it's hard to
>>>>> track the general progress. It would be great if anyone could offer some
>>>>> insights on how to best handle this project on Jira.
>>>>>
>>>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>>>> python 3 support, so it would be great to get your insights and find a way
>>>>> to merge our efforts.
>>>>>
>>>>> Kind regards,
>>>>> Robbe
>>>>>
>>>>> [1]
>>>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>>>
>>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>>> --
>>>>>
>>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>>
>>>>> * Robbe Sneyders*
>>>>>
>>>>> ML6 Gent
>>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>>
>>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>>
>>>>
>>>> --
>>>
>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>
>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>>
>
>

Re: [PROPOSAL] Python 3 support

Posted by Ahmet Altay <al...@google.com>.
On Tue, Mar 27, 2018 at 7:12 AM, Holden Karau <ho...@pigscanfly.ca> wrote:

>
> On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
> wrote:
>
>> Hi Anand,
>>
>> Thanks for the feedback.
>>
>> It should be no problem to run everything on DataflowRunner as well.
>> Are there any performance tests in place to check for performance
>> regressions?
>>
>
Yes there is a suite (
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PerformanceTests_Python.groovy).
It may not be very comprehensive and seems to be failing for a while. I
would not block python 3 work on performance for now. That is the
unfortuante state of things.

If anybody in the community is interested, this would be a great
opportunity to help with benchmarks in general.


>
>> Some questions were raised in the proposal document which I want to add
>> to this conversation:
>>
>> The first comment was about the targeted python 3 versions. We proposed
>> to target 3.6 since it is the latest version available and added 3.5
>> because 3.6 adoption seems rather low (hard to find any relevant sources on
>> this though).
>> If the beam community prefers 3.4, I would propose to target 3.4 only
>> during porting and add 3.5 and 3.6 later so we don't slow down the porting
>> progress. 3.4 has the advantage of already being installed on the workers
>> and allows pySpark pipelines to be moved over to beam more easily.
>> It would be great to get some opinions on this.
>>
>
My preference is to support 3.4+. I searched a bit on the web to understand
the usage statistics for python 3, it seems like python 3.4 has ~20% usage
and python 3.4+ has 99% (
https://semaphoreci.com/blog/2017/10/18/python-versions-used-in-commercial-projects-in-2017.html).
Based on that, I think it makes sense to support it.



>
>> Another comment was made on how to avoid regression during the porting
>> progress.
>> After applying step 1 and step 2, no python 3 compatibility lint warnings
>> should remain, so it would be great if we could enforce this check for
>> every pull request on an already updated subpackage.
>> After applying step 3, all tests should run on python 3, so again it
>> would be great if we can enforce these per updated subpackage.
>> Any insights on how to best accomplish this?
>>
> So you can look at some of the recent changes to tox.ini in the git log to
> see what we’ve done so far around this I suspect you can repeat that same
> pattern.
>

+1 updating tox.ini and adding new checks to run_mini_py3lint.sh would help
a lot to prevent regressions.



>
>> Thanks,
>> Robbe
>>
>> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you Robbe.
>>>
>>> I reviewed the document it looks reasonable to me. I will touch on some
>>> points that were not mentioned:
>>> - Runner exercise different code paths. Doing auto conversions and
>>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>>> DataflowRunner as well. This can be triggered from Jenkins. It will
>>> validate that we are still compatible for python 2.
>>> - Similar to above but with an eye on perf regressions.
>>>
>>> For project tracking on JIRA, please feel free to create any new issues,
>>> close stale ones, or take ownership of any open issues. All JIRAs should be
>>> assigned to the people actively working on them. If you wan to track it in
>>> a separate way, you can also propose that. (For example a kanban board is
>>> used for portability effort which is fully supported in JIRA.)
>>>
>>> I will also call out to a few other people in addition to Holden who
>>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
>>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
>>> people (and myself) for reviews and other questions that you have.
>>>
>>> Welcome again, and looking forward to your contributions.
>>>
>>> Thank you,
>>> Ahmet
>>>
>>>
>>>
>>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <ro...@ml6.eu>
>>> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> In the next month(s), me and my colleague Matthias will commit a lot of
>>>> time and effort to python 3 support for beam and we would like to discuss
>>>> the best way to go forward with this.
>>>>
>>>> We have drawn up a document [1] with a high level outline of the
>>>> proposed approach and would like to get your feedback on this.
>>>>
>>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>>> for the past year. Other smaller issues have been opened, but it's hard to
>>>> track the general progress. It would be great if anyone could offer some
>>>> insights on how to best handle this project on Jira.
>>>>
>>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>>> python 3 support, so it would be great to get your insights and find a way
>>>> to merge our efforts.
>>>>
>>>> Kind regards,
>>>> Robbe
>>>>
>>>> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
>>>> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>>> --
>>>>
>>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>>
>>>> * Robbe Sneyders*
>>>>
>>>> ML6 Gent
>>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>>
>>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>>
>>>
>>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
> --
> Twitter: https://twitter.com/holdenkarau
>

Re: [PROPOSAL] Python 3 support

Posted by Holden Karau <ho...@pigscanfly.ca>.
On Tue, Mar 27, 2018 at 4:27 AM Robbe Sneyders <ro...@ml6.eu>
wrote:

> Hi Anand,
>
> Thanks for the feedback.
>
> It should be no problem to run everything on DataflowRunner as well.
> Are there any performance tests in place to check for performance
> regressions?
>
> Some questions were raised in the proposal document which I want to add to
> this conversation:
>
> The first comment was about the targeted python 3 versions. We proposed to
> target 3.6 since it is the latest version available and added 3.5 because
> 3.6 adoption seems rather low (hard to find any relevant sources on this
> though).
> If the beam community prefers 3.4, I would propose to target 3.4 only
> during porting and add 3.5 and 3.6 later so we don't slow down the porting
> progress. 3.4 has the advantage of already being installed on the workers
> and allows pySpark pipelines to be moved over to beam more easily.
> It would be great to get some opinions on this.
>
> Another comment was made on how to avoid regression during the porting
> progress.
> After applying step 1 and step 2, no python 3 compatibility lint warnings
> should remain, so it would be great if we could enforce this check for
> every pull request on an already updated subpackage.
> After applying step 3, all tests should run on python 3, so again it would
> be great if we can enforce these per updated subpackage.
> Any insights on how to best accomplish this?
>
So you can look at some of the recent changes to tox.ini in the git log to
see what we’ve done so far around this I suspect you can repeat that same
pattern.

>
> Thanks,
> Robbe
>
> On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Robbe.
>>
>> I reviewed the document it looks reasonable to me. I will touch on some
>> points that were not mentioned:
>> - Runner exercise different code paths. Doing auto conversions and
>> focusing on DirectRunner is not enough. It is worthwhile to run things on
>> DataflowRunner as well. This can be triggered from Jenkins. It will
>> validate that we are still compatible for python 2.
>> - Similar to above but with an eye on perf regressions.
>>
>> For project tracking on JIRA, please feel free to create any new issues,
>> close stale ones, or take ownership of any open issues. All JIRAs should be
>> assigned to the people actively working on them. If you wan to track it in
>> a separate way, you can also propose that. (For example a kanban board is
>> used for portability effort which is fully supported in JIRA.)
>>
>> I will also call out to a few other people in addition to Holden who
>> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
>> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
>> people (and myself) for reviews and other questions that you have.
>>
>> Welcome again, and looking forward to your contributions.
>>
>> Thank you,
>> Ahmet
>>
>>
>>
>> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <ro...@ml6.eu>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> In the next month(s), me and my colleague Matthias will commit a lot of
>>> time and effort to python 3 support for beam and we would like to discuss
>>> the best way to go forward with this.
>>>
>>> We have drawn up a document [1] with a high level outline of the
>>> proposed approach and would like to get your feedback on this.
>>>
>>> The main Jira issue [2] for python 3 support has been mostly inactive
>>> for the past year. Other smaller issues have been opened, but it's hard to
>>> track the general progress. It would be great if anyone could offer some
>>> insights on how to best handle this project on Jira.
>>>
>>> @Holden Karau, you seem to have already put in a lot of effort to add
>>> python 3 support, so it would be great to get your insights and find a way
>>> to merge our efforts.
>>>
>>> Kind regards,
>>> Robbe
>>>
>>> [1]
>>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>>
>>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>>> --
>>>
>>> [image: https://ml6.eu] <https://ml6.eu/>
>>>
>>> * Robbe Sneyders*
>>>
>>> ML6 Gent
>>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>>
>>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>>
>>
>> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08
>
-- 
Twitter: https://twitter.com/holdenkarau

Re: [PROPOSAL] Python 3 support

Posted by Robbe Sneyders <ro...@ml6.eu>.
Hi Anand,

Thanks for the feedback.

It should be no problem to run everything on DataflowRunner as well.
Are there any performance tests in place to check for performance
regressions?

Some questions were raised in the proposal document which I want to add to
this conversation:

The first comment was about the targeted python 3 versions. We proposed to
target 3.6 since it is the latest version available and added 3.5 because
3.6 adoption seems rather low (hard to find any relevant sources on this
though).
If the beam community prefers 3.4, I would propose to target 3.4 only
during porting and add 3.5 and 3.6 later so we don't slow down the porting
progress. 3.4 has the advantage of already being installed on the workers
and allows pySpark pipelines to be moved over to beam more easily.
It would be great to get some opinions on this.

Another comment was made on how to avoid regression during the porting
progress.
After applying step 1 and step 2, no python 3 compatibility lint warnings
should remain, so it would be great if we could enforce this check for
every pull request on an already updated subpackage.
After applying step 3, all tests should run on python 3, so again it would
be great if we can enforce these per updated subpackage.
Any insights on how to best accomplish this?

Thanks,
Robbe

On Fri, 23 Mar 2018 at 19:59 Ahmet Altay <al...@google.com> wrote:

> Thank you Robbe.
>
> I reviewed the document it looks reasonable to me. I will touch on some
> points that were not mentioned:
> - Runner exercise different code paths. Doing auto conversions and
> focusing on DirectRunner is not enough. It is worthwhile to run things on
> DataflowRunner as well. This can be triggered from Jenkins. It will
> validate that we are still compatible for python 2.
> - Similar to above but with an eye on perf regressions.
>
> For project tracking on JIRA, please feel free to create any new issues,
> close stale ones, or take ownership of any open issues. All JIRAs should be
> assigned to the people actively working on them. If you wan to track it in
> a separate way, you can also propose that. (For example a kanban board is
> used for portability effort which is fully supported in JIRA.)
>
> I will also call out to a few other people in addition to Holden who
> helped out or showed interest in helping with Python 3. @cclaus, @luke-zhu,
> @udim, @robertwb, @charlesccychen, @tvalentyn. You can include these
> people (and myself) for reviews and other questions that you have.
>
> Welcome again, and looking forward to your contributions.
>
> Thank you,
> Ahmet
>
>
>
> On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <ro...@ml6.eu>
> wrote:
>
>> Hello everyone,
>>
>> In the next month(s), me and my colleague Matthias will commit a lot of
>> time and effort to python 3 support for beam and we would like to discuss
>> the best way to go forward with this.
>>
>> We have drawn up a document [1] with a high level outline of the proposed
>> approach and would like to get your feedback on this.
>>
>> The main Jira issue [2] for python 3 support has been mostly inactive for
>> the past year. Other smaller issues have been opened, but it's hard to
>> track the general progress. It would be great if anyone could offer some
>> insights on how to best handle this project on Jira.
>>
>> @Holden Karau, you seem to have already put in a lot of effort to add
>> python 3 support, so it would be great to get your insights and find a way
>> to merge our efforts.
>>
>> Kind regards,
>> Robbe
>>
>> [1]
>> https://docs.google.com/document/d/1xDG0MWVlDKDPu_IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
>>
>> [2] https://issues.apache.org/jira/browse/BEAM-1251
>> --
>>
>> [image: https://ml6.eu] <https://ml6.eu/>
>>
>> * Robbe Sneyders*
>>
>> ML6 Gent
>> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>>
>> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>>
>
> --

[image: https://ml6.eu] <https://ml6.eu/>

* Robbe Sneyders*

ML6 Gent
<https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>

M: +32 474 71 31 08

Re: [PROPOSAL] Python 3 support

Posted by Ahmet Altay <al...@google.com>.
Thank you Robbe.

I reviewed the document it looks reasonable to me. I will touch on some
points that were not mentioned:
- Runner exercise different code paths. Doing auto conversions and focusing
on DirectRunner is not enough. It is worthwhile to run things on
DataflowRunner as well. This can be triggered from Jenkins. It will
validate that we are still compatible for python 2.
- Similar to above but with an eye on perf regressions.

For project tracking on JIRA, please feel free to create any new issues,
close stale ones, or take ownership of any open issues. All JIRAs should be
assigned to the people actively working on them. If you wan to track it in
a separate way, you can also propose that. (For example a kanban board is
used for portability effort which is fully supported in JIRA.)

I will also call out to a few other people in addition to Holden who helped
out or showed interest in helping with Python 3. @cclaus, @luke-zhu, @udim,
@robertwb, @charlesccychen, @tvalentyn. You can include these people (and
myself) for reviews and other questions that you have.

Welcome again, and looking forward to your contributions.

Thank you,
Ahmet



On Fri, Mar 23, 2018 at 9:27 AM, Robbe Sneyders <ro...@ml6.eu>
wrote:

> Hello everyone,
>
> In the next month(s), me and my colleague Matthias will commit a lot of
> time and effort to python 3 support for beam and we would like to discuss
> the best way to go forward with this.
>
> We have drawn up a document [1] with a high level outline of the proposed
> approach and would like to get your feedback on this.
>
> The main Jira issue [2] for python 3 support has been mostly inactive for
> the past year. Other smaller issues have been opened, but it's hard to
> track the general progress. It would be great if anyone could offer some
> insights on how to best handle this project on Jira.
>
> @Holden Karau, you seem to have already put in a lot of effort to add
> python 3 support, so it would be great to get your insights and find a way
> to merge our efforts.
>
> Kind regards,
> Robbe
>
> [1] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE/edit?usp=sharing
> [2] https://issues.apache.org/jira/browse/BEAM-1251
> --
>
> [image: https://ml6.eu] <https://ml6.eu/>
>
> * Robbe Sneyders*
>
> ML6 Gent
> <https://www.google.be/maps/place/ML6/@51.037408,3.7044893,17z/data=!3m1!4b1!4m5!3m4!1s0x47c37161feeca14b:0xb8f72585fdd21c90!8m2!3d51.037408!4d3.706678?hl=nl>
>
> M: +32 474 71 31 08 <+32%20474%2071%2031%2008>
>