You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Eugene Kirpichov <ki...@google.com.INVALID> on 2017/01/31 01:42:59 UTC

Let's make Beam transforms comply with PTransform Style Guide

Hello,

The PTransform Style Guide is live
https://beam.apache.org/contribute/ptransform-style-guide/ - a natural next
step is to audit Beam libraries for compliance and file JIRAs for places
that need to be fixed. It'd be great to finish these cleanups before
declaring Beam stable API.

Please take a look and file JIRAs / post suggestions on this thread!

I think it'll also make a great source of easy and useful work for new
contributors.

Some things I remember off the top of my head:
- TextIO, KafkaIO use coders improperly - coders should not be used as a
general-purpose byte parsing mechanism.
- HadoopFileSource is not packaged as a PTransform
- Some connectors, e.g. KafkaIO, should use AutoValue for their parameter
builders, but don't
- A few connectors improperly use
- Some transforms expose their transform type as "Something.Bound" and
"Something.Unbound", e.g. TextIO.Read.Bound - such names are banned

I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353 about
making existing Beam transforms comply with the guide - let's crowdsource
this!

Thanks.

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by tarush grover <ta...@gmail.com>.
No worries.

Regards,
Tarush

On Fri, 21 Apr 2017 at 11:20 AM, Eugene Kirpichov
<ki...@google.com.invalid> wrote:

> Guys, apologies, but I already have Kinesis in review, and Pubsub ready for
> review. I'm afraid there's not much left for volunteers to take on right
> now.
>
> On Thu, Apr 20, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Cool, I gonna take a look on PubSub later today (I would like to finish
> > CassandraIO, HDFS refactoring and Spark 2 support first ;)).
> >
> > Regards
> > JB
> >
> > On 04/21/2017 06:03 AM, tarush grover wrote:
> > > Hi,
> > >
> > > I can take kinesis one.
> > >
> > > Regards,
> > > Tarush
> > >
> > >
> > > On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > > wrote:
> > >
> > >> Gonna take a look on the pending IOs.
> > >>
> > >> Thanks !
> > >> Regards
> > >> JB
> > >>
> > >> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> > >>> A few more knocked down
> > >>> - I finished Map/FlatMap, XML, TFRecordIO
> > >>> - I'm working on CountingInput; it's nontrivial.
> > >>> - Reuven is working on Text/Avro
> > >>> - @peay is working on removing coders from KafkaIO
> > >>>
> > >>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
> > >>>
> > >>> Any takers?
> > >>>
> > >>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <
> jb@nanthrax.net>
> > >>> wrote:
> > >>>
> > >>>> Hi Eugene,
> > >>>>
> > >>>> thanks for the update. I'm volunteer to tackle some those IOs (and
> > make
> > >>>> them
> > >>>> conform with PTransform style guide). I'm pretty sure other people
> > will
> > >>>> jump on ;)
> > >>>>
> > >>>> Regards
> > >>>> JB
> > >>>>
> > >>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> > >>>>> Hey all,
> > >>>>>
> > >>>>> More progress has been made and we're nearing completion. ParDo,
> > >>>> BigQueryIO
> > >>>>> and Window are fixed; Map/FlatMapElements are in review.
> > >>>>>
> > >>>>> The remaining unclaimed ones are all IOs of some form, and here's a
> > >> list.
> > >>>>> I've marked them all as "starter" in JIRA.
> > >>>>>
> > >>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
> > >>>>> TFRecordIO (Tensorflow) -
> > >>>> https://issues.apache.org/jira/browse/BEAM-1913
> > >>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> > >>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> > >>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> > >>>>>
> > >>>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO,
> > is
> > >> a
> > >>>>> good model to follow when taking these on, as well as e.g.
> > >>>>> https://github.com/apache/beam/pull/1927 (TextIO)
> > >>>>>
> > >>>>> These are all actually easy to fix, but need volunteers (I do not
> > have
> > >>>> time
> > >>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
> > >>>>> Let's finish this up in time for the first Beam stable release, so
> > >> Beam's
> > >>>>> stable API surface is consistent and polished!
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >> --
> > >> Jean-Baptiste Onofré
> > >> jbonofre@apache.org
> > >> http://blog.nanthrax.net
> > >> Talend - http://www.talend.com
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Davor Bonaci <da...@apache.org>.
Thanks Eugene -- this is remarkable.

On Tue, May 2, 2017 at 11:54 PM, Eugene Kirpichov <
kirpichov@google.com.invalid> wrote:

> Hey all,
>
> The effort is complete: all transforms have been brought in accordance with
> the style guide and the JIRAs are closed!
>
> In nearly all cases the fixes introduced small but backward-incompatible
> changes, but always with a simple migration path, and I believe the Beam
> API surface is overall much better for it.
>
> For example, there are no more IOs that use Coder's as their primary way of
> interpreting binary data; no more ugly Bound/Unbound classes; no more IOs
> exposing their Source or Sink API directly (instead of packaging as
> PTransform); the code is cleaner and shorter (due to AutoValue and a more
> principled distinction between factory methods and builder methods) and
> there are a lot more canonical examples of how to write transforms for
> future authors, now that every transform shipped with the SDK is a
> canonical example :)
>
> The only thing remaining is adjusting the website documentation, release
> notes, etc. - I'll work on this tomorrow.
>
>
> On Thu, Apr 20, 2017 at 10:55 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > No problem ;)
> >
> > Happy to review if needed ;)
> >
> > Regards
> > JB
> >
> > On 04/21/2017 07:50 AM, Eugene Kirpichov wrote:
> > > Guys, apologies, but I already have Kinesis in review, and Pubsub ready
> > for
> > > review. I'm afraid there's not much left for volunteers to take on
> right
> > > now.
> > >
> > > On Thu, Apr 20, 2017 at 10:47 PM Jean-Baptiste Onofré <jb@nanthrax.net
> >
> > > wrote:
> > >
> > >> Cool, I gonna take a look on PubSub later today (I would like to
> finish
> > >> CassandraIO, HDFS refactoring and Spark 2 support first ;)).
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> On 04/21/2017 06:03 AM, tarush grover wrote:
> > >>> Hi,
> > >>>
> > >>> I can take kinesis one.
> > >>>
> > >>> Regards,
> > >>> Tarush
> > >>>
> > >>>
> > >>> On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net
> > >
> > >>> wrote:
> > >>>
> > >>>> Gonna take a look on the pending IOs.
> > >>>>
> > >>>> Thanks !
> > >>>> Regards
> > >>>> JB
> > >>>>
> > >>>> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> > >>>>> A few more knocked down
> > >>>>> - I finished Map/FlatMap, XML, TFRecordIO
> > >>>>> - I'm working on CountingInput; it's nontrivial.
> > >>>>> - Reuven is working on Text/Avro
> > >>>>> - @peay is working on removing coders from KafkaIO
> > >>>>>
> > >>>>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
> > >>>>>
> > >>>>> Any takers?
> > >>>>>
> > >>>>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <
> > jb@nanthrax.net>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Eugene,
> > >>>>>>
> > >>>>>> thanks for the update. I'm volunteer to tackle some those IOs (and
> > >> make
> > >>>>>> them
> > >>>>>> conform with PTransform style guide). I'm pretty sure other people
> > >> will
> > >>>>>> jump on ;)
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> JB
> > >>>>>>
> > >>>>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> > >>>>>>> Hey all,
> > >>>>>>>
> > >>>>>>> More progress has been made and we're nearing completion. ParDo,
> > >>>>>> BigQueryIO
> > >>>>>>> and Window are fixed; Map/FlatMapElements are in review.
> > >>>>>>>
> > >>>>>>> The remaining unclaimed ones are all IOs of some form, and
> here's a
> > >>>> list.
> > >>>>>>> I've marked them all as "starter" in JIRA.
> > >>>>>>>
> > >>>>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
> > >>>>>>> TFRecordIO (Tensorflow) -
> > >>>>>> https://issues.apache.org/jira/browse/BEAM-1913
> > >>>>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> > >>>>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> > >>>>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> > >>>>>>>
> > >>>>>>> https://github.com/apache/beam/pull/2149 , which fixes
> BigQueryIO,
> > >> is
> > >>>> a
> > >>>>>>> good model to follow when taking these on, as well as e.g.
> > >>>>>>> https://github.com/apache/beam/pull/1927 (TextIO)
> > >>>>>>>
> > >>>>>>> These are all actually easy to fix, but need volunteers (I do not
> > >> have
> > >>>>>> time
> > >>>>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
> > >>>>>>> Let's finish this up in time for the first Beam stable release,
> so
> > >>>> Beam's
> > >>>>>>> stable API surface is consistent and polished!
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> Jean-Baptiste Onofré
> > >>>> jbonofre@apache.org
> > >>>> http://blog.nanthrax.net
> > >>>> Talend - http://www.talend.com
> > >>>>
> > >>>
> > >>
> > >> --
> > >> Jean-Baptiste Onofré
> > >> jbonofre@apache.org
> > >> http://blog.nanthrax.net
> > >> Talend - http://www.talend.com
> > >>
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hey all,

The effort is complete: all transforms have been brought in accordance with
the style guide and the JIRAs are closed!

In nearly all cases the fixes introduced small but backward-incompatible
changes, but always with a simple migration path, and I believe the Beam
API surface is overall much better for it.

For example, there are no more IOs that use Coder's as their primary way of
interpreting binary data; no more ugly Bound/Unbound classes; no more IOs
exposing their Source or Sink API directly (instead of packaging as
PTransform); the code is cleaner and shorter (due to AutoValue and a more
principled distinction between factory methods and builder methods) and
there are a lot more canonical examples of how to write transforms for
future authors, now that every transform shipped with the SDK is a
canonical example :)

The only thing remaining is adjusting the website documentation, release
notes, etc. - I'll work on this tomorrow.


On Thu, Apr 20, 2017 at 10:55 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> No problem ;)
>
> Happy to review if needed ;)
>
> Regards
> JB
>
> On 04/21/2017 07:50 AM, Eugene Kirpichov wrote:
> > Guys, apologies, but I already have Kinesis in review, and Pubsub ready
> for
> > review. I'm afraid there's not much left for volunteers to take on right
> > now.
> >
> > On Thu, Apr 20, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> >> Cool, I gonna take a look on PubSub later today (I would like to finish
> >> CassandraIO, HDFS refactoring and Spark 2 support first ;)).
> >>
> >> Regards
> >> JB
> >>
> >> On 04/21/2017 06:03 AM, tarush grover wrote:
> >>> Hi,
> >>>
> >>> I can take kinesis one.
> >>>
> >>> Regards,
> >>> Tarush
> >>>
> >>>
> >>> On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofré <jb@nanthrax.net
> >
> >>> wrote:
> >>>
> >>>> Gonna take a look on the pending IOs.
> >>>>
> >>>> Thanks !
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> >>>>> A few more knocked down
> >>>>> - I finished Map/FlatMap, XML, TFRecordIO
> >>>>> - I'm working on CountingInput; it's nontrivial.
> >>>>> - Reuven is working on Text/Avro
> >>>>> - @peay is working on removing coders from KafkaIO
> >>>>>
> >>>>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
> >>>>>
> >>>>> Any takers?
> >>>>>
> >>>>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <
> jb@nanthrax.net>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Eugene,
> >>>>>>
> >>>>>> thanks for the update. I'm volunteer to tackle some those IOs (and
> >> make
> >>>>>> them
> >>>>>> conform with PTransform style guide). I'm pretty sure other people
> >> will
> >>>>>> jump on ;)
> >>>>>>
> >>>>>> Regards
> >>>>>> JB
> >>>>>>
> >>>>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> >>>>>>> Hey all,
> >>>>>>>
> >>>>>>> More progress has been made and we're nearing completion. ParDo,
> >>>>>> BigQueryIO
> >>>>>>> and Window are fixed; Map/FlatMapElements are in review.
> >>>>>>>
> >>>>>>> The remaining unclaimed ones are all IOs of some form, and here's a
> >>>> list.
> >>>>>>> I've marked them all as "starter" in JIRA.
> >>>>>>>
> >>>>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
> >>>>>>> TFRecordIO (Tensorflow) -
> >>>>>> https://issues.apache.org/jira/browse/BEAM-1913
> >>>>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> >>>>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> >>>>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> >>>>>>>
> >>>>>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO,
> >> is
> >>>> a
> >>>>>>> good model to follow when taking these on, as well as e.g.
> >>>>>>> https://github.com/apache/beam/pull/1927 (TextIO)
> >>>>>>>
> >>>>>>> These are all actually easy to fix, but need volunteers (I do not
> >> have
> >>>>>> time
> >>>>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
> >>>>>>> Let's finish this up in time for the first Beam stable release, so
> >>>> Beam's
> >>>>>>> stable API surface is consistent and polished!
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbonofre@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
No problem ;)

Happy to review if needed ;)

Regards
JB

On 04/21/2017 07:50 AM, Eugene Kirpichov wrote:
> Guys, apologies, but I already have Kinesis in review, and Pubsub ready for
> review. I'm afraid there's not much left for volunteers to take on right
> now.
>
> On Thu, Apr 20, 2017 at 10:47 PM Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Cool, I gonna take a look on PubSub later today (I would like to finish
>> CassandraIO, HDFS refactoring and Spark 2 support first ;)).
>>
>> Regards
>> JB
>>
>> On 04/21/2017 06:03 AM, tarush grover wrote:
>>> Hi,
>>>
>>> I can take kinesis one.
>>>
>>> Regards,
>>> Tarush
>>>
>>>
>>> On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>> wrote:
>>>
>>>> Gonna take a look on the pending IOs.
>>>>
>>>> Thanks !
>>>> Regards
>>>> JB
>>>>
>>>> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
>>>>> A few more knocked down
>>>>> - I finished Map/FlatMap, XML, TFRecordIO
>>>>> - I'm working on CountingInput; it's nontrivial.
>>>>> - Reuven is working on Text/Avro
>>>>> - @peay is working on removing coders from KafkaIO
>>>>>
>>>>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
>>>>>
>>>>> Any takers?
>>>>>
>>>>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>>>> wrote:
>>>>>
>>>>>> Hi Eugene,
>>>>>>
>>>>>> thanks for the update. I'm volunteer to tackle some those IOs (and
>> make
>>>>>> them
>>>>>> conform with PTransform style guide). I'm pretty sure other people
>> will
>>>>>> jump on ;)
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
>>>>>>> Hey all,
>>>>>>>
>>>>>>> More progress has been made and we're nearing completion. ParDo,
>>>>>> BigQueryIO
>>>>>>> and Window are fixed; Map/FlatMapElements are in review.
>>>>>>>
>>>>>>> The remaining unclaimed ones are all IOs of some form, and here's a
>>>> list.
>>>>>>> I've marked them all as "starter" in JIRA.
>>>>>>>
>>>>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
>>>>>>> TFRecordIO (Tensorflow) -
>>>>>> https://issues.apache.org/jira/browse/BEAM-1913
>>>>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
>>>>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
>>>>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
>>>>>>>
>>>>>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO,
>> is
>>>> a
>>>>>>> good model to follow when taking these on, as well as e.g.
>>>>>>> https://github.com/apache/beam/pull/1927 (TextIO)
>>>>>>>
>>>>>>> These are all actually easy to fix, but need volunteers (I do not
>> have
>>>>>> time
>>>>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
>>>>>>> Let's finish this up in time for the first Beam stable release, so
>>>> Beam's
>>>>>>> stable API surface is consistent and polished!
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Jean-Baptiste Onofr�
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>
>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Guys, apologies, but I already have Kinesis in review, and Pubsub ready for
review. I'm afraid there's not much left for volunteers to take on right
now.

On Thu, Apr 20, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Cool, I gonna take a look on PubSub later today (I would like to finish
> CassandraIO, HDFS refactoring and Spark 2 support first ;)).
>
> Regards
> JB
>
> On 04/21/2017 06:03 AM, tarush grover wrote:
> > Hi,
> >
> > I can take kinesis one.
> >
> > Regards,
> > Tarush
> >
> >
> > On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> >> Gonna take a look on the pending IOs.
> >>
> >> Thanks !
> >> Regards
> >> JB
> >>
> >> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> >>> A few more knocked down
> >>> - I finished Map/FlatMap, XML, TFRecordIO
> >>> - I'm working on CountingInput; it's nontrivial.
> >>> - Reuven is working on Text/Avro
> >>> - @peay is working on removing coders from KafkaIO
> >>>
> >>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
> >>>
> >>> Any takers?
> >>>
> >>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> >>> wrote:
> >>>
> >>>> Hi Eugene,
> >>>>
> >>>> thanks for the update. I'm volunteer to tackle some those IOs (and
> make
> >>>> them
> >>>> conform with PTransform style guide). I'm pretty sure other people
> will
> >>>> jump on ;)
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> >>>>> Hey all,
> >>>>>
> >>>>> More progress has been made and we're nearing completion. ParDo,
> >>>> BigQueryIO
> >>>>> and Window are fixed; Map/FlatMapElements are in review.
> >>>>>
> >>>>> The remaining unclaimed ones are all IOs of some form, and here's a
> >> list.
> >>>>> I've marked them all as "starter" in JIRA.
> >>>>>
> >>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
> >>>>> TFRecordIO (Tensorflow) -
> >>>> https://issues.apache.org/jira/browse/BEAM-1913
> >>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> >>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> >>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> >>>>>
> >>>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO,
> is
> >> a
> >>>>> good model to follow when taking these on, as well as e.g.
> >>>>> https://github.com/apache/beam/pull/1927 (TextIO)
> >>>>>
> >>>>> These are all actually easy to fix, but need volunteers (I do not
> have
> >>>> time
> >>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
> >>>>> Let's finish this up in time for the first Beam stable release, so
> >> Beam's
> >>>>> stable API surface is consistent and polished!
> >>>>>
> >>>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Cool, I gonna take a look on PubSub later today (I would like to finish 
CassandraIO, HDFS refactoring and Spark 2 support first ;)).

Regards
JB

On 04/21/2017 06:03 AM, tarush grover wrote:
> Hi,
>
> I can take kinesis one.
>
> Regards,
> Tarush
>
>
> On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Gonna take a look on the pending IOs.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
>>> A few more knocked down
>>> - I finished Map/FlatMap, XML, TFRecordIO
>>> - I'm working on CountingInput; it's nontrivial.
>>> - Reuven is working on Text/Avro
>>> - @peay is working on removing coders from KafkaIO
>>>
>>> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
>>>
>>> Any takers?
>>>
>>> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Eugene,
>>>>
>>>> thanks for the update. I'm volunteer to tackle some those IOs (and make
>>>> them
>>>> conform with PTransform style guide). I'm pretty sure other people will
>>>> jump on ;)
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
>>>>> Hey all,
>>>>>
>>>>> More progress has been made and we're nearing completion. ParDo,
>>>> BigQueryIO
>>>>> and Window are fixed; Map/FlatMapElements are in review.
>>>>>
>>>>> The remaining unclaimed ones are all IOs of some form, and here's a
>> list.
>>>>> I've marked them all as "starter" in JIRA.
>>>>>
>>>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
>>>>> TFRecordIO (Tensorflow) -
>>>> https://issues.apache.org/jira/browse/BEAM-1913
>>>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
>>>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
>>>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
>>>>>
>>>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is
>> a
>>>>> good model to follow when taking these on, as well as e.g.
>>>>> https://github.com/apache/beam/pull/1927 (TextIO)
>>>>>
>>>>> These are all actually easy to fix, but need volunteers (I do not have
>>>> time
>>>>> to fix all of these myself, but happy to be a reviewer - @jkff).
>>>>> Let's finish this up in time for the first Beam stable release, so
>> Beam's
>>>>> stable API surface is consistent and polished!
>>>>>
>>>>
>>>
>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by tarush grover <ta...@gmail.com>.
Hi,

I can take kinesis one.

Regards,
Tarush


On Thu, 20 Apr 2017 at 11:18 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Gonna take a look on the pending IOs.
>
> Thanks !
> Regards
> JB
>
> On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> > A few more knocked down
> > - I finished Map/FlatMap, XML, TFRecordIO
> > - I'm working on CountingInput; it's nontrivial.
> > - Reuven is working on Text/Avro
> > - @peay is working on removing coders from KafkaIO
> >
> > Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
> >
> > Any takers?
> >
> > On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> >> Hi Eugene,
> >>
> >> thanks for the update. I'm volunteer to tackle some those IOs (and make
> >> them
> >> conform with PTransform style guide). I'm pretty sure other people will
> >> jump on ;)
> >>
> >> Regards
> >> JB
> >>
> >> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> >>> Hey all,
> >>>
> >>> More progress has been made and we're nearing completion. ParDo,
> >> BigQueryIO
> >>> and Window are fixed; Map/FlatMapElements are in review.
> >>>
> >>> The remaining unclaimed ones are all IOs of some form, and here's a
> list.
> >>> I've marked them all as "starter" in JIRA.
> >>>
> >>> XML - https://issues.apache.org/jira/browse/BEAM-1914
> >>> TFRecordIO (Tensorflow) -
> >> https://issues.apache.org/jira/browse/BEAM-1913
> >>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> >>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> >>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> >>>
> >>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is
> a
> >>> good model to follow when taking these on, as well as e.g.
> >>> https://github.com/apache/beam/pull/1927 (TextIO)
> >>>
> >>> These are all actually easy to fix, but need volunteers (I do not have
> >> time
> >>> to fix all of these myself, but happy to be a reviewer - @jkff).
> >>> Let's finish this up in time for the first Beam stable release, so
> Beam's
> >>> stable API surface is consistent and polished!
> >>>
> >>
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Gonna take a look on the pending IOs.

Thanks !
Regards
JB

On 04/19/2017 10:05 PM, Eugene Kirpichov wrote:
> A few more knocked down
> - I finished Map/FlatMap, XML, TFRecordIO
> - I'm working on CountingInput; it's nontrivial.
> - Reuven is working on Text/Avro
> - @peay is working on removing coders from KafkaIO
>
> Kinesis and PubsubIO remain; of these, Kinesis is the easier one.
>
> Any takers?
>
> On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Hi Eugene,
>>
>> thanks for the update. I'm volunteer to tackle some those IOs (and make
>> them
>> conform with PTransform style guide). I'm pretty sure other people will
>> jump on ;)
>>
>> Regards
>> JB
>>
>> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
>>> Hey all,
>>>
>>> More progress has been made and we're nearing completion. ParDo,
>> BigQueryIO
>>> and Window are fixed; Map/FlatMapElements are in review.
>>>
>>> The remaining unclaimed ones are all IOs of some form, and here's a list.
>>> I've marked them all as "starter" in JIRA.
>>>
>>> XML - https://issues.apache.org/jira/browse/BEAM-1914
>>> TFRecordIO (Tensorflow) -
>> https://issues.apache.org/jira/browse/BEAM-1913
>>> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
>>> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
>>> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
>>>
>>> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is a
>>> good model to follow when taking these on, as well as e.g.
>>> https://github.com/apache/beam/pull/1927 (TextIO)
>>>
>>> These are all actually easy to fix, but need volunteers (I do not have
>> time
>>> to fix all of these myself, but happy to be a reviewer - @jkff).
>>> Let's finish this up in time for the first Beam stable release, so Beam's
>>> stable API surface is consistent and polished!
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
A few more knocked down
- I finished Map/FlatMap, XML, TFRecordIO
- I'm working on CountingInput; it's nontrivial.
- Reuven is working on Text/Avro
- @peay is working on removing coders from KafkaIO

Kinesis and PubsubIO remain; of these, Kinesis is the easier one.

Any takers?

On Fri, Apr 7, 2017 at 10:47 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Eugene,
>
> thanks for the update. I'm volunteer to tackle some those IOs (and make
> them
> conform with PTransform style guide). I'm pretty sure other people will
> jump on ;)
>
> Regards
> JB
>
> On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> > Hey all,
> >
> > More progress has been made and we're nearing completion. ParDo,
> BigQueryIO
> > and Window are fixed; Map/FlatMapElements are in review.
> >
> > The remaining unclaimed ones are all IOs of some form, and here's a list.
> > I've marked them all as "starter" in JIRA.
> >
> > XML - https://issues.apache.org/jira/browse/BEAM-1914
> > TFRecordIO (Tensorflow) -
> https://issues.apache.org/jira/browse/BEAM-1913
> > KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> > PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> > CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
> >
> > https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is a
> > good model to follow when taking these on, as well as e.g.
> > https://github.com/apache/beam/pull/1927 (TextIO)
> >
> > These are all actually easy to fix, but need volunteers (I do not have
> time
> > to fix all of these myself, but happy to be a reviewer - @jkff).
> > Let's finish this up in time for the first Beam stable release, so Beam's
> > stable API surface is consistent and polished!
> >
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Eugene,

thanks for the update. I'm volunteer to tackle some those IOs (and make them 
conform with PTransform style guide). I'm pretty sure other people will jump on ;)

Regards
JB

On 04/08/2017 12:20 AM, Eugene Kirpichov wrote:
> Hey all,
>
> More progress has been made and we're nearing completion. ParDo, BigQueryIO
> and Window are fixed; Map/FlatMapElements are in review.
>
> The remaining unclaimed ones are all IOs of some form, and here's a list.
> I've marked them all as "starter" in JIRA.
>
> XML - https://issues.apache.org/jira/browse/BEAM-1914
> TFRecordIO (Tensorflow) - https://issues.apache.org/jira/browse/BEAM-1913
> KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
> PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
> CountingInput - https://issues.apache.org/jira/browse/BEAM-1414
>
> https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is a
> good model to follow when taking these on, as well as e.g.
> https://github.com/apache/beam/pull/1927 (TextIO)
>
> These are all actually easy to fix, but need volunteers (I do not have time
> to fix all of these myself, but happy to be a reviewer - @jkff).
> Let's finish this up in time for the first Beam stable release, so Beam's
> stable API surface is consistent and polished!
>
> On Fri, Mar 3, 2017 at 12:00 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> It sounds like a great idea in general, though there are actually not that
>> many issues remaining - I fixed a bunch, and have PRs out for two more
>> (ParDo and BigQueryIO).
>> The remaining ones are:
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1355 <https://issues.apache.org/jira/browse/BEAM-1355> HDFS
>> IO should comply with PTransform style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1402 <https://issues.apache.org/jira/browse/BEAM-1402> Make
>> TextIO and AvroIO use best-practice types.
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1414 <https://issues.apache.org/jira/browse/BEAM-1414> CountingInput
>> should comply with PTransform style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1415 <https://issues.apache.org/jira/browse/BEAM-1415> PubsubIO
>> should comply with PTransfrom style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1418 <https://issues.apache.org/jira/browse/BEAM-1418> MapElements
>> and FlatMapElements should comply with PTransform style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1425 <https://issues.apache.org/jira/browse/BEAM-1425> Window
>> should comply with PTransform style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>> [image: Bug - A problem which impairs or prevents the functions of the
>> product.] BEAM-1428 <https://issues.apache.org/jira/browse/BEAM-1428> KinesisIO
>> should comply with PTransform style guide
>>
>>    - [image: Major - Major loss of function.]
>>    - OPEN
>>
>>
>> HDFS is pending the other mailing list thread.
>> For TextIO/AvroIO there's a pending PR by Reuven.
>> Window and Map/FlatMapElements could use a brief discussion on the mailing
>> list as to what the proper API should be.
>> KinesisIO is, I think, due for a rewrite.
>>
>> However, CountingInput and PubsubIO I think are great targets for people
>> to pick up. Any volunteers?
>>
>>
>> On Fri, Mar 3, 2017 at 11:30 AM Robert Bradshaw
>> <ro...@google.com.invalid> wrote:
>>
>> Here's a crazy idea: what if we had a virtual fixit/hackathon to knock
>> these out (similar to the virtual meet-up, but with an agenda)? I find
>> communal hacking sessions towards a common goal are a good way to get to
>> know each other and get a lot done. Would there be any interest in this?
>>
>> On Wed, Mar 1, 2017 at 11:30 AM, Eugene Kirpichov <
>> kirpichov@google.com.invalid> wrote:
>>
>>> Hey all,
>>>
>>> First couple rounds of fixes are in. Thanks Aviem Zur for contributing
>>> TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
>>> progress (https://github.com/apache/beam/pull/1927).
>>>
>>> Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues
>> for
>>> the status.
>>> Many of the changes are backwards-incompatible (though with only minor
>>> changes in pipelines required). I'll make a separate announcement about
>>> that on the users list now.
>>>
>>> Call for help with the other sub-issues still stands! They are pretty
>>> simple work items but pretty important since this is a blocker for
>>> declaring stable BEAM API.
>>>
>>> On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>> wrote:
>>>
>>> Thanks Eugene.
>>>
>>> I will tackle some Jira when back next week.
>>>
>>> Regards
>>> JB
>>>
>>> On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
>>> <ki...@google.com.INVALID> wrote:
>>>> Hey all,
>>>>
>>>> I bit the bullet and audited all PTransform classes in Beam Java SDK
>>>> and
>>>> filed JIRA issues for all violations I could find.
>>>> I linked all them to the master JIRA issue
>>>> https://issues.apache.org/jira/browse/BEAM-1353
>>>>
>>>> In general, all of these should be fixed before declaring Beam stable
>>>> API.
>>>> Would appreciate if some senior folks looked at the issues and
>>>> confirmed
>>>> that my suggested changes make sense.
>>>>
>>>> PRs very welcome :) (though I'll be gone for the next few weeks so I
>>>> can't
>>>> review right now)
>>>> Many of these are very easy to fix (a few lines of code); some require
>>>> a
>>>> little more code, but as far as I can tell all of them are mechanical.
>>>>
>>>>
>>>> On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
>>>> wrote:
>>>>
>>>>> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
>>>> <dh...@google.com.invalid>
>>>>> wrote:
>>>>>
>>>>> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
>>>>> kirpichov@google.com.invalid> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> The PTransform Style Guide is live
>>>>>> https://beam.apache.org/contribute/ptransform-style-guide/ - a
>>>> natural
>>>>>> next
>>>>>> step is to audit Beam libraries for compliance and file JIRAs for
>>>> places
>>>>>> that need to be fixed. It'd be great to finish these cleanups
>>>> before
>>>>>> declaring Beam stable API.
>>>>>>
>>>>>> Please take a look and file JIRAs / post suggestions on this
>>>> thread!
>>>>>>
>>>>>> I think it'll also make a great source of easy and useful work for
>>>> new
>>>>>> contributors.
>>>>>>
>>>>>> Some things I remember off the top of my head:
>>>>>> - TextIO, KafkaIO use coders improperly - coders should not be used
>>>> as a
>>>>>> general-purpose byte parsing mechanism.
>>>>>>
>>>>>
>>>>> Can you say more about Kafka? Kafka actually exports byte[] by
>>>> default,
>>>>> whereas Text files are String by default. So it does not seem nearly
>>>> as
>>>>> egregious for Kafka as it is for Text.
>>>>>
>>>>> Agreed that KafkaIO is less egregious, but it still has methods
>>>>> withKeyCoder and withValueCoder - these should be replaced with
>>>> something
>>>>> that doesn't take Coder.
>>>>>
>>>>>
>>>>>
>>>>> - HadoopFileSource is not packaged as a PTransform
>>>>>> - Some connectors, e.g. KafkaIO, should use AutoValue for their
>>>> parameter
>>>>>> builders, but don't
>>>>>>
>>>>>
>>>>> Isn't AutoValue entirely an internal implementation detail that is
>>>> not
>>>>> exposed(*) to users? I think this is irrelevant to a stable API.
>>>>>
>>>>> Agreed - doesn't block stable API, but still a good thing to do
>>>> because it
>>>>> makes the code cleaner (for KafkaIO there's a long-standing PR that
>>>> was
>>>>> blocked on ratifying the style guide
>>>>> https://github.com/apache/beam/pull/1048)
>>>>>
>>>>>
>>>>>
>>>>> (*) except that it makes transforms not able to be final, which is a
>>>>> regression.
>>>>>
>>>>> I think AutoValue use should generally be considered *very* optional.
>>>> In
>>>>> transforms I author, I prefer not to use AutoValue because it makes
>>>> the
>>>>> code more complex and less readable.
>>>>>
>>>>> Yeah, guidance on when to use / not use AutoValue could be improved.
>>>> I
>>>>> think it makes a lot of sense when the transform has more than one or
>>>> two
>>>>> parameters or when the set of parameters can grow.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> - A few connectors improperly use
>>>>>> - Some transforms expose their transform type as "Something.Bound"
>>>> and
>>>>>> "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
>>>>>>
>>>>>
>>>>> "banned" is a strong word to use here. All of these are just
>>>>> recommendations.
>>>>>
>>>>> In general yes; the goal of the style guide is to be the default,
>>>> where if
>>>>> you deviate from it, you should have a good reason. I don't think
>>>> there
>>>>> ever exists a good reason to name a transform Something.Bound/Unbound
>>>>> though.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> I filed an umbrella JIRA
>>>> https://issues.apache.org/jira/browse/BEAM-1353
>>>>>> about
>>>>>> making existing Beam transforms comply with the guide - let's
>>>> crowdsource
>>>>>> this!
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hey all,

More progress has been made and we're nearing completion. ParDo, BigQueryIO
and Window are fixed; Map/FlatMapElements are in review.

The remaining unclaimed ones are all IOs of some form, and here's a list.
I've marked them all as "starter" in JIRA.

XML - https://issues.apache.org/jira/browse/BEAM-1914
TFRecordIO (Tensorflow) - https://issues.apache.org/jira/browse/BEAM-1913
KinesisIO - https://issues.apache.org/jira/browse/BEAM-1428
PubsubIO - https://issues.apache.org/jira/browse/BEAM-1415
CountingInput - https://issues.apache.org/jira/browse/BEAM-1414

https://github.com/apache/beam/pull/2149 , which fixes BigQueryIO, is a
good model to follow when taking these on, as well as e.g.
https://github.com/apache/beam/pull/1927 (TextIO)

These are all actually easy to fix, but need volunteers (I do not have time
to fix all of these myself, but happy to be a reviewer - @jkff).
Let's finish this up in time for the first Beam stable release, so Beam's
stable API surface is consistent and polished!

On Fri, Mar 3, 2017 at 12:00 PM Eugene Kirpichov <ki...@google.com>
wrote:

> It sounds like a great idea in general, though there are actually not that
> many issues remaining - I fixed a bunch, and have PRs out for two more
> (ParDo and BigQueryIO).
> The remaining ones are:
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1355 <https://issues.apache.org/jira/browse/BEAM-1355> HDFS
> IO should comply with PTransform style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1402 <https://issues.apache.org/jira/browse/BEAM-1402> Make
> TextIO and AvroIO use best-practice types.
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1414 <https://issues.apache.org/jira/browse/BEAM-1414> CountingInput
> should comply with PTransform style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1415 <https://issues.apache.org/jira/browse/BEAM-1415> PubsubIO
> should comply with PTransfrom style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1418 <https://issues.apache.org/jira/browse/BEAM-1418> MapElements
> and FlatMapElements should comply with PTransform style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1425 <https://issues.apache.org/jira/browse/BEAM-1425> Window
> should comply with PTransform style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
> [image: Bug - A problem which impairs or prevents the functions of the
> product.] BEAM-1428 <https://issues.apache.org/jira/browse/BEAM-1428> KinesisIO
> should comply with PTransform style guide
>
>    - [image: Major - Major loss of function.]
>    - OPEN
>
>
> HDFS is pending the other mailing list thread.
> For TextIO/AvroIO there's a pending PR by Reuven.
> Window and Map/FlatMapElements could use a brief discussion on the mailing
> list as to what the proper API should be.
> KinesisIO is, I think, due for a rewrite.
>
> However, CountingInput and PubsubIO I think are great targets for people
> to pick up. Any volunteers?
>
>
> On Fri, Mar 3, 2017 at 11:30 AM Robert Bradshaw
> <ro...@google.com.invalid> wrote:
>
> Here's a crazy idea: what if we had a virtual fixit/hackathon to knock
> these out (similar to the virtual meet-up, but with an agenda)? I find
> communal hacking sessions towards a common goal are a good way to get to
> know each other and get a lot done. Would there be any interest in this?
>
> On Wed, Mar 1, 2017 at 11:30 AM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hey all,
> >
> > First couple rounds of fixes are in. Thanks Aviem Zur for contributing
> > TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
> > progress (https://github.com/apache/beam/pull/1927).
> >
> > Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues
> for
> > the status.
> > Many of the changes are backwards-incompatible (though with only minor
> > changes in pipelines required). I'll make a separate announcement about
> > that on the users list now.
> >
> > Call for help with the other sub-issues still stands! They are pretty
> > simple work items but pretty important since this is a blocker for
> > declaring stable BEAM API.
> >
> > On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > Thanks Eugene.
> >
> > I will tackle some Jira when back next week.
> >
> > Regards
> > JB
> >
> > On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
> > <ki...@google.com.INVALID> wrote:
> > >Hey all,
> > >
> > >I bit the bullet and audited all PTransform classes in Beam Java SDK
> > >and
> > >filed JIRA issues for all violations I could find.
> > >I linked all them to the master JIRA issue
> > >https://issues.apache.org/jira/browse/BEAM-1353
> > >
> > >In general, all of these should be fixed before declaring Beam stable
> > >API.
> > >Would appreciate if some senior folks looked at the issues and
> > >confirmed
> > >that my suggested changes make sense.
> > >
> > >PRs very welcome :) (though I'll be gone for the next few weeks so I
> > >can't
> > >review right now)
> > >Many of these are very easy to fix (a few lines of code); some require
> > >a
> > >little more code, but as far as I can tell all of them are mechanical.
> > >
> > >
> > >On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
> > >wrote:
> > >
> > >> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
> > ><dh...@google.com.invalid>
> > >> wrote:
> > >>
> > >> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> > >> kirpichov@google.com.invalid> wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > The PTransform Style Guide is live
> > >> > https://beam.apache.org/contribute/ptransform-style-guide/ - a
> > >natural
> > >> > next
> > >> > step is to audit Beam libraries for compliance and file JIRAs for
> > >places
> > >> > that need to be fixed. It'd be great to finish these cleanups
> > >before
> > >> > declaring Beam stable API.
> > >> >
> > >> > Please take a look and file JIRAs / post suggestions on this
> > >thread!
> > >> >
> > >> > I think it'll also make a great source of easy and useful work for
> > >new
> > >> > contributors.
> > >> >
> > >> > Some things I remember off the top of my head:
> > >> > - TextIO, KafkaIO use coders improperly - coders should not be used
> > >as a
> > >> > general-purpose byte parsing mechanism.
> > >> >
> > >>
> > >> Can you say more about Kafka? Kafka actually exports byte[] by
> > >default,
> > >> whereas Text files are String by default. So it does not seem nearly
> > >as
> > >> egregious for Kafka as it is for Text.
> > >>
> > >> Agreed that KafkaIO is less egregious, but it still has methods
> > >> withKeyCoder and withValueCoder - these should be replaced with
> > >something
> > >> that doesn't take Coder.
> > >>
> > >>
> > >>
> > >> - HadoopFileSource is not packaged as a PTransform
> > >> > - Some connectors, e.g. KafkaIO, should use AutoValue for their
> > >parameter
> > >> > builders, but don't
> > >> >
> > >>
> > >> Isn't AutoValue entirely an internal implementation detail that is
> > >not
> > >> exposed(*) to users? I think this is irrelevant to a stable API.
> > >>
> > >> Agreed - doesn't block stable API, but still a good thing to do
> > >because it
> > >> makes the code cleaner (for KafkaIO there's a long-standing PR that
> > >was
> > >> blocked on ratifying the style guide
> > >> https://github.com/apache/beam/pull/1048)
> > >>
> > >>
> > >>
> > >> (*) except that it makes transforms not able to be final, which is a
> > >> regression.
> > >>
> > >> I think AutoValue use should generally be considered *very* optional.
> > >In
> > >> transforms I author, I prefer not to use AutoValue because it makes
> > >the
> > >> code more complex and less readable.
> > >>
> > >> Yeah, guidance on when to use / not use AutoValue could be improved.
> > >I
> > >> think it makes a lot of sense when the transform has more than one or
> > >two
> > >> parameters or when the set of parameters can grow.
> > >>
> > >>
> > >>
> > >>
> > >> > - A few connectors improperly use
> > >> > - Some transforms expose their transform type as "Something.Bound"
> > >and
> > >> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
> > >> >
> > >>
> > >> "banned" is a strong word to use here. All of these are just
> > >> recommendations.
> > >>
> > >> In general yes; the goal of the style guide is to be the default,
> > >where if
> > >> you deviate from it, you should have a good reason. I don't think
> > >there
> > >> ever exists a good reason to name a transform Something.Bound/Unbound
> > >> though.
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> > I filed an umbrella JIRA
> > >https://issues.apache.org/jira/browse/BEAM-1353
> > >> > about
> > >> > making existing Beam transforms comply with the guide - let's
> > >crowdsource
> > >> > this!
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >>
> >
>
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
It sounds like a great idea in general, though there are actually not that
many issues remaining - I fixed a bunch, and have PRs out for two more
(ParDo and BigQueryIO).
The remaining ones are:

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1355 <https://issues.apache.org/jira/browse/BEAM-1355> HDFS
IO should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1402 <https://issues.apache.org/jira/browse/BEAM-1402> Make
TextIO and AvroIO use best-practice types.

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1414 <https://issues.apache.org/jira/browse/BEAM-1414>
CountingInput
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1415 <https://issues.apache.org/jira/browse/BEAM-1415> PubsubIO
should comply with PTransfrom style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1418 <https://issues.apache.org/jira/browse/BEAM-1418>
MapElements
and FlatMapElements should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1425 <https://issues.apache.org/jira/browse/BEAM-1425> Window
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN

[image: Bug - A problem which impairs or prevents the functions of the
product.] BEAM-1428 <https://issues.apache.org/jira/browse/BEAM-1428> KinesisIO
should comply with PTransform style guide

   - [image: Major - Major loss of function.]
   - OPEN


HDFS is pending the other mailing list thread.
For TextIO/AvroIO there's a pending PR by Reuven.
Window and Map/FlatMapElements could use a brief discussion on the mailing
list as to what the proper API should be.
KinesisIO is, I think, due for a rewrite.

However, CountingInput and PubsubIO I think are great targets for people to
pick up. Any volunteers?


On Fri, Mar 3, 2017 at 11:30 AM Robert Bradshaw <ro...@google.com.invalid>
wrote:

> Here's a crazy idea: what if we had a virtual fixit/hackathon to knock
> these out (similar to the virtual meet-up, but with an agenda)? I find
> communal hacking sessions towards a common goal are a good way to get to
> know each other and get a lot done. Would there be any interest in this?
>
> On Wed, Mar 1, 2017 at 11:30 AM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hey all,
> >
> > First couple rounds of fixes are in. Thanks Aviem Zur for contributing
> > TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
> > progress (https://github.com/apache/beam/pull/1927).
> >
> > Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues
> for
> > the status.
> > Many of the changes are backwards-incompatible (though with only minor
> > changes in pipelines required). I'll make a separate announcement about
> > that on the users list now.
> >
> > Call for help with the other sub-issues still stands! They are pretty
> > simple work items but pretty important since this is a blocker for
> > declaring stable BEAM API.
> >
> > On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > Thanks Eugene.
> >
> > I will tackle some Jira when back next week.
> >
> > Regards
> > JB
> >
> > On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
> > <ki...@google.com.INVALID> wrote:
> > >Hey all,
> > >
> > >I bit the bullet and audited all PTransform classes in Beam Java SDK
> > >and
> > >filed JIRA issues for all violations I could find.
> > >I linked all them to the master JIRA issue
> > >https://issues.apache.org/jira/browse/BEAM-1353
> > >
> > >In general, all of these should be fixed before declaring Beam stable
> > >API.
> > >Would appreciate if some senior folks looked at the issues and
> > >confirmed
> > >that my suggested changes make sense.
> > >
> > >PRs very welcome :) (though I'll be gone for the next few weeks so I
> > >can't
> > >review right now)
> > >Many of these are very easy to fix (a few lines of code); some require
> > >a
> > >little more code, but as far as I can tell all of them are mechanical.
> > >
> > >
> > >On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
> > >wrote:
> > >
> > >> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
> > ><dh...@google.com.invalid>
> > >> wrote:
> > >>
> > >> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> > >> kirpichov@google.com.invalid> wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > The PTransform Style Guide is live
> > >> > https://beam.apache.org/contribute/ptransform-style-guide/ - a
> > >natural
> > >> > next
> > >> > step is to audit Beam libraries for compliance and file JIRAs for
> > >places
> > >> > that need to be fixed. It'd be great to finish these cleanups
> > >before
> > >> > declaring Beam stable API.
> > >> >
> > >> > Please take a look and file JIRAs / post suggestions on this
> > >thread!
> > >> >
> > >> > I think it'll also make a great source of easy and useful work for
> > >new
> > >> > contributors.
> > >> >
> > >> > Some things I remember off the top of my head:
> > >> > - TextIO, KafkaIO use coders improperly - coders should not be used
> > >as a
> > >> > general-purpose byte parsing mechanism.
> > >> >
> > >>
> > >> Can you say more about Kafka? Kafka actually exports byte[] by
> > >default,
> > >> whereas Text files are String by default. So it does not seem nearly
> > >as
> > >> egregious for Kafka as it is for Text.
> > >>
> > >> Agreed that KafkaIO is less egregious, but it still has methods
> > >> withKeyCoder and withValueCoder - these should be replaced with
> > >something
> > >> that doesn't take Coder.
> > >>
> > >>
> > >>
> > >> - HadoopFileSource is not packaged as a PTransform
> > >> > - Some connectors, e.g. KafkaIO, should use AutoValue for their
> > >parameter
> > >> > builders, but don't
> > >> >
> > >>
> > >> Isn't AutoValue entirely an internal implementation detail that is
> > >not
> > >> exposed(*) to users? I think this is irrelevant to a stable API.
> > >>
> > >> Agreed - doesn't block stable API, but still a good thing to do
> > >because it
> > >> makes the code cleaner (for KafkaIO there's a long-standing PR that
> > >was
> > >> blocked on ratifying the style guide
> > >> https://github.com/apache/beam/pull/1048)
> > >>
> > >>
> > >>
> > >> (*) except that it makes transforms not able to be final, which is a
> > >> regression.
> > >>
> > >> I think AutoValue use should generally be considered *very* optional.
> > >In
> > >> transforms I author, I prefer not to use AutoValue because it makes
> > >the
> > >> code more complex and less readable.
> > >>
> > >> Yeah, guidance on when to use / not use AutoValue could be improved.
> > >I
> > >> think it makes a lot of sense when the transform has more than one or
> > >two
> > >> parameters or when the set of parameters can grow.
> > >>
> > >>
> > >>
> > >>
> > >> > - A few connectors improperly use
> > >> > - Some transforms expose their transform type as "Something.Bound"
> > >and
> > >> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
> > >> >
> > >>
> > >> "banned" is a strong word to use here. All of these are just
> > >> recommendations.
> > >>
> > >> In general yes; the goal of the style guide is to be the default,
> > >where if
> > >> you deviate from it, you should have a good reason. I don't think
> > >there
> > >> ever exists a good reason to name a transform Something.Bound/Unbound
> > >> though.
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> > I filed an umbrella JIRA
> > >https://issues.apache.org/jira/browse/BEAM-1353
> > >> > about
> > >> > making existing Beam transforms comply with the guide - let's
> > >crowdsource
> > >> > this!
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >>
> >
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
Here's a crazy idea: what if we had a virtual fixit/hackathon to knock
these out (similar to the virtual meet-up, but with an agenda)? I find
communal hacking sessions towards a common goal are a good way to get to
know each other and get a lot done. Would there be any interest in this?

On Wed, Mar 1, 2017 at 11:30 AM, Eugene Kirpichov <
kirpichov@google.com.invalid> wrote:

> Hey all,
>
> First couple rounds of fixes are in. Thanks Aviem Zur for contributing
> TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
> progress (https://github.com/apache/beam/pull/1927).
>
> Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues for
> the status.
> Many of the changes are backwards-incompatible (though with only minor
> changes in pipelines required). I'll make a separate announcement about
> that on the users list now.
>
> Call for help with the other sub-issues still stands! They are pretty
> simple work items but pretty important since this is a blocker for
> declaring stable BEAM API.
>
> On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> Thanks Eugene.
>
> I will tackle some Jira when back next week.
>
> Regards
> JB
>
> On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
> <ki...@google.com.INVALID> wrote:
> >Hey all,
> >
> >I bit the bullet and audited all PTransform classes in Beam Java SDK
> >and
> >filed JIRA issues for all violations I could find.
> >I linked all them to the master JIRA issue
> >https://issues.apache.org/jira/browse/BEAM-1353
> >
> >In general, all of these should be fixed before declaring Beam stable
> >API.
> >Would appreciate if some senior folks looked at the issues and
> >confirmed
> >that my suggested changes make sense.
> >
> >PRs very welcome :) (though I'll be gone for the next few weeks so I
> >can't
> >review right now)
> >Many of these are very easy to fix (a few lines of code); some require
> >a
> >little more code, but as far as I can tell all of them are mechanical.
> >
> >
> >On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
> >wrote:
> >
> >> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
> ><dh...@google.com.invalid>
> >> wrote:
> >>
> >> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> >> kirpichov@google.com.invalid> wrote:
> >>
> >> > Hello,
> >> >
> >> > The PTransform Style Guide is live
> >> > https://beam.apache.org/contribute/ptransform-style-guide/ - a
> >natural
> >> > next
> >> > step is to audit Beam libraries for compliance and file JIRAs for
> >places
> >> > that need to be fixed. It'd be great to finish these cleanups
> >before
> >> > declaring Beam stable API.
> >> >
> >> > Please take a look and file JIRAs / post suggestions on this
> >thread!
> >> >
> >> > I think it'll also make a great source of easy and useful work for
> >new
> >> > contributors.
> >> >
> >> > Some things I remember off the top of my head:
> >> > - TextIO, KafkaIO use coders improperly - coders should not be used
> >as a
> >> > general-purpose byte parsing mechanism.
> >> >
> >>
> >> Can you say more about Kafka? Kafka actually exports byte[] by
> >default,
> >> whereas Text files are String by default. So it does not seem nearly
> >as
> >> egregious for Kafka as it is for Text.
> >>
> >> Agreed that KafkaIO is less egregious, but it still has methods
> >> withKeyCoder and withValueCoder - these should be replaced with
> >something
> >> that doesn't take Coder.
> >>
> >>
> >>
> >> - HadoopFileSource is not packaged as a PTransform
> >> > - Some connectors, e.g. KafkaIO, should use AutoValue for their
> >parameter
> >> > builders, but don't
> >> >
> >>
> >> Isn't AutoValue entirely an internal implementation detail that is
> >not
> >> exposed(*) to users? I think this is irrelevant to a stable API.
> >>
> >> Agreed - doesn't block stable API, but still a good thing to do
> >because it
> >> makes the code cleaner (for KafkaIO there's a long-standing PR that
> >was
> >> blocked on ratifying the style guide
> >> https://github.com/apache/beam/pull/1048)
> >>
> >>
> >>
> >> (*) except that it makes transforms not able to be final, which is a
> >> regression.
> >>
> >> I think AutoValue use should generally be considered *very* optional.
> >In
> >> transforms I author, I prefer not to use AutoValue because it makes
> >the
> >> code more complex and less readable.
> >>
> >> Yeah, guidance on when to use / not use AutoValue could be improved.
> >I
> >> think it makes a lot of sense when the transform has more than one or
> >two
> >> parameters or when the set of parameters can grow.
> >>
> >>
> >>
> >>
> >> > - A few connectors improperly use
> >> > - Some transforms expose their transform type as "Something.Bound"
> >and
> >> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
> >> >
> >>
> >> "banned" is a strong word to use here. All of these are just
> >> recommendations.
> >>
> >> In general yes; the goal of the style guide is to be the default,
> >where if
> >> you deviate from it, you should have a good reason. I don't think
> >there
> >> ever exists a good reason to name a transform Something.Bound/Unbound
> >> though.
> >>
> >>
> >>
> >>
> >> >
> >> > I filed an umbrella JIRA
> >https://issues.apache.org/jira/browse/BEAM-1353
> >> > about
> >> > making existing Beam transforms comply with the guide - let's
> >crowdsource
> >> > this!
> >> >
> >> > Thanks.
> >> >
> >>
> >>
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hey all,

First couple rounds of fixes are in. Thanks Aviem Zur for contributing
TextIO fixes and Dan Halperin for reviewing! One more fix by Reuven in
progress (https://github.com/apache/beam/pull/1927).

Follow https://issues.apache.org/jira/browse/BEAM-1353 and sub-issues for
the status.
Many of the changes are backwards-incompatible (though with only minor
changes in pipelines required). I'll make a separate announcement about
that on the users list now.

Call for help with the other sub-issues still stands! They are pretty
simple work items but pretty important since this is a blocker for
declaring stable BEAM API.

On Wed, Feb 8, 2017 at 3:51 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:

Thanks Eugene.

I will tackle some Jira when back next week.

Regards
JB

On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov
<ki...@google.com.INVALID> wrote:
>Hey all,
>
>I bit the bullet and audited all PTransform classes in Beam Java SDK
>and
>filed JIRA issues for all violations I could find.
>I linked all them to the master JIRA issue
>https://issues.apache.org/jira/browse/BEAM-1353
>
>In general, all of these should be fixed before declaring Beam stable
>API.
>Would appreciate if some senior folks looked at the issues and
>confirmed
>that my suggested changes make sense.
>
>PRs very welcome :) (though I'll be gone for the next few weeks so I
>can't
>review right now)
>Many of these are very easy to fix (a few lines of code); some require
>a
>little more code, but as far as I can tell all of them are mechanical.
>
>
>On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
>wrote:
>
>> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
><dh...@google.com.invalid>
>> wrote:
>>
>> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
>> kirpichov@google.com.invalid> wrote:
>>
>> > Hello,
>> >
>> > The PTransform Style Guide is live
>> > https://beam.apache.org/contribute/ptransform-style-guide/ - a
>natural
>> > next
>> > step is to audit Beam libraries for compliance and file JIRAs for
>places
>> > that need to be fixed. It'd be great to finish these cleanups
>before
>> > declaring Beam stable API.
>> >
>> > Please take a look and file JIRAs / post suggestions on this
>thread!
>> >
>> > I think it'll also make a great source of easy and useful work for
>new
>> > contributors.
>> >
>> > Some things I remember off the top of my head:
>> > - TextIO, KafkaIO use coders improperly - coders should not be used
>as a
>> > general-purpose byte parsing mechanism.
>> >
>>
>> Can you say more about Kafka? Kafka actually exports byte[] by
>default,
>> whereas Text files are String by default. So it does not seem nearly
>as
>> egregious for Kafka as it is for Text.
>>
>> Agreed that KafkaIO is less egregious, but it still has methods
>> withKeyCoder and withValueCoder - these should be replaced with
>something
>> that doesn't take Coder.
>>
>>
>>
>> - HadoopFileSource is not packaged as a PTransform
>> > - Some connectors, e.g. KafkaIO, should use AutoValue for their
>parameter
>> > builders, but don't
>> >
>>
>> Isn't AutoValue entirely an internal implementation detail that is
>not
>> exposed(*) to users? I think this is irrelevant to a stable API.
>>
>> Agreed - doesn't block stable API, but still a good thing to do
>because it
>> makes the code cleaner (for KafkaIO there's a long-standing PR that
>was
>> blocked on ratifying the style guide
>> https://github.com/apache/beam/pull/1048)
>>
>>
>>
>> (*) except that it makes transforms not able to be final, which is a
>> regression.
>>
>> I think AutoValue use should generally be considered *very* optional.
>In
>> transforms I author, I prefer not to use AutoValue because it makes
>the
>> code more complex and less readable.
>>
>> Yeah, guidance on when to use / not use AutoValue could be improved.
>I
>> think it makes a lot of sense when the transform has more than one or
>two
>> parameters or when the set of parameters can grow.
>>
>>
>>
>>
>> > - A few connectors improperly use
>> > - Some transforms expose their transform type as "Something.Bound"
>and
>> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
>> >
>>
>> "banned" is a strong word to use here. All of these are just
>> recommendations.
>>
>> In general yes; the goal of the style guide is to be the default,
>where if
>> you deviate from it, you should have a good reason. I don't think
>there
>> ever exists a good reason to name a transform Something.Bound/Unbound
>> though.
>>
>>
>>
>>
>> >
>> > I filed an umbrella JIRA
>https://issues.apache.org/jira/browse/BEAM-1353
>> > about
>> > making existing Beam transforms comply with the guide - let's
>crowdsource
>> > this!
>> >
>> > Thanks.
>> >
>>
>>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks Eugene.

I will tackle some Jira when back next week.

Regards
JB

On Feb 7, 2017, 18:16, at 18:16, Eugene Kirpichov <ki...@google.com.INVALID> wrote:
>Hey all,
>
>I bit the bullet and audited all PTransform classes in Beam Java SDK
>and
>filed JIRA issues for all violations I could find.
>I linked all them to the master JIRA issue
>https://issues.apache.org/jira/browse/BEAM-1353
>
>In general, all of these should be fixed before declaring Beam stable
>API.
>Would appreciate if some senior folks looked at the issues and
>confirmed
>that my suggested changes make sense.
>
>PRs very welcome :) (though I'll be gone for the next few weeks so I
>can't
>review right now)
>Many of these are very easy to fix (a few lines of code); some require
>a
>little more code, but as far as I can tell all of them are mechanical.
>
>
>On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
>wrote:
>
>> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin
><dh...@google.com.invalid>
>> wrote:
>>
>> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
>> kirpichov@google.com.invalid> wrote:
>>
>> > Hello,
>> >
>> > The PTransform Style Guide is live
>> > https://beam.apache.org/contribute/ptransform-style-guide/ - a
>natural
>> > next
>> > step is to audit Beam libraries for compliance and file JIRAs for
>places
>> > that need to be fixed. It'd be great to finish these cleanups
>before
>> > declaring Beam stable API.
>> >
>> > Please take a look and file JIRAs / post suggestions on this
>thread!
>> >
>> > I think it'll also make a great source of easy and useful work for
>new
>> > contributors.
>> >
>> > Some things I remember off the top of my head:
>> > - TextIO, KafkaIO use coders improperly - coders should not be used
>as a
>> > general-purpose byte parsing mechanism.
>> >
>>
>> Can you say more about Kafka? Kafka actually exports byte[] by
>default,
>> whereas Text files are String by default. So it does not seem nearly
>as
>> egregious for Kafka as it is for Text.
>>
>> Agreed that KafkaIO is less egregious, but it still has methods
>> withKeyCoder and withValueCoder - these should be replaced with
>something
>> that doesn't take Coder.
>>
>>
>>
>> - HadoopFileSource is not packaged as a PTransform
>> > - Some connectors, e.g. KafkaIO, should use AutoValue for their
>parameter
>> > builders, but don't
>> >
>>
>> Isn't AutoValue entirely an internal implementation detail that is
>not
>> exposed(*) to users? I think this is irrelevant to a stable API.
>>
>> Agreed - doesn't block stable API, but still a good thing to do
>because it
>> makes the code cleaner (for KafkaIO there's a long-standing PR that
>was
>> blocked on ratifying the style guide
>> https://github.com/apache/beam/pull/1048)
>>
>>
>>
>> (*) except that it makes transforms not able to be final, which is a
>> regression.
>>
>> I think AutoValue use should generally be considered *very* optional.
>In
>> transforms I author, I prefer not to use AutoValue because it makes
>the
>> code more complex and less readable.
>>
>> Yeah, guidance on when to use / not use AutoValue could be improved.
>I
>> think it makes a lot of sense when the transform has more than one or
>two
>> parameters or when the set of parameters can grow.
>>
>>
>>
>>
>> > - A few connectors improperly use
>> > - Some transforms expose their transform type as "Something.Bound"
>and
>> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
>> >
>>
>> "banned" is a strong word to use here. All of these are just
>> recommendations.
>>
>> In general yes; the goal of the style guide is to be the default,
>where if
>> you deviate from it, you should have a good reason. I don't think
>there
>> ever exists a good reason to name a transform Something.Bound/Unbound
>> though.
>>
>>
>>
>>
>> >
>> > I filed an umbrella JIRA
>https://issues.apache.org/jira/browse/BEAM-1353
>> > about
>> > making existing Beam transforms comply with the guide - let's
>crowdsource
>> > this!
>> >
>> > Thanks.
>> >
>>
>>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hey all,

I bit the bullet and audited all PTransform classes in Beam Java SDK and
filed JIRA issues for all violations I could find.
I linked all them to the master JIRA issue
https://issues.apache.org/jira/browse/BEAM-1353

In general, all of these should be fixed before declaring Beam stable API.
Would appreciate if some senior folks looked at the issues and confirmed
that my suggested changes make sense.

PRs very welcome :) (though I'll be gone for the next few weeks so I can't
review right now)
Many of these are very easy to fix (a few lines of code); some require a
little more code, but as far as I can tell all of them are mechanical.


On Tue, Jan 31, 2017 at 4:10 PM Eugene Kirpichov <ki...@google.com>
wrote:

> On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hello,
> >
> > The PTransform Style Guide is live
> > https://beam.apache.org/contribute/ptransform-style-guide/ - a natural
> > next
> > step is to audit Beam libraries for compliance and file JIRAs for places
> > that need to be fixed. It'd be great to finish these cleanups before
> > declaring Beam stable API.
> >
> > Please take a look and file JIRAs / post suggestions on this thread!
> >
> > I think it'll also make a great source of easy and useful work for new
> > contributors.
> >
> > Some things I remember off the top of my head:
> > - TextIO, KafkaIO use coders improperly - coders should not be used as a
> > general-purpose byte parsing mechanism.
> >
>
> Can you say more about Kafka? Kafka actually exports byte[] by default,
> whereas Text files are String by default. So it does not seem nearly as
> egregious for Kafka as it is for Text.
>
> Agreed that KafkaIO is less egregious, but it still has methods
> withKeyCoder and withValueCoder - these should be replaced with something
> that doesn't take Coder.
>
>
>
> - HadoopFileSource is not packaged as a PTransform
> > - Some connectors, e.g. KafkaIO, should use AutoValue for their parameter
> > builders, but don't
> >
>
> Isn't AutoValue entirely an internal implementation detail that is not
> exposed(*) to users? I think this is irrelevant to a stable API.
>
> Agreed - doesn't block stable API, but still a good thing to do because it
> makes the code cleaner (for KafkaIO there's a long-standing PR that was
> blocked on ratifying the style guide
> https://github.com/apache/beam/pull/1048)
>
>
>
> (*) except that it makes transforms not able to be final, which is a
> regression.
>
> I think AutoValue use should generally be considered *very* optional. In
> transforms I author, I prefer not to use AutoValue because it makes the
> code more complex and less readable.
>
> Yeah, guidance on when to use / not use AutoValue could be improved. I
> think it makes a lot of sense when the transform has more than one or two
> parameters or when the set of parameters can grow.
>
>
>
>
> > - A few connectors improperly use
> > - Some transforms expose their transform type as "Something.Bound" and
> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
> >
>
> "banned" is a strong word to use here. All of these are just
> recommendations.
>
> In general yes; the goal of the style guide is to be the default, where if
> you deviate from it, you should have a good reason. I don't think there
> ever exists a good reason to name a transform Something.Bound/Unbound
> though.
>
>
>
>
> >
> > I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353
> > about
> > making existing Beam transforms comply with the guide - let's crowdsource
> > this!
> >
> > Thanks.
> >
>
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
On Mon, Jan 30, 2017 at 7:56 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
> > Hello,
> >
> > The PTransform Style Guide is live
> > https://beam.apache.org/contribute/ptransform-style-guide/ - a natural
> > next
> > step is to audit Beam libraries for compliance and file JIRAs for places
> > that need to be fixed. It'd be great to finish these cleanups before
> > declaring Beam stable API.
> >
> > Please take a look and file JIRAs / post suggestions on this thread!
> >
> > I think it'll also make a great source of easy and useful work for new
> > contributors.
> >
> > Some things I remember off the top of my head:
> > - TextIO, KafkaIO use coders improperly - coders should not be used as a
> > general-purpose byte parsing mechanism.
> >
>
> Can you say more about Kafka? Kafka actually exports byte[] by default,
> whereas Text files are String by default. So it does not seem nearly as
> egregious for Kafka as it is for Text.
>
Agreed that KafkaIO is less egregious, but it still has methods
withKeyCoder and withValueCoder - these should be replaced with something
that doesn't take Coder.


>
> - HadoopFileSource is not packaged as a PTransform
> > - Some connectors, e.g. KafkaIO, should use AutoValue for their parameter
> > builders, but don't
> >
>
> Isn't AutoValue entirely an internal implementation detail that is not
> exposed(*) to users? I think this is irrelevant to a stable API.
>
Agreed - doesn't block stable API, but still a good thing to do because it
makes the code cleaner (for KafkaIO there's a long-standing PR that was
blocked on ratifying the style guide
https://github.com/apache/beam/pull/1048)


>
> (*) except that it makes transforms not able to be final, which is a
> regression.
>
> I think AutoValue use should generally be considered *very* optional. In
> transforms I author, I prefer not to use AutoValue because it makes the
> code more complex and less readable.
>
Yeah, guidance on when to use / not use AutoValue could be improved. I
think it makes a lot of sense when the transform has more than one or two
parameters or when the set of parameters can grow.


>
>
> > - A few connectors improperly use
> > - Some transforms expose their transform type as "Something.Bound" and
> > "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
> >
>
> "banned" is a strong word to use here. All of these are just
> recommendations.
>
In general yes; the goal of the style guide is to be the default, where if
you deviate from it, you should have a good reason. I don't think there
ever exists a good reason to name a transform Something.Bound/Unbound
though.


>
>
> >
> > I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353
> > about
> > making existing Beam transforms comply with the guide - let's crowdsource
> > this!
> >
> > Thanks.
> >
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Dan Halperin <dh...@google.com.INVALID>.
On Mon, Jan 30, 2017 at 7:56 PM, Dan Halperin <dh...@google.com> wrote:

> On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
> kirpichov@google.com.invalid> wrote:
>
>> Hello,
>>
>> The PTransform Style Guide is live
>> https://beam.apache.org/contribute/ptransform-style-guide/ - a natural
>> next
>> step is to audit Beam libraries for compliance and file JIRAs for places
>> that need to be fixed. It'd be great to finish these cleanups before
>> declaring Beam stable API.
>>
>> Please take a look and file JIRAs / post suggestions on this thread!
>>
>> I think it'll also make a great source of easy and useful work for new
>> contributors.
>>
>> Some things I remember off the top of my head:
>> - TextIO, KafkaIO use coders improperly - coders should not be used as a
>> general-purpose byte parsing mechanism.
>>
>
> Can you say more about Kafka? Kafka actually exports byte[] by default,
> whereas Text files are String by default. So it does not seem nearly as
> egregious for Kafka as it is for Text.
>
> - HadoopFileSource is not packaged as a PTransform
>> - Some connectors, e.g. KafkaIO, should use AutoValue for their parameter
>> builders, but don't
>>
>
> Isn't AutoValue entirely an internal implementation detail that is not
> exposed(*) to users? I think this is irrelevant to a stable API.
>
> (*) except that it makes transforms not able to be final, which is a
> regression.
>
> I think AutoValue use should generally be considered *very* optional. In
> transforms I author, I prefer not to use AutoValue because it makes the
> code more complex and less readable.
>
>
>> - A few connectors improperly use
>> - Some transforms expose their transform type as "Something.Bound" and
>> "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
>>
>
> "banned" is a strong word to use here. All of these are just
> recommendations.
>

Which is not to say that I disagree at all -- I think we even have a JIRA
open for this one somewhere.

Also, thanks Eugene for all the work writing this doc -- fantastic summary
of your recommendations and a great starting point for the community to add
in.

Dan

>
>
>>
>> I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353
>> about
>> making existing Beam transforms comply with the guide - let's crowdsource
>> this!
>>
>> Thanks.
>>
>
>

Re: Let's make Beam transforms comply with PTransform Style Guide

Posted by Dan Halperin <dh...@google.com.INVALID>.
On Mon, Jan 30, 2017 at 5:42 PM, Eugene Kirpichov <
kirpichov@google.com.invalid> wrote:

> Hello,
>
> The PTransform Style Guide is live
> https://beam.apache.org/contribute/ptransform-style-guide/ - a natural
> next
> step is to audit Beam libraries for compliance and file JIRAs for places
> that need to be fixed. It'd be great to finish these cleanups before
> declaring Beam stable API.
>
> Please take a look and file JIRAs / post suggestions on this thread!
>
> I think it'll also make a great source of easy and useful work for new
> contributors.
>
> Some things I remember off the top of my head:
> - TextIO, KafkaIO use coders improperly - coders should not be used as a
> general-purpose byte parsing mechanism.
>

Can you say more about Kafka? Kafka actually exports byte[] by default,
whereas Text files are String by default. So it does not seem nearly as
egregious for Kafka as it is for Text.

- HadoopFileSource is not packaged as a PTransform
> - Some connectors, e.g. KafkaIO, should use AutoValue for their parameter
> builders, but don't
>

Isn't AutoValue entirely an internal implementation detail that is not
exposed(*) to users? I think this is irrelevant to a stable API.

(*) except that it makes transforms not able to be final, which is a
regression.

I think AutoValue use should generally be considered *very* optional. In
transforms I author, I prefer not to use AutoValue because it makes the
code more complex and less readable.


> - A few connectors improperly use
> - Some transforms expose their transform type as "Something.Bound" and
> "Something.Unbound", e.g. TextIO.Read.Bound - such names are banned
>

"banned" is a strong word to use here. All of these are just
recommendations.


>
> I filed an umbrella JIRA https://issues.apache.org/jira/browse/BEAM-1353
> about
> making existing Beam transforms comply with the guide - let's crowdsource
> this!
>
> Thanks.
>