You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Neelesh Salian <ns...@cloudera.com> on 2016/11/02 04:07:20 UTC

Re: [DISCUSS] Using Verbs for Transforms

Thanks everyone. The PR was merged.


On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <ns...@cloudera.com>
wrote:

> Thanks everyone for all the inputs.
> It's really encouraging for a new contributor, as myself, to get valuable
> input and mentoring (like on this thread) and, in turn, help make the
> community better.
>
>
>
> On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> You did well ! It's an interesting discussion we have and it's great to
>> have it on the mailing list (better than in Jira or PR comments IMHO).
>>
>> Thanks !
>>
>> Regards
>> JB
>>
>> ⁣​
>>
>> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw
>> <ro...@google.com.INVALID> wrote:
>> >+1 to all Dan says.
>> >
>> >I only brought this up because it seemed new contributors (yay)
>> >jumping in and renaming a core transform based on "Something to
>> >consider" deserved a couple more more eyeballs, but didn't intend for
>> >it to become a big deal.
>> >
>> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
>> ><dh...@google.com.invalid> wrote:
>> >> Folks, I don't think this needs to be a "vote". This is just not that
>> >big a
>> >> deal :). It is important to be transparent and have these discussions
>> >on
>> >> the list, which is why we brought it here from GitHub/JIRA, but at
>> >the end
>> >> of the day I hope that a small group of committers and developers can
>> >> assess "good enough" consensus for these minor issues.
>> >>
>> >> Here's my assessment:
>> >> * We don't really have any rules about naming transforms. "Should be
>> >a
>> >> verb" is a sort of guiding principle inherited from the Google Flume
>> >> project from which Dataflow evolved, but honestly we violate this
>> >rule for
>> >> clarity all over the place. ("Values", for example).
>> >> * The "Big Data" community is significantly more familiar with the
>> >concept
>> >> of Distinct -- Jesse, who filed the original JIRA, is a good example
>> >here.
>> >> * Finally, nobody feels very strongly. We could argue minor points of
>> >each
>> >> solution, but at the end of the day I don't think anyone wants to
>> >block a
>> >> change.
>> >>
>> >> Let's go with Distinct. It's important to align Beam with the open
>> >source
>> >> big data community. (And thanks Jesse, our newest (*tied) committer,
>> >for
>> >> pushing us in the right direction!)
>> >>
>> >> Jesse, can you please take charge of wrapping up the PR and merging
>> >it?
>> >>
>> >> Thanks!
>> >> Dan
>> >>
>> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
>> ><jb...@nanthrax.net>
>> >> wrote:
>> >>
>> >>> Just to clarify. Davor is right for a code modification change: -1
>> >means a
>> >>> veto.
>> >>> I meant that -1 is not a veto for a release vote.
>> >>>
>> >>> Anyway, even if it's not a formal code, we can have a discussion
>> >with
>> >>> "options" a,b and c.
>> >>>
>> >>> Regards
>> >>> JB
>> >>>
>> >>> ⁣
>> >>>
>> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
>> ><da...@google.com.INVALID>
>> >>> wrote:
>> >>> >In terms of reaching a decision on any code or design changes,
>> >>> >including
>> >>> >this one, I'd suggest going without formal votes. Voting process
>> >for
>> >>> >code
>> >>> >modifications between choices A and B doesn't necessarily end with
>> >a
>> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
>> >cannot be
>> >>> >overridden [1]. Said differently, the guideline is that code
>> >changes
>> >>> >should
>> >>> >be made by consensus; not by one group outvoting another. I'd like
>> >to
>> >>> >avoid
>> >>> >setting such precedent; we should try to drive consensus, as
>> >opposed to
>> >>> >attempting to outvote another part of the community.
>> >>> >
>> >>> >In this particular case, we have had a great discussion. Many
>> >>> >contributors
>> >>> >brought different perspectives. Consequently, some opinions have
>> >been
>> >>> >likely changed. At this point, someone should summarize the
>> >arguments,
>> >>> >try
>> >>> >to critique them from a neutral standpoint, and suggest a refined
>> >>> >proposal
>> >>> >that takes these perspectives into account. If nobody objects in a
>> >>> >short
>> >>> >time, we should consider this decided. [ I can certainly help here,
>> >but
>> >>> >I'd
>> >>> >love to see somebody else do it! ]
>> >>> >
>> >>> >[1] http://www.apache.org/foundation/voting.html
>> >>> >
>> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
>> >>> ><bc...@google.com.invalid>
>> >>> >wrote:
>> >>> >
>> >>> >> I also like Distinct since it doesn't make it sound like it
>> >modifies
>> >>> >any
>> >>> >> underlying collection. RemoveDuplicates makes it sound like the
>> >>> >duplicates
>> >>> >> are removed, rather than a new PCollection without duplicates
>> >being
>> >>> >> returned.
>> >>> >>
>> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
>> ><jb...@nanthrax.net>
>> >>> >> wrote:
>> >>> >>
>> >>> >> > Agree. It was more a transition proposal.
>> >>> >> >
>> >>> >> > Regards
>> >>> >> > JB
>> >>> >> >
>> >>> >> > ⁣
>> >>> >> >
>> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
>> >>> >> > <ro...@google.com.INVALID> wrote:
>> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
>> >>> >> > ><jb...@nanthrax.net> wrote:
>> >>> >> > >> And what about use RemoveDuplicates and create an alias
>> >Distinct
>> >>> >?
>> >>> >> > >
>> >>> >> > >I'd really like to avoid (long term) aliases--you end up
>> >having to
>> >>> >> > >document (and maintain) them both, and it adds confusion as to
>> >>> >which
>> >>> >> > >one to use (especially if they every diverge), and means
>> >searching
>> >>> >for
>> >>> >> > >one or the other yields half the results.
>> >>> >> > >
>> >>> >> > >> It doesn't break the API and would address both SQL users
>> >and
>> >>> >more
>> >>> >> > >"big data" users.
>> >>> >> > >>
>> >>> >> > >> My $0.01 ;)
>> >>> >> > >>
>> >>> >> > >> Regards
>> >>> >> > >> JB
>> >>> >> > >>
>> >>> >> > >> ⁣
>> >>> >> > >>
>> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
>> >>> >> > ><dh...@google.com.INVALID> wrote:
>> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
>> >>> >> > >>>preference:
>> >>> >> > >>>
>> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important
>> >keywords
>> >>> >are
>> >>> >> > >in
>> >>> >> > >>>the
>> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly
>> >pretty
>> >>> >dang
>> >>> >> > >>> descriptive.
>> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user
>> >and
>> >>> >> > >likely
>> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API
>> >>> >change, so
>> >>> >> > >>>we
>> >>> >> > >>>should do it before we go stable.
>> >>> >> > >>>
>> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that
>> >>> >> > >"Distinct"
>> >>> >> > >>>>>>
>> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>>
>> >"AvoidDuplicate".
>> >>> >> > >>>
>> >>> >> > >>>Dan
>> >>> >> > >>>
>> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
>> >>> >> > >>><kl...@google.com.invalid>
>> >>> >> > >>>wrote:
>> >>> >> > >>>
>> >>> >> > >>>> The precedent that we use verbs has many exceptions. We
>> >have
>> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I
>> >>> >would
>> >>> >> > >even
>> >>> >> > >>>> include Sum (at least when I read it).
>> >>> >> > >>>>
>> >>> >> > >>>> Historical note: the predilection towards verbs is from
>> >the
>> >>> >Google
>> >>> >> > >>>Style
>> >>> >> > >>>> Guide for Java method names
>> >>> >> > >>>>
>> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5.
>> >>> >> 2.3-method-names
>> >>> >> > >,
>> >>> >> > >>>> which states "Method names are typically verbs or verb
>> >>> >phrases".
>> >>> >> > >But
>> >>> >> > >>>even
>> >>> >> > >>>> in Google code there are lots of exceptions when it makes
>> >>> >sense,
>> >>> >> > >like
>> >>> >> > >>>> Guava's
>> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the
>> >>> >entire
>> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google
>> >code.
>> >>> >I
>> >>> >> > >>>suggest we
>> >>> >> > >>>> use our judgment rather than a policy.
>> >>> >> > >>>>
>> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a
>> >>> >standard
>> >>> >> > >>>widespread
>> >>> >> > >>>> name and also reads better as an adjective. I prefer it,
>> >but
>> >>> >also
>> >>> >> > >>>don't
>> >>> >> > >>>> care strongly enough to change it or to change it back :-)
>> >>> >> > >>>>
>> >>> >> > >>>> If we must have a verb, I like it as-is more than
>> >MakeDistinct
>> >>> >and
>> >>> >> > >>>> AvoidDuplicate.
>> >>> >> > >>>>
>> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
>> >>> >> > >>><je...@smokinghand.com>
>> >>> >> > >>>> wrote:
>> >>> >> > >>>>
>> >>> >> > >>>> > My original thought for this change was that Crunch uses
>> >the
>> >>> >> > >class
>> >>> >> > >>>name
>> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct.
>> >>> >> > >>>> >
>> >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs
>> >>> >depending
>> >>> >> > >>>on the
>> >>> >> > >>>> > context.
>> >>> >> > >>>> >
>> >>> >> > >>>> > Using a verb to describe this class really doesn't
>> >connote
>> >>> >what
>> >>> >> > >the
>> >>> >> > >>>class
>> >>> >> > >>>> > does as succinctly as the adjective.
>> >>> >> > >>>> >
>> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
>> >>> >> > >>><ns...@cloudera.com>
>> >>> >> > >>>> > wrote:
>> >>> >> > >>>> >
>> >>> >> > >>>> > > Hello,
>> >>> >> > >>>> > >
>> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse
>> >for
>> >>> >their
>> >>> >> > >>>review on
>> >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
>> >>> >> > >>>> > >
>> >>> >> > >>>> > > A point that came up was using verbs explicitly for
>> >>> >Transforms.
>> >>> >> > >>>> > > Here is the PR:
>> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164
>> >>> >> > >>>> > >
>> >>> >> > >>>> > > Posting it to help understand if we have a consensus
>> >for
>> >>> >it and
>> >>> >> > >>>if yes,
>> >>> >> > >>>> > we
>> >>> >> > >>>> > > could perhaps document it for future changes.
>> >>> >> > >>>> > >
>> >>> >> > >>>> > > Thank you.
>> >>> >> > >>>> > >
>> >>> >> > >>>> > > --
>> >>> >> > >>>> > > Neelesh Srinivas Salian
>> >>> >> > >>>> > > Engineer
>> >>> >> > >>>> > >
>> >>> >> > >>>> >
>> >>> >> > >>>>
>> >>> >> >
>> >>> >>
>> >>>
>>
>
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>
>
>
>


-- 
Neelesh Srinivas Salian
Customer Operations Engineer

Re: [DISCUSS] Using Verbs for Transforms

Posted by Neelesh Salian <ns...@cloudera.com>.
Will do. Feel free to chime in, if I missed anything.

On Tue, Nov 1, 2016 at 9:16 PM, Jesse Anderson <je...@smokinghand.com>
wrote:

> @Neelesh Could you write an email to the user list explaining the change
> since it is a breaking change?
>
> On Tue, Nov 1, 2016 at 10:08 PM Neelesh Salian <ns...@cloudera.com>
> wrote:
>
> > Thanks everyone. The PR was merged.
> >
> >
> > On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <ns...@cloudera.com>
> > wrote:
> >
> > > Thanks everyone for all the inputs.
> > > It's really encouraging for a new contributor, as myself, to get
> valuable
> > > input and mentoring (like on this thread) and, in turn, help make the
> > > community better.
> > >
> > >
> > >
> > > On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> > > wrote:
> > >
> > >> You did well ! It's an interesting discussion we have and it's great
> to
> > >> have it on the mailing list (better than in Jira or PR comments IMHO).
> > >>
> > >> Thanks !
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> ⁣​
> > >>
> > >> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw
> > >> <ro...@google.com.INVALID> wrote:
> > >> >+1 to all Dan says.
> > >> >
> > >> >I only brought this up because it seemed new contributors (yay)
> > >> >jumping in and renaming a core transform based on "Something to
> > >> >consider" deserved a couple more more eyeballs, but didn't intend for
> > >> >it to become a big deal.
> > >> >
> > >> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
> > >> ><dh...@google.com.invalid> wrote:
> > >> >> Folks, I don't think this needs to be a "vote". This is just not
> that
> > >> >big a
> > >> >> deal :). It is important to be transparent and have these
> discussions
> > >> >on
> > >> >> the list, which is why we brought it here from GitHub/JIRA, but at
> > >> >the end
> > >> >> of the day I hope that a small group of committers and developers
> can
> > >> >> assess "good enough" consensus for these minor issues.
> > >> >>
> > >> >> Here's my assessment:
> > >> >> * We don't really have any rules about naming transforms. "Should
> be
> > >> >a
> > >> >> verb" is a sort of guiding principle inherited from the Google
> Flume
> > >> >> project from which Dataflow evolved, but honestly we violate this
> > >> >rule for
> > >> >> clarity all over the place. ("Values", for example).
> > >> >> * The "Big Data" community is significantly more familiar with the
> > >> >concept
> > >> >> of Distinct -- Jesse, who filed the original JIRA, is a good
> example
> > >> >here.
> > >> >> * Finally, nobody feels very strongly. We could argue minor points
> of
> > >> >each
> > >> >> solution, but at the end of the day I don't think anyone wants to
> > >> >block a
> > >> >> change.
> > >> >>
> > >> >> Let's go with Distinct. It's important to align Beam with the open
> > >> >source
> > >> >> big data community. (And thanks Jesse, our newest (*tied)
> committer,
> > >> >for
> > >> >> pushing us in the right direction!)
> > >> >>
> > >> >> Jesse, can you please take charge of wrapping up the PR and merging
> > >> >it?
> > >> >>
> > >> >> Thanks!
> > >> >> Dan
> > >> >>
> > >> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
> > >> ><jb...@nanthrax.net>
> > >> >> wrote:
> > >> >>
> > >> >>> Just to clarify. Davor is right for a code modification change: -1
> > >> >means a
> > >> >>> veto.
> > >> >>> I meant that -1 is not a veto for a release vote.
> > >> >>>
> > >> >>> Anyway, even if it's not a formal code, we can have a discussion
> > >> >with
> > >> >>> "options" a,b and c.
> > >> >>>
> > >> >>> Regards
> > >> >>> JB
> > >> >>>
> > >> >>> ⁣
> > >> >>>
> > >> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
> > >> ><da...@google.com.INVALID>
> > >> >>> wrote:
> > >> >>> >In terms of reaching a decision on any code or design changes,
> > >> >>> >including
> > >> >>> >this one, I'd suggest going without formal votes. Voting process
> > >> >for
> > >> >>> >code
> > >> >>> >modifications between choices A and B doesn't necessarily end
> with
> > >> >a
> > >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
> > >> >cannot be
> > >> >>> >overridden [1]. Said differently, the guideline is that code
> > >> >changes
> > >> >>> >should
> > >> >>> >be made by consensus; not by one group outvoting another. I'd
> like
> > >> >to
> > >> >>> >avoid
> > >> >>> >setting such precedent; we should try to drive consensus, as
> > >> >opposed to
> > >> >>> >attempting to outvote another part of the community.
> > >> >>> >
> > >> >>> >In this particular case, we have had a great discussion. Many
> > >> >>> >contributors
> > >> >>> >brought different perspectives. Consequently, some opinions have
> > >> >been
> > >> >>> >likely changed. At this point, someone should summarize the
> > >> >arguments,
> > >> >>> >try
> > >> >>> >to critique them from a neutral standpoint, and suggest a refined
> > >> >>> >proposal
> > >> >>> >that takes these perspectives into account. If nobody objects in
> a
> > >> >>> >short
> > >> >>> >time, we should consider this decided. [ I can certainly help
> here,
> > >> >but
> > >> >>> >I'd
> > >> >>> >love to see somebody else do it! ]
> > >> >>> >
> > >> >>> >[1] http://www.apache.org/foundation/voting.html
> > >> >>> >
> > >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> > >> >>> ><bc...@google.com.invalid>
> > >> >>> >wrote:
> > >> >>> >
> > >> >>> >> I also like Distinct since it doesn't make it sound like it
> > >> >modifies
> > >> >>> >any
> > >> >>> >> underlying collection. RemoveDuplicates makes it sound like the
> > >> >>> >duplicates
> > >> >>> >> are removed, rather than a new PCollection without duplicates
> > >> >being
> > >> >>> >> returned.
> > >> >>> >>
> > >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
> > >> ><jb...@nanthrax.net>
> > >> >>> >> wrote:
> > >> >>> >>
> > >> >>> >> > Agree. It was more a transition proposal.
> > >> >>> >> >
> > >> >>> >> > Regards
> > >> >>> >> > JB
> > >> >>> >> >
> > >> >>> >> > ⁣
> > >> >>> >> >
> > >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> > >> >>> >> > <ro...@google.com.INVALID> wrote:
> > >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> > >> >>> >> > ><jb...@nanthrax.net> wrote:
> > >> >>> >> > >> And what about use RemoveDuplicates and create an alias
> > >> >Distinct
> > >> >>> >?
> > >> >>> >> > >
> > >> >>> >> > >I'd really like to avoid (long term) aliases--you end up
> > >> >having to
> > >> >>> >> > >document (and maintain) them both, and it adds confusion as
> to
> > >> >>> >which
> > >> >>> >> > >one to use (especially if they every diverge), and means
> > >> >searching
> > >> >>> >for
> > >> >>> >> > >one or the other yields half the results.
> > >> >>> >> > >
> > >> >>> >> > >> It doesn't break the API and would address both SQL users
> > >> >and
> > >> >>> >more
> > >> >>> >> > >"big data" users.
> > >> >>> >> > >>
> > >> >>> >> > >> My $0.01 ;)
> > >> >>> >> > >>
> > >> >>> >> > >> Regards
> > >> >>> >> > >> JB
> > >> >>> >> > >>
> > >> >>> >> > >> ⁣
> > >> >>> >> > >>
> > >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> > >> >>> >> > ><dh...@google.com.INVALID> wrote:
> > >> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in
> decreasing
> > >> >>> >> > >>>preference:
> > >> >>> >> > >>>
> > >> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important
> > >> >keywords
> > >> >>> >are
> > >> >>> >> > >in
> > >> >>> >> > >>>the
> > >> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly
> > >> >pretty
> > >> >>> >dang
> > >> >>> >> > >>> descriptive.
> > >> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL
> user
> > >> >and
> > >> >>> >> > >likely
> > >> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API
> > >> >>> >change, so
> > >> >>> >> > >>>we
> > >> >>> >> > >>>should do it before we go stable.
> > >> >>> >> > >>>
> > >> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong
> that
> > >> >>> >> > >"Distinct"
> > >> >>> >> > >>>>>>
> > >> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>>
> > >> >"AvoidDuplicate".
> > >> >>> >> > >>>
> > >> >>> >> > >>>Dan
> > >> >>> >> > >>>
> > >> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> > >> >>> >> > >>><kl...@google.com.invalid>
> > >> >>> >> > >>>wrote:
> > >> >>> >> > >>>
> > >> >>> >> > >>>> The precedent that we use verbs has many exceptions. We
> > >> >have
> > >> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and
> I
> > >> >>> >would
> > >> >>> >> > >even
> > >> >>> >> > >>>> include Sum (at least when I read it).
> > >> >>> >> > >>>>
> > >> >>> >> > >>>> Historical note: the predilection towards verbs is from
> > >> >the
> > >> >>> >Google
> > >> >>> >> > >>>Style
> > >> >>> >> > >>>> Guide for Java method names
> > >> >>> >> > >>>>
> > >> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5.
> > >> >>> >> 2.3-method-names
> > >> >>> >> > >,
> > >> >>> >> > >>>> which states "Method names are typically verbs or verb
> > >> >>> >phrases".
> > >> >>> >> > >But
> > >> >>> >> > >>>even
> > >> >>> >> > >>>> in Google code there are lots of exceptions when it
> makes
> > >> >>> >sense,
> > >> >>> >> > >like
> > >> >>> >> > >>>> Guava's
> > >> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(),
> the
> > >> >>> >entire
> > >> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google
> > >> >code.
> > >> >>> >I
> > >> >>> >> > >>>suggest we
> > >> >>> >> > >>>> use our judgment rather than a policy.
> > >> >>> >> > >>>>
> > >> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a
> > >> >>> >standard
> > >> >>> >> > >>>widespread
> > >> >>> >> > >>>> name and also reads better as an adjective. I prefer it,
> > >> >but
> > >> >>> >also
> > >> >>> >> > >>>don't
> > >> >>> >> > >>>> care strongly enough to change it or to change it back
> :-)
> > >> >>> >> > >>>>
> > >> >>> >> > >>>> If we must have a verb, I like it as-is more than
> > >> >MakeDistinct
> > >> >>> >and
> > >> >>> >> > >>>> AvoidDuplicate.
> > >> >>> >> > >>>>
> > >> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> > >> >>> >> > >>><je...@smokinghand.com>
> > >> >>> >> > >>>> wrote:
> > >> >>> >> > >>>>
> > >> >>> >> > >>>> > My original thought for this change was that Crunch
> uses
> > >> >the
> > >> >>> >> > >class
> > >> >>> >> > >>>name
> > >> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct.
> > >> >>> >> > >>>> >
> > >> >>> >> > >>>> > Maybe the rule should be changed to adjectives or
> verbs
> > >> >>> >depending
> > >> >>> >> > >>>on the
> > >> >>> >> > >>>> > context.
> > >> >>> >> > >>>> >
> > >> >>> >> > >>>> > Using a verb to describe this class really doesn't
> > >> >connote
> > >> >>> >what
> > >> >>> >> > >the
> > >> >>> >> > >>>class
> > >> >>> >> > >>>> > does as succinctly as the adjective.
> > >> >>> >> > >>>> >
> > >> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> > >> >>> >> > >>><ns...@cloudera.com>
> > >> >>> >> > >>>> > wrote:
> > >> >>> >> > >>>> >
> > >> >>> >> > >>>> > > Hello,
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse
> > >> >for
> > >> >>> >their
> > >> >>> >> > >>>review on
> > >> >>> >> > >>>> > > this: https://issues.apache.org/
> jira/browse/BEAM-239
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> > > A point that came up was using verbs explicitly for
> > >> >>> >Transforms.
> > >> >>> >> > >>>> > > Here is the PR:
> > >> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> > > Posting it to help understand if we have a consensus
> > >> >for
> > >> >>> >it and
> > >> >>> >> > >>>if yes,
> > >> >>> >> > >>>> > we
> > >> >>> >> > >>>> > > could perhaps document it for future changes.
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> > > Thank you.
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> > > --
> > >> >>> >> > >>>> > > Neelesh Srinivas Salian
> > >> >>> >> > >>>> > > Engineer
> > >> >>> >> > >>>> > >
> > >> >>> >> > >>>> >
> > >> >>> >> > >>>>
> > >> >>> >> >
> > >> >>> >>
> > >> >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Neelesh Srinivas Salian
> > > Customer Operations Engineer
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Neelesh Srinivas Salian
> > Customer Operations Engineer
> >
>



-- 
Neelesh Srinivas Salian
Customer Operations Engineer

Re: [DISCUSS] Using Verbs for Transforms

Posted by Jesse Anderson <je...@smokinghand.com>.
@Neelesh Could you write an email to the user list explaining the change
since it is a breaking change?

On Tue, Nov 1, 2016 at 10:08 PM Neelesh Salian <ns...@cloudera.com> wrote:

> Thanks everyone. The PR was merged.
>
>
> On Thu, Oct 27, 2016 at 11:51 AM, Neelesh Salian <ns...@cloudera.com>
> wrote:
>
> > Thanks everyone for all the inputs.
> > It's really encouraging for a new contributor, as myself, to get valuable
> > input and mentoring (like on this thread) and, in turn, help make the
> > community better.
> >
> >
> >
> > On Thu, Oct 27, 2016 at 11:41 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> >> You did well ! It's an interesting discussion we have and it's great to
> >> have it on the mailing list (better than in Jira or PR comments IMHO).
> >>
> >> Thanks !
> >>
> >> Regards
> >> JB
> >>
> >> ⁣​
> >>
> >> On Oct 27, 2016, 20:39, at 20:39, Robert Bradshaw
> >> <ro...@google.com.INVALID> wrote:
> >> >+1 to all Dan says.
> >> >
> >> >I only brought this up because it seemed new contributors (yay)
> >> >jumping in and renaming a core transform based on "Something to
> >> >consider" deserved a couple more more eyeballs, but didn't intend for
> >> >it to become a big deal.
> >> >
> >> >On Thu, Oct 27, 2016 at 11:03 AM, Dan Halperin
> >> ><dh...@google.com.invalid> wrote:
> >> >> Folks, I don't think this needs to be a "vote". This is just not that
> >> >big a
> >> >> deal :). It is important to be transparent and have these discussions
> >> >on
> >> >> the list, which is why we brought it here from GitHub/JIRA, but at
> >> >the end
> >> >> of the day I hope that a small group of committers and developers can
> >> >> assess "good enough" consensus for these minor issues.
> >> >>
> >> >> Here's my assessment:
> >> >> * We don't really have any rules about naming transforms. "Should be
> >> >a
> >> >> verb" is a sort of guiding principle inherited from the Google Flume
> >> >> project from which Dataflow evolved, but honestly we violate this
> >> >rule for
> >> >> clarity all over the place. ("Values", for example).
> >> >> * The "Big Data" community is significantly more familiar with the
> >> >concept
> >> >> of Distinct -- Jesse, who filed the original JIRA, is a good example
> >> >here.
> >> >> * Finally, nobody feels very strongly. We could argue minor points of
> >> >each
> >> >> solution, but at the end of the day I don't think anyone wants to
> >> >block a
> >> >> change.
> >> >>
> >> >> Let's go with Distinct. It's important to align Beam with the open
> >> >source
> >> >> big data community. (And thanks Jesse, our newest (*tied) committer,
> >> >for
> >> >> pushing us in the right direction!)
> >> >>
> >> >> Jesse, can you please take charge of wrapping up the PR and merging
> >> >it?
> >> >>
> >> >> Thanks!
> >> >> Dan
> >> >>
> >> >> On Wed, Oct 26, 2016 at 11:12 PM, Jean-Baptiste Onofré
> >> ><jb...@nanthrax.net>
> >> >> wrote:
> >> >>
> >> >>> Just to clarify. Davor is right for a code modification change: -1
> >> >means a
> >> >>> veto.
> >> >>> I meant that -1 is not a veto for a release vote.
> >> >>>
> >> >>> Anyway, even if it's not a formal code, we can have a discussion
> >> >with
> >> >>> "options" a,b and c.
> >> >>>
> >> >>> Regards
> >> >>> JB
> >> >>>
> >> >>> ⁣
> >> >>>
> >> >>> On Oct 27, 2016, 06:48, at 06:48, Davor Bonaci
> >> ><da...@google.com.INVALID>
> >> >>> wrote:
> >> >>> >In terms of reaching a decision on any code or design changes,
> >> >>> >including
> >> >>> >this one, I'd suggest going without formal votes. Voting process
> >> >for
> >> >>> >code
> >> >>> >modifications between choices A and B doesn't necessarily end with
> >> >a
> >> >>> >decision A or B -- a single (qualified) -1 vote is a veto and
> >> >cannot be
> >> >>> >overridden [1]. Said differently, the guideline is that code
> >> >changes
> >> >>> >should
> >> >>> >be made by consensus; not by one group outvoting another. I'd like
> >> >to
> >> >>> >avoid
> >> >>> >setting such precedent; we should try to drive consensus, as
> >> >opposed to
> >> >>> >attempting to outvote another part of the community.
> >> >>> >
> >> >>> >In this particular case, we have had a great discussion. Many
> >> >>> >contributors
> >> >>> >brought different perspectives. Consequently, some opinions have
> >> >been
> >> >>> >likely changed. At this point, someone should summarize the
> >> >arguments,
> >> >>> >try
> >> >>> >to critique them from a neutral standpoint, and suggest a refined
> >> >>> >proposal
> >> >>> >that takes these perspectives into account. If nobody objects in a
> >> >>> >short
> >> >>> >time, we should consider this decided. [ I can certainly help here,
> >> >but
> >> >>> >I'd
> >> >>> >love to see somebody else do it! ]
> >> >>> >
> >> >>> >[1] http://www.apache.org/foundation/voting.html
> >> >>> >
> >> >>> >On Wed, Oct 26, 2016 at 7:35 AM, Ben Chambers
> >> >>> ><bc...@google.com.invalid>
> >> >>> >wrote:
> >> >>> >
> >> >>> >> I also like Distinct since it doesn't make it sound like it
> >> >modifies
> >> >>> >any
> >> >>> >> underlying collection. RemoveDuplicates makes it sound like the
> >> >>> >duplicates
> >> >>> >> are removed, rather than a new PCollection without duplicates
> >> >being
> >> >>> >> returned.
> >> >>> >>
> >> >>> >> On Wed, Oct 26, 2016, 7:36 AM Jean-Baptiste Onofré
> >> ><jb...@nanthrax.net>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> > Agree. It was more a transition proposal.
> >> >>> >> >
> >> >>> >> > Regards
> >> >>> >> > JB
> >> >>> >> >
> >> >>> >> > ⁣
> >> >>> >> >
> >> >>> >> > On Oct 26, 2016, 08:31, at 08:31, Robert Bradshaw
> >> >>> >> > <ro...@google.com.INVALID> wrote:
> >> >>> >> > >On Mon, Oct 24, 2016 at 11:02 PM, Jean-Baptiste Onofré
> >> >>> >> > ><jb...@nanthrax.net> wrote:
> >> >>> >> > >> And what about use RemoveDuplicates and create an alias
> >> >Distinct
> >> >>> >?
> >> >>> >> > >
> >> >>> >> > >I'd really like to avoid (long term) aliases--you end up
> >> >having to
> >> >>> >> > >document (and maintain) them both, and it adds confusion as to
> >> >>> >which
> >> >>> >> > >one to use (especially if they every diverge), and means
> >> >searching
> >> >>> >for
> >> >>> >> > >one or the other yields half the results.
> >> >>> >> > >
> >> >>> >> > >> It doesn't break the API and would address both SQL users
> >> >and
> >> >>> >more
> >> >>> >> > >"big data" users.
> >> >>> >> > >>
> >> >>> >> > >> My $0.01 ;)
> >> >>> >> > >>
> >> >>> >> > >> Regards
> >> >>> >> > >> JB
> >> >>> >> > >>
> >> >>> >> > >> ⁣
> >> >>> >> > >>
> >> >>> >> > >> On Oct 24, 2016, 22:23, at 22:23, Dan Halperin
> >> >>> >> > ><dh...@google.com.INVALID> wrote:
> >> >>> >> > >>>I find "MakeDistinct" more confusing. My votes in decreasing
> >> >>> >> > >>>preference:
> >> >>> >> > >>>
> >> >>> >> > >>>1. Keep `RemoveDuplicates` name, ensure that important
> >> >keywords
> >> >>> >are
> >> >>> >> > >in
> >> >>> >> > >>>the
> >> >>> >> > >>>Javadoc. This reduces churn on our users and is honestly
> >> >pretty
> >> >>> >dang
> >> >>> >> > >>> descriptive.
> >> >>> >> > >>>2. Rename to `Distinct`, which is clear if you're a SQL user
> >> >and
> >> >>> >> > >likely
> >> >>> >> > >>>less clear otherwise. This is a backwards-incompatible API
> >> >>> >change, so
> >> >>> >> > >>>we
> >> >>> >> > >>>should do it before we go stable.
> >> >>> >> > >>>
> >> >>> >> > >>>I am not super strong that 1 > 2, but I am very strong that
> >> >>> >> > >"Distinct"
> >> >>> >> > >>>>>>
> >> >>> >> > >>>"MakeDistinct" or and "RemoveDuplicates" >>>
> >> >"AvoidDuplicate".
> >> >>> >> > >>>
> >> >>> >> > >>>Dan
> >> >>> >> > >>>
> >> >>> >> > >>>On Mon, Oct 24, 2016 at 10:12 AM, Kenneth Knowles
> >> >>> >> > >>><kl...@google.com.invalid>
> >> >>> >> > >>>wrote:
> >> >>> >> > >>>
> >> >>> >> > >>>> The precedent that we use verbs has many exceptions. We
> >> >have
> >> >>> >> > >>>> ApproximateQuantiles, Values, Keys, WithTimestamps, and I
> >> >>> >would
> >> >>> >> > >even
> >> >>> >> > >>>> include Sum (at least when I read it).
> >> >>> >> > >>>>
> >> >>> >> > >>>> Historical note: the predilection towards verbs is from
> >> >the
> >> >>> >Google
> >> >>> >> > >>>Style
> >> >>> >> > >>>> Guide for Java method names
> >> >>> >> > >>>>
> >> >>> >> > >>><https://google.github.io/styleguide/javaguide.html#s5.
> >> >>> >> 2.3-method-names
> >> >>> >> > >,
> >> >>> >> > >>>> which states "Method names are typically verbs or verb
> >> >>> >phrases".
> >> >>> >> > >But
> >> >>> >> > >>>even
> >> >>> >> > >>>> in Google code there are lots of exceptions when it makes
> >> >>> >sense,
> >> >>> >> > >like
> >> >>> >> > >>>> Guava's
> >> >>> >> > >>>> Iterables.any(), Iterables.all(), Iterables.toArray(), the
> >> >>> >entire
> >> >>> >> > >>>> Predicates module, etc. Just an aside; Beam isn't Google
> >> >code.
> >> >>> >I
> >> >>> >> > >>>suggest we
> >> >>> >> > >>>> use our judgment rather than a policy.
> >> >>> >> > >>>>
> >> >>> >> > >>>> I think "Distinct" is one of those exceptions. It is a
> >> >>> >standard
> >> >>> >> > >>>widespread
> >> >>> >> > >>>> name and also reads better as an adjective. I prefer it,
> >> >but
> >> >>> >also
> >> >>> >> > >>>don't
> >> >>> >> > >>>> care strongly enough to change it or to change it back :-)
> >> >>> >> > >>>>
> >> >>> >> > >>>> If we must have a verb, I like it as-is more than
> >> >MakeDistinct
> >> >>> >and
> >> >>> >> > >>>> AvoidDuplicate.
> >> >>> >> > >>>>
> >> >>> >> > >>>> On Mon, Oct 24, 2016 at 9:46 AM Jesse Anderson
> >> >>> >> > >>><je...@smokinghand.com>
> >> >>> >> > >>>> wrote:
> >> >>> >> > >>>>
> >> >>> >> > >>>> > My original thought for this change was that Crunch uses
> >> >the
> >> >>> >> > >class
> >> >>> >> > >>>name
> >> >>> >> > >>>> > Distinct. SQL also uses the keyword distinct.
> >> >>> >> > >>>> >
> >> >>> >> > >>>> > Maybe the rule should be changed to adjectives or verbs
> >> >>> >depending
> >> >>> >> > >>>on the
> >> >>> >> > >>>> > context.
> >> >>> >> > >>>> >
> >> >>> >> > >>>> > Using a verb to describe this class really doesn't
> >> >connote
> >> >>> >what
> >> >>> >> > >the
> >> >>> >> > >>>class
> >> >>> >> > >>>> > does as succinctly as the adjective.
> >> >>> >> > >>>> >
> >> >>> >> > >>>> > On Mon, Oct 24, 2016 at 9:40 AM Neelesh Salian
> >> >>> >> > >>><ns...@cloudera.com>
> >> >>> >> > >>>> > wrote:
> >> >>> >> > >>>> >
> >> >>> >> > >>>> > > Hello,
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> > > First of all, thank you to Daniel, Robert and Jesse
> >> >for
> >> >>> >their
> >> >>> >> > >>>review on
> >> >>> >> > >>>> > > this: https://issues.apache.org/jira/browse/BEAM-239
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> > > A point that came up was using verbs explicitly for
> >> >>> >Transforms.
> >> >>> >> > >>>> > > Here is the PR:
> >> >>> >> > >>>https://github.com/apache/incubator-beam/pull/1164
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> > > Posting it to help understand if we have a consensus
> >> >for
> >> >>> >it and
> >> >>> >> > >>>if yes,
> >> >>> >> > >>>> > we
> >> >>> >> > >>>> > > could perhaps document it for future changes.
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> > > Thank you.
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> > > --
> >> >>> >> > >>>> > > Neelesh Srinivas Salian
> >> >>> >> > >>>> > > Engineer
> >> >>> >> > >>>> > >
> >> >>> >> > >>>> >
> >> >>> >> > >>>>
> >> >>> >> >
> >> >>> >>
> >> >>>
> >>
> >
> >
> >
> > --
> > Neelesh Srinivas Salian
> > Customer Operations Engineer
> >
> >
> >
> >
>
>
> --
> Neelesh Srinivas Salian
> Customer Operations Engineer
>