You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Dan Halperin <dh...@google.com> on 2016/12/07 23:55:42 UTC

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

+user@, because this is a user-impacting change and they might not all be
paying attention to the dev@ list.

+1

I'm mildly reluctant because this will break all users that have written
composite transforms -- and I'm the jerk that filed the issue (a few times
now, on different iterations of the SDKs). But, like Ben said, I can't
think of a different way to do this that doesn't break users.

Hopefully, with breaking changes pushed to DoFn and to PTransform, the
worst user churn would be over. I can't think of anything quite so invasive.

IMO: if there is, we should try to push all the remaining "major" breaks
through in the same release.

Dan

On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> +1
>
> I've seen this mistake myself in some PRs.
>
> On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bc...@google.com.invalid>
> wrote:
>
> > +1 -- This seems like the best option. It's a mechanical change, and the
> > compiler will let users know it needs to be made. It will make the
> mistake
> > much less common, and when it occurs it will be much clearer what is
> wrong.
> >
> > It would be great if we could make the mis-use a compiler problem or a
> > pipeline construction time error without changing the names, but both of
> > these options are not practical. We can't hide the expansion method,
> since
> > it is what PTransform implementations need to override. We can't make
> this
> > a construction time exception since it would require adding code to every
> > PTransform implementation.
> >
> > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid>
> > wrote:
> >
> > > +1; This is probably the best way to make sure users don't reverse the
> > > polarity of the PCollection flow.
> > >
> > > This also brings PInput.expand(), POutput.expand(), and
> > > PTransform.expand(PInput) into line - namely, for some composite thing,
> > > "represent yourself as some collection of primitives" (potentially
> > > recursively).
> > >
> > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles <klk@google.com.invalid
> >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I want to bring up another major backwards-incompatible change before
> > it
> > > is
> > > > too late, to resolve [BEAM-438].
> > > >
> > > > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > > > PTransform.expand. I have opened [PR #1538] just for reference (it
> took
> > > 30
> > > > seconds using IDE automated refactor)
> > > >
> > > > This change affects *PTransform authors* but does *not* affect
> pipeline
> > > > authors.
> > > >
> > > > This issue was filed a long time ago. It has been a problem many
> times
> > > with
> > > > actual users since before Beam started incubating. This is what goes
> > > wrong
> > > > (often):
> > > >
> > > >    PCollection<Foo> input = ...
> > > >    PTransform<PCollection<Foo>, ...> transform = ...
> > > >
> > > >    transform.apply(input)
> > > >
> > > > This type checks and even looks perfectly normal. Do you see the
> error?
> > > >
> > > > ... what we need the user to write is:
> > > >
> > > >     input.apply(transform)
> > > >
> > > > What a confusing difference! After all, the first one type-checks and
> > the
> > > > first one is how you apply a Function or Predicate or
> > > SerializableFunction,
> > > > etc. But it is broken. With transform.apply(input) the transform is
> not
> > > > registered with the pipeline at all.
> > > >
> > > > We obviously can't (and don't want to) change the most core way that
> > > > pipeline authors use Beam, so PInput.apply (aka PCollection.apply)
> must
> > > > remain the same. But we do need a way to make it impossible to mix
> > these
> > > > up.
> > > >
> > > > The simplest way I can think of is to choose a new name for the other
> > > > method involved. Users probably won't write transform.expand(input)
> > since
> > > > they will never have seen it in any examples, etc. This will just
> make
> > > > PTransform authors need to do a global rename, and the type system
> will
> > > > direct them to all cases so there is no silent failure possible.
> > > >
> > > > What do you think?
> > > >
> > > > Kenn
> > > >
> > > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> > > >
> > > > p.s. there is a really amusing and confusing call chain:
> > > PCollection.apply
> > > > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > > > PipelineRunner.apply -> PTransform.apply
> > > >
> > > > After this change and work to get the runner out of the loop, it
> > becomes
> > > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> > > >
> > >
> >
>

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

Posted by Kenneth Knowles <kl...@google.com>.
Hi users@,

This has been done, and will show up in the next incubating release.

Kenn

On Wed, Dec 7, 2016 at 5:13 PM, Ben Chambers <bc...@google.com.invalid>
wrote:

> +1 to pushing all remaining "major" (read likely to affect everyone) breaks
> through in a single release.
>
> On Wed, Dec 7, 2016 at 3:56 PM Dan Halperin <dh...@google.com.invalid>
> wrote:
>
> > +user@, because this is a user-impacting change and they might not all
> be
> > paying attention to the dev@ list.
> >
> > +1
> >
> > I'm mildly reluctant because this will break all users that have written
> > composite transforms -- and I'm the jerk that filed the issue (a few
> times
> > now, on different iterations of the SDKs). But, like Ben said, I can't
> > think of a different way to do this that doesn't break users.
> >
> > Hopefully, with breaking changes pushed to DoFn and to PTransform, the
> > worst user churn would be over. I can't think of anything quite so
> > invasive.
> >
> > IMO: if there is, we should try to push all the remaining "major" breaks
> > through in the same release.
> >
> > Dan
> >
> > On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> > > +1
> > >
> > > I've seen this mistake myself in some PRs.
> > >
> > > On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bchambers@google.com.invalid
> >
> > > wrote:
> > >
> > > > +1 -- This seems like the best option. It's a mechanical change, and
> > the
> > > > compiler will let users know it needs to be made. It will make the
> > > mistake
> > > > much less common, and when it occurs it will be much clearer what is
> > > wrong.
> > > >
> > > > It would be great if we could make the mis-use a compiler problem or
> a
> > > > pipeline construction time error without changing the names, but both
> > of
> > > > these options are not practical. We can't hide the expansion method,
> > > since
> > > > it is what PTransform implementations need to override. We can't make
> > > this
> > > > a construction time exception since it would require adding code to
> > every
> > > > PTransform implementation.
> > > >
> > > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tgroh@google.com.invalid
> >
> > > > wrote:
> > > >
> > > > > +1; This is probably the best way to make sure users don't reverse
> > the
> > > > > polarity of the PCollection flow.
> > > > >
> > > > > This also brings PInput.expand(), POutput.expand(), and
> > > > > PTransform.expand(PInput) into line - namely, for some composite
> > thing,
> > > > > "represent yourself as some collection of primitives" (potentially
> > > > > recursively).
> > > > >
> > > > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles
> > <klk@google.com.invalid
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I want to bring up another major backwards-incompatible change
> > before
> > > > it
> > > > > is
> > > > > > too late, to resolve [BEAM-438].
> > > > > >
> > > > > > Summary: Leave PInput.apply the same but rename PTransform.apply
> to
> > > > > > PTransform.expand. I have opened [PR #1538] just for reference
> (it
> > > took
> > > > > 30
> > > > > > seconds using IDE automated refactor)
> > > > > >
> > > > > > This change affects *PTransform authors* but does *not* affect
> > > pipeline
> > > > > > authors.
> > > > > >
> > > > > > This issue was filed a long time ago. It has been a problem many
> > > times
> > > > > with
> > > > > > actual users since before Beam started incubating. This is what
> > goes
> > > > > wrong
> > > > > > (often):
> > > > > >
> > > > > >    PCollection<Foo> input = ...
> > > > > >    PTransform<PCollection<Foo>, ...> transform = ...
> > > > > >
> > > > > >    transform.apply(input)
> > > > > >
> > > > > > This type checks and even looks perfectly normal. Do you see the
> > > error?
> > > > > >
> > > > > > ... what we need the user to write is:
> > > > > >
> > > > > >     input.apply(transform)
> > > > > >
> > > > > > What a confusing difference! After all, the first one type-checks
> > and
> > > > the
> > > > > > first one is how you apply a Function or Predicate or
> > > > > SerializableFunction,
> > > > > > etc. But it is broken. With transform.apply(input) the transform
> is
> > > not
> > > > > > registered with the pipeline at all.
> > > > > >
> > > > > > We obviously can't (and don't want to) change the most core way
> > that
> > > > > > pipeline authors use Beam, so PInput.apply (aka
> PCollection.apply)
> > > must
> > > > > > remain the same. But we do need a way to make it impossible to
> mix
> > > > these
> > > > > > up.
> > > > > >
> > > > > > The simplest way I can think of is to choose a new name for the
> > other
> > > > > > method involved. Users probably won't write
> transform.expand(input)
> > > > since
> > > > > > they will never have seen it in any examples, etc. This will just
> > > make
> > > > > > PTransform authors need to do a global rename, and the type
> system
> > > will
> > > > > > direct them to all cases so there is no silent failure possible.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Kenn
> > > > > >
> > > > > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > > > > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> > > > > >
> > > > > > p.s. there is a really amusing and confusing call chain:
> > > > > PCollection.apply
> > > > > > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > > > > > PipelineRunner.apply -> PTransform.apply
> > > > > >
> > > > > > After this change and work to get the runner out of the loop, it
> > > > becomes
> > > > > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

Posted by Ben Chambers <bc...@google.com>.
+1 to pushing all remaining "major" (read likely to affect everyone) breaks
through in a single release.

On Wed, Dec 7, 2016 at 3:56 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> +user@, because this is a user-impacting change and they might not all be
> paying attention to the dev@ list.
>
> +1
>
> I'm mildly reluctant because this will break all users that have written
> composite transforms -- and I'm the jerk that filed the issue (a few times
> now, on different iterations of the SDKs). But, like Ben said, I can't
> think of a different way to do this that doesn't break users.
>
> Hopefully, with breaking changes pushed to DoFn and to PTransform, the
> worst user churn would be over. I can't think of anything quite so
> invasive.
>
> IMO: if there is, we should try to push all the remaining "major" breaks
> through in the same release.
>
> Dan
>
> On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > +1
> >
> > I've seen this mistake myself in some PRs.
> >
> > On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bc...@google.com.invalid>
> > wrote:
> >
> > > +1 -- This seems like the best option. It's a mechanical change, and
> the
> > > compiler will let users know it needs to be made. It will make the
> > mistake
> > > much less common, and when it occurs it will be much clearer what is
> > wrong.
> > >
> > > It would be great if we could make the mis-use a compiler problem or a
> > > pipeline construction time error without changing the names, but both
> of
> > > these options are not practical. We can't hide the expansion method,
> > since
> > > it is what PTransform implementations need to override. We can't make
> > this
> > > a construction time exception since it would require adding code to
> every
> > > PTransform implementation.
> > >
> > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid>
> > > wrote:
> > >
> > > > +1; This is probably the best way to make sure users don't reverse
> the
> > > > polarity of the PCollection flow.
> > > >
> > > > This also brings PInput.expand(), POutput.expand(), and
> > > > PTransform.expand(PInput) into line - namely, for some composite
> thing,
> > > > "represent yourself as some collection of primitives" (potentially
> > > > recursively).
> > > >
> > > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles
> <klk@google.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I want to bring up another major backwards-incompatible change
> before
> > > it
> > > > is
> > > > > too late, to resolve [BEAM-438].
> > > > >
> > > > > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > > > > PTransform.expand. I have opened [PR #1538] just for reference (it
> > took
> > > > 30
> > > > > seconds using IDE automated refactor)
> > > > >
> > > > > This change affects *PTransform authors* but does *not* affect
> > pipeline
> > > > > authors.
> > > > >
> > > > > This issue was filed a long time ago. It has been a problem many
> > times
> > > > with
> > > > > actual users since before Beam started incubating. This is what
> goes
> > > > wrong
> > > > > (often):
> > > > >
> > > > >    PCollection<Foo> input = ...
> > > > >    PTransform<PCollection<Foo>, ...> transform = ...
> > > > >
> > > > >    transform.apply(input)
> > > > >
> > > > > This type checks and even looks perfectly normal. Do you see the
> > error?
> > > > >
> > > > > ... what we need the user to write is:
> > > > >
> > > > >     input.apply(transform)
> > > > >
> > > > > What a confusing difference! After all, the first one type-checks
> and
> > > the
> > > > > first one is how you apply a Function or Predicate or
> > > > SerializableFunction,
> > > > > etc. But it is broken. With transform.apply(input) the transform is
> > not
> > > > > registered with the pipeline at all.
> > > > >
> > > > > We obviously can't (and don't want to) change the most core way
> that
> > > > > pipeline authors use Beam, so PInput.apply (aka PCollection.apply)
> > must
> > > > > remain the same. But we do need a way to make it impossible to mix
> > > these
> > > > > up.
> > > > >
> > > > > The simplest way I can think of is to choose a new name for the
> other
> > > > > method involved. Users probably won't write transform.expand(input)
> > > since
> > > > > they will never have seen it in any examples, etc. This will just
> > make
> > > > > PTransform authors need to do a global rename, and the type system
> > will
> > > > > direct them to all cases so there is no silent failure possible.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Kenn
> > > > >
> > > > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > > > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> > > > >
> > > > > p.s. there is a really amusing and confusing call chain:
> > > > PCollection.apply
> > > > > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > > > > PipelineRunner.apply -> PTransform.apply
> > > > >
> > > > > After this change and work to get the runner out of the loop, it
> > > becomes
> > > > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] [BEAM-438] Rename one of PTransform.apply or PInput.apply

Posted by Ben Chambers <bc...@google.com.INVALID>.
+1 to pushing all remaining "major" (read likely to affect everyone) breaks
through in a single release.

On Wed, Dec 7, 2016 at 3:56 PM Dan Halperin <dh...@google.com.invalid>
wrote:

> +user@, because this is a user-impacting change and they might not all be
> paying attention to the dev@ list.
>
> +1
>
> I'm mildly reluctant because this will break all users that have written
> composite transforms -- and I'm the jerk that filed the issue (a few times
> now, on different iterations of the SDKs). But, like Ben said, I can't
> think of a different way to do this that doesn't break users.
>
> Hopefully, with breaking changes pushed to DoFn and to PTransform, the
> worst user churn would be over. I can't think of anything quite so
> invasive.
>
> IMO: if there is, we should try to push all the remaining "major" breaks
> through in the same release.
>
> Dan
>
> On Thu, Dec 8, 2016 at 7:48 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > +1
> >
> > I've seen this mistake myself in some PRs.
> >
> > On Thu, 8 Dec 2016 at 06:10 Ben Chambers <bc...@google.com.invalid>
> > wrote:
> >
> > > +1 -- This seems like the best option. It's a mechanical change, and
> the
> > > compiler will let users know it needs to be made. It will make the
> > mistake
> > > much less common, and when it occurs it will be much clearer what is
> > wrong.
> > >
> > > It would be great if we could make the mis-use a compiler problem or a
> > > pipeline construction time error without changing the names, but both
> of
> > > these options are not practical. We can't hide the expansion method,
> > since
> > > it is what PTransform implementations need to override. We can't make
> > this
> > > a construction time exception since it would require adding code to
> every
> > > PTransform implementation.
> > >
> > > On Wed, Dec 7, 2016 at 1:55 PM Thomas Groh <tg...@google.com.invalid>
> > > wrote:
> > >
> > > > +1; This is probably the best way to make sure users don't reverse
> the
> > > > polarity of the PCollection flow.
> > > >
> > > > This also brings PInput.expand(), POutput.expand(), and
> > > > PTransform.expand(PInput) into line - namely, for some composite
> thing,
> > > > "represent yourself as some collection of primitives" (potentially
> > > > recursively).
> > > >
> > > > On Wed, Dec 7, 2016 at 1:37 PM, Kenneth Knowles
> <klk@google.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I want to bring up another major backwards-incompatible change
> before
> > > it
> > > > is
> > > > > too late, to resolve [BEAM-438].
> > > > >
> > > > > Summary: Leave PInput.apply the same but rename PTransform.apply to
> > > > > PTransform.expand. I have opened [PR #1538] just for reference (it
> > took
> > > > 30
> > > > > seconds using IDE automated refactor)
> > > > >
> > > > > This change affects *PTransform authors* but does *not* affect
> > pipeline
> > > > > authors.
> > > > >
> > > > > This issue was filed a long time ago. It has been a problem many
> > times
> > > > with
> > > > > actual users since before Beam started incubating. This is what
> goes
> > > > wrong
> > > > > (often):
> > > > >
> > > > >    PCollection<Foo> input = ...
> > > > >    PTransform<PCollection<Foo>, ...> transform = ...
> > > > >
> > > > >    transform.apply(input)
> > > > >
> > > > > This type checks and even looks perfectly normal. Do you see the
> > error?
> > > > >
> > > > > ... what we need the user to write is:
> > > > >
> > > > >     input.apply(transform)
> > > > >
> > > > > What a confusing difference! After all, the first one type-checks
> and
> > > the
> > > > > first one is how you apply a Function or Predicate or
> > > > SerializableFunction,
> > > > > etc. But it is broken. With transform.apply(input) the transform is
> > not
> > > > > registered with the pipeline at all.
> > > > >
> > > > > We obviously can't (and don't want to) change the most core way
> that
> > > > > pipeline authors use Beam, so PInput.apply (aka PCollection.apply)
> > must
> > > > > remain the same. But we do need a way to make it impossible to mix
> > > these
> > > > > up.
> > > > >
> > > > > The simplest way I can think of is to choose a new name for the
> other
> > > > > method involved. Users probably won't write transform.expand(input)
> > > since
> > > > > they will never have seen it in any examples, etc. This will just
> > make
> > > > > PTransform authors need to do a global rename, and the type system
> > will
> > > > > direct them to all cases so there is no silent failure possible.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Kenn
> > > > >
> > > > > [BEAM-438] https://issues.apache.org/jira/browse/BEAM-438
> > > > > [PR #1538] https://github.com/apache/incubator-beam/pull/1538
> > > > >
> > > > > p.s. there is a really amusing and confusing call chain:
> > > > PCollection.apply
> > > > > -> Pipeline.applyTransform -> Pipeline.applyInternal ->
> > > > > PipelineRunner.apply -> PTransform.apply
> > > > >
> > > > > After this change and work to get the runner out of the loop, it
> > > becomes
> > > > > PCollection.apply -> Pipeline.applyTransform -> PTransform.expand
> > > > >
> > > >
> > >
> >
>