You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Eugene Kirpichov <ki...@google.com.INVALID> on 2017/04/20 20:09:47 UTC

Re: Should you always have a separate PTransform class for a new transform?

The discussion has been reignited on
https://github.com/apache/beam/pull/2603 .
I can see the pretty strong argument for not replicating all the features
of Combine on the hypothetical helper transform like Count.Globally.

I guess, then, I like Robert's option of having an interface like
"CombiningPTransform" with all the builder methods like withoutDefaults
etc., and returning that from transforms like Count.globally(), so that for
now one can simply return Combine.globally() from it (which would of course
implement this interface) but if later the implementation changes, it can
change to a composite transform builder which would also implement this
interface.

Notes on the PR itself:
- The change above would be a backward-compatible change so it wouldn't
necessarily have to go in before Beam stable release.
- However, if the current PR is submitted in the current form (returning a
Combine.Globally), then afterwards this change *would* be incompatible and
would have to be done before stable release.
So I'd suggest to hold off the PR until we reach consensus here.

On Wed, Feb 8, 2017 at 2:09 PM Eugene Kirpichov <ki...@google.com>
wrote:

> I think the value in having Mean.perKey() in addition to Mean.combineFn()
> is that using Mean.perKey() does not require knowledge of the combine
> concept, so easier for users. Generally, when using Beam, to simply compute
> a count or a mean, you should not need to know about combine.
>
> On Wed, Feb 8, 2017 at 2:06 PM Robert Bradshaw <ro...@google.com.invalid>
> wrote:
>
>> On Wed, Feb 8, 2017 at 1:27 PM, Eugene Kirpichov <
>> kirpichov@google.com.invalid> wrote:
>>
>> > So... Would it be fair to say that everybody would be satisfied if we
>> > treated the "glorified combine" transforms (Sum, Count, Mean, Sample,
>> > Latest) the following way:
>> > - For each case, SDK must expose the relevant CombineFn as a static
>> factory
>> > function: e.g. Sum.ofIntegers(), Latest.of(), etc. [it may make sense to
>> > discuss the naming of these CombineFn factory functions so they are not
>> > confused with neighboring PTransform factory functions]
>> >
>>
>> Or just expose the CombineFn class itself, rather than a factory method to
>> construct it? As for naming (if we go for factories), maybe
>> Mean.combineFn().
>>
>>
>> > - Expose also a PTransform factory function, returning either a
>> > PTransform<InputT, OutputT> (if there's nothing to configure on the
>> > transform), or its concrete subclass (if there is something to
>> configure).
>> > Transition from PTransform to subclass is always possible in a
>> > backward-compatible way, so it's safe to err on the side of returning
>> > PTransform.
>> > - Do *not* return concrete types from the PTransform factory function
>> such
>> > as Combine.Globally - instead, if the user has an advanced use case and
>> > wants to configure the combine, they should apply Combine.globally()
>> > themselves to your exposed CombineFn.
>> >
>> > In particular, this means:
>> > - Sum, Mean stay unchanged
>> > - Count, Sample, Latest should additionally expose their CombineFn's:
>> > Count.of()? or how should we name them?
>> > - Count.globally() and Count.perKey() should be changed from returning
>> > Combine.Globally and Combine.PerKey to returning the more general type
>> > PTransform<..., ...>. Cases where the user relies on them returning a
>> > Combine should be changed to applying the Combine manually.
>> >
>> > Makes sense?
>> >
>>
>> So the value in Mean.perKey() is solely in the fact that it's pithier for
>> Combine.perKey(Mean.combineFn())? Or do we assume that the former could
>> possibly get a new implementation, but the latter should be used if
>> additional configuration is needed?
>>
>> On Tue, Feb 7, 2017 at 10:50 PM Dan Halperin <dhalperi@google.com.invalid
>> >
>> > wrote:
>> >
>> > > I am generally persuaded to at least change my number to something
>> like 0
>> > > :). These are pretty reasonable perspectives, especially pointing out
>> > that
>> > > withSideInputs is pretty useless in Count ;)
>> > >
>> > > On Tue, Feb 7, 2017 at 10:04 PM, Kenneth Knowles
>> <klk@google.com.invalid
>> > >
>> > > wrote:
>> > >
>> > > >  On Tue, Feb 7, 2017 at 8:43 PM, Eugene Kirpichov <
>> > > >
>> > > > > kirpichov@google.com.invalid> wrote:
>> > > > > I must admit I didn't quite
>> > > > > understand the option of "implements CombiningTransform".
>> > > > >
>> > > >
>> > > > On Tue, Feb 7, 2017 at 9:04 PM, Robert Bradshaw
>> > > > <robertwb@google.com.invalid
>> > > > > wrote:
>> > > >
>> > > > > Sorry, I'll try to clarify. ... <<codez>>...
>> > > > >
>> > > >
>> > > > FWIW this is also what I meant by my 3b "HasACombineInsideIt". The
>> > > > difference between my suggestion and Robert's is that I used a self
>> > type,
>> > > > which is really not worth the trouble (in fact, it blocked me from
>> > > bringing
>> > > > this to fruition last time we had this same conversation).
>> > > >
>> > > > Kenn
>> > > >
>> > >
>> >
>>
>