You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kenneth Knowles <kl...@google.com.INVALID> on 2016/03/23 17:12:58 UTC

Re: Capability matrix question

+1 to considering "metric" / PMetric / etc.

On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com> wrote:

> How about "PMetric" ?
>
> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
>
>>
>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line
>>>> such as the following:
>>>>
>>>> PCollection<KV<String, Double>> meanByName =
>>>> dataPoints.apply(Mean.<String, Double>perKey());
>>>>
>>>> …would be considered an Aggregator, since it applies a mean aggregation
>>>> over a window. Is that correct, with respect to the Beam terminology? If
>>>> not, what would an example of an Aggregator be?
>>>>
>>>
>> Ah, we may have some slightly confusing terminology here.
>>
>> In that code snippet you are using a PTransform (Mean.perKey) to combine
>> a PCollection using the Mean CombineFn
>> <https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>.
>> An Aggregator
>> <https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54>
>> takes a CombineFn and applies it continuously within a DoFn. So it's more
>> analogous to a 'counter'. You can see an example of aggregators in
>> DebuggingWordCount
>> <https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129>
>> .
>>
>> We never really used the term *aggregation *to refer to a general set of
>> PTransforms until we started describing things to the community. But it is
>> a useful word, so we've ended up in a bit of confusing state. Maybe we
>> should consider renaming Aggregator? Something like "metric" might be
>> clearer.
>>
>>

Re: Capability matrix question

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1 to Metric

Regards
JB

On 03/23/2016 09:56 PM, Dan Halperin wrote:
> +1 @Amit =>  -1 to Counter but +1 to Metric.
>
> On Wed, Mar 23, 2016 at 1:43 PM, Amit Sela <am...@gmail.com> wrote:
>
>> IMHO Counters just count..  Metrics measure things, so I think metrics
>> sounds better. Accumulators and Aggregators would have been good as well if
>> they weren't so overloaded.
>> That's just my thoughts here though..
>>
>> On Wed, Mar 23, 2016 at 10:38 PM Robert Bradshaw
>> <ro...@google.com.invalid> wrote:
>>
>>> +1 to renaming this. [P]Counter is another option.
>>>
>>> On Wed, Mar 23, 2016 at 9:12 AM, Kenneth Knowles <klk@google.com.invalid
>>>
>>> wrote:
>>>> +1 to considering "metric" / PMetric / etc.
>>>>
>>>> On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com>
>> wrote:
>>>>
>>>>> How about "PMetric" ?
>>>>>
>>>>> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
>>>>>
>>>>>>
>>>>>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a
>> line
>>>>>>>> such as the following:
>>>>>>>>
>>>>>>>> PCollection<KV<String, Double>> meanByName =
>>>>>>>> dataPoints.apply(Mean.<String, Double>perKey());
>>>>>>>>
>>>>>>>> …would be considered an Aggregator, since it applies a mean
>>> aggregation
>>>>>>>> over a window. Is that correct, with respect to the Beam
>>> terminology? If
>>>>>>>> not, what would an example of an Aggregator be?
>>>>>>>>
>>>>>>>
>>>>>> Ah, we may have some slightly confusing terminology here.
>>>>>>
>>>>>> In that code snippet you are using a PTransform (Mean.perKey) to
>>> combine
>>>>>> a PCollection using the Mean CombineFn
>>>>>> <
>>>
>> https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359
>>>> .
>>>>>> An Aggregator
>>>>>> <
>>>
>> https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54
>>>>
>>>>>> takes a CombineFn and applies it continuously within a DoFn. So it's
>>> more
>>>>>> analogous to a 'counter'. You can see an example of aggregators in
>>>>>> DebuggingWordCount
>>>>>> <
>>>
>> https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129
>>>>
>>>>>> .
>>>>>>
>>>>>> We never really used the term *aggregation *to refer to a general set
>>> of
>>>>>> PTransforms until we started describing things to the community. But
>>> it is
>>>>>> a useful word, so we've ended up in a bit of confusing state. Maybe
>> we
>>>>>> should consider renaming Aggregator? Something like "metric" might be
>>>>>> clearer.
>>>>>>
>>>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Capability matrix question

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
+1 to Metric too.

Sounds like there's consensus on renaming to something, likely
[P]Metric. I created https://issues.apache.org/jira/browse/BEAM-147 to
track the actual work.

On Wed, Mar 23, 2016 at 1:56 PM, Dan Halperin
<dh...@google.com.invalid> wrote:
> +1 @Amit =>  -1 to Counter but +1 to Metric.
>
> On Wed, Mar 23, 2016 at 1:43 PM, Amit Sela <am...@gmail.com> wrote:
>
>> IMHO Counters just count..  Metrics measure things, so I think metrics
>> sounds better. Accumulators and Aggregators would have been good as well if
>> they weren't so overloaded.
>> That's just my thoughts here though..
>>
>> On Wed, Mar 23, 2016 at 10:38 PM Robert Bradshaw
>> <ro...@google.com.invalid> wrote:
>>
>> > +1 to renaming this. [P]Counter is another option.
>> >
>> > On Wed, Mar 23, 2016 at 9:12 AM, Kenneth Knowles <klk@google.com.invalid
>> >
>> > wrote:
>> > > +1 to considering "metric" / PMetric / etc.
>> > >
>> > > On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com>
>> wrote:
>> > >
>> > >> How about "PMetric" ?
>> > >>
>> > >> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
>> > >>
>> > >>>
>> > >>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a
>> line
>> > >>>>> such as the following:
>> > >>>>>
>> > >>>>> PCollection<KV<String, Double>> meanByName =
>> > >>>>> dataPoints.apply(Mean.<String, Double>perKey());
>> > >>>>>
>> > >>>>> …would be considered an Aggregator, since it applies a mean
>> > aggregation
>> > >>>>> over a window. Is that correct, with respect to the Beam
>> > terminology? If
>> > >>>>> not, what would an example of an Aggregator be?
>> > >>>>>
>> > >>>>
>> > >>> Ah, we may have some slightly confusing terminology here.
>> > >>>
>> > >>> In that code snippet you are using a PTransform (Mean.perKey) to
>> > combine
>> > >>> a PCollection using the Mean CombineFn
>> > >>> <
>> >
>> https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359
>> > >.
>> > >>> An Aggregator
>> > >>> <
>> >
>> https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54
>> > >
>> > >>> takes a CombineFn and applies it continuously within a DoFn. So it's
>> > more
>> > >>> analogous to a 'counter'. You can see an example of aggregators in
>> > >>> DebuggingWordCount
>> > >>> <
>> >
>> https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129
>> > >
>> > >>> .
>> > >>>
>> > >>> We never really used the term *aggregation *to refer to a general set
>> > of
>> > >>> PTransforms until we started describing things to the community. But
>> > it is
>> > >>> a useful word, so we've ended up in a bit of confusing state. Maybe
>> we
>> > >>> should consider renaming Aggregator? Something like "metric" might be
>> > >>> clearer.
>> > >>>
>> > >>>
>> >
>>

Re: Capability matrix question

Posted by Dan Halperin <dh...@google.com.INVALID>.
+1 @Amit =>  -1 to Counter but +1 to Metric.

On Wed, Mar 23, 2016 at 1:43 PM, Amit Sela <am...@gmail.com> wrote:

> IMHO Counters just count..  Metrics measure things, so I think metrics
> sounds better. Accumulators and Aggregators would have been good as well if
> they weren't so overloaded.
> That's just my thoughts here though..
>
> On Wed, Mar 23, 2016 at 10:38 PM Robert Bradshaw
> <ro...@google.com.invalid> wrote:
>
> > +1 to renaming this. [P]Counter is another option.
> >
> > On Wed, Mar 23, 2016 at 9:12 AM, Kenneth Knowles <klk@google.com.invalid
> >
> > wrote:
> > > +1 to considering "metric" / PMetric / etc.
> > >
> > > On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com>
> wrote:
> > >
> > >> How about "PMetric" ?
> > >>
> > >> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
> > >>
> > >>>
> > >>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a
> line
> > >>>>> such as the following:
> > >>>>>
> > >>>>> PCollection<KV<String, Double>> meanByName =
> > >>>>> dataPoints.apply(Mean.<String, Double>perKey());
> > >>>>>
> > >>>>> …would be considered an Aggregator, since it applies a mean
> > aggregation
> > >>>>> over a window. Is that correct, with respect to the Beam
> > terminology? If
> > >>>>> not, what would an example of an Aggregator be?
> > >>>>>
> > >>>>
> > >>> Ah, we may have some slightly confusing terminology here.
> > >>>
> > >>> In that code snippet you are using a PTransform (Mean.perKey) to
> > combine
> > >>> a PCollection using the Mean CombineFn
> > >>> <
> >
> https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359
> > >.
> > >>> An Aggregator
> > >>> <
> >
> https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54
> > >
> > >>> takes a CombineFn and applies it continuously within a DoFn. So it's
> > more
> > >>> analogous to a 'counter'. You can see an example of aggregators in
> > >>> DebuggingWordCount
> > >>> <
> >
> https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129
> > >
> > >>> .
> > >>>
> > >>> We never really used the term *aggregation *to refer to a general set
> > of
> > >>> PTransforms until we started describing things to the community. But
> > it is
> > >>> a useful word, so we've ended up in a bit of confusing state. Maybe
> we
> > >>> should consider renaming Aggregator? Something like "metric" might be
> > >>> clearer.
> > >>>
> > >>>
> >
>

Re: Capability matrix question

Posted by Amit Sela <am...@gmail.com>.
IMHO Counters just count..  Metrics measure things, so I think metrics
sounds better. Accumulators and Aggregators would have been good as well if
they weren't so overloaded.
That's just my thoughts here though..

On Wed, Mar 23, 2016 at 10:38 PM Robert Bradshaw
<ro...@google.com.invalid> wrote:

> +1 to renaming this. [P]Counter is another option.
>
> On Wed, Mar 23, 2016 at 9:12 AM, Kenneth Knowles <kl...@google.com.invalid>
> wrote:
> > +1 to considering "metric" / PMetric / etc.
> >
> > On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com> wrote:
> >
> >> How about "PMetric" ?
> >>
> >> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
> >>
> >>>
> >>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line
> >>>>> such as the following:
> >>>>>
> >>>>> PCollection<KV<String, Double>> meanByName =
> >>>>> dataPoints.apply(Mean.<String, Double>perKey());
> >>>>>
> >>>>> …would be considered an Aggregator, since it applies a mean
> aggregation
> >>>>> over a window. Is that correct, with respect to the Beam
> terminology? If
> >>>>> not, what would an example of an Aggregator be?
> >>>>>
> >>>>
> >>> Ah, we may have some slightly confusing terminology here.
> >>>
> >>> In that code snippet you are using a PTransform (Mean.perKey) to
> combine
> >>> a PCollection using the Mean CombineFn
> >>> <
> https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359
> >.
> >>> An Aggregator
> >>> <
> https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54
> >
> >>> takes a CombineFn and applies it continuously within a DoFn. So it's
> more
> >>> analogous to a 'counter'. You can see an example of aggregators in
> >>> DebuggingWordCount
> >>> <
> https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129
> >
> >>> .
> >>>
> >>> We never really used the term *aggregation *to refer to a general set
> of
> >>> PTransforms until we started describing things to the community. But
> it is
> >>> a useful word, so we've ended up in a bit of confusing state. Maybe we
> >>> should consider renaming Aggregator? Something like "metric" might be
> >>> clearer.
> >>>
> >>>
>

Re: Capability matrix question

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
+1 to renaming this. [P]Counter is another option.

On Wed, Mar 23, 2016 at 9:12 AM, Kenneth Knowles <kl...@google.com.invalid> wrote:
> +1 to considering "metric" / PMetric / etc.
>
> On Wed, Mar 23, 2016 at 8:09 AM, Amit Sela <am...@gmail.com> wrote:
>
>> How about "PMetric" ?
>>
>> On Wed, Mar 23, 2016, 16:53 Frances Perry <fj...@google.com> wrote:
>>
>>>
>>>>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line
>>>>> such as the following:
>>>>>
>>>>> PCollection<KV<String, Double>> meanByName =
>>>>> dataPoints.apply(Mean.<String, Double>perKey());
>>>>>
>>>>> …would be considered an Aggregator, since it applies a mean aggregation
>>>>> over a window. Is that correct, with respect to the Beam terminology? If
>>>>> not, what would an example of an Aggregator be?
>>>>>
>>>>
>>> Ah, we may have some slightly confusing terminology here.
>>>
>>> In that code snippet you are using a PTransform (Mean.perKey) to combine
>>> a PCollection using the Mean CombineFn
>>> <https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>.
>>> An Aggregator
>>> <https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54>
>>> takes a CombineFn and applies it continuously within a DoFn. So it's more
>>> analogous to a 'counter'. You can see an example of aggregators in
>>> DebuggingWordCount
>>> <https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129>
>>> .
>>>
>>> We never really used the term *aggregation *to refer to a general set of
>>> PTransforms until we started describing things to the community. But it is
>>> a useful word, so we've ended up in a bit of confusing state. Maybe we
>>> should consider renaming Aggregator? Something like "metric" might be
>>> clearer.
>>>
>>>