You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Wesley Tanaka <wt...@wtanaka.com> on 2017/10/07 19:52:01 UTC

Making ReifyTimestampsAndWindowsFn public

GatherAllPanes.ReifyTimestampsAndWindowsFn looks useful for giving 
MapElements, Filter, et al access to PaneInfo and BoundedWindow. Is 
there a reason why that functionality shouldn't be made into a public 
PTransform?  I filed https://issues.apache.org/jira/browse/BEAM-3035 
which can be resolved invalid if this is a bad idea.

More generally, it seems like ValueInSingleWindow is hardly used across 
the API.  Is there a reason to avoid it, either in the API or in user 
code or both?

--
Wesley Tanaka
https://wtanaka.com/

Re: Making ReifyTimestampsAndWindowsFn public

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
We can mark things deprecated without removing them immediately to help
move users in the right direction. In this case ReifyTimestamps is a
package private class so we have freedom to change it how we want.


On Fri, Oct 13, 2017 at 6:57 PM, Wesley Tanaka <wt...@wtanaka.com>
wrote:

> Grouping everything under "Reify" sounds good because it's a short name
> and thus easier to read when reading pipeline code.  It also sounds good
> for early learners of the API -- people who don't need it can ignore a
> single class called Reify instead of having to ignore many similarly named
> classes ReifyTimestamps, ReifyValueInSingleWindows, Reify*...
>
> Would we mark ReifyTimestamps deprecated at the same time so that in the
> long term there was only a single Reify class?  Or would that need to wait
> for a 3.x release?
>
>
> On 10/11/2017 11:34 AM, Eugene Kirpichov wrote:
>
>> Luke, I think you're talking about the ability to *output into the given
>> window*. Wesley's code is about just *extracting* the current element's
>> windowing info and packaging it into a ValueInSingleWindow. I'd say +1,
>> this is a safe and potentially handy little utility transform. Such
>> reification is also mentioned in s.apache.org/context-fn as an argument
>> against needing explicit windowing information in context for user code
>> closures.
>>
>> In terms of API, I'd suggest to package this under Reify:
>> Reify.timestampedValues() could be a synonym for ReifyTimestamps,
>> Reify.valuesInWindows() could be what you've implemented.
>>
>> There's other kinds of reifications possible, don't know if it's a good
>> idea to put them under the same namespace or not: e.g. Reify.asIterable():
>> PCollection<T> -> PCollection<Iterable<T>> (equivalent to grouping by a
>> Void key and taking the values).
>>
>> On Wed, Oct 11, 2017 at 2:14 PM Lukasz Cwik <lc...@google.com.invalid>
>> wrote:
>>
>> Reifying requires outputting records within a given window and timestamp.
>>> Giving access to underlying information and the ability to output
>>> arbitrary
>>> records within arbitrary windows is dangerous as a user may not honor the
>>> windowing/triggering semantics that are required and a runner may drop
>>> records causing confusion for users.
>>>
>>> On Sat, Oct 7, 2017 at 12:52 PM, Wesley Tanaka <wtanaka+beam@wtanaka.com
>>> >
>>> wrote:
>>>
>>> GatherAllPanes.ReifyTimestampsAndWindowsFn looks useful for giving
>>>> MapElements, Filter, et al access to PaneInfo and BoundedWindow. Is
>>>>
>>> there a
>>>
>>>> reason why that functionality shouldn't be made into a public
>>>> PTransform?
>>>> I filed https://issues.apache.org/jira/browse/BEAM-3035 which can be
>>>> resolved invalid if this is a bad idea.
>>>>
>>>> More generally, it seems like ValueInSingleWindow is hardly used across
>>>> the API.  Is there a reason to avoid it, either in the API or in user
>>>>
>>> code
>>>
>>>> or both?
>>>>
>>>> --
>>>> Wesley Tanaka
>>>> https://wtanaka.com/
>>>>
>>>>
> --
> Wesley Tanaka
> https://wtanaka.com/
>
>

Re: Making ReifyTimestampsAndWindowsFn public

Posted by Wesley Tanaka <wt...@wtanaka.com>.
Grouping everything under "Reify" sounds good because it's a short name 
and thus easier to read when reading pipeline code.  It also sounds good 
for early learners of the API -- people who don't need it can ignore a 
single class called Reify instead of having to ignore many similarly 
named classes ReifyTimestamps, ReifyValueInSingleWindows, Reify*...

Would we mark ReifyTimestamps deprecated at the same time so that in the 
long term there was only a single Reify class?  Or would that need to 
wait for a 3.x release?

On 10/11/2017 11:34 AM, Eugene Kirpichov wrote:
> Luke, I think you're talking about the ability to *output into the given
> window*. Wesley's code is about just *extracting* the current element's
> windowing info and packaging it into a ValueInSingleWindow. I'd say +1,
> this is a safe and potentially handy little utility transform. Such
> reification is also mentioned in s.apache.org/context-fn as an argument
> against needing explicit windowing information in context for user code
> closures.
>
> In terms of API, I'd suggest to package this under Reify:
> Reify.timestampedValues() could be a synonym for ReifyTimestamps,
> Reify.valuesInWindows() could be what you've implemented.
>
> There's other kinds of reifications possible, don't know if it's a good
> idea to put them under the same namespace or not: e.g. Reify.asIterable():
> PCollection<T> -> PCollection<Iterable<T>> (equivalent to grouping by a
> Void key and taking the values).
>
> On Wed, Oct 11, 2017 at 2:14 PM Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
>> Reifying requires outputting records within a given window and timestamp.
>> Giving access to underlying information and the ability to output arbitrary
>> records within arbitrary windows is dangerous as a user may not honor the
>> windowing/triggering semantics that are required and a runner may drop
>> records causing confusion for users.
>>
>> On Sat, Oct 7, 2017 at 12:52 PM, Wesley Tanaka <wt...@wtanaka.com>
>> wrote:
>>
>>> GatherAllPanes.ReifyTimestampsAndWindowsFn looks useful for giving
>>> MapElements, Filter, et al access to PaneInfo and BoundedWindow. Is
>> there a
>>> reason why that functionality shouldn't be made into a public PTransform?
>>> I filed https://issues.apache.org/jira/browse/BEAM-3035 which can be
>>> resolved invalid if this is a bad idea.
>>>
>>> More generally, it seems like ValueInSingleWindow is hardly used across
>>> the API.  Is there a reason to avoid it, either in the API or in user
>> code
>>> or both?
>>>
>>> --
>>> Wesley Tanaka
>>> https://wtanaka.com/
>>>

-- 
Wesley Tanaka
https://wtanaka.com/


Re: Making ReifyTimestampsAndWindowsFn public

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Luke, I think you're talking about the ability to *output into the given
window*. Wesley's code is about just *extracting* the current element's
windowing info and packaging it into a ValueInSingleWindow. I'd say +1,
this is a safe and potentially handy little utility transform. Such
reification is also mentioned in s.apache.org/context-fn as an argument
against needing explicit windowing information in context for user code
closures.

In terms of API, I'd suggest to package this under Reify:
Reify.timestampedValues() could be a synonym for ReifyTimestamps,
Reify.valuesInWindows() could be what you've implemented.

There's other kinds of reifications possible, don't know if it's a good
idea to put them under the same namespace or not: e.g. Reify.asIterable():
PCollection<T> -> PCollection<Iterable<T>> (equivalent to grouping by a
Void key and taking the values).

On Wed, Oct 11, 2017 at 2:14 PM Lukasz Cwik <lc...@google.com.invalid>
wrote:

> Reifying requires outputting records within a given window and timestamp.
> Giving access to underlying information and the ability to output arbitrary
> records within arbitrary windows is dangerous as a user may not honor the
> windowing/triggering semantics that are required and a runner may drop
> records causing confusion for users.
>
> On Sat, Oct 7, 2017 at 12:52 PM, Wesley Tanaka <wt...@wtanaka.com>
> wrote:
>
> > GatherAllPanes.ReifyTimestampsAndWindowsFn looks useful for giving
> > MapElements, Filter, et al access to PaneInfo and BoundedWindow. Is
> there a
> > reason why that functionality shouldn't be made into a public PTransform?
> > I filed https://issues.apache.org/jira/browse/BEAM-3035 which can be
> > resolved invalid if this is a bad idea.
> >
> > More generally, it seems like ValueInSingleWindow is hardly used across
> > the API.  Is there a reason to avoid it, either in the API or in user
> code
> > or both?
> >
> > --
> > Wesley Tanaka
> > https://wtanaka.com/
> >
>

Re: Making ReifyTimestampsAndWindowsFn public

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
Reifying requires outputting records within a given window and timestamp.
Giving access to underlying information and the ability to output arbitrary
records within arbitrary windows is dangerous as a user may not honor the
windowing/triggering semantics that are required and a runner may drop
records causing confusion for users.

On Sat, Oct 7, 2017 at 12:52 PM, Wesley Tanaka <wt...@wtanaka.com>
wrote:

> GatherAllPanes.ReifyTimestampsAndWindowsFn looks useful for giving
> MapElements, Filter, et al access to PaneInfo and BoundedWindow. Is there a
> reason why that functionality shouldn't be made into a public PTransform?
> I filed https://issues.apache.org/jira/browse/BEAM-3035 which can be
> resolved invalid if this is a bad idea.
>
> More generally, it seems like ValueInSingleWindow is hardly used across
> the API.  Is there a reason to avoid it, either in the API or in user code
> or both?
>
> --
> Wesley Tanaka
> https://wtanaka.com/
>