You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ben Chambers <bc...@apache.org> on 2023/06/12 18:50:54 UTC

[Proposal] Kaskada DSL and FnHarness for Temporal Queries

Hello Beam!

Kaskada has created a query language for expressing temporal queries,
making it easy to work with multiple streams and perform temporally
correct joins. We’re looking at taking our native, columnar execution
engine and making it available as a PTransform and FnHarness for use
with Apache Beam.

We’ve drafted a [short document][proposal] outlining our planned
approach and the potential benefits to Kaskada and Beam users. It
would be super helpful to get some feedback on this approach and any
ways that it could be improved / better integrated with Beam to
provide more value!

Could you see yourself using (or contributing) to this work? Let us know!

Thanks!

Ben

[proposal]: https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Posted by Ben Chambers <bc...@apache.org>.
Hey Daniel -- Great question!

Kaskada was designed to be similar to SQL but with a few differences.
The most significant is the assumption of both ordering and grouping.
Kaskada uses this to automatically merge multiple input collections,
and to allow data-dependent windows that identify a range of time. For
instance, the query `Purchases.amount | sum(window = since(Login))` to
sum the amount spent since the last login. In user studies, we've
heard that these make it much easier to compose queries analyzing the
entire "journey" or "funnel" for each user.

There are also cases where the ordering assumption *isn't* a good fit
-- queries that aren't as sensitive to time. Having both options
readily available would allow a user to choose what is most natural to
them and their use case.

-- Ben

On Mon, Jun 12, 2023 at 12:14 PM Daniel Collins <dp...@google.com> wrote:
>
> How does this mechanism differ from beam SQL which already offers windowing via SQL over PCollections?
>
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/
>
> -Daniel
>
> On Mon, Jun 12, 2023 at 3:11 PM Ryan Michael <ke...@gmail.com> wrote:
>>
>> Hello, Beam (also)!
>>
>> Just introducing myself - I'm Ryan and I've been working with Ben on the Kaskada project for the past few years. As Ben mentioned, I think there's a great opportunity to bring together some of the work we've done to make time-based computation easier to reason about with the Beam community's work on scalable streaming computation.
>>
>> I'll be at the Beam Summit in NYC starting Wednesday and presenting a short overview of how we see Kaskada fitting into the Generative AI world at the "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or GenAI) is interesting to you and you'll be at the conference I'd love to touch base in person!
>>
>> -Ryan
>>
>> On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <bc...@apache.org> wrote:
>>>
>>> Hello Beam!
>>>
>>> Kaskada has created a query language for expressing temporal queries,
>>> making it easy to work with multiple streams and perform temporally
>>> correct joins. We’re looking at taking our native, columnar execution
>>> engine and making it available as a PTransform and FnHarness for use
>>> with Apache Beam.
>>>
>>> We’ve drafted a [short document][proposal] outlining our planned
>>> approach and the potential benefits to Kaskada and Beam users. It
>>> would be super helpful to get some feedback on this approach and any
>>> ways that it could be improved / better integrated with Beam to
>>> provide more value!
>>>
>>> Could you see yourself using (or contributing) to this work? Let us know!
>>>
>>> Thanks!
>>>
>>> Ben
>>>
>>> [proposal]: https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>>
>>
>>
>> --
>> Ryan Michael
>> kerinin@gmail.com | 512.466.3662 | github | linkedin

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Posted by Ben Chambers <bc...@apache.org>.
Hey Daniel -- Great question!

Kaskada was designed to be similar to SQL but with a few differences.
The most significant is the assumption of both ordering and grouping.
Kaskada uses this to automatically merge multiple input collections,
and to allow data-dependent windows that identify a range of time. For
instance, the query `Purchases.amount | sum(window = since(Login))` to
sum the amount spent since the last login. In user studies, we've
heard that these make it much easier to compose queries analyzing the
entire "journey" or "funnel" for each user.

There are also cases where the ordering assumption *isn't* a good fit
-- queries that aren't as sensitive to time. Having both options
readily available would allow a user to choose what is most natural to
them and their use case.

-- Ben

On Mon, Jun 12, 2023 at 12:14 PM Daniel Collins <dp...@google.com> wrote:
>
> How does this mechanism differ from beam SQL which already offers windowing via SQL over PCollections?
>
> https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/
>
> -Daniel
>
> On Mon, Jun 12, 2023 at 3:11 PM Ryan Michael <ke...@gmail.com> wrote:
>>
>> Hello, Beam (also)!
>>
>> Just introducing myself - I'm Ryan and I've been working with Ben on the Kaskada project for the past few years. As Ben mentioned, I think there's a great opportunity to bring together some of the work we've done to make time-based computation easier to reason about with the Beam community's work on scalable streaming computation.
>>
>> I'll be at the Beam Summit in NYC starting Wednesday and presenting a short overview of how we see Kaskada fitting into the Generative AI world at the "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or GenAI) is interesting to you and you'll be at the conference I'd love to touch base in person!
>>
>> -Ryan
>>
>> On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <bc...@apache.org> wrote:
>>>
>>> Hello Beam!
>>>
>>> Kaskada has created a query language for expressing temporal queries,
>>> making it easy to work with multiple streams and perform temporally
>>> correct joins. We’re looking at taking our native, columnar execution
>>> engine and making it available as a PTransform and FnHarness for use
>>> with Apache Beam.
>>>
>>> We’ve drafted a [short document][proposal] outlining our planned
>>> approach and the potential benefits to Kaskada and Beam users. It
>>> would be super helpful to get some feedback on this approach and any
>>> ways that it could be improved / better integrated with Beam to
>>> provide more value!
>>>
>>> Could you see yourself using (or contributing) to this work? Let us know!
>>>
>>> Thanks!
>>>
>>> Ben
>>>
>>> [proposal]: https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>>
>>
>>
>> --
>> Ryan Michael
>> kerinin@gmail.com | 512.466.3662 | github | linkedin

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Posted by Daniel Collins via dev <de...@beam.apache.org>.
How does this mechanism differ from beam SQL which already offers windowing
via SQL over PCollections?

https://beam.apache.org/documentation/dsls/sql/extensions/windowing-and-triggering/

-Daniel

On Mon, Jun 12, 2023 at 3:11 PM Ryan Michael <ke...@gmail.com> wrote:

> Hello, Beam (also)!
>
> Just introducing myself - I'm Ryan and I've been working with Ben on the
> Kaskada project for the past few years. As Ben mentioned, I think there's a
> great opportunity to bring together some of the work we've done to make
> time-based computation easier to reason about with the Beam community's
> work on scalable streaming computation.
>
> I'll be at the Beam Summit in NYC starting Wednesday and presenting a
> short overview of how we see Kaskada fitting into the Generative AI world
> at the "Generative AI Meetup" Wednesday afternoon - if the doc Ben linked
> to (or GenAI) is interesting to you and you'll be at the conference I'd
> love to touch base in person!
>
> -Ryan
>
> On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <bc...@apache.org> wrote:
>
>> Hello Beam!
>>
>> Kaskada has created a query language for expressing temporal queries,
>> making it easy to work with multiple streams and perform temporally
>> correct joins. We’re looking at taking our native, columnar execution
>> engine and making it available as a PTransform and FnHarness for use
>> with Apache Beam.
>>
>> We’ve drafted a [short document][proposal] outlining our planned
>> approach and the potential benefits to Kaskada and Beam users. It
>> would be super helpful to get some feedback on this approach and any
>> ways that it could be improved / better integrated with Beam to
>> provide more value!
>>
>> Could you see yourself using (or contributing) to this work? Let us know!
>>
>> Thanks!
>>
>> Ben
>>
>> [proposal]:
>> https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>>
>
>
> --
> *Ryan Michael *
> kerinin@gmail.com | 512.466.3662 <(512)%20466-3662> | github
> <https://github.com/kerinin> | linkedin
> <http://www.linkedin.com/pub/ryan-michael/21/41/a29>
>

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Posted by Ryan Michael <ke...@gmail.com>.
Hello, Beam (also)!

Just introducing myself - I'm Ryan and I've been working with Ben on the
Kaskada project for the past few years. As Ben mentioned, I think there's a
great opportunity to bring together some of the work we've done to make
time-based computation easier to reason about with the Beam community's
work on scalable streaming computation.

I'll be at the Beam Summit in NYC starting Wednesday and presenting a short
overview of how we see Kaskada fitting into the Generative AI world at the
"Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or
GenAI) is interesting to you and you'll be at the conference I'd love to
touch base in person!

-Ryan

On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <bc...@apache.org> wrote:

> Hello Beam!
>
> Kaskada has created a query language for expressing temporal queries,
> making it easy to work with multiple streams and perform temporally
> correct joins. We’re looking at taking our native, columnar execution
> engine and making it available as a PTransform and FnHarness for use
> with Apache Beam.
>
> We’ve drafted a [short document][proposal] outlining our planned
> approach and the potential benefits to Kaskada and Beam users. It
> would be super helpful to get some feedback on this approach and any
> ways that it could be improved / better integrated with Beam to
> provide more value!
>
> Could you see yourself using (or contributing) to this work? Let us know!
>
> Thanks!
>
> Ben
>
> [proposal]:
> https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>


-- 
*Ryan Michael *
kerinin@gmail.com | 512.466.3662 | github <https://github.com/kerinin> |
linkedin <http://www.linkedin.com/pub/ryan-michael/21/41/a29>

Re: [Proposal] Kaskada DSL and FnHarness for Temporal Queries

Posted by Ryan Michael <ke...@gmail.com>.
Hello, Beam (also)!

Just introducing myself - I'm Ryan and I've been working with Ben on the
Kaskada project for the past few years. As Ben mentioned, I think there's a
great opportunity to bring together some of the work we've done to make
time-based computation easier to reason about with the Beam community's
work on scalable streaming computation.

I'll be at the Beam Summit in NYC starting Wednesday and presenting a short
overview of how we see Kaskada fitting into the Generative AI world at the
"Generative AI Meetup" Wednesday afternoon - if the doc Ben linked to (or
GenAI) is interesting to you and you'll be at the conference I'd love to
touch base in person!

-Ryan

On Mon, Jun 12, 2023 at 2:51 PM Ben Chambers <bc...@apache.org> wrote:

> Hello Beam!
>
> Kaskada has created a query language for expressing temporal queries,
> making it easy to work with multiple streams and perform temporally
> correct joins. We’re looking at taking our native, columnar execution
> engine and making it available as a PTransform and FnHarness for use
> with Apache Beam.
>
> We’ve drafted a [short document][proposal] outlining our planned
> approach and the potential benefits to Kaskada and Beam users. It
> would be super helpful to get some feedback on this approach and any
> ways that it could be improved / better integrated with Beam to
> provide more value!
>
> Could you see yourself using (or contributing) to this work? Let us know!
>
> Thanks!
>
> Ben
>
> [proposal]:
> https://docs.google.com/document/d/1w6DYpYCi1c521AOh83JN3CB3C9pZwBPruUbawqH-NsA/edit
>


-- 
*Ryan Michael *
kerinin@gmail.com | 512.466.3662 | github <https://github.com/kerinin> |
linkedin <http://www.linkedin.com/pub/ryan-michael/21/41/a29>