You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by HG <ha...@gmail.com> on 2022/01/06 14:24:50 UTC

adding elapsed times to events that form a transaction

Hello all,

My question is basically whether it is possible to group events by a key
(these will belong to a specific transaction) and then calculate the
elapsed times between them based on a timestamp that is present in the
event.
So a transaction my have x events all timestamped and with the
transaction_id as key.
Is it possible to
1. group them by the key
2. order by the timestamp,
3. calculate the elapsed times between the steps/event
4. add that elapsed time to the step/event
5. output the modified events to the sink



Regards Hans

Re: adding elapsed times to events that form a transaction

Posted by David Anderson <da...@apache.org>.
One way to solve this with Flink SQL would be to use MATCH_RECOGNIZE. [1]
is an example illustrating a very similar use case.

[1] https://stackoverflow.com/a/62122751/2000823

On Fri, Jan 7, 2022 at 11:32 AM Ali Bahadir Zeybek <al...@ververica.com>
wrote:

> Hello Hans,
>
> If you would like to see some hands-on examples which showcases the
> capabilities of Flink, I would suggest you follow the training
> exercises[1].
> To be more specific, checkpointing[2] example implements a similar logic to
> what you have described.
>
> Sincerely,
>
> Ali
>
> [1]: https://github.com/ververica/flink-training
> [2]:
> https://github.com/ververica/flink-training/tree/master/troubleshooting/checkpointing
>
> On Fri, Jan 7, 2022 at 1:13 PM Francesco Guardiani <
> francesco@ververica.com> wrote:
>
>> So in Flink we essentially have 2 main APIs to define stream topologies:
>> one is DataStream and the other one is Table API. My guess is that right
>> now you're trying to use DataStream with the Kafka connector.
>>
>> DataStream allows you to statically define a stream topology, with an API
>> in a similar fashion to Java Streams or RxJava.
>> Table API on the other hand gives you the ability to define stream jobs
>> using SQL, where you can easily perform operations such as joins over
>> windows.
>>
>> Flink is definitely able to solve your use case, with both APIs. You can
>> also mix these two APIs in your application to solve your use case in the
>> way you want.
>> I suggest you start by looking at the documentation of Table API
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/
>> and then, for your specific use case, check
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/
>> .
>>
>> Hope it helps.
>> FG
>>
>> On Fri, Jan 7, 2022 at 10:58 AM HG <ha...@gmail.com> wrote:
>>
>>> Hi Francesco.
>>>
>>> I am not using anything right now apart from Kafka.
>>> Just need to know whether Flink is capable of doing this and trying to
>>> understand the documentation and terminology etc.
>>> I grapple a bit to understand the whole picture.
>>>
>>> Thanks
>>>
>>> Regards Hans
>>>
>>> Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
>>> francesco@ververica.com>:
>>>
>>>> Hi,
>>>> Are you using SQL or DataStream? For SQL you can use the Window TVF
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
>>>> feature, where the window size is the "max" elapsed time, and then inside
>>>> the window you pick the beginning and end event and join them.
>>>>
>>>> Hope it helps,
>>>> FG
>>>>
>>>> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> My question is basically whether it is possible to group events by a
>>>>> key (these will belong to a specific transaction) and then calculate the
>>>>> elapsed times between them based on a timestamp that is present in the
>>>>> event.
>>>>> So a transaction my have x events all timestamped and with the
>>>>> transaction_id as key.
>>>>> Is it possible to
>>>>> 1. group them by the key
>>>>> 2. order by the timestamp,
>>>>> 3. calculate the elapsed times between the steps/event
>>>>> 4. add that elapsed time to the step/event
>>>>> 5. output the modified events to the sink
>>>>>
>>>>>
>>>>>
>>>>> Regards Hans
>>>>>
>>>>

Re: adding elapsed times to events that form a transaction

Posted by HG <ha...@gmail.com>.
I am watching a ververica youtube playlist first
Already did the rides-and-fares stuff.

Will certainly look into these.

Thanks Ali


Op vr 7 jan. 2022 om 11:32 schreef Ali Bahadir Zeybek <al...@ververica.com>:

> Hello Hans,
>
> If you would like to see some hands-on examples which showcases the
> capabilities of Flink, I would suggest you follow the training
> exercises[1].
> To be more specific, checkpointing[2] example implements a similar logic to
> what you have described.
>
> Sincerely,
>
> Ali
>
> [1]: https://github.com/ververica/flink-training
> [2]:
> https://github.com/ververica/flink-training/tree/master/troubleshooting/checkpointing
>
> On Fri, Jan 7, 2022 at 1:13 PM Francesco Guardiani <
> francesco@ververica.com> wrote:
>
>> So in Flink we essentially have 2 main APIs to define stream topologies:
>> one is DataStream and the other one is Table API. My guess is that right
>> now you're trying to use DataStream with the Kafka connector.
>>
>> DataStream allows you to statically define a stream topology, with an API
>> in a similar fashion to Java Streams or RxJava.
>> Table API on the other hand gives you the ability to define stream jobs
>> using SQL, where you can easily perform operations such as joins over
>> windows.
>>
>> Flink is definitely able to solve your use case, with both APIs. You can
>> also mix these two APIs in your application to solve your use case in the
>> way you want.
>> I suggest you start by looking at the documentation of Table API
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/
>> and then, for your specific use case, check
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/
>> .
>>
>> Hope it helps.
>> FG
>>
>> On Fri, Jan 7, 2022 at 10:58 AM HG <ha...@gmail.com> wrote:
>>
>>> Hi Francesco.
>>>
>>> I am not using anything right now apart from Kafka.
>>> Just need to know whether Flink is capable of doing this and trying to
>>> understand the documentation and terminology etc.
>>> I grapple a bit to understand the whole picture.
>>>
>>> Thanks
>>>
>>> Regards Hans
>>>
>>> Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
>>> francesco@ververica.com>:
>>>
>>>> Hi,
>>>> Are you using SQL or DataStream? For SQL you can use the Window TVF
>>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
>>>> feature, where the window size is the "max" elapsed time, and then inside
>>>> the window you pick the beginning and end event and join them.
>>>>
>>>> Hope it helps,
>>>> FG
>>>>
>>>> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> My question is basically whether it is possible to group events by a
>>>>> key (these will belong to a specific transaction) and then calculate the
>>>>> elapsed times between them based on a timestamp that is present in the
>>>>> event.
>>>>> So a transaction my have x events all timestamped and with the
>>>>> transaction_id as key.
>>>>> Is it possible to
>>>>> 1. group them by the key
>>>>> 2. order by the timestamp,
>>>>> 3. calculate the elapsed times between the steps/event
>>>>> 4. add that elapsed time to the step/event
>>>>> 5. output the modified events to the sink
>>>>>
>>>>>
>>>>>
>>>>> Regards Hans
>>>>>
>>>>

Re: adding elapsed times to events that form a transaction

Posted by Ali Bahadir Zeybek <al...@ververica.com>.
Hello Hans,

If you would like to see some hands-on examples which showcases the
capabilities of Flink, I would suggest you follow the training exercises[1].
To be more specific, checkpointing[2] example implements a similar logic to
what you have described.

Sincerely,

Ali

[1]: https://github.com/ververica/flink-training
[2]:
https://github.com/ververica/flink-training/tree/master/troubleshooting/checkpointing

On Fri, Jan 7, 2022 at 1:13 PM Francesco Guardiani <fr...@ververica.com>
wrote:

> So in Flink we essentially have 2 main APIs to define stream topologies:
> one is DataStream and the other one is Table API. My guess is that right
> now you're trying to use DataStream with the Kafka connector.
>
> DataStream allows you to statically define a stream topology, with an API
> in a similar fashion to Java Streams or RxJava.
> Table API on the other hand gives you the ability to define stream jobs
> using SQL, where you can easily perform operations such as joins over
> windows.
>
> Flink is definitely able to solve your use case, with both APIs. You can
> also mix these two APIs in your application to solve your use case in the
> way you want.
> I suggest you start by looking at the documentation of Table API
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/
> and then, for your specific use case, check
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/
> .
>
> Hope it helps.
> FG
>
> On Fri, Jan 7, 2022 at 10:58 AM HG <ha...@gmail.com> wrote:
>
>> Hi Francesco.
>>
>> I am not using anything right now apart from Kafka.
>> Just need to know whether Flink is capable of doing this and trying to
>> understand the documentation and terminology etc.
>> I grapple a bit to understand the whole picture.
>>
>> Thanks
>>
>> Regards Hans
>>
>> Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
>> francesco@ververica.com>:
>>
>>> Hi,
>>> Are you using SQL or DataStream? For SQL you can use the Window TVF
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
>>> feature, where the window size is the "max" elapsed time, and then inside
>>> the window you pick the beginning and end event and join them.
>>>
>>> Hope it helps,
>>> FG
>>>
>>> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> My question is basically whether it is possible to group events by a
>>>> key (these will belong to a specific transaction) and then calculate the
>>>> elapsed times between them based on a timestamp that is present in the
>>>> event.
>>>> So a transaction my have x events all timestamped and with the
>>>> transaction_id as key.
>>>> Is it possible to
>>>> 1. group them by the key
>>>> 2. order by the timestamp,
>>>> 3. calculate the elapsed times between the steps/event
>>>> 4. add that elapsed time to the step/event
>>>> 5. output the modified events to the sink
>>>>
>>>>
>>>>
>>>> Regards Hans
>>>>
>>>

Re: adding elapsed times to events that form a transaction

Posted by HG <ha...@gmail.com>.
Super
Then it will not be a waste of time to learn flink.

Thanks!

Op vr 7 jan. 2022 om 11:13 schreef Francesco Guardiani <
francesco@ververica.com>:

> So in Flink we essentially have 2 main APIs to define stream topologies:
> one is DataStream and the other one is Table API. My guess is that right
> now you're trying to use DataStream with the Kafka connector.
>
> DataStream allows you to statically define a stream topology, with an API
> in a similar fashion to Java Streams or RxJava.
> Table API on the other hand gives you the ability to define stream jobs
> using SQL, where you can easily perform operations such as joins over
> windows.
>
> Flink is definitely able to solve your use case, with both APIs. You can
> also mix these two APIs in your application to solve your use case in the
> way you want.
> I suggest you start by looking at the documentation of Table API
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/
> and then, for your specific use case, check
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/
> .
>
> Hope it helps.
> FG
>
> On Fri, Jan 7, 2022 at 10:58 AM HG <ha...@gmail.com> wrote:
>
>> Hi Francesco.
>>
>> I am not using anything right now apart from Kafka.
>> Just need to know whether Flink is capable of doing this and trying to
>> understand the documentation and terminology etc.
>> I grapple a bit to understand the whole picture.
>>
>> Thanks
>>
>> Regards Hans
>>
>> Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
>> francesco@ververica.com>:
>>
>>> Hi,
>>> Are you using SQL or DataStream? For SQL you can use the Window TVF
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
>>> feature, where the window size is the "max" elapsed time, and then inside
>>> the window you pick the beginning and end event and join them.
>>>
>>> Hope it helps,
>>> FG
>>>
>>> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> My question is basically whether it is possible to group events by a
>>>> key (these will belong to a specific transaction) and then calculate the
>>>> elapsed times between them based on a timestamp that is present in the
>>>> event.
>>>> So a transaction my have x events all timestamped and with the
>>>> transaction_id as key.
>>>> Is it possible to
>>>> 1. group them by the key
>>>> 2. order by the timestamp,
>>>> 3. calculate the elapsed times between the steps/event
>>>> 4. add that elapsed time to the step/event
>>>> 5. output the modified events to the sink
>>>>
>>>>
>>>>
>>>> Regards Hans
>>>>
>>>

Re: adding elapsed times to events that form a transaction

Posted by Francesco Guardiani <fr...@ververica.com>.
So in Flink we essentially have 2 main APIs to define stream topologies:
one is DataStream and the other one is Table API. My guess is that right
now you're trying to use DataStream with the Kafka connector.

DataStream allows you to statically define a stream topology, with an API
in a similar fashion to Java Streams or RxJava.
Table API on the other hand gives you the ability to define stream jobs
using SQL, where you can easily perform operations such as joins over
windows.

Flink is definitely able to solve your use case, with both APIs. You can
also mix these two APIs in your application to solve your use case in the
way you want.
I suggest you start by looking at the documentation of Table API
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/overview/
and then, for your specific use case, check
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/
.

Hope it helps.
FG

On Fri, Jan 7, 2022 at 10:58 AM HG <ha...@gmail.com> wrote:

> Hi Francesco.
>
> I am not using anything right now apart from Kafka.
> Just need to know whether Flink is capable of doing this and trying to
> understand the documentation and terminology etc.
> I grapple a bit to understand the whole picture.
>
> Thanks
>
> Regards Hans
>
> Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
> francesco@ververica.com>:
>
>> Hi,
>> Are you using SQL or DataStream? For SQL you can use the Window TVF
>> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
>> feature, where the window size is the "max" elapsed time, and then inside
>> the window you pick the beginning and end event and join them.
>>
>> Hope it helps,
>> FG
>>
>> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> My question is basically whether it is possible to group events by a key
>>> (these will belong to a specific transaction) and then calculate the
>>> elapsed times between them based on a timestamp that is present in the
>>> event.
>>> So a transaction my have x events all timestamped and with the
>>> transaction_id as key.
>>> Is it possible to
>>> 1. group them by the key
>>> 2. order by the timestamp,
>>> 3. calculate the elapsed times between the steps/event
>>> 4. add that elapsed time to the step/event
>>> 5. output the modified events to the sink
>>>
>>>
>>>
>>> Regards Hans
>>>
>>

Re: adding elapsed times to events that form a transaction

Posted by HG <ha...@gmail.com>.
Hi Francesco.

I am not using anything right now apart from Kafka.
Just need to know whether Flink is capable of doing this and trying to
understand the documentation and terminology etc.
I grapple a bit to understand the whole picture.

Thanks

Regards Hans

Op vr 7 jan. 2022 om 09:24 schreef Francesco Guardiani <
francesco@ververica.com>:

> Hi,
> Are you using SQL or DataStream? For SQL you can use the Window TVF
> <https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
> feature, where the window size is the "max" elapsed time, and then inside
> the window you pick the beginning and end event and join them.
>
> Hope it helps,
> FG
>
> On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:
>
>> Hello all,
>>
>> My question is basically whether it is possible to group events by a key
>> (these will belong to a specific transaction) and then calculate the
>> elapsed times between them based on a timestamp that is present in the
>> event.
>> So a transaction my have x events all timestamped and with the
>> transaction_id as key.
>> Is it possible to
>> 1. group them by the key
>> 2. order by the timestamp,
>> 3. calculate the elapsed times between the steps/event
>> 4. add that elapsed time to the step/event
>> 5. output the modified events to the sink
>>
>>
>>
>> Regards Hans
>>
>

Re: adding elapsed times to events that form a transaction

Posted by Francesco Guardiani <fr...@ververica.com>.
Hi,
Are you using SQL or DataStream? For SQL you can use the Window TVF
<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/window-tvf/>
feature, where the window size is the "max" elapsed time, and then inside
the window you pick the beginning and end event and join them.

Hope it helps,
FG

On Thu, Jan 6, 2022 at 3:25 PM HG <ha...@gmail.com> wrote:

> Hello all,
>
> My question is basically whether it is possible to group events by a key
> (these will belong to a specific transaction) and then calculate the
> elapsed times between them based on a timestamp that is present in the
> event.
> So a transaction my have x events all timestamped and with the
> transaction_id as key.
> Is it possible to
> 1. group them by the key
> 2. order by the timestamp,
> 3. calculate the elapsed times between the steps/event
> 4. add that elapsed time to the step/event
> 5. output the modified events to the sink
>
>
>
> Regards Hans
>