You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Mohil Khare <mo...@prosimo.io> on 2020/04/07 00:03:30 UTC

Keeping keys in a state for a very very long time (keys expiry unknown)

Hello all,
We are attempting a implement a use case where beam (java sdk) reads two
kind of records from data stream like Kafka:

1. Records of type A containing key and corresponding metadata.
2. Records of type B containing the same key, but no metadata. Beam then
needs to fill metadata for records of type B  by doing a lookup for
metadata using keys received in records of type A.

Idea is to save metadata or rather state for keys received in records of
type A and then do a lookup when records of type B are received.
I have implemented this using the "@State" construct of beam. However my
problem is that we don't know when keys should expire. I don't think
keeping a global window will be a good idea as there could be many keys
(may be millions over a period of time) to be saved in a state.

What is the best way to achieve this? I was reading about RedisIO, but
found that it is still in the experimental stage. Can someone please
recommend any solution to achieve this.

Thanks and regards
Mohil

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Reza Rokni <re...@google.com>.
Nice!

On Mon, 18 May 2020, 05:52 Mohil Khare, <mo...@prosimo.io> wrote:

> Hi Reza and others,
> As suggested, I have opened
> https://issues.apache.org/jira/browse/BEAM-10019 which I think might be a
> good addition to beam pipeline patterns.
>
> Thanks
> Mohil
>
> On Mon, Apr 6, 2020 at 6:28 PM Mohil Khare <mo...@prosimo.io> wrote:
>
>> Sure thing.. I would love to contribute.
>>
>> Thanks
>> Mohil
>>
>>
>>
>> On Mon, Apr 6, 2020 at 6:17 PM Reza Ardeshir Rokni <ra...@gmail.com>
>> wrote:
>>
>>> Great! BTW if you get the time and wanted to contribute back to beam
>>> there is a nice section to record cool patterns:
>>>
>>> https://beam.apache.org/documentation/patterns/overview/
>>>
>>> This would make a great one!
>>>
>>> On Tue, 7 Apr 2020 at 09:12, Mohil Khare <mo...@prosimo.io> wrote:
>>>
>>>> No ... that's a valid answer. Since I wanted to have a long window size
>>>> per key and since we can't use state with session windows, I am using a
>>>> sliding window for let's say 72 hrs which starts every hour.
>>>>
>>>> Thanks a lot Reza for your input.
>>>>
>>>> Regards
>>>> Mohil
>>>>
>>>> On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Depends on the use case, Global state comes with the technical debt of
>>>>> having to do your own GC, but comes with more control. You could
>>>>> implement the pattern above with a long FixedWindow as well, which will
>>>>> take care of the GC within the window  bound.
>>>>>
>>>>> Sorry, its not a yes / no answer :-)
>>>>>
>>>>> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>>
>>>>>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>>>>>> external system after timer expiry makes sense.
>>>>>> So do you suggest using a global window for maintaining state ?
>>>>>>
>>>>>> Thanks and regards
>>>>>> Mohil
>>>>>>
>>>>>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Are you able to make use of the following pattern?
>>>>>>>
>>>>>>> Store StateA-metadata until no activity for Duration X, you can use
>>>>>>> a Timer to check this, then expire the value, but store in an
>>>>>>> external system. If you get a record that does want this value after
>>>>>>> expiry, call out to the external system and store the value again in key
>>>>>>> StateA-metadata.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>> We are attempting a implement a use case where beam (java sdk)
>>>>>>>> reads two kind of records from data stream like Kafka:
>>>>>>>>
>>>>>>>> 1. Records of type A containing key and corresponding metadata.
>>>>>>>> 2. Records of type B containing the same key, but no metadata. Beam
>>>>>>>> then needs to fill metadata for records of type B  by doing a lookup for
>>>>>>>> metadata using keys received in records of type A.
>>>>>>>>
>>>>>>>> Idea is to save metadata or rather state for keys received in
>>>>>>>> records of type A and then do a lookup when records of type B are received.
>>>>>>>> I have implemented this using the "@State" construct of beam.
>>>>>>>> However my problem is that we don't know when keys should expire. I don't
>>>>>>>> think keeping a global window will be a good idea as there could be many
>>>>>>>> keys (may be millions over a period of time) to be saved in a state.
>>>>>>>>
>>>>>>>> What is the best way to achieve this? I was reading about RedisIO,
>>>>>>>> but found that it is still in the experimental stage. Can someone please
>>>>>>>> recommend any solution to achieve this.
>>>>>>>>
>>>>>>>> Thanks and regards
>>>>>>>> Mohil
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Mohil Khare <mo...@prosimo.io>.
Hi Reza and others,
As suggested, I have opened
https://issues.apache.org/jira/browse/BEAM-10019 which
I think might be a good addition to beam pipeline patterns.

Thanks
Mohil

On Mon, Apr 6, 2020 at 6:28 PM Mohil Khare <mo...@prosimo.io> wrote:

> Sure thing.. I would love to contribute.
>
> Thanks
> Mohil
>
>
>
> On Mon, Apr 6, 2020 at 6:17 PM Reza Ardeshir Rokni <ra...@gmail.com>
> wrote:
>
>> Great! BTW if you get the time and wanted to contribute back to beam
>> there is a nice section to record cool patterns:
>>
>> https://beam.apache.org/documentation/patterns/overview/
>>
>> This would make a great one!
>>
>> On Tue, 7 Apr 2020 at 09:12, Mohil Khare <mo...@prosimo.io> wrote:
>>
>>> No ... that's a valid answer. Since I wanted to have a long window size
>>> per key and since we can't use state with session windows, I am using a
>>> sliding window for let's say 72 hrs which starts every hour.
>>>
>>> Thanks a lot Reza for your input.
>>>
>>> Regards
>>> Mohil
>>>
>>> On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>> wrote:
>>>
>>>> Depends on the use case, Global state comes with the technical debt of
>>>> having to do your own GC, but comes with more control. You could
>>>> implement the pattern above with a long FixedWindow as well, which will
>>>> take care of the GC within the window  bound.
>>>>
>>>> Sorry, its not a yes / no answer :-)
>>>>
>>>> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>
>>>>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>>>>> external system after timer expiry makes sense.
>>>>> So do you suggest using a global window for maintaining state ?
>>>>>
>>>>> Thanks and regards
>>>>> Mohil
>>>>>
>>>>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Are you able to make use of the following pattern?
>>>>>>
>>>>>> Store StateA-metadata until no activity for Duration X, you can use a
>>>>>> Timer to check this, then expire the value, but store in an
>>>>>> external system. If you get a record that does want this value after
>>>>>> expiry, call out to the external system and store the value again in key
>>>>>> StateA-metadata.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>> We are attempting a implement a use case where beam (java sdk) reads
>>>>>>> two kind of records from data stream like Kafka:
>>>>>>>
>>>>>>> 1. Records of type A containing key and corresponding metadata.
>>>>>>> 2. Records of type B containing the same key, but no metadata. Beam
>>>>>>> then needs to fill metadata for records of type B  by doing a lookup for
>>>>>>> metadata using keys received in records of type A.
>>>>>>>
>>>>>>> Idea is to save metadata or rather state for keys received in
>>>>>>> records of type A and then do a lookup when records of type B are received.
>>>>>>> I have implemented this using the "@State" construct of beam.
>>>>>>> However my problem is that we don't know when keys should expire. I don't
>>>>>>> think keeping a global window will be a good idea as there could be many
>>>>>>> keys (may be millions over a period of time) to be saved in a state.
>>>>>>>
>>>>>>> What is the best way to achieve this? I was reading about RedisIO,
>>>>>>> but found that it is still in the experimental stage. Can someone please
>>>>>>> recommend any solution to achieve this.
>>>>>>>
>>>>>>> Thanks and regards
>>>>>>> Mohil
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Mohil Khare <mo...@prosimo.io>.
Sure thing.. I would love to contribute.

Thanks
Mohil



On Mon, Apr 6, 2020 at 6:17 PM Reza Ardeshir Rokni <ra...@gmail.com>
wrote:

> Great! BTW if you get the time and wanted to contribute back to beam there
> is a nice section to record cool patterns:
>
> https://beam.apache.org/documentation/patterns/overview/
>
> This would make a great one!
>
> On Tue, 7 Apr 2020 at 09:12, Mohil Khare <mo...@prosimo.io> wrote:
>
>> No ... that's a valid answer. Since I wanted to have a long window size
>> per key and since we can't use state with session windows, I am using a
>> sliding window for let's say 72 hrs which starts every hour.
>>
>> Thanks a lot Reza for your input.
>>
>> Regards
>> Mohil
>>
>> On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <ra...@gmail.com>
>> wrote:
>>
>>> Depends on the use case, Global state comes with the technical debt of
>>> having to do your own GC, but comes with more control. You could
>>> implement the pattern above with a long FixedWindow as well, which will
>>> take care of the GC within the window  bound.
>>>
>>> Sorry, its not a yes / no answer :-)
>>>
>>> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>
>>>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>>>> external system after timer expiry makes sense.
>>>> So do you suggest using a global window for maintaining state ?
>>>>
>>>> Thanks and regards
>>>> Mohil
>>>>
>>>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are you able to make use of the following pattern?
>>>>>
>>>>> Store StateA-metadata until no activity for Duration X, you can use a
>>>>> Timer to check this, then expire the value, but store in an
>>>>> external system. If you get a record that does want this value after
>>>>> expiry, call out to the external system and store the value again in key
>>>>> StateA-metadata.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>>
>>>>>> Hello all,
>>>>>> We are attempting a implement a use case where beam (java sdk) reads
>>>>>> two kind of records from data stream like Kafka:
>>>>>>
>>>>>> 1. Records of type A containing key and corresponding metadata.
>>>>>> 2. Records of type B containing the same key, but no metadata. Beam
>>>>>> then needs to fill metadata for records of type B  by doing a lookup for
>>>>>> metadata using keys received in records of type A.
>>>>>>
>>>>>> Idea is to save metadata or rather state for keys received in records
>>>>>> of type A and then do a lookup when records of type B are received.
>>>>>> I have implemented this using the "@State" construct of beam. However
>>>>>> my problem is that we don't know when keys should expire. I don't think
>>>>>> keeping a global window will be a good idea as there could be many keys
>>>>>> (may be millions over a period of time) to be saved in a state.
>>>>>>
>>>>>> What is the best way to achieve this? I was reading about RedisIO,
>>>>>> but found that it is still in the experimental stage. Can someone please
>>>>>> recommend any solution to achieve this.
>>>>>>
>>>>>> Thanks and regards
>>>>>> Mohil
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Reza Ardeshir Rokni <ra...@gmail.com>.
Great! BTW if you get the time and wanted to contribute back to beam there
is a nice section to record cool patterns:

https://beam.apache.org/documentation/patterns/overview/

This would make a great one!

On Tue, 7 Apr 2020 at 09:12, Mohil Khare <mo...@prosimo.io> wrote:

> No ... that's a valid answer. Since I wanted to have a long window size
> per key and since we can't use state with session windows, I am using a
> sliding window for let's say 72 hrs which starts every hour.
>
> Thanks a lot Reza for your input.
>
> Regards
> Mohil
>
> On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <ra...@gmail.com>
> wrote:
>
>> Depends on the use case, Global state comes with the technical debt of
>> having to do your own GC, but comes with more control. You could
>> implement the pattern above with a long FixedWindow as well, which will
>> take care of the GC within the window  bound.
>>
>> Sorry, its not a yes / no answer :-)
>>
>> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:
>>
>>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>>> external system after timer expiry makes sense.
>>> So do you suggest using a global window for maintaining state ?
>>>
>>> Thanks and regards
>>> Mohil
>>>
>>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
>>> wrote:
>>>
>>>> Are you able to make use of the following pattern?
>>>>
>>>> Store StateA-metadata until no activity for Duration X, you can use a
>>>> Timer to check this, then expire the value, but store in an
>>>> external system. If you get a record that does want this value after
>>>> expiry, call out to the external system and store the value again in key
>>>> StateA-metadata.
>>>>
>>>> Cheers
>>>>
>>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>>
>>>>> Hello all,
>>>>> We are attempting a implement a use case where beam (java sdk) reads
>>>>> two kind of records from data stream like Kafka:
>>>>>
>>>>> 1. Records of type A containing key and corresponding metadata.
>>>>> 2. Records of type B containing the same key, but no metadata. Beam
>>>>> then needs to fill metadata for records of type B  by doing a lookup for
>>>>> metadata using keys received in records of type A.
>>>>>
>>>>> Idea is to save metadata or rather state for keys received in records
>>>>> of type A and then do a lookup when records of type B are received.
>>>>> I have implemented this using the "@State" construct of beam. However
>>>>> my problem is that we don't know when keys should expire. I don't think
>>>>> keeping a global window will be a good idea as there could be many keys
>>>>> (may be millions over a period of time) to be saved in a state.
>>>>>
>>>>> What is the best way to achieve this? I was reading about RedisIO, but
>>>>> found that it is still in the experimental stage. Can someone please
>>>>> recommend any solution to achieve this.
>>>>>
>>>>> Thanks and regards
>>>>> Mohil
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Mohil Khare <mo...@prosimo.io>.
No ... that's a valid answer. Since I wanted to have a long window size per
key and since we can't use state with session windows, I am using a sliding
window for let's say 72 hrs which starts every hour.

Thanks a lot Reza for your input.

Regards
Mohil

On Mon, Apr 6, 2020 at 6:09 PM Reza Ardeshir Rokni <ra...@gmail.com>
wrote:

> Depends on the use case, Global state comes with the technical debt of
> having to do your own GC, but comes with more control. You could
> implement the pattern above with a long FixedWindow as well, which will
> take care of the GC within the window  bound.
>
> Sorry, its not a yes / no answer :-)
>
> On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:
>
>> Thanks a lot Reza for your quick response. Yeah saving the data in an
>> external system after timer expiry makes sense.
>> So do you suggest using a global window for maintaining state ?
>>
>> Thanks and regards
>> Mohil
>>
>> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
>> wrote:
>>
>>> Are you able to make use of the following pattern?
>>>
>>> Store StateA-metadata until no activity for Duration X, you can use a
>>> Timer to check this, then expire the value, but store in an
>>> external system. If you get a record that does want this value after
>>> expiry, call out to the external system and store the value again in key
>>> StateA-metadata.
>>>
>>> Cheers
>>>
>>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>>
>>>> Hello all,
>>>> We are attempting a implement a use case where beam (java sdk) reads
>>>> two kind of records from data stream like Kafka:
>>>>
>>>> 1. Records of type A containing key and corresponding metadata.
>>>> 2. Records of type B containing the same key, but no metadata. Beam
>>>> then needs to fill metadata for records of type B  by doing a lookup for
>>>> metadata using keys received in records of type A.
>>>>
>>>> Idea is to save metadata or rather state for keys received in records
>>>> of type A and then do a lookup when records of type B are received.
>>>> I have implemented this using the "@State" construct of beam. However
>>>> my problem is that we don't know when keys should expire. I don't think
>>>> keeping a global window will be a good idea as there could be many keys
>>>> (may be millions over a period of time) to be saved in a state.
>>>>
>>>> What is the best way to achieve this? I was reading about RedisIO, but
>>>> found that it is still in the experimental stage. Can someone please
>>>> recommend any solution to achieve this.
>>>>
>>>> Thanks and regards
>>>> Mohil
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Reza Ardeshir Rokni <ra...@gmail.com>.
Depends on the use case, Global state comes with the technical debt of
having to do your own GC, but comes with more control. You could
implement the pattern above with a long FixedWindow as well, which will
take care of the GC within the window  bound.

Sorry, its not a yes / no answer :-)

On Tue, 7 Apr 2020 at 09:03, Mohil Khare <mo...@prosimo.io> wrote:

> Thanks a lot Reza for your quick response. Yeah saving the data in an
> external system after timer expiry makes sense.
> So do you suggest using a global window for maintaining state ?
>
> Thanks and regards
> Mohil
>
> On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
> wrote:
>
>> Are you able to make use of the following pattern?
>>
>> Store StateA-metadata until no activity for Duration X, you can use a
>> Timer to check this, then expire the value, but store in an
>> external system. If you get a record that does want this value after
>> expiry, call out to the external system and store the value again in key
>> StateA-metadata.
>>
>> Cheers
>>
>> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>>
>>> Hello all,
>>> We are attempting a implement a use case where beam (java sdk) reads two
>>> kind of records from data stream like Kafka:
>>>
>>> 1. Records of type A containing key and corresponding metadata.
>>> 2. Records of type B containing the same key, but no metadata. Beam then
>>> needs to fill metadata for records of type B  by doing a lookup for
>>> metadata using keys received in records of type A.
>>>
>>> Idea is to save metadata or rather state for keys received in records of
>>> type A and then do a lookup when records of type B are received.
>>> I have implemented this using the "@State" construct of beam. However my
>>> problem is that we don't know when keys should expire. I don't think
>>> keeping a global window will be a good idea as there could be many keys
>>> (may be millions over a period of time) to be saved in a state.
>>>
>>> What is the best way to achieve this? I was reading about RedisIO, but
>>> found that it is still in the experimental stage. Can someone please
>>> recommend any solution to achieve this.
>>>
>>> Thanks and regards
>>> Mohil
>>>
>>>
>>>
>>>
>>>
>>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Mohil Khare <mo...@prosimo.io>.
Thanks a lot Reza for your quick response. Yeah saving the data in an
external system after timer expiry makes sense.
So do you suggest using a global window for maintaining state ?

Thanks and regards
Mohil

On Mon, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni <ra...@gmail.com>
wrote:

> Are you able to make use of the following pattern?
>
> Store StateA-metadata until no activity for Duration X, you can use a
> Timer to check this, then expire the value, but store in an
> external system. If you get a record that does want this value after
> expiry, call out to the external system and store the value again in key
> StateA-metadata.
>
> Cheers
>
> On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:
>
>> Hello all,
>> We are attempting a implement a use case where beam (java sdk) reads two
>> kind of records from data stream like Kafka:
>>
>> 1. Records of type A containing key and corresponding metadata.
>> 2. Records of type B containing the same key, but no metadata. Beam then
>> needs to fill metadata for records of type B  by doing a lookup for
>> metadata using keys received in records of type A.
>>
>> Idea is to save metadata or rather state for keys received in records of
>> type A and then do a lookup when records of type B are received.
>> I have implemented this using the "@State" construct of beam. However my
>> problem is that we don't know when keys should expire. I don't think
>> keeping a global window will be a good idea as there could be many keys
>> (may be millions over a period of time) to be saved in a state.
>>
>> What is the best way to achieve this? I was reading about RedisIO, but
>> found that it is still in the experimental stage. Can someone please
>> recommend any solution to achieve this.
>>
>> Thanks and regards
>> Mohil
>>
>>
>>
>>
>>
>>

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

Posted by Reza Ardeshir Rokni <ra...@gmail.com>.
Are you able to make use of the following pattern?

Store StateA-metadata until no activity for Duration X, you can use a Timer
to check this, then expire the value, but store in an external system. If
you get a record that does want this value after expiry, call out to the
external system and store the value again in key StateA-metadata.

Cheers

On Tue, 7 Apr 2020 at 08:03, Mohil Khare <mo...@prosimo.io> wrote:

> Hello all,
> We are attempting a implement a use case where beam (java sdk) reads two
> kind of records from data stream like Kafka:
>
> 1. Records of type A containing key and corresponding metadata.
> 2. Records of type B containing the same key, but no metadata. Beam then
> needs to fill metadata for records of type B  by doing a lookup for
> metadata using keys received in records of type A.
>
> Idea is to save metadata or rather state for keys received in records of
> type A and then do a lookup when records of type B are received.
> I have implemented this using the "@State" construct of beam. However my
> problem is that we don't know when keys should expire. I don't think
> keeping a global window will be a good idea as there could be many keys
> (may be millions over a period of time) to be saved in a state.
>
> What is the best way to achieve this? I was reading about RedisIO, but
> found that it is still in the experimental stage. Can someone please
> recommend any solution to achieve this.
>
> Thanks and regards
> Mohil
>
>
>
>
>
>