You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Rex Fenley <Re...@remind101.com> on 2020/12/08 20:29:43 UTC

How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Hello,

I'd like to better understand delete behavior of AggregateFunctions. Let's
assume there's an aggregate of `user_id` to a set of `group_ids` for groups
belonging to that user.
`user_id_1 -> [group_id_1, group_id_2, etc.]`
Now let's assume sometime later that deletes arrive for all rows which
produce user_id_1's group_id's.

Would the aggregate function completely delete the associated state from
RocksDB or would it leave something like `user_id_1 -> []` sitting in
RocksDB forever?

We have an aggregate similar to this where users could delete themselves
and we want to make sure we're not accumulating data forever for those
users.

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Rex Fenley <Re...@remind101.com>.
Hello,

I still don't have a good understanding of how UDAF in the Table API
handles deletes. If every row aggregated into one groupBy(key) gets a
retract, meaning nothing should be grouped by that key, will the state get
deleted? Is there a way to delete the state for that row i.e. forward a
retract but not an append and remove the state from RocksDB?

Thanks!

On Fri, Dec 11, 2020 at 9:15 AM Rex Fenley <Re...@remind101.com> wrote:

> Hi,
>
> Does this question make sense or am I missing something?
>
> Thanks!
>
> On Thu, Dec 10, 2020 at 10:24 AM Rex Fenley <Re...@remind101.com> wrote:
>
>> Ok, that makes sense.
>>
>> > You just need to correct the acc state to what it expects to be (say
>> re-evaluate the acc without the record that needs retraction) when you
>> received  the retraction message.
>>
>> So for example, if i just remove all items from acc.groupIdSet on
>> retraction it will know to clear out the state entirely from rocks?
>>
>> If a user gets deleted altogether (and my groupby is on user_id) what
>> sort of retraction do I need to evaluate then? Because I'm thinking now it
>> will need to just delete the state entirely and pass a full retraction of
>> the state downstream, but I don't know if deleting state from rocks happens
>> automatically or I need to make it do that in the retract method somehow.
>>
>> On Wed, Dec 9, 2020 at 6:16 PM Danny Chan <da...@apache.org> wrote:
>>
>>> No, the group agg, stream-stream join and rank are all stateful
>>> operators which need a state-backend to bookkeep the acc values.
>>>
>>> But it is only required to emit the retractions when the stateful
>>> operator A has a downstream operator B that is also stateful, because the B
>>> needs the retractions to correct the accs. If B is not stateful, just
>>> emitting the new record to override is enough.
>>>
>>> You just need to correct the acc state to what it expects to be (say
>>> re-evaluate the acc without the record that needs retraction) when you
>>> received  the retraction message.
>>>
>>> Rex Fenley <Re...@remind101.com> 于2020年12月10日周四 上午2:44写道:
>>>
>>>> So from what I'm understanding, the aggregate itself is not a "stateful
>>>> operator" but one may follow it? How does the aggregate accumulator keep
>>>> old values then? It can't all just live in memory, actually, looking at the
>>>> savepoints it looks like there's state associated with our aggregate
>>>> operator.
>>>>
>>>> To clarify my concern too, in my retract function impl in the aggregate
>>>> function class, all I do is remove a value (a group id) from the
>>>> accumulator set (which is an array). For example, if there is only 1
>>>> group_id left for a user and it gets deleted, that group_id will be removed
>>>> from the accumulator set and the set will be empty. I would hope that at
>>>> that point, given that there are no remaining rows for the aggregate, that
>>>> I could or flink will just delete the associated stored accumulator
>>>> altogether i.e. delete `user_id_1 -> []`. Is it possible that both the
>>>> groups and the user need to be deleted for everything to clear from
>>>> storage? That might make more sense actually..
>>>>
>>>> If this doesn't happen, since users delete themselves and their groups
>>>> all the time, we'll be storing all these empty data sets in rocks for no
>>>> reason. To clarify, we're using Debezium as our source and using Flink as a
>>>> materialization engine, so we never want to explicitly set a timeout on any
>>>> of our data, we just want to scale up predictably with our user growth.
>>>>
>>>> Thanks!
>>>>
>>>> On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <da...@apache.org> wrote:
>>>>
>>>>> Hi, Rex Fenley ~
>>>>>
>>>>> If there is stateful operator as the output of the aggregate function.
>>>>> Then each time the function receives an update (or delete) for the key, the
>>>>> agg operator would emit 2 messages, one for retracting the old record, one
>>>>> for the new message. For your case, the new message is the DELETE.
>>>>>
>>>>> If there is no stateful operator, the aggregate operator would just
>>>>> emit the update after (the new) message which is the delete.
>>>>>
>>>>> Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'd like to better understand delete behavior of AggregateFunctions.
>>>>>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` for
>>>>>> groups belonging to that user.
>>>>>> `user_id_1 -> [group_id_1, group_id_2, etc.]`
>>>>>> Now let's assume sometime later that deletes arrive for all rows
>>>>>> which produce user_id_1's group_id's.
>>>>>>
>>>>>> Would the aggregate function completely delete the associated state
>>>>>> from RocksDB or would it leave something like `user_id_1 -> []` sitting in
>>>>>> RocksDB forever?
>>>>>>
>>>>>> We have an aggregate similar to this where users could delete
>>>>>> themselves and we want to make sure we're not accumulating data forever for
>>>>>> those users.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>>>
>>>>>>
>>>>>> Remind.com <https://www.remind.com/> |  BLOG
>>>>>> <http://blog.remind.com/>  |  FOLLOW US
>>>>>> <https://twitter.com/remindhq>  |  LIKE US
>>>>>> <https://www.facebook.com/remindhq>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Rex Fenley <Re...@remind101.com>.
Hi,

Does this question make sense or am I missing something?

Thanks!

On Thu, Dec 10, 2020 at 10:24 AM Rex Fenley <Re...@remind101.com> wrote:

> Ok, that makes sense.
>
> > You just need to correct the acc state to what it expects to be (say
> re-evaluate the acc without the record that needs retraction) when you
> received  the retraction message.
>
> So for example, if i just remove all items from acc.groupIdSet on
> retraction it will know to clear out the state entirely from rocks?
>
> If a user gets deleted altogether (and my groupby is on user_id) what sort
> of retraction do I need to evaluate then? Because I'm thinking now it will
> need to just delete the state entirely and pass a full retraction of the
> state downstream, but I don't know if deleting state from rocks happens
> automatically or I need to make it do that in the retract method somehow.
>
> On Wed, Dec 9, 2020 at 6:16 PM Danny Chan <da...@apache.org> wrote:
>
>> No, the group agg, stream-stream join and rank are all stateful
>> operators which need a state-backend to bookkeep the acc values.
>>
>> But it is only required to emit the retractions when the stateful
>> operator A has a downstream operator B that is also stateful, because the B
>> needs the retractions to correct the accs. If B is not stateful, just
>> emitting the new record to override is enough.
>>
>> You just need to correct the acc state to what it expects to be (say
>> re-evaluate the acc without the record that needs retraction) when you
>> received  the retraction message.
>>
>> Rex Fenley <Re...@remind101.com> 于2020年12月10日周四 上午2:44写道:
>>
>>> So from what I'm understanding, the aggregate itself is not a "stateful
>>> operator" but one may follow it? How does the aggregate accumulator keep
>>> old values then? It can't all just live in memory, actually, looking at the
>>> savepoints it looks like there's state associated with our aggregate
>>> operator.
>>>
>>> To clarify my concern too, in my retract function impl in the aggregate
>>> function class, all I do is remove a value (a group id) from the
>>> accumulator set (which is an array). For example, if there is only 1
>>> group_id left for a user and it gets deleted, that group_id will be removed
>>> from the accumulator set and the set will be empty. I would hope that at
>>> that point, given that there are no remaining rows for the aggregate, that
>>> I could or flink will just delete the associated stored accumulator
>>> altogether i.e. delete `user_id_1 -> []`. Is it possible that both the
>>> groups and the user need to be deleted for everything to clear from
>>> storage? That might make more sense actually..
>>>
>>> If this doesn't happen, since users delete themselves and their groups
>>> all the time, we'll be storing all these empty data sets in rocks for no
>>> reason. To clarify, we're using Debezium as our source and using Flink as a
>>> materialization engine, so we never want to explicitly set a timeout on any
>>> of our data, we just want to scale up predictably with our user growth.
>>>
>>> Thanks!
>>>
>>> On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <da...@apache.org> wrote:
>>>
>>>> Hi, Rex Fenley ~
>>>>
>>>> If there is stateful operator as the output of the aggregate function.
>>>> Then each time the function receives an update (or delete) for the key, the
>>>> agg operator would emit 2 messages, one for retracting the old record, one
>>>> for the new message. For your case, the new message is the DELETE.
>>>>
>>>> If there is no stateful operator, the aggregate operator would just
>>>> emit the update after (the new) message which is the delete.
>>>>
>>>> Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'd like to better understand delete behavior of AggregateFunctions.
>>>>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` for
>>>>> groups belonging to that user.
>>>>> `user_id_1 -> [group_id_1, group_id_2, etc.]`
>>>>> Now let's assume sometime later that deletes arrive for all rows which
>>>>> produce user_id_1's group_id's.
>>>>>
>>>>> Would the aggregate function completely delete the associated state
>>>>> from RocksDB or would it leave something like `user_id_1 -> []` sitting in
>>>>> RocksDB forever?
>>>>>
>>>>> We have an aggregate similar to this where users could delete
>>>>> themselves and we want to make sure we're not accumulating data forever for
>>>>> those users.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> --
>>>>>
>>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>>
>>>>>
>>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>>> <https://www.facebook.com/remindhq>
>>>>>
>>>>
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Rex Fenley <Re...@remind101.com>.
Ok, that makes sense.

> You just need to correct the acc state to what it expects to be (say
re-evaluate the acc without the record that needs retraction) when you
received  the retraction message.

So for example, if i just remove all items from acc.groupIdSet on
retraction it will know to clear out the state entirely from rocks?

If a user gets deleted altogether (and my groupby is on user_id) what sort
of retraction do I need to evaluate then? Because I'm thinking now it will
need to just delete the state entirely and pass a full retraction of the
state downstream, but I don't know if deleting state from rocks happens
automatically or I need to make it do that in the retract method somehow.

On Wed, Dec 9, 2020 at 6:16 PM Danny Chan <da...@apache.org> wrote:

> No, the group agg, stream-stream join and rank are all stateful
> operators which need a state-backend to bookkeep the acc values.
>
> But it is only required to emit the retractions when the stateful operator
> A has a downstream operator B that is also stateful, because the B needs
> the retractions to correct the accs. If B is not stateful, just emitting
> the new record to override is enough.
>
> You just need to correct the acc state to what it expects to be (say
> re-evaluate the acc without the record that needs retraction) when you
> received  the retraction message.
>
> Rex Fenley <Re...@remind101.com> 于2020年12月10日周四 上午2:44写道:
>
>> So from what I'm understanding, the aggregate itself is not a "stateful
>> operator" but one may follow it? How does the aggregate accumulator keep
>> old values then? It can't all just live in memory, actually, looking at the
>> savepoints it looks like there's state associated with our aggregate
>> operator.
>>
>> To clarify my concern too, in my retract function impl in the aggregate
>> function class, all I do is remove a value (a group id) from the
>> accumulator set (which is an array). For example, if there is only 1
>> group_id left for a user and it gets deleted, that group_id will be removed
>> from the accumulator set and the set will be empty. I would hope that at
>> that point, given that there are no remaining rows for the aggregate, that
>> I could or flink will just delete the associated stored accumulator
>> altogether i.e. delete `user_id_1 -> []`. Is it possible that both the
>> groups and the user need to be deleted for everything to clear from
>> storage? That might make more sense actually..
>>
>> If this doesn't happen, since users delete themselves and their groups
>> all the time, we'll be storing all these empty data sets in rocks for no
>> reason. To clarify, we're using Debezium as our source and using Flink as a
>> materialization engine, so we never want to explicitly set a timeout on any
>> of our data, we just want to scale up predictably with our user growth.
>>
>> Thanks!
>>
>> On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <da...@apache.org> wrote:
>>
>>> Hi, Rex Fenley ~
>>>
>>> If there is stateful operator as the output of the aggregate function.
>>> Then each time the function receives an update (or delete) for the key, the
>>> agg operator would emit 2 messages, one for retracting the old record, one
>>> for the new message. For your case, the new message is the DELETE.
>>>
>>> If there is no stateful operator, the aggregate operator would just emit
>>> the update after (the new) message which is the delete.
>>>
>>> Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:
>>>
>>>> Hello,
>>>>
>>>> I'd like to better understand delete behavior of AggregateFunctions.
>>>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` for
>>>> groups belonging to that user.
>>>> `user_id_1 -> [group_id_1, group_id_2, etc.]`
>>>> Now let's assume sometime later that deletes arrive for all rows which
>>>> produce user_id_1's group_id's.
>>>>
>>>> Would the aggregate function completely delete the associated state
>>>> from RocksDB or would it leave something like `user_id_1 -> []` sitting in
>>>> RocksDB forever?
>>>>
>>>> We have an aggregate similar to this where users could delete
>>>> themselves and we want to make sure we're not accumulating data forever for
>>>> those users.
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Danny Chan <da...@apache.org>.
No, the group agg, stream-stream join and rank are all stateful
operators which need a state-backend to bookkeep the acc values.

But it is only required to emit the retractions when the stateful operator
A has a downstream operator B that is also stateful, because the B needs
the retractions to correct the accs. If B is not stateful, just emitting
the new record to override is enough.

You just need to correct the acc state to what it expects to be (say
re-evaluate the acc without the record that needs retraction) when you
received  the retraction message.

Rex Fenley <Re...@remind101.com> 于2020年12月10日周四 上午2:44写道:

> So from what I'm understanding, the aggregate itself is not a "stateful
> operator" but one may follow it? How does the aggregate accumulator keep
> old values then? It can't all just live in memory, actually, looking at the
> savepoints it looks like there's state associated with our aggregate
> operator.
>
> To clarify my concern too, in my retract function impl in the aggregate
> function class, all I do is remove a value (a group id) from the
> accumulator set (which is an array). For example, if there is only 1
> group_id left for a user and it gets deleted, that group_id will be removed
> from the accumulator set and the set will be empty. I would hope that at
> that point, given that there are no remaining rows for the aggregate, that
> I could or flink will just delete the associated stored accumulator
> altogether i.e. delete `user_id_1 -> []`. Is it possible that both the
> groups and the user need to be deleted for everything to clear from
> storage? That might make more sense actually..
>
> If this doesn't happen, since users delete themselves and their groups all
> the time, we'll be storing all these empty data sets in rocks for no
> reason. To clarify, we're using Debezium as our source and using Flink as a
> materialization engine, so we never want to explicitly set a timeout on any
> of our data, we just want to scale up predictably with our user growth.
>
> Thanks!
>
> On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <da...@apache.org> wrote:
>
>> Hi, Rex Fenley ~
>>
>> If there is stateful operator as the output of the aggregate function.
>> Then each time the function receives an update (or delete) for the key, the
>> agg operator would emit 2 messages, one for retracting the old record, one
>> for the new message. For your case, the new message is the DELETE.
>>
>> If there is no stateful operator, the aggregate operator would just emit
>> the update after (the new) message which is the delete.
>>
>> Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:
>>
>>> Hello,
>>>
>>> I'd like to better understand delete behavior of AggregateFunctions.
>>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` for
>>> groups belonging to that user.
>>> `user_id_1 -> [group_id_1, group_id_2, etc.]`
>>> Now let's assume sometime later that deletes arrive for all rows which
>>> produce user_id_1's group_id's.
>>>
>>> Would the aggregate function completely delete the associated state from
>>> RocksDB or would it leave something like `user_id_1 -> []` sitting in
>>> RocksDB forever?
>>>
>>> We have an aggregate similar to this where users could delete themselves
>>> and we want to make sure we're not accumulating data forever for those
>>> users.
>>>
>>> Thanks!
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Rex Fenley <Re...@remind101.com>.
So from what I'm understanding, the aggregate itself is not a "stateful
operator" but one may follow it? How does the aggregate accumulator keep
old values then? It can't all just live in memory, actually, looking at the
savepoints it looks like there's state associated with our aggregate
operator.

To clarify my concern too, in my retract function impl in the aggregate
function class, all I do is remove a value (a group id) from the
accumulator set (which is an array). For example, if there is only 1
group_id left for a user and it gets deleted, that group_id will be removed
from the accumulator set and the set will be empty. I would hope that at
that point, given that there are no remaining rows for the aggregate, that
I could or flink will just delete the associated stored accumulator
altogether i.e. delete `user_id_1 -> []`. Is it possible that both the
groups and the user need to be deleted for everything to clear from
storage? That might make more sense actually..

If this doesn't happen, since users delete themselves and their groups all
the time, we'll be storing all these empty data sets in rocks for no
reason. To clarify, we're using Debezium as our source and using Flink as a
materialization engine, so we never want to explicitly set a timeout on any
of our data, we just want to scale up predictably with our user growth.

Thanks!

On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <da...@apache.org> wrote:

> Hi, Rex Fenley ~
>
> If there is stateful operator as the output of the aggregate function.
> Then each time the function receives an update (or delete) for the key, the
> agg operator would emit 2 messages, one for retracting the old record, one
> for the new message. For your case, the new message is the DELETE.
>
> If there is no stateful operator, the aggregate operator would just emit
> the update after (the new) message which is the delete.
>
> Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:
>
>> Hello,
>>
>> I'd like to better understand delete behavior of AggregateFunctions.
>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` for
>> groups belonging to that user.
>> `user_id_1 -> [group_id_1, group_id_2, etc.]`
>> Now let's assume sometime later that deletes arrive for all rows which
>> produce user_id_1's group_id's.
>>
>> Would the aggregate function completely delete the associated state from
>> RocksDB or would it leave something like `user_id_1 -> []` sitting in
>> RocksDB forever?
>>
>> We have an aggregate similar to this where users could delete themselves
>> and we want to make sure we're not accumulating data forever for those
>> users.
>>
>> Thanks!
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: How User-Defined AggregateFunctions handle deletes of all aggregated rows.

Posted by Danny Chan <da...@apache.org>.
Hi, Rex Fenley ~

If there is stateful operator as the output of the aggregate function. Then
each time the function receives an update (or delete) for the key, the agg
operator would emit 2 messages, one for retracting the old record, one for
the new message. For your case, the new message is the DELETE.

If there is no stateful operator, the aggregate operator would just emit
the update after (the new) message which is the delete.

Rex Fenley <Re...@remind101.com> 于2020年12月9日周三 上午4:30写道:

> Hello,
>
> I'd like to better understand delete behavior of AggregateFunctions. Let's
> assume there's an aggregate of `user_id` to a set of `group_ids` for groups
> belonging to that user.
> `user_id_1 -> [group_id_1, group_id_2, etc.]`
> Now let's assume sometime later that deletes arrive for all rows which
> produce user_id_1's group_id's.
>
> Would the aggregate function completely delete the associated state from
> RocksDB or would it leave something like `user_id_1 -> []` sitting in
> RocksDB forever?
>
> We have an aggregate similar to this where users could delete themselves
> and we want to make sure we're not accumulating data forever for those
> users.
>
> Thanks!
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>