You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Neng Lu <nl...@apache.org> on 2023/03/01 02:16:52 UTC

[Discussion] Allowing configure if function consumer should skip to latest

Hello Community,

In this [PR](https://github.com/apache/pulsar/pull/17214), we changed the
function protobuf by adding one more field `bool skipToLatest = 14;`.

The change itself is minimum and self-contained for function internal usage.

Given the new community guideline that protobuf change should notify the
community, I'm writing this email to initiate the (late) discussion.

### Motivation
In certain failure cases, the function needs to skip all the content
between the last successfully acked message and the latest message in the
topic in order to skip the huge backlog and quick recovery.

### Modifications
1. Providing a boolean config for function submission cmd
2. PulsarSource will call `consumer.seek(MessageId.latest)` if the skip
flag is set
3. the internal function protobuf file will have a new field `skipToLatest`

Let me know your thoughts. And if there're big concerns regarding this
change, we will revert the above PR fist and then make modifications to it.

Best Regards,
Neng Lu

Re: [Discussion] Allowing configure if function consumer should skip to latest

Posted by PengHui Li <pe...@apache.org>.
Got it.

Thanks for the explanation.
LGTM

Penghui

On Thu, Mar 9, 2023 at 3:06 AM Neng Lu <nl...@apache.org> wrote:

> Hi everyone,
>
> This discussion has been one week old. If there's no objection or
> concerns, I'll move forward and close the discussion with the conclusion
> that we are good with the proposed change.
>
> This will result in the PR merge (although it was already merged, so the
> merged change won't be reverted in this case).
>
> On 2023/03/07 03:45:02 Neng Lu wrote:
> > Hi Penghui,
> >
> > Thanks for your question.
> >
> > One case is failure recovery for a windowing function.
> >
> > A windowing function will ack message until its window is emitted. If
> the window function fails due to issues such as OOM and restarts, it has a
> massive backlog to catch up. And the function will never be able to recover
> itself since the backlog keeps growing and it keeps OOM.
> >
> > Our user prefers an automatic way for recovery, given they are okay with
> skipping some backlog data. (This is acceptable in IoT cases). Also, Users
> may deploy hundreds of functions in their environment. Manually resetting
> the cursor is not scalable and is a heavy burden for the on-call person in
> such cases.
> >
> > Hope the above use case can help provide some more context regarding the
> change.
> >
> > On 2023/03/03 03:51:35 PengHui Li wrote:
> > > Hi Neng,
> > >
> > > Thanks for raising up the discussion
> > >
> > > > In certain failure cases, the function needs to skip all the content
> > > between the last successfully acked message and the latest message in
> the
> > > topic in order to skip the huge backlog and quick recovery.
> > >
> > > Do you have some real cases that can help us to understand it
> > > is necessary to introduce a new flag? Another possibility is users
> > > can use pulsar admin to reset the cursor to the latest position,
> > > Why will it not work for users?
> > >
> > > Regards,
> > > Penghui
> > >
> > > > On Mar 1, 2023, at 10:16, Neng Lu <nl...@apache.org> wrote:
> > > >
> > > > In certain failure cases, the function needs to skip all the content
> > > > between the last successfully acked message and the latest message
> in the
> > > > topic in order to skip the huge backlog and quick recovery.
> > >
> > >
> >
>

Re: [Discussion] Allowing configure if function consumer should skip to latest

Posted by Neng Lu <nl...@apache.org>.
Hi everyone,

This discussion has been one week old. If there's no objection or concerns, I'll move forward and close the discussion with the conclusion that we are good with the proposed change.

This will result in the PR merge (although it was already merged, so the merged change won't be reverted in this case).

On 2023/03/07 03:45:02 Neng Lu wrote:
> Hi Penghui,
> 
> Thanks for your question.
> 
> One case is failure recovery for a windowing function.
> 
> A windowing function will ack message until its window is emitted. If the window function fails due to issues such as OOM and restarts, it has a massive backlog to catch up. And the function will never be able to recover itself since the backlog keeps growing and it keeps OOM.
> 
> Our user prefers an automatic way for recovery, given they are okay with skipping some backlog data. (This is acceptable in IoT cases). Also, Users may deploy hundreds of functions in their environment. Manually resetting the cursor is not scalable and is a heavy burden for the on-call person in such cases. 
> 
> Hope the above use case can help provide some more context regarding the change.
> 
> On 2023/03/03 03:51:35 PengHui Li wrote:
> > Hi Neng,
> > 
> > Thanks for raising up the discussion
> > 
> > > In certain failure cases, the function needs to skip all the content
> > between the last successfully acked message and the latest message in the
> > topic in order to skip the huge backlog and quick recovery.
> > 
> > Do you have some real cases that can help us to understand it
> > is necessary to introduce a new flag? Another possibility is users
> > can use pulsar admin to reset the cursor to the latest position,
> > Why will it not work for users? 
> > 
> > Regards,
> > Penghui
> > 
> > > On Mar 1, 2023, at 10:16, Neng Lu <nl...@apache.org> wrote:
> > > 
> > > In certain failure cases, the function needs to skip all the content
> > > between the last successfully acked message and the latest message in the
> > > topic in order to skip the huge backlog and quick recovery.
> > 
> > 
> 

Re: [Discussion] Allowing configure if function consumer should skip to latest

Posted by Neng Lu <nl...@apache.org>.
Hi Penghui,

Thanks for your question.

One case is failure recovery for a windowing function.

A windowing function will ack message until its window is emitted. If the window function fails due to issues such as OOM and restarts, it has a massive backlog to catch up. And the function will never be able to recover itself since the backlog keeps growing and it keeps OOM.

Our user prefers an automatic way for recovery, given they are okay with skipping some backlog data. (This is acceptable in IoT cases). Also, Users may deploy hundreds of functions in their environment. Manually resetting the cursor is not scalable and is a heavy burden for the on-call person in such cases. 

Hope the above use case can help provide some more context regarding the change.

On 2023/03/03 03:51:35 PengHui Li wrote:
> Hi Neng,
> 
> Thanks for raising up the discussion
> 
> > In certain failure cases, the function needs to skip all the content
> between the last successfully acked message and the latest message in the
> topic in order to skip the huge backlog and quick recovery.
> 
> Do you have some real cases that can help us to understand it
> is necessary to introduce a new flag? Another possibility is users
> can use pulsar admin to reset the cursor to the latest position,
> Why will it not work for users? 
> 
> Regards,
> Penghui
> 
> > On Mar 1, 2023, at 10:16, Neng Lu <nl...@apache.org> wrote:
> > 
> > In certain failure cases, the function needs to skip all the content
> > between the last successfully acked message and the latest message in the
> > topic in order to skip the huge backlog and quick recovery.
> 
> 

Re: [Discussion] Allowing configure if function consumer should skip to latest

Posted by PengHui Li <co...@gmail.com>.
Hi Neng,

Thanks for raising up the discussion

> In certain failure cases, the function needs to skip all the content
between the last successfully acked message and the latest message in the
topic in order to skip the huge backlog and quick recovery.

Do you have some real cases that can help us to understand it
is necessary to introduce a new flag? Another possibility is users
can use pulsar admin to reset the cursor to the latest position,
Why will it not work for users? 

Regards,
Penghui

> On Mar 1, 2023, at 10:16, Neng Lu <nl...@apache.org> wrote:
> 
> In certain failure cases, the function needs to skip all the content
> between the last successfully acked message and the latest message in the
> topic in order to skip the huge backlog and quick recovery.