You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Qingsheng Ren <re...@gmail.com> on 2022/06/01 03:53:31 UTC

Re: Can we use CheckpointedFunction with the new Source api?

Hi Qing, 

Thanks for the input. I think having a stateful function to accumulate the tree after source is a reasonable solution to me. Under your design a split is mapping to a znode so the state persisted in the source reader would be per-node information, and it’s hard to accumulate them under the current abstraction of source. Also I’m a little bit curious about the use case. If the downstream requires the whole tree, does that means the parallelism of the accumulator has to be 1? Please forgive me if my understanding is incorrect. 

Another idea in my mind is that if you are also providing a reusable *table* source, you can wrap the source and the accumulating function together into a DataStreamScanProvider and provide as one table source to user. This might look a bit neater. 

Cheers, 

Qingsheng

> On May 31, 2022, at 16:04, Qing Lim <q....@mwam.com> wrote:
> 
> Hi Qingsheng, thanks for getting back.
> 
> I manage to find a workaround, but if you can provide other suggestions it'd be great too.
> 
> I followed the documentation here: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/
> 
> I implemented a Custom Source that emits all changes from a Zookeeper node (recursively), I modelled it such that everytime there's a change in the tree, it will produce a split for the specific node, then the reader is responsible to fetch the latest state of the node, and emit.
> This works fine, but we have a use case to emit the whole tree (recursively), conceptually it is quite simple, I just need to accumulate the whole tree as State, and then emit the State whenever it updates.
> 
> I was trying to achieve this inside the Source, because this is meant to be something reusable within our organization, which is why I asked the original question, but I have now solved it by implementing a stateful Map function instead, it is a bit less ergonomic, but acceptable on my end. So if you have an alternative, please share with me, thank you.
> 
> Kind regards
> 
> -----Original Message-----
> From: Qingsheng Ren <re...@gmail.com> 
> Sent: 31 May 2022 03:57
> To: Qing Lim <q....@mwam.com>
> Cc: user@flink.apache.org
> Subject: Re: Can we use CheckpointedFunction with the new Source api?
> 
> Hi Qing,
> 
> I’m afraid CheckpointedFunction cannot be applied to the new source API, but could you share the abstractions of your source implementation, like which component a split maps to etc.? Maybe we can try to do some workarounds. 
> 
> Best, 
> 
> Qingsheng
> 
>> On May 30, 2022, at 20:09, Qing Lim <q....@mwam.com> wrote:
>> 
>> Hi, is it possible to use CheckpointedFunction with the new Source api? (The one in package org.apache.flink.api.connector.source)
>> 
>> My use case:
>> 
>> I have a custom source that emit individual nodes update from a tree, and I wish to create a stream of the whole Tree snapshots, so I will have to accumulate all updates and keep it as state. In addition to this, I wish to expose this functionality as a library to my organization.
>> 
>> The custom source is written using the new Source api, I wonder if we can attach state to it?
>> 
>> Kind regards
>> 
>> This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. If you are not the intended recipient of this e-mail you are hereby notified that any dissemination, distribution, or copying of its content is strictly prohibited. If you have received this message in error, please notify the sender by return e-mail and destroy the message and all copies in your possession.
>> 
>> 
>> To find out more details about how we may collect, use and share your personal information, please see https://www.mwam.com/privacy-policy. This includes details of how calls you make to us may be recorded in order for us to comply with our legal and regulatory obligations.
>> 
>> 
>> To the extent that the contents of this email constitutes a financial promotion, please note that it is issued only to and/or directed only at persons who are professional clients or eligible counterparties as defined in the FCA Rules. Any investment products or services described in this email are available only to professional clients and eligible counterparties. Persons who are not professional clients or eligible counterparties should not rely or act on the contents of this email.
>> 
>> 
>> Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P. ("MWNA"), which is registered with the US Securities and Exchange Commission ("SEC") as an investment adviser.  Registration with the SEC does not imply that MWNA or its employees possess a certain level of skill or training.
>> 
>