You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Prasanna kumar <pr...@gmail.com> on 2020/08/21 18:00:51 UTC

SDK vs Connectors

Hi Team,

Following is the pipeline
Kafka => Processing => SNS Topics .

Flink Does not provide a SNS connector out of the box.

a) I implemented the above by using AWS SDK and published the messages in
the Map operator itself.
The pipeline is working well. I see messages flowing to SNS topics.

b) Another approach is that I could write a custom sink function and still
publish to SNS using SDK in this stage.

Questions
1) What would be the primary difference between approach a) and b). Is
there any significant advantage of one over the other ?

2) Would at least once guarantee be confirmed if we follow the above
approach?

3) Would there be any significant disadvantages(rather what we need to be
careful ) of writing our custom sink functions ?

Thanks,
Prasanna.

Re: SDK vs Connectors

Posted by Robert Metzger <rm...@apache.org>.
Hi Prasanna,

(General remark: For such questions, please send the email only to
user@flink.apache.org. There's no need to email to dev@ as well.)

I don't think Flink can do much if the library you are using isn't throwing
exceptions. Maybe the library has other means of error reporting (a
callback to register, or it returns futures) which you can integrate with
Flink.



On Sun, Aug 23, 2020 at 4:34 PM Prasanna kumar <
prasannakumarramani@gmail.com> wrote:

> Thanks for the Reply Yun,
>
> I see that when I publish the messages to SNS from map operator, in case
> of any errors I find the checkpointing mechanism takes care of "no data
> loss".
>
> One scenario I could not replicate is that, the method from SDK unable to
> send messages to SNS but remains silent not throwing any
> errors/exceptions.In this case we may not confirm "at least once guarantee"
> of delivery of messages.
>
> Prasanna.
>
> On Sun 23 Aug, 2020, 07:51 Yun Gao, <yu...@aliyun.com> wrote:
>
>> Hi Prasanna,
>>
>>    1) Semantically both a) and b) would be Ok. If the Custom sink could
>> be chained with the map operator (I assume the map operator is the
>> "Processing" in the graph), there should be also no much difference
>> physically, if they could not chain, then writting a custom sink would
>> cause another pass of network transferring, but the custom sink would be
>> run in a different thread, thus much more computation resources could be
>> exploited.
>>    2) To achieve at-least-once, you need to implment the
>> "CheckpointedFunction" interface, and ensures flushing all the data to the
>> outside systems when snapshotting states. Since if the checkpointing
>> succeed, the previous data will not be replayed after failover, thus these
>> pieces of data need to be ensured written out before the checkpoint
>> succeeds.
>>    3) From my side I don't think there are significant disadvantages of
>> writing custom sink functions.
>>
>> Best,
>>  Yun
>>
>>
>> ------------------------------------------------------------------
>> Sender:Prasanna kumar<pr...@gmail.com>
>> Date:2020/08/22 02:00:51
>> Recipient:user<us...@flink.apache.org>; <de...@flink.apache.org>
>> Theme:SDK vs Connectors
>>
>> Hi Team,
>>
>> Following is the pipeline
>> Kafka => Processing => SNS Topics .
>>
>> Flink Does not provide a SNS connector out of the box.
>>
>> a) I implemented the above by using AWS SDK and published the messages in
>> the Map operator itself.
>> The pipeline is working well. I see messages flowing to SNS topics.
>>
>> b) Another approach is that I could write a custom sink function and
>> still publish to SNS using SDK in this stage.
>>
>> Questions
>> 1) What would be the primary difference between approach a) and b). Is
>> there any significant advantage of one over the other ?
>>
>> 2) Would at least once guarantee be confirmed if we follow the above
>> approach?
>>
>> 3) Would there be any significant disadvantages(rather what we need to be
>> careful ) of writing our custom sink functions ?
>>
>> Thanks,
>> Prasanna.
>>
>>

Re: SDK vs Connectors

Posted by Prasanna kumar <pr...@gmail.com>.
Thanks for the Reply Yun,

I see that when I publish the messages to SNS from map operator, in case of
any errors I find the checkpointing mechanism takes care of "no data loss".

One scenario I could not replicate is that, the method from SDK unable to
send messages to SNS but remains silent not throwing any
errors/exceptions.In this case we may not confirm "at least once guarantee"
of delivery of messages.

Prasanna.

On Sun 23 Aug, 2020, 07:51 Yun Gao, <yu...@aliyun.com> wrote:

> Hi Prasanna,
>
>    1) Semantically both a) and b) would be Ok. If the Custom sink could be
> chained with the map operator (I assume the map operator is the
> "Processing" in the graph), there should be also no much difference
> physically, if they could not chain, then writting a custom sink would
> cause another pass of network transferring, but the custom sink would be
> run in a different thread, thus much more computation resources could be
> exploited.
>    2) To achieve at-least-once, you need to implment the
> "CheckpointedFunction" interface, and ensures flushing all the data to the
> outside systems when snapshotting states. Since if the checkpointing
> succeed, the previous data will not be replayed after failover, thus these
> pieces of data need to be ensured written out before the checkpoint
> succeeds.
>    3) From my side I don't think there are significant disadvantages of
> writing custom sink functions.
>
> Best,
>  Yun
>
>
> ------------------------------------------------------------------
> Sender:Prasanna kumar<pr...@gmail.com>
> Date:2020/08/22 02:00:51
> Recipient:user<us...@flink.apache.org>; <de...@flink.apache.org>
> Theme:SDK vs Connectors
>
> Hi Team,
>
> Following is the pipeline
> Kafka => Processing => SNS Topics .
>
> Flink Does not provide a SNS connector out of the box.
>
> a) I implemented the above by using AWS SDK and published the messages in
> the Map operator itself.
> The pipeline is working well. I see messages flowing to SNS topics.
>
> b) Another approach is that I could write a custom sink function and still
> publish to SNS using SDK in this stage.
>
> Questions
> 1) What would be the primary difference between approach a) and b). Is
> there any significant advantage of one over the other ?
>
> 2) Would at least once guarantee be confirmed if we follow the above
> approach?
>
> 3) Would there be any significant disadvantages(rather what we need to be
> careful ) of writing our custom sink functions ?
>
> Thanks,
> Prasanna.
>
>

Re: SDK vs Connectors

Posted by Prasanna kumar <pr...@gmail.com>.
Thanks for the Reply Yun,

I see that when I publish the messages to SNS from map operator, in case of
any errors I find the checkpointing mechanism takes care of "no data loss".

One scenario I could not replicate is that, the method from SDK unable to
send messages to SNS but remains silent not throwing any
errors/exceptions.In this case we may not confirm "at least once guarantee"
of delivery of messages.

Prasanna.

On Sun 23 Aug, 2020, 07:51 Yun Gao, <yu...@aliyun.com> wrote:

> Hi Prasanna,
>
>    1) Semantically both a) and b) would be Ok. If the Custom sink could be
> chained with the map operator (I assume the map operator is the
> "Processing" in the graph), there should be also no much difference
> physically, if they could not chain, then writting a custom sink would
> cause another pass of network transferring, but the custom sink would be
> run in a different thread, thus much more computation resources could be
> exploited.
>    2) To achieve at-least-once, you need to implment the
> "CheckpointedFunction" interface, and ensures flushing all the data to the
> outside systems when snapshotting states. Since if the checkpointing
> succeed, the previous data will not be replayed after failover, thus these
> pieces of data need to be ensured written out before the checkpoint
> succeeds.
>    3) From my side I don't think there are significant disadvantages of
> writing custom sink functions.
>
> Best,
>  Yun
>
>
> ------------------------------------------------------------------
> Sender:Prasanna kumar<pr...@gmail.com>
> Date:2020/08/22 02:00:51
> Recipient:user<us...@flink.apache.org>; <de...@flink.apache.org>
> Theme:SDK vs Connectors
>
> Hi Team,
>
> Following is the pipeline
> Kafka => Processing => SNS Topics .
>
> Flink Does not provide a SNS connector out of the box.
>
> a) I implemented the above by using AWS SDK and published the messages in
> the Map operator itself.
> The pipeline is working well. I see messages flowing to SNS topics.
>
> b) Another approach is that I could write a custom sink function and still
> publish to SNS using SDK in this stage.
>
> Questions
> 1) What would be the primary difference between approach a) and b). Is
> there any significant advantage of one over the other ?
>
> 2) Would at least once guarantee be confirmed if we follow the above
> approach?
>
> 3) Would there be any significant disadvantages(rather what we need to be
> careful ) of writing our custom sink functions ?
>
> Thanks,
> Prasanna.
>
>

Re: SDK vs Connectors

Posted by Yun Gao <yu...@aliyun.com>.
Hi Prasanna,

   1) Semantically both a) and b) would be Ok. If the Custom sink could be chained with the map operator (I assume the map operator is the "Processing" in the graph), there should be also no much difference physically, if they could not chain, then writting a custom sink would cause another pass of network transferring, but the custom sink would be run in a different thread, thus much more computation resources could be exploited. 
   2) To achieve at-least-once, you need to implment the "CheckpointedFunction" interface, and ensures flushing all the data to the outside systems when snapshotting states. Since if the checkpointing succeed, the previous data will not be replayed after failover, thus these pieces of data need to be ensured written out before the checkpoint succeeds.
   3) From my side I don't think there are significant disadvantages of writing custom sink functions. 

Best,
 Yun


------------------------------------------------------------------
Sender:Prasanna kumar<pr...@gmail.com>
Date:2020/08/22 02:00:51
Recipient:user<us...@flink.apache.org>; <de...@flink.apache.org>
Theme:SDK vs Connectors

Hi Team,

Following is the pipeline 
Kafka => Processing => SNS Topics .

Flink Does not provide a SNS connector out of the box. 

a) I implemented the above by using AWS SDK and published the messages in the Map operator itself.  
The pipeline is working well. I see messages flowing to SNS topics.

b) Another approach is that I could write a custom sink function and still publish to SNS using SDK in this stage. 

Questions
1) What would be the primary difference between approach a) and b). Is there any significant advantage of one over the other ?

2) Would at least once guarantee be confirmed if we follow the above approach?

3) Would there be any significant disadvantages(rather what we need to be careful ) of writing our custom sink functions ?

Thanks,
Prasanna. 

Re: SDK vs Connectors

Posted by Yun Gao <yu...@aliyun.com.INVALID>.
Hi Prasanna,

   1) Semantically both a) and b) would be Ok. If the Custom sink could be chained with the map operator (I assume the map operator is the "Processing" in the graph), there should be also no much difference physically, if they could not chain, then writting a custom sink would cause another pass of network transferring, but the custom sink would be run in a different thread, thus much more computation resources could be exploited. 
   2) To achieve at-least-once, you need to implment the "CheckpointedFunction" interface, and ensures flushing all the data to the outside systems when snapshotting states. Since if the checkpointing succeed, the previous data will not be replayed after failover, thus these pieces of data need to be ensured written out before the checkpoint succeeds.
   3) From my side I don't think there are significant disadvantages of writing custom sink functions. 

Best,
 Yun


------------------------------------------------------------------
Sender:Prasanna kumar<pr...@gmail.com>
Date:2020/08/22 02:00:51
Recipient:user<us...@flink.apache.org>; <de...@flink.apache.org>
Theme:SDK vs Connectors

Hi Team,

Following is the pipeline 
Kafka => Processing => SNS Topics .

Flink Does not provide a SNS connector out of the box. 

a) I implemented the above by using AWS SDK and published the messages in the Map operator itself.  
The pipeline is working well. I see messages flowing to SNS topics.

b) Another approach is that I could write a custom sink function and still publish to SNS using SDK in this stage. 

Questions
1) What would be the primary difference between approach a) and b). Is there any significant advantage of one over the other ?

2) Would at least once guarantee be confirmed if we follow the above approach?

3) Would there be any significant disadvantages(rather what we need to be careful ) of writing our custom sink functions ?

Thanks,
Prasanna.