You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tarandeep Singh <ta...@gmail.com> on 2017/03/09 15:56:15 UTC

Flink streaming - call external API "after" sink

Hi,

I am using flink-1.2 streaming API to process clickstream and compute some
results per cookie. The computed results are stored in Cassandra using
flink-cassandra connector. After a result is stored in cassandra, I want to
notify an external system (using their API or via Kafka) that result is
available (for one cookie).

Can this be done (with/without modifying sink source code)?

What if I create a JointSink that internally uses cassandra sink and kafka
sink and writes to both places? I am not worried about same record written
multiple times as the computed result and the external system consumption
is idempotent.

Thank you,
Tarandeep

Re: Flink streaming - call external API "after" sink

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

this is the second time that something like this is being requested or
proposed. This was the first time: [1].


+Seth, who might have an opinion on this.



I'm starting to think that we might need to generalise this pattern.
Right now, the SinkFunction interface is this:
public interface SinkFunction<IN> extends Function, Serializable {

    /**

     * Function for standard sink behaviour. This function is called

     * for every record.

     */

    void invoke(IN value) throws Exception;

}



The interface for FlatMapFunction is this:

public interface FlatMapFunction<T, O> extends Function, Serializable {


    /**

     * The core method of the FlatMapFunction. Takes an element from the
     * input data set and transforms

     * it into zero, one, or more elements.

    void flatMap(T value, Collector<O> out) throws Exception;

}



The only difference is naming and the fact that FlatMapFunction can emit
elements. All SinkFunction implementations could be implemented as a
FlatMapFunction<IN, Void>, so we might not even need a special
SinkFunction in the end and all sinks can become "sinks that can also
forward data".


For an example of a system that does it like this you can look at Apache
Beam, where there are no special Sink Functions. Everything is just
DoFns (basically a very powerful FlatMapFunction) stringed together. For
example there is a file "sink" that consists of three DoFns: one does
some initialisation and sends forward write handles, the second DoFn
does the writing and forwards handles to the written data, the third one
finalises by renaming the written files to the final location.


Best,

Aljoscha



[1] https://lists.apache.org/thread.html/bfa811892f8bd5d87d47a4597b60ab2f4aee0a8e7d6379b3d6d9d7b3@%3Cdev.flink.apache.org%3E




On Thu, Mar 9, 2017, at 16:56, Tarandeep Singh wrote:

> Hi,

> 

> I am using flink-1.2 streaming API to process clickstream and compute
> some results per cookie. The computed results are stored in Cassandra
> using flink-cassandra connector. After a result is stored in
> cassandra, I want to notify an external system (using their API or via
> Kafka) that result is available (for one cookie).
> 

> Can this be done (with/without modifying sink source code)?

> 

> What if I create a JointSink that internally uses cassandra sink and
> kafka sink and writes to both places? I am not worried about same
> record written multiple times as the computed result and the external
> system consumption is idempotent.
> 

> Thank you,

> Tarandeep