You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Alam, Zeeshan" <Ze...@fmr.com> on 2016/08/03 12:21:46 UTC

Flink Batch Processing with Kafka

Hi,

Flink works very well with Kafka if you wish to stream data. Following  is how I am streaming data with Kafka and Flink.

FlinkKafkaConsumer08<Event> kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties);
DataStream<Event> messageStream = env.addSource(kafkaConsumer);

Is there a way to do a micro batch operation on the data coming from Kafka? What I want to do is to reduce or aggregate the events coming from Kafka. For instance I am getting 40000 events per second from Kafka and what I want is to group 2000 events into one and send it to my microservice for further processing. Can I use the Flink DataSet API for this or should I go with Spark or some other framework?

Thanks & Regards
Zeeshan Alam

Re: Flink Batch Processing with Kafka

Posted by Prabhu V <vp...@gmail.com>.
If your environment is not kerberized (or if you can offord to restart the
job every 7 days),  a checkpoint enabled, flink job with windowing and the
count trigger, would be ideal for your requirement.

Check the api's on flink windows.

I had something like this that worked

stream.keyBy(0).countWindow(2000).apply(function)
                .setParallelism(conf.getInt("window.parallelism"));

where stream is a data stream using the kafka connector,
"function" is where you would have the "send it to my microservice" part.
keyBy(0) is the aggregation based on a key field

You could look up the individual methods in the api.

Thanks,
Prabhu

On Wed, Aug 3, 2016 at 5:21 AM, Alam, Zeeshan <Ze...@fmr.com> wrote:

> Hi,
>
>
>
> Flink works very well with Kafka if you wish to stream data. Following  is
> how I am streaming data with Kafka and Flink.
>
>
>
> FlinkKafkaConsumer08<Event> kafkaConsumer = *new* FlinkKafkaConsumer08<>(
> *KAFKA_AVRO_TOPIC*, avroSchema, properties);
>
> DataStream<Event> messageStream = env.addSource(kafkaConsumer);
>
>
>
> Is there a way to do a micro batch operation on the data coming from
> Kafka? What I want to do is to *reduce* or *aggregate* the events coming
> from Kafka. For instance I am getting 40000 events per second from Kafka
> and what I want is to group 2000 events into one and send it to my
> microservice for further processing. Can I use the *Flink DataSet API*
> for this or should I go with Spark or some other framework?
>
>
>
> Thanks & Regards
>
> Zeeshan Alam
>