You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Alam, Zeeshan" <Ze...@fmr.com> on 2016/08/03 12:21:46 UTC
Flink Batch Processing with Kafka
Hi,
Flink works very well with Kafka if you wish to stream data. Following is how I am streaming data with Kafka and Flink.
FlinkKafkaConsumer08<Event> kafkaConsumer = new FlinkKafkaConsumer08<>(KAFKA_AVRO_TOPIC, avroSchema, properties);
DataStream<Event> messageStream = env.addSource(kafkaConsumer);
Is there a way to do a micro batch operation on the data coming from Kafka? What I want to do is to reduce or aggregate the events coming from Kafka. For instance I am getting 40000 events per second from Kafka and what I want is to group 2000 events into one and send it to my microservice for further processing. Can I use the Flink DataSet API for this or should I go with Spark or some other framework?
Thanks & Regards
Zeeshan Alam
Re: Flink Batch Processing with Kafka
Posted by Prabhu V <vp...@gmail.com>.
If your environment is not kerberized (or if you can offord to restart the
job every 7 days), a checkpoint enabled, flink job with windowing and the
count trigger, would be ideal for your requirement.
Check the api's on flink windows.
I had something like this that worked
stream.keyBy(0).countWindow(2000).apply(function)
.setParallelism(conf.getInt("window.parallelism"));
where stream is a data stream using the kafka connector,
"function" is where you would have the "send it to my microservice" part.
keyBy(0) is the aggregation based on a key field
You could look up the individual methods in the api.
Thanks,
Prabhu
On Wed, Aug 3, 2016 at 5:21 AM, Alam, Zeeshan <Ze...@fmr.com> wrote:
> Hi,
>
>
>
> Flink works very well with Kafka if you wish to stream data. Following is
> how I am streaming data with Kafka and Flink.
>
>
>
> FlinkKafkaConsumer08<Event> kafkaConsumer = *new* FlinkKafkaConsumer08<>(
> *KAFKA_AVRO_TOPIC*, avroSchema, properties);
>
> DataStream<Event> messageStream = env.addSource(kafkaConsumer);
>
>
>
> Is there a way to do a micro batch operation on the data coming from
> Kafka? What I want to do is to *reduce* or *aggregate* the events coming
> from Kafka. For instance I am getting 40000 events per second from Kafka
> and what I want is to group 2000 events into one and send it to my
> microservice for further processing. Can I use the *Flink DataSet API*
> for this or should I go with Spark or some other framework?
>
>
>
> Thanks & Regards
>
> Zeeshan Alam
>