You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Marchant, Hayden " <ha...@citi.com> on 2018/02/01 10:42:17 UTC

Reading bounded data from Kafka in Flink job

I have 2 datasets that I need to join together in a Flink batch job. One of the datasets needs to be created dynamically by completely 'draining' a Kafka topic in an offset range (start and end), and create a file containing all messages in that range. I know that in Flink streaming I can specify the start offset, but not the end offset. In my case, this preparation of the file from kafka topic is really working on a finite, bounded set of data, even though it's from Kafka. 

Is there a way that I can do this in Flink (either streaming or batch ?

Thanks,
Hayden

Re: Reading bounded data from Kafka in Flink job

Posted by Fabian Hueske <fh...@gmail.com>.

Hi Hayden,

as far as I know, an end offset is not supported by Flink's Kafka consumer.
You could extend Flink's consumer. As you said, there is already code to
set the starting offset (per partition), so you might be able to just
piggyback on that.

Gordon (in CC) who has worked a lot on the Kafka connector might have a
better idea.

Best, Fabian

2018-02-01 11:42 GMT+01:00 Marchant, Hayden <ha...@citi.com>:

> I have 2 datasets that I need to join together in a Flink batch job. One
> of the datasets needs to be created dynamically by completely 'draining' a
> Kafka topic in an offset range (start and end), and create a file
> containing all messages in that range. I know that in Flink streaming I can
> specify the start offset, but not the end offset. In my case, this
> preparation of the file from kafka topic is really working on a finite,
> bounded set of data, even though it's from Kafka.
>
> Is there a way that I can do this in Flink (either streaming or batch ?
>
> Thanks,
> Hayden
>
>
>
>