You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Sandybayev, Turar (CAI - Atlanta)" <Tu...@coxautoinc.com> on 2019/02/08 15:43:13 UTC

Help with a stream processing use case

Hi all,

I wonder whether it’s possible to use Flink for the following requirement. We need to process a Kinesis stream and based on values in each record, route those records to different S3 buckets and keyspaces, with support for batching up of files and control over partitioning scheme (so preferably through Firehose).

I know it’s straightforward to have a Kinesis source and a Kinesis sink, and the hook up Firehose to the sink from AWS, but I need a “fan out” to potentially thousands of different buckets, based on content of each event.

Thanks!
Turar



Re: Help with a stream processing use case

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi,

If Firehouse already supports sinking records from a Kinesis stream to an
S3 bucket, then yes, Chesnay's suggestion would work.
You route each record to a specific Kinesis stream depending on some value
in the record using the  KinesisSerializationSchema, and sink each Kinesis
stream to their target S3 bucket.

Another obvious approach is to use side output tags in the Flink job to
route records to different streaming file sinks that write to their own S3
buckets, but that would require knowing the target S3 buckets upfront.

Cheers,
Gordon

On Sun, Feb 10, 2019 at 5:42 PM Chesnay Schepler <ch...@apache.org> wrote:

> I'll need someone else to chime in here for a definitive answer (cc'd
> Gordon), so I'm really just guessing here.
>
> For the partitioning it looks like you can use a custom partitioner, see
> FlinkKinesisProducer#setCustomPartitioner.
> Have you looked at the KinesisSerializationSchema described in the
> documentation
> <https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kinesis.html#kinesis-producer>?
> It allows you to write to a specific stream based on incoming events, but
> I'm not sure whether this translates to S3 buckets and keyspaces.
>
> On 08.02.2019 16:43, Sandybayev, Turar (CAI - Atlanta) wrote:
>
> Hi all,
>
>
>
> I wonder whether it’s possible to use Flink for the following requirement.
> We need to process a Kinesis stream and based on values in each record,
> route those records to different S3 buckets and keyspaces, with support for
> batching up of files and control over partitioning scheme (so preferably
> through Firehose).
>
>
>
> I know it’s straightforward to have a Kinesis source and a Kinesis sink,
> and the hook up Firehose to the sink from AWS, but I need a “fan out” to
> potentially thousands of different buckets, based on content of each event.
>
>
>
> Thanks!
>
> Turar
>
>
>
>
>
>
>

Re: Help with a stream processing use case

Posted by Chesnay Schepler <ch...@apache.org>.
I'll need someone else to chime in here for a definitive answer (cc'd 
Gordon), so I'm really just guessing here.

For the partitioning it looks like you can use a custom partitioner, see 
FlinkKinesisProducer#setCustomPartitioner.
Have you looked at the KinesisSerializationSchema described in the 
documentation 
<https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kinesis.html#kinesis-producer>? 
It allows you to write to a specific stream based on incoming events, 
but I'm not sure whether this translates to S3 buckets and keyspaces.

On 08.02.2019 16:43, Sandybayev, Turar (CAI - Atlanta) wrote:
>
> Hi all,
>
> I wonder whether it’s possible to use Flink for the following 
> requirement. We need to process a Kinesis stream and based on values 
> in each record, route those records to different S3 buckets and 
> keyspaces, with support for batching up of files and control over 
> partitioning scheme (so preferably through Firehose).
>
> I know it’s straightforward to have a Kinesis source and a Kinesis 
> sink, and the hook up Firehose to the sink from AWS, but I need a “fan 
> out” to potentially thousands of different buckets, based on content 
> of each event.
>
> Thanks!
>
> Turar
>