You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Sriram Ganesh <sr...@gmail.com> on 2023/01/24 09:25:10 UTC

Using S3 as stream source in Flink

Hi Everyone,

I am thinking of switching my input source from Kafka to S3. First, I
couldn't find any streaming source connector for S3. I have some basic
questions about

1. How S3 will work as a streaming source with proper checkpointing.?
2. How Flink will manage the last offset processed from a file?
3. Is exactly_once possible while using S3 as a streaming source?
4. What could be the pros and cons of using an S3 kind of storage as a
streaming source?

Any help would be appreciated. Thanks in advance.

Thanks,
-- 
*Sriram G*
*Tech*

Re: Using S3 as stream source in Flink

Posted by Sriram Ganesh <sr...@gmail.com>.

I saw in aws-samples
https://github.com/aws-samples/flink-stream-processing-refarch/blob/master/kinesis-taxi-stream-producer/src/main/java/com/amazonaws/flink/refarch/utils/TaxiEventReader.java
they are not using FileSource.

Now I got it. Thanks, Martijn.

On Wed, Jan 25, 2023 at 9:07 PM Martijn Visser <ma...@apache.org>
wrote:

> Hi Sriram G,
>
> Both the DataStream and Table API support filesystem as a source in
> unbounded (streaming mode) with exactly once guarantees. This is documented
> at
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/
> and
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/
>
> Best regards,
>
> Martijn
>
> Op di 24 jan. 2023 om 10:26 schreef Sriram Ganesh <sr...@gmail.com>:
>
>> Hi Everyone,
>>
>> I am thinking of switching my input source from Kafka to S3. First, I
>> couldn't find any streaming source connector for S3. I have some basic
>> questions about
>>
>> 1. How S3 will work as a streaming source with proper checkpointing.?
>> 2. How Flink will manage the last offset processed from a file?
>> 3. Is exactly_once possible while using S3 as a streaming source?
>> 4. What could be the pros and cons of using an S3 kind of storage as a
>> streaming source?
>>
>> Any help would be appreciated. Thanks in advance.
>>
>> Thanks,
>> --
>> *Sriram G*
>> *Tech*
>>
>>

-- 
*Sriram G*
*Tech*

Re: Using S3 as stream source in Flink

Posted by Martijn Visser <ma...@apache.org>.

Hi Sriram G,

Both the DataStream and Table API support filesystem as a source in
unbounded (streaming mode) with exactly once guarantees. This is documented
at
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/
and
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/

Best regards,

Martijn

Op di 24 jan. 2023 om 10:26 schreef Sriram Ganesh <sr...@gmail.com>:

> Hi Everyone,
>
> I am thinking of switching my input source from Kafka to S3. First, I
> couldn't find any streaming source connector for S3. I have some basic
> questions about
>
> 1. How S3 will work as a streaming source with proper checkpointing.?
> 2. How Flink will manage the last offset processed from a file?
> 3. Is exactly_once possible while using S3 as a streaming source?
> 4. What could be the pros and cons of using an S3 kind of storage as a
> streaming source?
>
> Any help would be appreciated. Thanks in advance.
>
> Thanks,
> --
> *Sriram G*
> *Tech*
>
>