You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Madhukar Thota <ma...@gmail.com> on 2017/05/31 02:10:43 UTC

Amazon Athena

Anyone used used Amazon Athena with Apache Flink?

I have use case where I want to write streaming data ( which is in Avro
format) from kafka to s3 by converting into parquet format and update S3
location with daily partitions on Athena table.

Any guidance is appreciated.

Re: Amazon Athena

Posted by Seth Wiesman <sw...@mediamath.com>.
Seems straight forward. The biggest challenge is that that you don’t want Athena picking up on partially written files or for whatever reason corrupt files. The issue with S3 is you cannot allow Flink to perform delete, truncate, or rename operations because it moves faster than S3 can become consistent. I think the simplest solution would be to use the bucketing sink to write files out to hdfs and then add an additional operator or auxiliary process that will copy them to S3 when they move from pending to complete. If you do this then you will only need at least once copy’s to S3 because overwriting a file with itself is the only consistent overwrite condition.  

Seth  

On 6/6/17, 10:03 AM, "Aljoscha Krettek" <al...@apache.org> wrote:

    Hi,
    
    I don’t have any experience with Athena but this sounds doable. It seems that you only need to have some way of writing into S3 and then Athena will pick up the data in S3 when running queries. Multiple folks have used Flink to write data from Kafka into S3, the most recent case I know from the mailing lists is probably Seth (in cc), could you maybe comment if you find some time?
    
    Best,
    Aljoscha
    
    > On 31. May 2017, at 04:10, Madhukar Thota <ma...@gmail.com> wrote:
    > 
    > Anyone used used Amazon Athena with Apache Flink?
    > 
    > I have use case where I want to write streaming data ( which is in Avro format) from kafka to s3 by converting into parquet format and update S3 location with daily partitions on Athena table.
    > 
    > Any guidance is appreciated.
    > 
    
    


Re: Amazon Athena

Posted by Aljoscha Krettek <al...@apache.org>.
Hi,

I don’t have any experience with Athena but this sounds doable. It seems that you only need to have some way of writing into S3 and then Athena will pick up the data in S3 when running queries. Multiple folks have used Flink to write data from Kafka into S3, the most recent case I know from the mailing lists is probably Seth (in cc), could you maybe comment if you find some time?

Best,
Aljoscha

> On 31. May 2017, at 04:10, Madhukar Thota <ma...@gmail.com> wrote:
> 
> Anyone used used Amazon Athena with Apache Flink?
> 
> I have use case where I want to write streaming data ( which is in Avro format) from kafka to s3 by converting into parquet format and update S3 location with daily partitions on Athena table.
> 
> Any guidance is appreciated.
>