You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Kirill Polunin <kp...@griddynamics.com> on 2020/05/13 10:04:53 UTC

Reading parquet from S3

Hi,
I'm writing an article about Apache Flink. I wanted to ask, does Apache
Flink supports reading Parquet files in Batching mode? Our team is going to
read parquet files from S3. I didn't find such page in flink's tutorials.
I will be grateful if you help, thank you


-- 

[image: Brandmark_small.jpg]

Kirill Polunin, BigData Engineer

Grid Dynamics

Szczytnicka 11, Wroclaw, Poland

Dir: (+48) 535-073-641 | skype: imbamrgrey

Read Grid Dynamics' Tech Blog
<http://blog.griddynamics.com/?utm_campaign=Big%20Data%20Blog%20social%20media%20promotion&utm_medium=CTA&utm_source=Email>

This email message (and any attachments) is confidential and may be
privileged or otherwise protected from disclosure by applicable law. If you
are not the intended recipient or have received this in error please notify
the system manager, security@griddynamics.com and remove this message and
any attachments from your system. Any unauthorized dissemination, copying
or other use of this message and/or any attachments is strictly prohibited
and may constitute a breach of civil or criminal law. Grid Dynamics may
monitor email traffic data and also the content of email.

Re: Reading parquet from S3

Posted by Jingsong Li <ji...@gmail.com>.
Hi Kirill,

For DataSet API, yes, we have "ParquetRowInputFormat",
"ParquetPojoInputFormat" and "ParquetMapInputFormat" [1].

For file system parquet files in table:
- Before 1.11, you can register a ParquetTableSource to batch table
environment with legacy planner.
- In 1.11, you can create file system table with parquet format, and
reading from SQL.

For hive in table, we support reading table with parquet format.

[1]
https://github.com/apache/flink/tree/38e5e8161a9c763cf7df3b642830b5a97371bb00/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet

Best,
Jingsong Lee

On Wed, May 13, 2020 at 7:26 PM Kirill Polunin <kp...@griddynamics.com>
wrote:

> Hi,
> I'm writing an article about Apache Flink. I wanted to ask, does Apache
> Flink supports reading Parquet files in Batching mode? Our team is going to
> read parquet files from S3. I didn't find such page in flink's tutorials.
> I will be grateful if you help, thank you
>
>
> --
>
> [image: Brandmark_small.jpg]
>
> Kirill Polunin, BigData Engineer
>
> Grid Dynamics
>
> Szczytnicka 11, Wroclaw, Poland
>
> Dir: (+48) 535-073-641 | skype: imbamrgrey
>
> Read Grid Dynamics' Tech Blog
> <
> http://blog.griddynamics.com/?utm_campaign=Big%20Data%20Blog%20social%20media%20promotion&utm_medium=CTA&utm_source=Email
> >
>
> This email message (and any attachments) is confidential and may be
> privileged or otherwise protected from disclosure by applicable law. If you
> are not the intended recipient or have received this in error please notify
> the system manager, security@griddynamics.com and remove this message and
> any attachments from your system. Any unauthorized dissemination, copying
> or other use of this message and/or any attachments is strictly prohibited
> and may constitute a breach of civil or criminal law. Grid Dynamics may
> monitor email traffic data and also the content of email.
>


-- 
Best, Jingsong Lee