You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by sky <x_...@163.com> on 2017/12/04 07:58:14 UTC

Parquet Data File Name

Hi all,
    What is the relationship between the name of the parquet data file in HDFS and each time insert? What is the definition format of the name of the data file? Can you customize the name of the corresponding data file for each insert?

Re: Parquet Data File Name

Posted by Sailesh Mukil <sa...@cloudera.com>.
Hi Sky,

Currently I don't think it's possible to customize file names automatically
with each insert (someone can correct me if I'm wrong). As for the filename
convention, it's basically:
<fragment instance
id>_<unique_number>_data.<file_number_written_by_the_same_sink>.parq

Code references:
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-table-sink.cc#L229-L245

https://github.com/apache/impala/blob/master/be/src/exec/hdfs-table-sink.cc#L346-L348

- Sailesh

On Sun, Dec 3, 2017 at 11:58 PM, sky <x_...@163.com> wrote:

> Hi all,
>     What is the relationship between the name of the parquet data file in
> HDFS and each time insert? What is the definition format of the name of the
> data file? Can you customize the name of the corresponding data file for
> each insert?