You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by sky <x_...@163.com> on 2017/12/04 07:58:14 UTC
Parquet Data File Name
Hi all,
What is the relationship between the name of the parquet data file in HDFS and each time insert? What is the definition format of the name of the data file? Can you customize the name of the corresponding data file for each insert?
Re: Parquet Data File Name
Posted by Sailesh Mukil <sa...@cloudera.com>.
Hi Sky,
Currently I don't think it's possible to customize file names automatically
with each insert (someone can correct me if I'm wrong). As for the filename
convention, it's basically:
<fragment instance
id>_<unique_number>_data.<file_number_written_by_the_same_sink>.parq
Code references:
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-table-sink.cc#L229-L245
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-table-sink.cc#L346-L348
- Sailesh
On Sun, Dec 3, 2017 at 11:58 PM, sky <x_...@163.com> wrote:
> Hi all,
> What is the relationship between the name of the parquet data file in
> HDFS and each time insert? What is the definition format of the name of the
> data file? Can you customize the name of the corresponding data file for
> each insert?