You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Cheng Lian <li...@databricks.com> on 2015/12/07 06:52:11 UTC

Re: parquet file doubts

cc parquet-dev list (it would be nice to always do so for these general 
questions.)

Cheng

On 12/6/15 3:10 PM, Shushant Arora wrote:
> Hi
>
> I have few doubts on parquet file format.
>
> 1.Does parquet keeps min max statistics like in ORC. how can I see
> parquet version(whether its1.1,1.2or1.3) for parquet file generated
> using hive or custom MR or AvroParquetoutputFormat.

Yes, Parquet also keeps row group statistics. You may check the Parquet 
file using the parquet-meta CLI tool in parquet-tools (see 
https://github.com/Parquet/parquet-mr/issues/321 for details), then look 
for the "creator" field of the file. For programmatic access, check for 
o.a.p.hadoop.metadata.FileMetaData.createdBy.

>
> 2.how to sort parquet records while generating parquet file using
> avroparquetoutput format?

AvroParquetOutputFormat is not a format. It's just responsible for 
converting Avro records to Parquet records. How are you using 
AvroParquetOutputFormat? Any example snippets?

>
> Thanks


Re: parquet file doubts

Posted by Julien Le Dem <ju...@dremio.com>.
Thanks Cheng!
Here is a useful blog post:
http://grepalex.com/2014/05/13/parquet-file-format-and-object-model/
about 2.

On Sun, Dec 6, 2015 at 9:52 PM, Cheng Lian <li...@databricks.com> wrote:

> cc parquet-dev list (it would be nice to always do so for these general
> questions.)
>
> Cheng
>
> On 12/6/15 3:10 PM, Shushant Arora wrote:
>
>> Hi
>>
>> I have few doubts on parquet file format.
>>
>> 1.Does parquet keeps min max statistics like in ORC. how can I see
>> parquet version(whether its1.1,1.2or1.3) for parquet file generated
>> using hive or custom MR or AvroParquetoutputFormat.
>>
>
> Yes, Parquet also keeps row group statistics. You may check the Parquet
> file using the parquet-meta CLI tool in parquet-tools (see
> https://github.com/Parquet/parquet-mr/issues/321 for details), then look
> for the "creator" field of the file. For programmatic access, check for
> o.a.p.hadoop.metadata.FileMetaData.createdBy.
>
>
>> 2.how to sort parquet records while generating parquet file using
>> avroparquetoutput format?
>>
>
> AvroParquetOutputFormat is not a format. It's just responsible for
> converting Avro records to Parquet records. How are you using
> AvroParquetOutputFormat? Any example snippets?
>
>
>> Thanks
>>
>
>


-- 
Julien