You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Guo Kai <gu...@gmail.com> on 2017/02/10 14:37:01 UTC

AO and Parquet Format

Hi, guys!

I want to ask more detail about AO and Parquet Format in HAWQ.

As we know, in PostgreSQL, tuples is organized one by one in a fixed size
block when the table format is heap. What about AO and Parquet?

Thanks for any advice!:)

Re: AO and Parquet Format

Posted by Lili Ma <li...@apache.org>.
AO format is organized in Row level.  And the data is organized in block
level, inside each block there are block header describing metadata, and
block content storing the actual data inside this block.   Most of the data
are represented in MemTuple.  You can specify blocksize when defining a AO
table.

Parquet format is organized in Row-Column Level, and Parquet table format
in HAWQ is compatible with open source Parquet. The concept is
rowgroup->columnChunk->columnPage.  Each rowgroup stores multiple rows;
Inside each rowgroup, the data is organized by column, each column maps to
a columnChunk; A column Chunk is constructed by one or multiple columnPage.
  You can specify RowGroupSize and PageSize which maps to the max size of
RowGroup and ColumnPage when defining a Parquet table.

If you are interested, you can refer to these two files for detailed
information:
1. AO table: src/backend/access/appendonly/appendonlyam.c
2. Parquet table: src/backend/access/parquet/parquetam.c


Best Regards,
Lili

2017-02-13 14:17 GMT+08:00 Ma Hongxu <in...@outlook.com>:

> Briefly, parquet format is organized by row groups, each row group is
> Column-oriented. And AO is Row-oriented, I guess it's very similar to PG
> heap format.
>
> Do you want the detailed introduction of format?
> Seems it doesn't have a details format wiki/doc of hawq, maybe Lili Ma
> have some resources.
>
> I am very interested in it also, let's discuss it in this mail-list in
> future.
> Thank you!
>
>
> 在 10/02/2017 22:37, Guo Kai 写道:
>
> Hi, guys!
>
> I want to ask more detail about AO and Parquet Format in HAWQ.
>
> As we know, in PostgreSQL, tuples is organized one by one in a fixed size
> block when the table format is heap. What about AO and Parquet?
>
> Thanks for any advice!:)
>
>
> --
> Regards,
> Hongxu.
>
>

Re: AO and Parquet Format

Posted by Ma Hongxu <in...@outlook.com>.
Briefly, parquet format is organized by row groups, each row group is Column-oriented. And AO is Row-oriented, I guess it's very similar to PG heap format.

Do you want the detailed introduction of format?
Seems it doesn't have a details format wiki/doc of hawq, maybe Lili Ma have some resources.

I am very interested in it also, let's discuss it in this mail-list in future.
Thank you!

在 10/02/2017 22:37, Guo Kai 写道:
Hi, guys!

I want to ask more detail about AO and Parquet Format in HAWQ.

As we know, in PostgreSQL, tuples is organized one by one in a fixed size block when the table format is heap. What about AO and Parquet?

Thanks for any advice!:)


--
Regards,
Hongxu.