You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by "yin.zhb@163.com" <yi...@163.com> on 2016/02/28 15:39:59 UTC

why hawq off columnorientied table by default?

hi,all:
    this days i am testing hawq(1.3.1) ,I got some questions:
by default,hawq off the column_orientied_table,why?

[gpadmin@stars1 test]$ 
[gpadmin@stars1 test]$ psql -U gpadmin -d hawq -f create_table.sql 
psql:create_table.sql:48: ERROR:  Column oriented tables are deprecated. To enable it, set GUC gp_enable_column_oriented_table on.
[gpadmin@stars1 test]$ gpconfig -s gp_enabled_column_orientied_table
20160228:21:45:40:026806 gpconfig:stars1:gpadmin-[ERROR]:-Failed to retrieve GUC information, guc does not exist: gp_enabled_column_orientied_table
[gpadmin@stars1 test]$ gpconfig -s gp_enable_column_oriented_table
Values on all segments are consistent
GUC          : gp_enable_column_oriented_table
Master  value: off
Segment value: off
[gpadmin@stars1 test]$ 



yin.zhb@163.com

Re: why hawq off columnorientied table by default?

Posted by Qi Shao <qs...@pivotal.io>.
Parquet is available as a storage option for hawq internal tables.

Hawq implements column oriented storage with a file per column.

Eg, storing a table with orientation=column in hawq, if there are 20
segments, 1000 columns, and the table has 500 partitions, in total it will
generate about 20*1000*500 files in hdfs. With orientation=parquet, you
only have 20*1000 files. HDFS is not good at handling a huge amount of
small files.


On Sun, Feb 28, 2016 at 9:47 PM Michael André Pearce <
michael.andre.pearce@me.com> wrote:

> Hi Lei,
>
> How come in latest versions of hive they achieve and advocate using column
> orientated tables with orc or parquet, and this isn’t suffered as much?
> Isn’t this how some of the more recent performance improvements have even
> been achieved in hive by using such formats as hive.
>
> Surely having columnar tables is more efficient and would bring
> performance benefits to hawq for analytics workloads which is what in my
> experience the key workload of sql users on hadoop.
>
> Using something like ORC files with compactions would also enable HAWQ to
> support transactions e.g. delete and update operations as is now available
> in Hive.
>
> Cheers
> Mike
>
>
>
>
> On 29 Feb 2016, at 01:19, Lei Chang <le...@apache.org> wrote:
>
> Hi, if column oriented tables are not used properly, it may overwhelm hdfs
> since it might lead to too many files. So it is disabled by default.
>
> Cheers
> Lei
>
>
>
> On Sun, Feb 28, 2016 at 10:39 PM, yin.zhb@163.com <yi...@163.com> wrote:
>
>> hi,all:
>>     this days i am testing hawq(1.3.1) ,I got some questions:
>> by default,hawq off the column_orientied_table,why?
>>
>> [gpadmin@stars1 test]$
>> [gpadmin@stars1 test]$ psql -U gpadmin -d hawq -f create_table.sql
>>
>> psql:create_table.sql:48: ERROR:  Column oriented tables are deprecated. To enable it, set GUC gp_enable_column_oriented_table on.
>> [gpadmin@stars1 test]$ gpconfig -s gp_enabled_column_orientied_table
>>
>> 20160228:21:45:40:026806 gpconfig:stars1:gpadmin-[ERROR]:-Failed to retrieve GUC information, guc does not exist: gp_enabled_column_orientied_table
>> [gpadmin@stars1 test]$ gpconfig -s gp_enable_column_oriented_table
>> Values on all segments are consistent
>> GUC          : gp_enable_column_oriented_table
>> Master  value: off
>> Segment value: off
>> [gpadmin@stars1 test]$
>>
>> ------------------------------
>> yin.zhb@163.com
>>
>
>
>

Re: why hawq off columnorientied table by default?

Posted by Michael André Pearce <mi...@me.com>.
Hi Lei,

How come in latest versions of hive they achieve and advocate using column orientated tables with orc or parquet, and this isn’t suffered as much? Isn’t this how some of the more recent performance improvements have even been achieved in hive by using such formats as hive.

Surely having columnar tables is more efficient and would bring performance benefits to hawq for analytics workloads which is what in my experience the key workload of sql users on hadoop.

Using something like ORC files with compactions would also enable HAWQ to support transactions e.g. delete and update operations as is now available in Hive.

Cheers
Mike



> On 29 Feb 2016, at 01:19, Lei Chang <le...@apache.org> wrote:
> 
> Hi, if column oriented tables are not used properly, it may overwhelm hdfs since it might lead to too many files. So it is disabled by default.
> 
> Cheers
> Lei
> 
> 
> 
> On Sun, Feb 28, 2016 at 10:39 PM, yin.zhb@163.com <ma...@163.com> <yin.zhb@163.com <ma...@163.com>> wrote:
> hi,all:
>     this days i am testing hawq(1.3.1) ,I got some questions:
> by default,hawq off the column_orientied_table,why?
> 
> [gpadmin@stars1 test]$ 
> [gpadmin@stars1 test]$ psql -U gpadmin -d hawq -f create_table.sql 
> psql:create_table.sql:48: ERROR:  Column oriented tables are deprecated. To enable it, set GUC gp_enable_column_oriented_table on.
> [gpadmin@stars1 test]$ gpconfig -s gp_enabled_column_orientied_table
> 20160228:21:45:40:026806 gpconfig:stars1:gpadmin-[ERROR]:-Failed to retrieve GUC information, guc does not exist: gp_enabled_column_orientied_table
> [gpadmin@stars1 test]$ gpconfig -s gp_enable_column_oriented_table
> Values on all segments are consistent
> GUC          : gp_enable_column_oriented_table
> Master  value: off
> Segment value: off
> [gpadmin@stars1 test]$ 
> 
> yin.zhb@163.com <ma...@163.com>


Re: why hawq off columnorientied table by default?

Posted by Lei Chang <le...@apache.org>.
Hi, if column oriented tables are not used properly, it may overwhelm hdfs
since it might lead to too many files. So it is disabled by default.

Cheers
Lei



On Sun, Feb 28, 2016 at 10:39 PM, yin.zhb@163.com <yi...@163.com> wrote:

> hi,all:
>     this days i am testing hawq(1.3.1) ,I got some questions:
> by default,hawq off the column_orientied_table,why?
>
> [gpadmin@stars1 test]$
> [gpadmin@stars1 test]$ psql -U gpadmin -d hawq -f create_table.sql
>
> psql:create_table.sql:48: ERROR:  Column oriented tables are deprecated. To enable it, set GUC gp_enable_column_oriented_table on.
> [gpadmin@stars1 test]$ gpconfig -s gp_enabled_column_orientied_table
>
> 20160228:21:45:40:026806 gpconfig:stars1:gpadmin-[ERROR]:-Failed to retrieve GUC information, guc does not exist: gp_enabled_column_orientied_table
> [gpadmin@stars1 test]$ gpconfig -s gp_enable_column_oriented_table
> Values on all segments are consistent
> GUC          : gp_enable_column_oriented_table
> Master  value: off
> Segment value: off
> [gpadmin@stars1 test]$
>
> ------------------------------
> yin.zhb@163.com
>