You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by a <ww...@163.com> on 2017/04/01 09:06:06 UTC

Re:[DISCUSSION]support new feature: Partition Table

additinal suggestion:
1、support at least two level partition
2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata
3、delete data by partition column



best regards
fish

At 2017-03-31 23:42:07, "QiangCai" <qi...@qq.com> wrote:
>Hi all, 
>
>  Let's start the discussion regarding the partition table.
>
>  To support partition table, what we should do?
>
>  1. create table with partition to support Range Partitioning, Hash
>Partitioning, List Partitioning and Composite Partitioning, write the
>partition info to schema. 
>
>  2. during data loading, re-partition the input data, start a task process
>a partition, write partition information to footer and index file.
>
>  3. during data query, prune B+Tree by partition if the filter contain the
>partition column. or prune data blocks by partition when there is only
>partition column predicate.
>
>  4. optimizer the join performance of two partition tables if partition
>column is the join column.
>
>   Any thoughts, comments and questions ?
>
>   Thanks!
>
>Best Regards
>David
>
>
>
>--
>View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: [DISCUSSION]support new feature: Partition Table

Posted by Jacky Li <ja...@qq.com>.

comments inline

> 在 2017年4月1日，下午5:06，a <ww...@163.com> 写道：
> 
> additinal suggestion:
> 1、support at least two level partition

I think we can let user specify the partition columns, it can be multiple columns together to form a partition key. Is this what you mean by two level partition? Generally speaking, partition on multiple columns usually leads to small file issues, which we may want to avoid.

> 2、build the B+Tree by partition column shoud split the segment and make it small and may speed load data in carbondata

When using partitioning, it will slower down the loading process as it needs shuffle. But benefit is that queries have filter column on partition key will be faster.

> 3、delete data by partition column
> 

This could be a future feature in our roadmap after partition feature is supported.

> 
> 
> best regards
> fish
> 
> At 2017-03-31 23:42:07, "QiangCai" <qi...@qq.com> wrote:
>> Hi all, 
>> 
>> Let's start the discussion regarding the partition table.
>> 
>> To support partition table, what we should do?
>> 
>> 1. create table with partition to support Range Partitioning, Hash
>> Partitioning, List Partitioning and Composite Partitioning, write the
>> partition info to schema. 
>> 
>> 2. during data loading, re-partition the input data, start a task process
>> a partition, write partition information to footer and index file.
>> 
>> 3. during data query, prune B+Tree by partition if the filter contain the
>> partition column. or prune data blocks by partition when there is only
>> partition column predicate.
>> 
>> 4. optimizer the join performance of two partition tables if partition
>> column is the join column.
>> 
>>  Any thoughts, comments and questions ?
>> 
>>  Thanks!
>> 
>> Best Regards
>> David
>> 
>> 
>> 
>> --
>> View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
>> Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.