You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by "Cao, Lionel (JIRA)" <ji...@apache.org> on 2017/04/12 07:13:41 UTC
[jira] [Commented] (CARBONDATA-910) Implement Partition feature
[ https://issues.apache.org/jira/browse/CARBONDATA-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965484#comment-15965484 ]
Cao, Lionel commented on CARBONDATA-910:
----------------------------------------
2017-04-12 notes:
1. list partitioning should support value group; partition by list area((China, India), (England, France), (America, Canada))
2. support add and delete, maybe rebuild in future, delete partition will delete data also;
3. data store prefer option 2, and use partitionId as taskId;
4. single level partition for first version, no composite partitioning
> Implement Partition feature
> ---------------------------
>
> Key: CARBONDATA-910
> URL: https://issues.apache.org/jira/browse/CARBONDATA-910
> Project: CarbonData
> Issue Type: New Feature
> Components: core, data-load, data-query
> Reporter: Cao, Lionel
> Assignee: Cao, Lionel
>
> Why need partition table
> Partition table provide an option to divide table into some smaller pieces.
> With partition table:
> 1. Data could be better managed, organized and stored.
> 2. We can avoid full table scan in some scenario and improve query performance. (partition column in filter,
> multiple partition tables join in the same partition column etc.)
> Partitioning design
> Range Partitioning
> range partitioning maps data to partitions according to the range of partition column values, operator '<' defines non-inclusive upper bound of current partition.
> List Partitioning
> list partitioning allows you map data to partitions with specific value list
> Hash Partitioning
> hash partitioning maps data to partitions with hash algorithm and put them to the given number of partitions
> Composite Partitioning(2 levels at most for now)
> Range-Range, Range-List, Range-Hash, List-Range, List-List, List-Hash, Hash-Range, Hash-List, Hash-Hash
> DDL-Create
> Create table sales(
> itemid long,
> logdate datetime,
> customerid int
> ...
> ...)
> [partition by range logdate(...)]
> [subpartition by list area(...)]
> Stored By 'carbondata'
> [tblproperties(...)];
> range partition:
> partition by range logdate(< '2016-01-01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01')
> list partition:
> partition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania')
> hash partition:
> partition by hash(itemid, 9)
> composite partition:
> partition by range logdate(< '2016- -01', < '2017-01-01', < '2017-02-01', < '2017-03-01', < '2099-01-01')
> subpartition by list area('Asia', 'Europe', 'North America', 'Africa', 'Oceania')
> DDL-Rebuild, Add
> Alter table sales rebuild partition by (range|list|hash)(...);
> Alter table salse add partition (< '2018-01-01'); #only support range partitioning, list partitioning
> Alter table salse add partition ('South America');
> #Note: No delete operation for partition, please use rebuild.
> If need delete data, use delete statement, but the definition of partition will not be deleted.
> Partition Table Data Store
> [Option One]
> Use the current design, keep partition folder out of segments
> Fact
> |___Part0
> | |___Segment_0
> | |___ *******-[bucketId]-.carbondata
> | |___ *******-[bucketId]-.carbondata
> | |___Segment_1
> | ...
> |___Part1
> | |___Segment_0
> | |___Segment_1
> |...
> [Option Two]
> remove partition folder, add partition id into file name and build btree in driver side.
> Fact
> |___Segment_0
> | |___ *******-[bucketId]-[partitionId].carbondata
> | |___ *******-[bucketId]-[partitionId].carbondata
> |___Segment_1
> |___Segment_2
> ...
> Pros & Cons:
> Option one would be faster to locate target files
> Option two need to store more metadata of folders
> Partition Table MetaData Store
> partitioni info should be stored in file footer/index file and load into memory before user query.
> Relationship with Bucket
> Bucket should be lower level of partition.
> Partition Table Query
> Example:
> Select * from sales
> where logdate <= date '2016-12-01';
> User should remember to add a partition filter when write SQL on a partition table.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)