You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by "Ravindra Pesala (JIRA)" <ji...@apache.org> on 2016/08/17 10:58:20 UTC
[jira] [Updated] (CARBONDATA-159) carbon should support primary key & keep mapping table table_property

     [ https://issues.apache.org/jira/browse/CARBONDATA-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ravindra Pesala updated CARBONDATA-159:
---------------------------------------
    Fix Version/s:     (was: 0.1.0-incubating)
                   0.2.0-incubating

> carbon should support primary key & keep mapping table table_property
> ---------------------------------------------------------------------
>
>                 Key: CARBONDATA-159
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-159
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: core, data-load, data-query, format
>    Affects Versions: 0.1.0-incubating
>            Reporter: qiuheng
>              Labels: features
>             Fix For: 0.2.0-incubating
>
>   Original Estimate: 720h
>  Remaining Estimate: 720h
>
> As we know , carbon support MDK index , according the design ,if we have filter or filter combination on the left side columns , we can get a good performance . 
> but if the leading key is a high cardinality column (>100million cardinality etc), only the filter on leading key can gain good performance， the filter on following columns and other high cardinality columns can not , because the they are close to un-sort .
> i suggest we add one key mapping function , the table property will look like : 
> create table (low cardinality column to high cardinality column)
> table_property(
> primary_key h_col3,
> index_key_mapping(h_col1,h_col2)
> )
>                 low cardinality-> high cardinality
> col1,col2,col3,col4.....col10,h_col1,h_col2,h_col3
> during data loading , carbon will create a internal index table A,it will records all the (values --> position) of primary_key,look like:
>         h_col3                           list of block let
>   18682114091        [blockid1+blokletid1],[blockid4+blokletid10]....
>   18683343442        [blockid2+blokletid4],[blockid23+blokletid5]....
>   ...                           .....
> and will create another two key mapping table:
> table 1:
> ---------------------------------------
> h_col2            hcol3
>  jarray        18682114091
>  ramana     18683343442
>   ......              .......
> table2:
> -----------------------------------------
> h_col1            hcol3
>  77647        18682114091
>  99899       18683343442
>   ......              .......
> 1)if the filter on col1-col10, will use original MDK capacity ;
> 2)if the filter on h_col1, system will scan index table to get the block let position , then use it to fetch the data directly; 
> 3)if the filter on h_col2 or h_col3 , system first scan the key mapping table to get the primary key list , then 2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)