You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Sanoj MG <sa...@gmail.com> on 2017/04/03 06:39:49 UTC

Dimension column of integer type - to exclude from dictionary

Hi All,

I have a dimension column of integer type. Since the cardinality of this
column is relatively high, I want to exclude it from the dictionary for
faster loading. Is there any way to do this in Carbondata DDL?

When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will be
defined as a dimension, but it will also be included in the dictionary.


Thanks,
Sanoj

Re: Dimension column of integer type - to exclude from dictionary

Posted by QiangCai <qi...@qq.com>.
SORT_COLUMNS can add a numeric type column to a dimension without dictionary
encoding. SORT_COLUMNS feature was implemented in 12-dev branch.

Best Regards
David QiangCai 



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p9977.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

Re: Dimension column of integer type - to exclude from dictionary

Posted by Sanoj MG <sa...@gmail.com>.
Hi Liang,

On Tue, Apr 4, 2017 at 2:55 PM, Liang Chen <ch...@gmail.com> wrote:

> Hi Sanoj
>
> First , see if i understand your requirement: you only want to build index
> for column "Account", but don't want to build dictionary for column
> "Account", is it right?
>

Yes this is right. In our ETL pipeline we have many dimension columns /
surrogate keys of integer type. I want to build index for these columns,
will try as David suggested.


> If the above my understanding is right,  then David mentioned
> "SORT_COLUMNS"
> feature will satisfy your requirements.
>
> Currently, you only can do like this :
> First changes column "Account" to String type from Integer, then uses
> TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account')
>

I thought of doing this, but don't really like it since I will have to pad
0's for comparison operators to work. Also, will have to cast it back if I
need to load it into another system.

Another point, in our start schema, there are many low cardinality
surrogate keys of int type as well. These are indeed dimension columns that
need index, but dictionary encoding may not give any benefit.

Thanks,
Sanoj



> Regards
> Liang
>
>
> Sanoj MG wrote
> > Hi All,
> >
> > I have a dimension column of integer type. Since the cardinality of this
> > column is relatively high, I want to exclude it from the dictionary for
> > faster loading. Is there any way to do this in Carbondata DDL?
> >
> > When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will
> be
> > defined as a dimension, but it will also be included in the dictionary.
> >
> >
> > Thanks,
> > Sanoj
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/Dimension-
> column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

Re: Dimension column of integer type - to exclude from dictionary

Posted by Liang Chen <ch...@gmail.com>.
Hi Sanoj

First , see if i understand your requirement: you only want to build index
for column "Account", but don't want to build dictionary for column
"Account", is it right?
If the above my understanding is right,  then David mentioned "SORT_COLUMNS"
feature will satisfy your requirements.

Currently, you only can do like this :
First changes column "Account" to String type from Integer, then uses
TBLPROPERTIES ('DICTIONARY_EXCLUDE'='Account')

Regards
Liang


Sanoj MG wrote
> Hi All,
> 
> I have a dimension column of integer type. Since the cardinality of this
> column is relatively high, I want to exclude it from the dictionary for
> faster loading. Is there any way to do this in Carbondata DDL?
> 
> When I use TBLPROPERTIES ('DICTIONARY_INCLUDE'='Account'), Account will be
> defined as a dimension, but it will also be included in the dictionary.
> 
> 
> Thanks,
> Sanoj





--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p10008.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.