You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by manish gupta <to...@gmail.com> on 2018/11/05 06:14:33 UTC

[Feature Proposal] Proposal for offline and DDL local dictionary support

Hi Dev

Currently we are supporting LOCAL DICTIONARY feature during data load
operation. The feature is very helpful in terms that it reduces the store
size which helps is reducing the IO thereby enhancing the query performance.
*This proposal is to extend LOCAL DICTIONARY feature and provide a separate
DDL and offline support for this feature. This is will make this feature
usage more flexible. The reason for proposing this feature is*:

1. DDL support which can enable stores without local dictionary to add this
feature for the already loaded data. This can be helpful for customers to
leverage the functionality of LOCAL  DICTIONARY  feature for their data
which is written in carbondata format without local dictionary.
2. We know that when Local dictionary is enabled, though small but there is
degrade in data load performance. So there can be applications/customers
who want to fine tune the loaded data in off-peak time. This feature can be
helpful for those kind of scenarios.
3. Offline support is proposed for SDK like features where In we do not
have spark driver executor model and there can be only a single thread used
for loading data. So for this scenario we can provide an offline support
thereby not impacting the existing data load performance.

Please let me know your suggestions for this proposal. If most of the
community members feel the idea is good and it will make the usage of this
feature more flexible I can come up with a design and further discuss on
this platform.

Regards
Manish Gupta

RE: [Feature Proposal] Proposal for offline and DDL localdictionary support

Posted by xuchuanyin <xu...@hust.edu.cn>.
Does local dictionary harm the performance so that they want to disable it for some specific columns?

Sent from laptop

From: xubo245
Sent: Monday, November 12, 2018 10:05 AM
To: dev@carbondata.apache.org
Subject: Re: [Feature Proposal] Proposal for offline and DDL localdictionary support

SDK has supported local dictionary:
org.apache.carbondata.sdk.file.CarbonWriterBuilder#localDictionaryThreshold
org.apache.carbondata.sdk.file.CarbonWriterBuilder#enableLocalDictionary

But don't support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE. 
I think we should support it.  There are some users want to use
LOCAL_DICTIONARY_EXCLUDE.





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [Feature Proposal] Proposal for offline and DDL local dictionary support

Posted by xubo245 <60...@qq.com>.
SDK has supported local dictionary:
org.apache.carbondata.sdk.file.CarbonWriterBuilder#localDictionaryThreshold
org.apache.carbondata.sdk.file.CarbonWriterBuilder#enableLocalDictionary

But don't support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE. 
I think we should support it.  There are some users want to use
LOCAL_DICTIONARY_EXCLUDE.





--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Feature Proposal] Proposal for offline and DDL local dictionary support

Posted by Jacky Li <ja...@qq.com>.
+1
Yes, I think SDK should provide local dictionary support also.

Regards,
Jacky

> 在 2018年11月5日,下午2:14,manish gupta <to...@gmail.com> 写道:
> 
> Hi Dev
> 
> Currently we are supporting LOCAL DICTIONARY feature during data load
> operation. The feature is very helpful in terms that it reduces the store
> size which helps is reducing the IO thereby enhancing the query performance.
> *This proposal is to extend LOCAL DICTIONARY feature and provide a separate
> DDL and offline support for this feature. This is will make this feature
> usage more flexible. The reason for proposing this feature is*:
> 
> 1. DDL support which can enable stores without local dictionary to add this
> feature for the already loaded data. This can be helpful for customers to
> leverage the functionality of LOCAL  DICTIONARY  feature for their data
> which is written in carbondata format without local dictionary.
> 2. We know that when Local dictionary is enabled, though small but there is
> degrade in data load performance. So there can be applications/customers
> who want to fine tune the loaded data in off-peak time. This feature can be
> helpful for those kind of scenarios.
> 3. Offline support is proposed for SDK like features where In we do not
> have spark driver executor model and there can be only a single thread used
> for loading data. So for this scenario we can provide an offline support
> thereby not impacting the existing data load performance.
> 
> Please let me know your suggestions for this proposal. If most of the
> community members feel the idea is good and it will make the usage of this
> feature more flexible I can come up with a design and further discuss on
> this platform.
> 
> Regards
> Manish Gupta
>