You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Hao Jiang <ha...@uchicago.edu> on 2016/12/14 19:00:02 UTC

Question about RLE and DELTA encoding

Dear Dev team,

I have a question several days ago about RLE and DELTA encoding in 
Carbon. Thank you for pointing me the source code of the implementation.

I have read through the code, and have the following understanding. 
Could you please double confirm whether they are correct? Thanks!

1. RLE encoding only applies to columns with Encoding.DICTIONARY enabled 
and has cardinality less than the parameter 
CarbonCommonConstants.HIGH_CARDINALITY_VALUE.

I saw that the RLE encoding is applied to data in function 
/BlockIndexerStorageForInt.compressDataMyOwnWay, /and is controlled by 
/aggKeyBlock/, of which the value is set by /arrangeUniqueBlockType/.

If my understanding is correct, could you please share some reasons you 
design the logic like this?

2. DELTA encoding is implemented in 
/ValueCompressionUtil.getCompressedValues. /It doesn't do a sequential 
DELTA encoding, e.g., for a list of numbers a,b,c..., encode them as a, 
b-a, c-b...//Instead, it does a max-delta encoding. e.g., for a,b,c..., 
assume the max value is M, encode them as M-a, M-b, M-c.

Could you please also share the thought why you choose to use this 
encoding?

Thanks!

Regards,

Hao Jiang



Re: Question about RLE and DELTA encoding

Posted by "k.ashok" <k....@huawei.com>.
Hi Hao Jiang
Regarding your first question why RLE is controlled by aggKeyBlock. 
There is dictionary and no-dictionary column type in carbon. 
carbon sort the column data and then store it. Due to sorting index will get
shuffled. Hence
for no dictionary data RLE is applied on index and not on data because it is
no dictionary data.
thus in BlockIndexerStorageForInt@compressMyOwnWay, RLE happens on index.
compressDataMyOwnWay
is done only for dictionary data.

Regarding your second question
Measure data are not sorted and hence sequential delta may be either big or
small
for e.g
if data is 2,-3,4,-6 then sequential delta will be(-5,7,-10,-6)
Other then max min delta, we do type conversion also to reduce storage space



--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Question-about-RLE-and-DELTA-encoding-tp4441p4451.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.