You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Zhong Yanghong (JIRA)" <ji...@apache.org> on 2018/08/10 09:05:00 UTC
[jira] [Created] (KYLIN-3491) Improve the cube building process
when using global dictionary
Zhong Yanghong created KYLIN-3491:
-------------------------------------
Summary: Improve the cube building process when using global dictionary
Key: KYLIN-3491
URL: https://issues.apache.org/jira/browse/KYLIN-3491
Project: Kylin
Issue Type: Improvement
Reporter: Zhong Yanghong
Assignee: Zhong Yanghong
By current cubing process, if the global dictionary is very large, since the raw data records are unsorted, it's hard to encode raw values into ids for the input of bitmap due to frequent swap of the dictionary slices. We need a refined process. The idea is as follows:
# for each source data block, there will be a mapper generating the distinct values & sort them
# encode the sorted distinct values and generate a shrunken dict for each source data block.
# when building base cuboid, use the shrunken dict for each source data block for encoding.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)