You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2021/04/02 03:41:00 UTC

[jira] [Updated] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

     [ https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaoxiang Yu updated KYLIN-4941:
--------------------------------
    Fix Version/s:     (was: v3.1.2)
                   v3.2.0

> Support encoding raw data to base cuboid column-by-column
> ---------------------------------------------------------
>
>                 Key: KYLIN-4941
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4941
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v3.1.1
>            Reporter: ShengJun Zheng
>            Priority: Major
>             Fix For: v3.2.0
>
>
> When building with spark engine, the first step is to encode hive table's row to base cuboid data.
> The existing implementation is encoding row by row. If the cube has several dictionary encoded measures, it has to use all dictionaries at the same time to encode a single row. This causes heavy memory usage, and low cache hit ratio of dictionary cache.
> We optimized this case by encoding column by column, and it did bring significant improvement over cubes with several high cardinality dictionaries-encoded measures.
> We will refine the implementation based on KYLIN3.x and share it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)