You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/18 02:41:00 UTC

[jira] [Commented] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

    [ https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346544#comment-17346544 ] 

ASF GitHub Bot commented on KYLIN-4941:
---------------------------------------

hit-lacus commented on a change in pull request #1631:
URL: https://github.com/apache/kylin/pull/1631#discussion_r633995459



##########
File path: core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java
##########
@@ -2709,7 +2709,11 @@ public int getDistCPMaxMapNum(){
         return Integer.valueOf(getOptional("kylin.storage.distcp-max-map-num", "50"));
     }
 
-    public String getKylinDictCacheStrength(){
+    public String getKylinDictCacheStrength() {
         return getOptional("kylin.dict.cache.strength", "soft");
-    };
+    }
+
+    public boolean encodeBaseCuboidColumnByColumn() {

Review comment:
       Could you please kindly add some explanation of this `property`? Maybe some suggestion about in which case should we enable this feature. Or explain it main principle.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Support encoding raw data to base cuboid column-by-column
> ---------------------------------------------------------
>
>                 Key: KYLIN-4941
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4941
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v3.1.1
>            Reporter: ShengJun Zheng
>            Assignee: ShengJun Zheng
>            Priority: Major
>             Fix For: v3.1.3
>
>
> When building with spark engine, the first step is to encode hive table's row to base cuboid data.
> The existing implementation is encoding row by row. If the cube has several dictionary encoded measures, it has to use all dictionaries at the same time to encode a single row. This causes heavy memory usage, and low cache hit ratio of dictionary cache.
> We optimized this case by encoding column by column, and it did bring significant improvement over cubes with several high cardinality dictionaries-encoded measures.
> We will refine the implementation based on KYLIN3.x and share it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)