You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2019/12/18 13:19:50 UTC

[GitHub] [kylin] zhoukangcn opened a new pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

zhoukangcn opened a new pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005
 
 
   see: https://issues.apache.org/jira/browse/kylin-4185

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] zhoukangcn commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
zhoukangcn commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005#issuecomment-570980659
 
 
   @nichunen Could you help to review this? Thank you

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] zhoukangcn edited a comment on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
zhoukangcn edited a comment on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005#issuecomment-575977670
 
 
   @nichunen 
   doc for KYLIN-4185, please see : https://github.com/apache/kylin/pull/1071

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] nichunen merged pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
nichunen merged pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] nichunen commented on a change in pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
nichunen commented on a change in pull request #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005#discussion_r367896842
 
 

 ##########
 File path: engine-mr/src/main/java/org/apache/kylin/engine/mr/CubingJob.java
 ##########
 @@ -362,4 +369,63 @@ public long findCubeSizeBytes() {
         return Long.parseLong(findExtraInfoBackward(CUBE_SIZE_BYTES, "0"));
     }
 
+    public List<Double> findEstimateRatio(CubeSegment seg, KylinConfig config) {
+        CubeInstance cubeInstance = seg.getCubeInstance();
+        CuboidScheduler cuboidScheduler = cubeInstance.getCuboidScheduler();
+        List<List<Long>> layeredCuboids = cuboidScheduler.getCuboidsByLayer();
+        int totalLevels = cuboidScheduler.getBuildLevel();
+
+        List<Double> result = Lists.newArrayList();
+
+        Map<Long, Double> estimatedSizeMap;
+
+        String cuboidRootPath = getCuboidRootPath(seg, config);
+
+        try {
+            estimatedSizeMap = new CubeStatsReader(seg, config).getCuboidSizeMap(true);
+        } catch (IOException e) {
+            logger.warn("Cannot get segment {} estimated size map", seg.getName());
+
+            return null;
+        }
+
+        for (int level = 0; level <= totalLevels; level++) {
+            double levelEstimatedSize = 0;
+            for (Long cuboidId : layeredCuboids.get(level)) {
+                levelEstimatedSize += estimatedSizeMap.get(cuboidId) == null ? 0.0 : estimatedSizeMap.get(cuboidId);
+            }
+
+            double levelRealSize = getRealSizeByLevel(cuboidRootPath, level);
+
+            if (levelEstimatedSize == 0.0 || levelRealSize == 0.0){
+                result.add(level, -1.0);
+            } else {
+                result.add(level, levelRealSize / levelEstimatedSize);
+            }
+        }
+
+        return result;
+    }
+
+
+    private double getRealSizeByLevel(String rootPath, int level) {
+        try {
+            String levelPath = JobBuilderSupport.getCuboidOutputPathsByLevel(rootPath, level);
+            FileSystem fs = HadoopUtil.getFileSystem(levelPath);
+            return fs.getContentSummary(new Path(levelPath)).getLength() / (1024L * 1024L);
 
 Review comment:
   {long} / {long} can not return double

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] zhoukangcn commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
zhoukangcn commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005#issuecomment-575977670
 
 
   @nichunen please see : https://github.com/apache/kylin/pull/1071

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [kylin] nichunen commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments

Posted by GitBox <gi...@apache.org>.
nichunen commented on issue #1005: KYLIN-4185: optimize CuboidSizeMap by using historical segments
URL: https://github.com/apache/kylin/pull/1005#issuecomment-575605550
 
 
   @zhoukangcn Hi, would you please add another pr for the doc of this config?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services