You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2022/12/27 07:21:12 UTC

[GitHub] [kylin] liuzhao-lz commented on a diff in pull request #2049: KYLIN-5371 fix segment prune

liuzhao-lz commented on code in PR #2049:
URL: https://github.com/apache/kylin/pull/2049#discussion_r1057488331


##########
core-common/src/main/java/org/apache/kylin/common/util/DateFormat.java:
##########
@@ -194,4 +202,18 @@ public static boolean isDatePattern(String ptn) {
         return COMPACT_DATE_PATTERN.equals(ptn) || YYYYMMDDHH.equals(ptn) || YYYYMMDDHHMM.equals(ptn)
                 || YYYYMMDDHHMMSS.equals(ptn);
     }
+
+    public static Long getFormatTimeStamp(long time, String pattern) {
+        try {
+            if (StringUtils.isNotBlank(pattern)) {
+                SimpleDateFormat sdf = new SimpleDateFormat(pattern, Locale.getDefault(Locale.Category.FORMAT));
+                sdf.setTimeZone(TimeZone.getTimeZone("GMT"));

Review Comment:
   ![b834587586ad4ff0235117f3887cbb5](https://user-images.githubusercontent.com/49258176/209627680-9689cecb-9e23-4e96-bb41-14c8053dbd34.png)
   



##########
kylin-spark-project/kylin-spark-common/src/main/scala/org/apache/spark/sql/execution/datasource/FilePruner.scala:
##########
@@ -366,7 +366,9 @@ class FilePruner(cubeInstance: CubeInstance,
         val pruned = segDirs.filter {
           e => {
             val tsRange = cubeInstance.getSegment(e.segmentName, SegmentStatusEnum.READY).getTSRange
-            SegFilters(tsRange.startValue, tsRange.endValue, pattern)
+            // tsRange: 20221219000000_20221219010000、20221219010000_20221219020000, pattern: yyyy-MM-dd
+            val start = DateFormat.getFormatTimeStamp(tsRange.startValue, pattern)
+            SegFilters(start, tsRange.endValue, pattern)

Review Comment:
   tsRange.endValue 不需要考虑,DateFormat.getFormatTimeStamp(tsRange.startValue, pattern) 这个的作用是将 startValue 格式化成天的时间戳值(忽略小时),原因是 SegFilters 的 foldFilter 中是按天格式化where 分区字段值的(记作ts),后续比较也是 “ts >= start && ts < end” end 如果是到时间其对应的时间戳一定是大于天的时间戳值因而可以不用处理。
   
   ![2e8dbc69f57fa37e8734d713d2317f2](https://user-images.githubusercontent.com/49258176/209627641-fa8e0f21-3294-4d33-a319-64cdd4a2cc25.png)
   ![3b63ed8ab495cedb6de7628b9c27d0e](https://user-images.githubusercontent.com/49258176/209627653-53c349bb-baf3-4417-a170-07548d7fdf02.png)
   ![f2fa18fd622f6a54a4eeaa3f8501557](https://user-images.githubusercontent.com/49258176/209627662-dcb51cbe-ab73-40e3-a399-7644353ff710.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@kylin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org