You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/06 12:37:00 UTC

[jira] [Commented] (KYLIN-4010) Auto adjust offset according to query server's timezone for time derived column

    [ https://issues.apache.org/jira/browse/KYLIN-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989714#comment-16989714 ] 

ASF subversion and git services commented on KYLIN-4010:
--------------------------------------------------------

Commit f2bea280d12061b2fb5f39f5b301579178d70e7a in kylin's branch refs/heads/master from Xiaoxiang Yu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=f2bea28 ]

KYLIN-4167 Phase2 (#961)

* KYLIN-4167 Phase2

- Fix bugs when cubing job been dropped
- Drop former segment info when submit new one
- Add new logfile for streaming coordinator more log to let diagnostic easier.

* KYLIN-4010 Auto ajust timezone

* KYLIN-4010 Fix segment purge bugs


> Auto adjust offset according to query server's timezone for time derived column
> -------------------------------------------------------------------------------
>
>                 Key: KYLIN-4010
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4010
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Others
>    Affects Versions: v3.0.0-alpha
>            Reporter: zengrui
>            Assignee: Xiaoxiang Yu
>            Priority: Minor
>             Fix For: v3.0.0
>
>         Attachments: image-2019-07-15-17-15-31-209.png, image-2019-07-15-17-17-04-029.png, image-2019-07-15-17-17-39-568.png, image-2019-09-22-16-35-23-663.png, image-2019-09-22-16-43-19-248.png, image-2019-09-22-20-28-02-402.png, image-2019-09-22-20-35-19-038.png, image-2019-09-22-20-53-34-720.png, image-2019-09-24-17-23-42-451.png, image-2019-09-24-17-33-29-824.png, image-2019-09-24-17-35-47-589.png, image-2019-09-24-17-36-39-611.png
>
>
> h2. Backgroud
> In realtime OLAP, we index real-time event in streaming receiver. We know that each event must contains a timestamp column (we often call it event time), that value should represent when this event was produced. Because event maybe come from different timezone and use local timezone is always *error-prone*, so we recommend to use a {color:#DE350B}GMT+0{color} timestamp(System.currentTimeMillis()) to avoid such issue.
> I think this is good by design, it is easy to understand and always correct. But the *side effect* is that, the end user(business manager behind a BI tools) are unhappy because he have to use GMT+0 with date/time related filter in SQL and should understand the result should be *shifted* with his local timezone. It is not user-firendly and inconvenient for normal user. Because user may compare query result from different data source and compare them and summarize, use GMT+0 may trouble them.
> h2. Example
> For example, kylin user work in *GMT+8* (maybe in Shanghai) want to know some metrics which occured from {color:#DE350B}2019-09-01 12:00:00{color} to {color:#DE350B}2019-09-01 14:00:00{color} in his {color:#DE350B}local timezone{color}, so he has to {color:#DE350B}rewrite{color} his query (with eight hour offset) to following: 
> {code:sql}
> select hour_start, count(*)
> from realtime_table
> where hour_start >= "2019-09-01 04:00:00" and hour_start < "2019-09-01 06:00:00"  
> group by hour_start
> {code}
> And he will get result like :
> ||hour_start ||count||
> |2019-09-01 04:00:00	|139202|
> |2019-09-01 05:00:00	|89398|
> And he must convert to a more meaningful result in his mind, it is realy annoying!
> ||hour_start ||count||
> |2019-09-01 12:00:00	|139202|
> |2019-09-01 13:00:00	|89398|
> h2. Desgin
> We should not change the way receiver index event, event time should be stored in UTC timestamp. We should auto rewrite sql's event time related filter. 
> In kylin, filter condition in where clause will be convert to a *TupleFilter*, and it looks like *RelNode* in Apache Calicate.
> For where hour_start >= "2019-09-01 12:00:00" and hour_start < "2019-09-01 14:00:00", we will send TupleFilter to streaming receiver or region server which looks like this:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 12:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 14:00:00"
>       timestamp
> {noformat}
> But for streaming query, we want to change each ConstantTupleFilter and minus value for that timestamp. So the TupleFilter which be sent will be following:
> {noformat}
> AND
>   GreatThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 04:00:00"
>       timestamp
>   LessThanOrEqual
>     hout_start
>     CAST
>       "2019-09-01 06:00:00"
>       timestamp
> {noformat}
> Before query result processed by *OLAPEnumerator*,  kylin will plus each value of time derived column, thus protect row from be filtered by calcite generated code.
> So, user will get what he want in his timezone without any burden.
> h2. How to use
> To enable auto shift by time zone, please set {color:#DE350B}kylin.stream.auto.just.by.timezone{color} to true.
> You can specific time zone by {color:#DE350B}kylin.web.timezone{color}, otherwise, time zone will be auto detected.
> Only *time derived column* will be affected.
> h2. Related Issue
> Originally, the event time can only in the format of a long value (UTC timestamp). But in some case, the event time is in a format of "yyyy-MM-dd HH:mm:ss", we use a new class DateTimeParser(introduced in KYLIN-4001) to convert such format into a UTC timestamp.
> h3. Old Describe
> In Real-Time Streaming Cube when I send some records to kafka topic, the tmestamp for the record is 2019-01-01 00:00:00.000, but kylin create a segment named 20181231160000_20181231170000.
> Then I found that TimeZone is hard-coded to "GMT" in function makeSegmentName for class CubeSegment. I think that it should be config in kylin.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)