You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Nick Muerdter (JIRA)" <ji...@apache.org> on 2016/03/21 21:47:25 UTC

[jira] [Created] (KYLIN-1513) Time partitioning doesn't work across multiple days

Nick Muerdter created KYLIN-1513:
------------------------------------

             Summary: Time partitioning doesn't work across multiple days
                 Key: KYLIN-1513
                 URL: https://issues.apache.org/jira/browse/KYLIN-1513
             Project: Kylin
          Issue Type: Bug
    Affects Versions: v1.5.0
            Reporter: Nick Muerdter


I was attempting to use the new time partition functionality added in v1.5.0 (https://issues.apache.org/jira/browse/KYLIN-1427). I realize this hasn't been added to the interface yet (https://issues.apache.org/jira/browse/KYLIN-1441), so I'm not sure if this is feature is completely ready yet, but I thought I'd note the issue I ran into:

- Using the API, I defined the `partition_time_column` and `partition_time_format` attributes on my model:
    {code}
{
  "partition_desc": {
    "partition_date_column": "DEFAULT.LOGS.CAL_DATE",
    "partition_date_format": "yyyy-MM-dd",
    "partition_date_column": "DEFAULT.LOGS.CAL_HOUR",
    "partition_date_format": "H",
    "partition_date_start": null,
    "partition_type": "APPEND"
  }
}
    {code}

- I then attempted to build the cube for the first time over a multi-day duration (2010-08-16 to 2010-11-01).
- The cube reported successfully building, but the resulting cube contained no data.

When I looked at the Kylin logs, it looks like the SQL generated by the time partition won't work if the query spans multiple days. In my case, the WHERE clause used for 2010-08-16 to 2010-11-01 was:

{code}
WHERE (LOGS.CAL_DATE >= '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01' AND LOGS.CAL_HOUR >= '0' AND LOGS.CAL_HOUR < '0')
{code}

The issues seems to be that the hour condition is ANDed to the end of the existing date range query: https://github.com/apache/kylin/blob/kylin-1.5.0/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L170-L174 However, this logic doesn't work when the date range spans multiple days. In this case, it's trying to match where the hour column is both >= 0 and < 0 (since I was processing midnight to midnight), which will never match any results. However, if I switched my cube build to end at 02:00, I believe this would lead to some results being built in the cube, but not what's expected (it would only process 00:00-02:00 on each day.

So while I think the current implementation will work as long as you're building less than 24 hours of data at a time, it would be nice if this could still support multiple-day builds when this time partition is also present.

I think when a separate time column is present, the SQL generated would need to be something more like:

{code}
WHERE
  (LOGS.CAL_DATE = '2010-08-16' AND LOGS.CAL_HOUR >= '0') OR
  (LOGS.CAL_DATE > '2010-08-16' AND LOGS.CAL_DATE < '2010-11-01') OR
  (LOGS.CAL_DATE = '2010-11-01' AND LOGS.CAL_HOUR < '0')
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)