You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/12 23:53:44 UTC

[GitHub] [druid] paul-rogers opened a new issue #11587: TIME_FLOOR does not work, documentation is incomplete

paul-rogers opened a new issue #11587:
URL: https://github.com/apache/druid/issues/11587


   `TIME_FLOOR` does not work, documentation is incomplete
   
   ### Affected Version
   
   Version: 0.21.
   
   ### Description
   
   Consider the "stock" Wikipedia data source and stock Docker Druid cluster. The `TIME_FLOOR` function (and documentation) exhibit a number of problems.
   
   My goal is to round the `__time` column to one hour, eventually to `n` hours (or minutes, etc.)
   
   ### Periods in Documentation Should Be Quoted
   
   The [documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions) describes the `period` argument as:
   
   > Period can be any ISO8601 period, like P3M (quarters) or PT12H (half-days). 
   
   Note that the period examples are unquoted. By contrast, `DATE_TRUNC` says:
   
   > Unit can be 'milliseconds', 'second', 'minute', 'hour', 'day', 'week', 'month', 'quarter', 'year', 'decade', 'century', or 'millennium'.
   
   That is, the examples are quoted. So, it is not clear if the `P1H` should be unquoted (as with `FLOOR`) or quoted. Let's try.
   
   ```sql
   SELECT
     TIME_FLOOR(__time, 'P1H') AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Error:
   
   ```text
   Error: Unknown exception
   
   Error while applying rule DruidQueryRule(SELECT_PROJECT), args [rel#10765:LogicalProject.NONE.[](input=RelSubset#10764,hr=TIME_FLOOR($0, 'P1H'),channel=$2,page=$16), rel#10775:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"wikiticker-2015-09-12-sampled"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["__time","added","channel","cityName","comment","countryIsoCode","countryName","deleted","delta","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","user"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"eefb68de-7cd7-47e8-8056-bd28c8d2296c"},"descending":false,"granularity":{"type":"all"}},signature={__time:LONG, added:LONG, channel:STRING, cityName:STRING, comment:STRING, countryIsoCode:STRING, coun
 tryName:STRING, deleted:LONG, delta:LONG, isAnonymous:STRING, isMinor:STRING, isNew:STRING, isRobot:STRING, isUnpatrolled:STRING, metroCode:STRING, namespace:STRING, page:STRING, regionIsoCode:STRING, regionName:STRING, user:STRING})]
   
   java.lang.RuntimeException
   ```
   
   Maybe unquoted?
   
   ```sql
   SELECT
     TIME_FLOOR(__time, P1H) AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Definitely not. Error:
   
   ```text
   Error: Plan validation failed
   
   org.apache.calcite.runtime.CalciteContextException: From line 2, column 22 to line 2, column 24: Column 'P1H' not found in any table
   
   org.apache.calcite.tools.ValidationException
   ```
   
   ### Role of Time Zone is Unclear
   
   The documentation states:
   
   > Rounds down a timestamp, returning it as a new timestamp. ... The time zone, if provided, should be a time zone name like "America/Los_Angeles" or offset like "-08:00". This function is similar to CEIL but is more flexible.
   
   There is no description of what role the timezone plays. Here is one interpretation:
   
   > The input and output values are UTC-based time zones. The timestamp is converted to the given timezone when computing periods of a day or greater: the local timezone determines when midnight occurs.
   
   It might be worth adding an example, since time conversions are quite confusing.
   
   Suppose I have a PST time of `2012-08-12T17:00:00 -08:00`. Convert this to UTC and get `2012-08-12T01:00:00Z` in the data. Give `TIME_FLOOR` a time zone of `-08:00` and we get our PST time back. We use `P1D` to truncate this to `2012-08-12T17:00:00 -08:00`. Now, convert it back to UTC and we get `2012-08-11T16:00:00Z`.
   
   Thus, though we've rounded to one day in the local time zone, the hours will be non-zero for the UTC time zone (unless the local time zone is UTC.)
   
   ### `TIME_FLOOR` Fails for Two Arguments
   
   This still leaves no explanation for why the first example failed.
   
   The [documentation](https://druid.apache.org/docs/latest/querying/sql.html#time-functions) describes `TIME_FLOOR` as:
   
   > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])`
   
   Yet, our example with two arguments fails.
   
   ### Internal Function Signature Does Not Match Documentation
   
   Let's try something else, give the function a timezone of `'UTC'` to see if that makes it happy:
   
   ```sql
   SELECT
     TIME_FLOOR(__time, 'P1H', 'UTC') AS hr,
     channel,
     page
   FROM "wikiticker-2015-09-12-sampled"
   ```
   
   Error:
   
   ```text
   Error: Plan validation failed
   
   org.apache.calcite.runtime.CalciteContextException: From line 2, column 3 to line 2, column 34: 
   Cannot apply 'TIME_FLOOR' to arguments of type 
   'TIME_FLOOR(<TIMESTAMP(3)>, <CHAR(3)>, <CHAR(3)>)'. 
   Supported form(s): 'TIME_FLOOR(<TIMESTAMP>, <CHARACTER>, <TIMESTAMP>, <CHARACTER>)'
   
   org.apache.calcite.tools.ValidationException
   ```
   
   Notice that that the third and fourth arguments do not match the documentation:
   
   > `TIME_FLOOR(<timestamp_expr>, <period>, [<origin>, [<timezone>]])`
   
   ### Testing Suggestion
   
   This appears to be a case in which the function would benefit from a robust set of unit tests which exercise the options given in the documentation.
   
   As it is, it seems that `TIME_FLOOR` is basically unusable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on issue #11587: TIME_FLOOR documentation is incomplete

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #11587:
URL: https://github.com/apache/druid/issues/11587#issuecomment-898170486


   @jihoonson, thanks for pointing out my error. I've updated the description to be just a documentation request.
   
   Error handling in Java is a struggle for all projects. But, yes, a good "invalid period string" error would probably have set me straight.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cpriest commented on issue #11587: TIME_FLOOR documentation is incomplete

Posted by GitBox <gi...@apache.org>.
cpriest commented on issue #11587:
URL: https://github.com/apache/druid/issues/11587#issuecomment-914546985


   I was just looking for information about this undocumented origin parameter.  I know it's a ticket to update documentation, but what does this origin parameter mean or do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gitmstoute commented on issue #11587: TIME_FLOOR documentation is incomplete

Posted by GitBox <gi...@apache.org>.
gitmstoute commented on issue #11587:
URL: https://github.com/apache/druid/issues/11587#issuecomment-1024666557


   I came across this page also looking for the documentation for `origin` and `timezone` in the `TIME_FLOOR`.
   The druid native query documentation covers some similar terminology in the [Query Granularities](https://druid.apache.org/docs/latest/querying/granularities.html) section:
   
   > Time zone is optional (defaults to UTC). Origin is optional (defaults to 1970-01-01T00:00:00 in the given time zone).
   
   I suspect (haven't confirmed) these definitions could be true for the TIME_FLOOR params.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #11587: TIME_FLOOR does not work, documentation is incomplete

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #11587:
URL: https://github.com/apache/druid/issues/11587#issuecomment-898054910


   I assume you wanted `PT1H` instead of `P1H`. The query should work with `PT1H`. But, we should improve our error handling to show something meaningful instead of unknown exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org