You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by GitBox <gi...@apache.org> on 2022/11/17 00:09:53 UTC

[GitHub] [calcite] wnob opened a new pull request, #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

wnob opened a new pull request, #2973:
URL: https://github.com/apache/calcite/pull/2973

   There is an existing implementation for a BQ `DATE` function but it wrongly thinks this is simply a typecast. See documentation [here][1].
   
   This PR implements 2 of the 4 possible overloads for `DATE` (considering the optional time zone in the second form) as well as 1 of the possible overloads for `TIMESTAMP`, and was able to get a quidem test to pass utilizing both functions. I intend to implement the rest of the overloads in a follow-up commit but was hoping for some feedback now and thought this was a decent-sized commit to review at once.
   
   [1]: https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] olivrlee commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
olivrlee commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1024660113


##########
babel/src/main/codegen/includes/parserImpls.ftl:
##########
@@ -42,6 +42,26 @@ SqlNode DateFunctionCall() :
     }
 }
 
+SqlNode TimestampFunctionCall() :

Review Comment:
   Can you explain a bit on the whole purpose of the `/babel` directory? What I'm understanding here is that its used to be able to parse new SQL statements coming in, the `date` function existed already above this block so it doesn't need to be added 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322705468

   > BQ `DATETIME` = Calcite `TIMESTAMP`
   
   I don't see the connection. In Calcite, a timestamp is represented internally as an offset in milliseconds since epoch. In BigQuery, a timestamp is represented internally as an offset in microseconds since epoch. Apart from the difference in resolution, they're essentially the same.
   
   Of course, this can easily be re-interpreted as a `DATETIME` by assuming a default time zone, but I struggle to see how a Calcite `TIMESTAMP` is better mapped to a BQ `DATETIME` than to a BQ `TIMESTAMP`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1026992374


##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -149,6 +159,40 @@ public class SqlFunctions {
 
   private static final Pattern PATTERN_0_STAR_E = Pattern.compile("0*E");
 
+  // Date formatter for BigQuery's timestamp literals:
+  // https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
+  private static final DateTimeFormatter BIG_QUERY_TIMESTAMP_LITERAL_FORMATTER =
+      new DateTimeFormatterBuilder()
+          // Unlike ISO 8601, BQ only supports years between 1 - 9999,
+          // but can support single-digit month and day parts.
+          .appendValue(ChronoField.YEAR, 4)
+          .appendLiteral('-')
+          .appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral('-')
+          .appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
+          // Everything after the date is optional. Optional sections can be nested.
+          .optionalStart()
+          // BQ accepts either a literal 'T' or a space to separate the date from the time,
+          // so make the 'T' optional but pad with 1 space if it's omitted.
+          .padNext(1, ' ')
+          .optionalStart()
+          .appendLiteral('T')
+          .optionalEnd()
+          // Unlike ISO 8601, BQ can support single-digit hour, minute, and second parts.
+          .appendValue(ChronoField.HOUR_OF_DAY, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral(':')
+          .appendValue(ChronoField.MINUTE_OF_HOUR, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral(':')
+          .appendValue(ChronoField.SECOND_OF_MINUTE, 1, 2, SignStyle.NOT_NEGATIVE)
+          // ISO 8601 supports up to nanosecond precision, but BQ only up to microsecond.
+          .optionalStart()
+          .appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
+          .optionalEnd()
+          .optionalStart()
+          .parseLenient()
+          .appendOffsetId()
+          .toFormatter(Locale.ROOT);
+

Review Comment:
   Yes, it would appear that JDK8 had a different implementation for `DateTimeFormatter` that cannot handle offsets (or at least not in the same way as later versions). I'm inclined to switch to regex-based conversion, which will require more logic but be more widely compatible. Thoughts?



##########
site/_docs/reference.md:
##########
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)           | Converts the timezone of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])                   | Returns the current time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)                              | Returns the name, in the connection's locale, of the weekday in *datetime*; for example, it returns '星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)                                   | Equivalent to `CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)                | Returns a date object given year, month, and day
+| b | DATE(timestamp)                                | Extracts the UTC date from a timestamp
+| b | DATE(timestamp, string)                        | Extracts the date from a timestamp, in a given time zone
+| b | TIMESTAMP(string)                              | Equivalent to `CAST(string AS TIMESTAMP)`
+| b | TIMESTAMP(string, string)                      | Equivalent to `CAST(string AS TIMESTAMP)`, converted to a given time zone
+| b | TIMESTAMP(date)                                | Convert a UTC date to a timestamp (at midnight)
+| b | TIMESTAMP(date, string)                        | Convert a date to a timestamp (at midnight), in a given time zone
+| b | TIME(integer, integer, integer)                | Returns a time object given hour, minute, and second
+| b | TIME(timestamp)                                | Extracts the UTC time from a timestamp

Review Comment:
   Reworded for clarity.



##########
babel/src/main/codegen/includes/parserImpls.ftl:
##########
@@ -42,6 +42,46 @@ SqlNode DateFunctionCall() :
     }
 }
 
+SqlNode TimestampFunctionCall() :
+{
+    final SqlFunctionCategory funcType = SqlFunctionCategory.USER_DEFINED_FUNCTION;

Review Comment:
   Ya this was just copied from the existing `DateFunctionCall`. I assumed this vaguely meant library function since there is no obvious "library function" value in the `SqlFunctionCategory` enum. Perhaps these should all be in the `TIMEDATE` category. I suppose since this is the babel parser, there's no real concept of a library function.
   
   BTW, is my assessment of babel in my above comment with Oliver correct? Should we be putting these productions in the core parser instead (or in addition to here)?



##########
babel/src/test/resources/sql/big-query.iq:
##########
@@ -1051,29 +1051,98 @@ select unix_date(timestamp '2008-12-25') as d;
 
 !ok
 
-# DATE
-# 'date(x) is shorthand for 'cast(x as date)'
-select date('1970-01-01') as d;
+# DATE(year, month, day)
+select date(2022, 11, 15) as d;
 +------------+
 | d          |
 +------------+
-| 1970-01-01 |
+| 2022-11-15 |
 +------------+
 (1 row)
 
 !ok
 
-!if (false) {
-select date(cast(null as varchar(10))) as d;
-+---+
-| D |
-+---+
-|   |
-+---+
+# TIMESTAMP(string)

Review Comment:
   Cool, I think the most recent commit has what you asked for. The BigQuery docs aren't completely self-consistent style-wise but I copied verbatim. Also, the 3 latter tests involve a combination of different functions, so just kept the original descriptive comments there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322750893

   The precision issue really is orthogonal, so at best it muddies the waters. See https://issues.apache.org/jira/browse/CALCITE-5266.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1025778875


##########
babel/src/main/codegen/includes/parserImpls.ftl:
##########
@@ -42,6 +42,26 @@ SqlNode DateFunctionCall() :
     }
 }
 
+SqlNode TimestampFunctionCall() :

Review Comment:
   Here's my understanding: Babel is the "universal" parser ¹ that does not understand the semantics of individual dialects (hence the name). Dialect-specific semantic validation is a separate step that occurs after dialect-agnostic parsing.
   
   This is a [FreeMarker template file][1]. It does not contain any interpolation or tags, so running this through FreeMarker would simply remove the license comment at the top of the file. It's processed by [FMPP][2] in `config.fmpp` [here][3], which is in-turn referenced in the [JavaCC][4] grammar [here][5], as well as [here][6] via the `builtinFunctionCallMethods` list where the function name was added above in `config.fmpp`.
   
   Together, these 2 additions to the grammar indicate that there is a *production* called `TimestampFunctionCall` which consists of the [`<TIMESTAMP>` token][7] followed by a [`FunctionParameterList`][8] production, which consists of left-parenthesis, comma-separated arguments, right-parenthesis. The parser does nothing to validate how many arguments there are or what types they are.
   
   So yes, long story short, you're right that we didn't have to modify the `DateFunctionCall` production because it's just a function call either way, and I just added this new production with a different name token but which is otherwise identical to the `DATE` one.
   
   ¹ I've heard Julian make reference to other parsers being potentially usable with Calcite but I don't know of any examples.
   
   [1]: https://freemarker.apache.org/docs/dgui_template_overallstructure.html
   [2]: https://fmpp.sourceforge.net/index.html
   [3]: https://github.com/apache/calcite/blob/a505b25eacc473c6ec0ef8abd40c1ccae86297b6/babel/src/main/codegen/config.fmpp#L566
   [4]: https://javacc.github.io/javacc/
   [5]: https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L1163
   [6]: https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L5992
   [7]: https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L7922
   [8]: https://github.com/apache/calcite/blob/a0e119ea42def418957f214f539469f1aba76c18/core/src/main/codegen/templates/Parser.jj#L948
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1320707981

   @tjbanghart
   
   > Wouldn't `DATETIME` map to Calcite's interpretation of a `TIMESTAMP` due to the lack of TZ info?
   
   Neither has tz info. I included a little table in the design doc that shows similarities between types in BQ and Calcite, and BQ `TIMESTAMP` maps pretty well to Calcite's `TIMESTAMP`. There isn't really a good existing type for `DATETIME` in Calcite. I think we'll have to implement one.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] mkou commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
mkou commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1355746946

   > 
   
   I see this was last month, I just want to make sure that there is better visibility on that and that indeed we agree that 
   BQ TIMESTAMP => Calcite TIMESTAMP WITH LOCAL TIMEZONW
   BQ DATETIME => Calcite DATETIME


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1319428833

   I've now implemented all the overloads that do not involve `DATETIME` objects. I'm not sure what is the best way to proceed with these because there doesn't seem to be a good fit in standard SQL to act as an alias. `TIMESTAMP_WITH_LOCAL_TIME_ZONE` is not a great fit; it has an implied time zone, whereas a `DATETIME` has no time zone at all.
   
   It's not necessary for MVP since we don't yet support `DATETIME` at all, so I'm inclined to punt on this for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322739188

   Of course, we could simply re-interpret those bits as `DATETIME`, as I mentioned. We don't need a new internal representation; we could just make a new type that happens to have the same internal representation.
   
   UTC is not exactly relevant for a BQ timestamp. Sure, the epoch is 1970-01-01 00:00:00 UTC, but you could also think of it as 1969-01-01 16:00:00 PST. It's just a point in time.
   
   On the other hand, a time zone would be very relevant if we decided to represent BQ `DATETIME` as a Calcite `TIMESTAMP`, since we would have to interpret Calcite's offset-from-epoch in a particular time zone, UTC being the obvious choice.
   
   Alternatively, we could pick a new internal representation for BQ `DATETIME` -- e.g. as a Java `LocalDateTime`, which is internally represented as clock/calendar parameters with no time zone -- and avoid the implementation detail of converting from offset to clock/calendar parameters in a particular time zone. It doesn't really matter from an engineering standpoint.
   
   Basically, we have 2 equally possible options:
   1. Map BQ `TIMESTAMP` to Calcite `TIMESTAMP`, and map BQ `DATETIME` to something else (probably a whole new type that's BQ-specific, call it `BIGQUERY_DATETIME`)
   2. Map BQ `DATETIME` to Calcite `TIMESTAMP`, and map BQ `TIMESTAMP` to something else (also probably a whole new type that's BQ-specific, call it `BIGQUERY_TIMESTAMP`).
   
   I just don't see any advantages to option 1 over 2. Option 1 would cause confusion, as well as requiring a slightly awkward internal representation for `DATETIME` because it requires a time zone to convert from internal representation to semantic representation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322748744

   Let's move design discussions to the Jira case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322709533

   No, BigQuery, a timestamp is represented internally as an offset in microseconds since epoch *UTC*.
   
   The "UTC" is a crucial difference.
   
   Sure, they're both a bunch of bits. But so is a 64 bit IEEE floating point number. The interpretation of those bits is crucial.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] mkou commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
mkou commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1356124354

   Hey Julian, 
   
   Yes we agree this is what I meant 
   
   > 
   > Le 16 déc. 2022 à 18:52, Julian Hyde ***@***.***> a écrit :
   > 
   > 
   > @mkou, In SqlLibraryOperators, the TIMESTAMP function should return a Calcite TIMESTAMP, DATE will return DATE, and there is no DATETIME function.
   > 
   > Elsewhere (or in non-open source code), there can be a function called DATETIME that returns a Calcite TIMESTAMP (what BQ calls a DATETIME), and a function called TIMESTAMP that returns a Calcite TIMESTAMP WITH LOCAL TIME ZONE (what BQ calls a TIMESTAMP). The DATETIME function would in fact be SqlLibraryOperators.TIMESTAMP.withName("DATETIME"). And so forth.
   > 
   > Sorry it's complicated. But we need to keep the function library orthogonal to any type aliasing.
   > 
   > —
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > You are receiving this because you were mentioned.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1355981493

   @mkou, In `SqlLibraryOperators`, the `TIMESTAMP` function should return a Calcite `TIMESTAMP`, `DATE` will return `DATE`, and there is no `DATETIME` function.
   
   Elsewhere (or in non-open source code), there can be a function called `DATETIME` that returns a Calcite `TIMESTAMP` (what BQ calls a `DATETIME`), and a function called `TIMESTAMP` that returns a Calcite `TIMESTAMP WITH LOCAL TIME ZONE` (what BQ calls a `TIMESTAMP`). The `DATETIME` function would in fact be `SqlLibraryOperators.TIMESTAMP.withName("DATETIME")`. And so forth.
   
   Sorry it's complicated. But we need to keep the function library orthogonal to any type aliasing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] olivrlee commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
olivrlee commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1024660113


##########
babel/src/main/codegen/includes/parserImpls.ftl:
##########
@@ -42,6 +42,26 @@ SqlNode DateFunctionCall() :
     }
 }
 
+SqlNode TimestampFunctionCall() :

Review Comment:
   Can you explain a bit (to me, not needing comments on the file )on the whole purpose of the `/babel` directory? What I'm understanding here is that its used to be able to parse new SQL statements coming in, the `date` function existed already above this block so it doesn't need to be added 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1320951570

   BQ `DATETIME` = Calcite `TIMESTAMP`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] julianhyde commented on a diff in pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
julianhyde commented on code in PR #2973:
URL: https://github.com/apache/calcite/pull/2973#discussion_r1026775499


##########
site/_docs/reference.md:
##########
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)           | Converts the timezone of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])                   | Returns the current time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)                              | Returns the name, in the connection's locale, of the weekday in *datetime*; for example, it returns '星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)                                   | Equivalent to `CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)                | Returns a date object given year, month, and day
+| b | DATE(timestamp)                                | Extracts the UTC date from a timestamp
+| b | DATE(timestamp, string)                        | Extracts the date from a timestamp, in a given time zone
+| b | TIMESTAMP(string)                              | Equivalent to `CAST(string AS TIMESTAMP)`
+| b | TIMESTAMP(string, string)                      | Equivalent to `CAST(string AS TIMESTAMP)`, converted to a given time zone
+| b | TIMESTAMP(date)                                | Convert a UTC date to a timestamp (at midnight)
+| b | TIMESTAMP(date, string)                        | Convert a date to a timestamp (at midnight), in a given time zone
+| b | TIME(integer, integer, integer)                | Returns a time object given hour, minute, and second
+| b | TIME(timestamp)                                | Extracts the UTC time from a timestamp

Review Comment:
   A Calcite TIMESTAMP value does not have a UTC time. Even though you're implementing for BQ, you must use Calcite terminology here.



##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -149,6 +159,40 @@ public class SqlFunctions {
 
   private static final Pattern PATTERN_0_STAR_E = Pattern.compile("0*E");
 
+  // Date formatter for BigQuery's timestamp literals:
+  // https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#timestamp_literals
+  private static final DateTimeFormatter BIG_QUERY_TIMESTAMP_LITERAL_FORMATTER =
+      new DateTimeFormatterBuilder()
+          // Unlike ISO 8601, BQ only supports years between 1 - 9999,
+          // but can support single-digit month and day parts.
+          .appendValue(ChronoField.YEAR, 4)
+          .appendLiteral('-')
+          .appendValue(ChronoField.MONTH_OF_YEAR, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral('-')
+          .appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
+          // Everything after the date is optional. Optional sections can be nested.
+          .optionalStart()
+          // BQ accepts either a literal 'T' or a space to separate the date from the time,
+          // so make the 'T' optional but pad with 1 space if it's omitted.
+          .padNext(1, ' ')
+          .optionalStart()
+          .appendLiteral('T')
+          .optionalEnd()
+          // Unlike ISO 8601, BQ can support single-digit hour, minute, and second parts.
+          .appendValue(ChronoField.HOUR_OF_DAY, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral(':')
+          .appendValue(ChronoField.MINUTE_OF_HOUR, 1, 2, SignStyle.NOT_NEGATIVE)
+          .appendLiteral(':')
+          .appendValue(ChronoField.SECOND_OF_MINUTE, 1, 2, SignStyle.NOT_NEGATIVE)
+          // ISO 8601 supports up to nanosecond precision, but BQ only up to microsecond.
+          .optionalStart()
+          .appendFraction(ChronoField.MICRO_OF_SECOND, 0, 6, true)
+          .optionalEnd()
+          .optionalStart()
+          .parseLenient()
+          .appendOffsetId()
+          .toFormatter(Locale.ROOT);
+

Review Comment:
   is this the code that Will said was non-JDK8 compliant? 



##########
site/_docs/reference.md:
##########
@@ -2596,7 +2596,16 @@ semantics.
 | p | CONVERT_TIMEZONE(tz1, tz2, datetime)           | Converts the timezone of *datetime* from *tz1* to *tz2*
 | b | CURRENT_DATETIME([timezone])                   | Returns the current time as a TIMESTAMP from *timezone*
 | m | DAYNAME(datetime)                              | Returns the name, in the connection's locale, of the weekday in *datetime*; for example, it returns '星期日' for both DATE '2020-02-10' and TIMESTAMP '2020-02-10 10:10:10'
-| b | DATE(string)                                   | Equivalent to `CAST(string AS DATE)`
+| b | DATE(integer, integer, integer)                | Returns a date object given year, month, and day

Review Comment:
   this list should be alphabetically sorted



##########
core/src/main/java/org/apache/calcite/sql/fun/SqlDateFunction.java:
##########
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to you under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.calcite.sql.fun;
+
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.sql.SqlCallBinding;
+import org.apache.calcite.sql.SqlFunction;
+import org.apache.calcite.sql.SqlFunctionCategory;
+import org.apache.calcite.sql.SqlKind;
+import org.apache.calcite.sql.SqlOperandCountRange;
+import org.apache.calcite.sql.SqlOperator;
+import org.apache.calcite.sql.SqlOperatorBinding;
+import org.apache.calcite.sql.type.SqlOperandCountRanges;
+import org.apache.calcite.sql.type.SqlOperandTypeChecker;
+import org.apache.calcite.sql.type.SqlTypeName;
+
+import java.util.Locale;
+
+/**
+ * The Google BigQuery {@code DATE} function returns a date object and can be invoked in 3 ways:

Review Comment:
   javadoc is HTML. you need &lt;ul&gt; and &lt;p&gt; tags to achieve formatting



##########
babel/src/main/codegen/includes/parserImpls.ftl:
##########
@@ -42,6 +42,46 @@ SqlNode DateFunctionCall() :
     }
 }
 
+SqlNode TimestampFunctionCall() :
+{
+    final SqlFunctionCategory funcType = SqlFunctionCategory.USER_DEFINED_FUNCTION;

Review Comment:
   not sure it should be user-defined function. a library function, maybe



##########
babel/src/test/resources/sql/big-query.iq:
##########
@@ -1051,29 +1051,98 @@ select unix_date(timestamp '2008-12-25') as d;
 
 !ok
 
-# DATE
-# 'date(x) is shorthand for 'cast(x as date)'
-select date('1970-01-01') as d;
+# DATE(year, month, day)
+select date(2022, 11, 15) as d;
 +------------+
 | d          |
 +------------+
-| 1970-01-01 |
+| 2022-11-15 |
 +------------+
 (1 row)
 
 !ok
 
-!if (false) {
-select date(cast(null as varchar(10))) as d;
-+---+
-| D |
-+---+
-|   |
-+---+
+# TIMESTAMP(string)

Review Comment:
   big-query.iq was created by pasting in doc from google's site. please continue the tradition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] tjbanghart commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
tjbanghart commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1319499323

   Wouldn't `DATETIME` map to Calcite's interpretation of a `TIMESTAMP` due to the lack of TZ info?
   
   Also, should we move this into the core parser? I don't feel strongly either way (or if it matters all that much) but the BQ style literals and some functions live in core already.
   
   Otherwise, LGTM and a great example for adding other functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [calcite] wnob commented on pull request #2973: [CALCITE-5180] Implement some of the overloads for BigQuery's DATE and TIMESTAMP

Posted by GitBox <gi...@apache.org>.
wnob commented on PR #2973:
URL: https://github.com/apache/calcite/pull/2973#issuecomment-1322745325

   I would also note that both BQ types have microsecond precision, so either way the existing Calcite `TIMESTAMP` is not a perfect match.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@calcite.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org