You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/07 02:56:30 UTC

[GitHub] [beam] ZijieSong946 opened a new pull request #12493: [BEAM-10607] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

ZijieSong946 opened a new pull request #12493:
URL: https://github.com/apache/beam/pull/12493


   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark | Twister2
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) | ---
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/i
 con)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](htt
 ps://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Twister2/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python38/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_
 Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_P
 ostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/) | ---
   XLang | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Direct/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/) | ---
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonDocker_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/b
 eam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   ![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470902732



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       We plan to support windowing only up to millisecond precision, and the timestamp here seems only used by windowing. So I guess it is fine?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r472501717



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Timestamp.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.Instant;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A timestamp without a time-zone.
+ *
+ * <p>It represents an instant on the time-line.
+ *
+ * <p>Its input type is a {@link Instant}, and base type is a {@link Row} containing EpochSeconds
+ * field and Nanos field. EpochSeconds field is a Long that represents incrementing count of seconds
+ * from 1970-01-01T00:00:00Z. Nanos field is an Int that represents nanoseconds of second.
+ */
+public class Timestamp implements Schema.LogicalType<Instant, Row> {

Review comment:
       This is the same as [NanosInstant](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosInstant.java), can we re-use that instead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470940637



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       One option would be if one is forced to coerce to a millisecond-granularity one before using it in windowing (or other Beam-timestamp-using) operations. Silently mutating the data seems scary. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] stale[bot] closed pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
stale[bot] closed pull request #12493:
URL: https://github.com/apache/beam/pull/12493


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r472529903



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       That would be great. We could put this as an option on the SQLTransform itself. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r472505810



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Timestamp.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.Instant;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A timestamp without a time-zone.
+ *
+ * <p>It represents an instant on the time-line.
+ *
+ * <p>Its input type is a {@link Instant}, and base type is a {@link Row} containing EpochSeconds
+ * field and Nanos field. EpochSeconds field is a Long that represents incrementing count of seconds
+ * from 1970-01-01T00:00:00Z. Nanos field is an Int that represents nanoseconds of second.
+ */
+public class Timestamp implements Schema.LogicalType<Instant, Row> {

Review comment:
       Got it, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r472504592



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/Timestamp.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.schemas.logicaltypes;
+
+import java.time.Instant;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.values.Row;
+
+/**
+ * A timestamp without a time-zone.
+ *
+ * <p>It represents an instant on the time-line.
+ *
+ * <p>Its input type is a {@link Instant}, and base type is a {@link Row} containing EpochSeconds
+ * field and Nanos field. EpochSeconds field is a Long that represents incrementing count of seconds
+ * from 1970-01-01T00:00:00Z. Nanos field is an Int that represents nanoseconds of second.
+ */
+public class Timestamp implements Schema.LogicalType<Instant, Row> {

Review comment:
       +1. I think we should. (This is PR is a WIP prototype. @ZijieSong946 you can continue with this class. Once we merge it I will remember to change to NanoInstant)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ZijieSong946 commented on pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
ZijieSong946 commented on pull request #12493:
URL: https://github.com/apache/beam/pull/12493#issuecomment-673263406


   cc: @robinyqiu Would you please take a look at test cases `BeamSqlDslAggregationTest.testSessionWindowWithBounded()` and `BeamSqlDslStdOperatorTest.testAvg()` if possible. Thanks in advance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r474108519



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       +1 for an option on the SQLTransform itself, that will be easier to use from Python




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on pull request #12493:
URL: https://github.com/apache/beam/pull/12493#issuecomment-685149004


   Thanks Zijie for this great prototype. I will continue this work based on this PR.
   
   There are four things that we need to finish before this feature can go into production:
   1. Fix the current tests breaking on CalciteSQL dialect side
   2. Map to a new logical type that represents microsecond instant (cc: @TheNeuralBit )
   3. Add an option to choose either throw an error or truncate as discussed in the comment above
   4. Add support to reading and writing the microsecond instant type in IOs (BigQuery/PubSub)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robinyqiu commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robinyqiu commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r472526820



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       > I think there'll be surprising loss of precision if this field is used for windowing (say, with TimestampCombiner.EARLIEST) and then the timestamp is inspected (is even available from SQL, right?) and has been truncated.
   
   I see. From Java user can definitely inspect the timestamp, so that is a problem. From pure SQL (e.g. from Dataflow SQL UI) I think for TUMBLE and HOP this will be fine, because the window start/end timestamp will not have a microsecond component if we only allow windowing up to millisecond. For SESSION I think there will be a problem, because the window starts/ends at a timestamp of an element.
   
   I will make sure we don't silently truncate precision before I merge the PR. One option is to use a PipelineOptions for user to choose truncate or fail over timestamp with microsecond component during windowing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] ZijieSong946 commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
ZijieSong946 commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470929991



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       Yeah. We do not support timestamp microsecond precision for windowing in this PR. Windowing for Streaming tests is still millisecond precision. So this conversion is fine, because `Instant.toEpochMilli()` is not lossy for an millisecond precision `Instant`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] stale[bot] commented on pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #12493:
URL: https://github.com/apache/beam/pull/12493#issuecomment-723500379


   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] stale[bot] commented on pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #12493:
URL: https://github.com/apache/beam/pull/12493#issuecomment-727715874


   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470899495



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       It seems concerning that this is silently lossy...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470940637



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       One option would be if one is forced to coerce to a millisecond-granularity one before using it in windowing (or other Beam-timestamp-using) operations. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] TheNeuralBit commented on pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on pull request #12493:
URL: https://github.com/apache/beam/pull/12493#issuecomment-686582828


   Thank you Zijie! and thanks for the plan @robinyqiu!
   
   I have a PR up for item (2): #12764 - it adds a micros_instant type in Java and Python and verifies compatibility.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] robertwb commented on a change in pull request #12493: [BEAM-10607] [WIP] Support micro-second ZetaSQL TIMESTAMP in BeamSQL

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #12493:
URL: https://github.com/apache/beam/pull/12493#discussion_r470940566



##########
File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamTableFunctionScanRel.java
##########
@@ -232,7 +233,11 @@ private Schema getKeySchema(Schema inputSchema, List<Integer> keys) {
           upstream
               .apply(
                   "assignEventTimestamp",
-                  WithTimestamps.<Row>of(row -> row.getDateTime(windowFieldIndex).toInstant())
+                  WithTimestamps.<Row>of(
+                          row ->
+                              Instant.ofEpochMilli(

Review comment:
       I think there'll be surprising loss of precision if this field is used for windowing (say, with TimestampCombiner.EARLIEST) and then the timestamp is inspected (is even available from SQL, right?) and has been truncated. (Or any other conversion from Row to Beam and back again being lossy.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org