You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "伍翀(云邪)" <wu...@alibaba-inc.com> on 2017/07/27 06:39:52 UTC

[DISCUSS] Temporal Feature in Calcite

Hi all,

We are looking for the temporal support in Calcite, or so-called system-versioned temporal table. The temporal database feature was introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A system-versioned temporal table is designed to keep a full history of data changes and allow easy point in time analysis. It is very useful for scenarios that require tracking history of data changes.

Because system-versioned tables are intended primarily for tracking historical data changes, queries on system-versioned tables often tend to be concerned with retrieving the table content as of a given point in time or between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS OF syntactic extension for this specific purpose. For example, the following query retrieves the rows of Emp that were current as of Jan. 2, 2011:
    SELECT ENo,EName,Sys_Start,Sys_End
    FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'

In addition, we need the time expression can be a relational expression whose value is from another table. For example, the following query joins the Orders to the Prices as the price was at the order time:
    SELECT STREAM *
    FROM Orders AS o
    JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
      ON o.productId = p.productId

So I would like to introduce the syntactic extension in Calcite. What do you think about this? Any comments or suggestions are welcome!

[1] https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
[2] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
[3] https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
[4] http://sqlhints.com/tag/for-system_time-as-of/

Bests,
Jark Wu

Re: [DISCUSS] Temporal Feature in Calcite

Posted by Julian Hyde <jh...@apache.org>.
Oops, I saw that you have addressed that comment in https://issues.apache.org/jira/browse/CALCITE-1912 <https://issues.apache.org/jira/browse/CALCITE-1912>. Let’s discuss there.

> On Aug 2, 2017, at 11:34 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> I do think that the best way to write the query is
> 
> SELECT  *
>   FROM Orders AS o
>   JOIN ProductPrices FOR SYSTEM_TIME AS OF o.OrderType AS p
>     ON o.productId = p.productId
> 
> But do we agree that this goes beyond the standard? It breaches "c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.” because “o.OrderType” is a column reference.
> 
> Julian
> 
>> On Aug 1, 2017, at 1:33 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>> 
>> Hi Julian,
>> 
>> We are interested in both and I have logged CALCITE-1912 for supporting “FOR SYSTEM_TIME AS OF” in regular queries.
>> 
>> Regarding to (b), you mean it can be covered by standard. I agree, the query I posted in the previous mail can be rewrote as a subquery:
>> 
>> SELECT  *
>>   FROM Orders AS o
>>   JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND sysEnd > O.orderTime) AS P
>>     ON o.productId = p.productId
>> 
>> But subquery is too complex for users, the period condition is trivial for users. Using the “FOR SYSTEM_TIME AS OF”  is much simpler. Yes, you are right that the standard says it can’t contain a column reference. We initialize this discuss as we would like to "extend" the standard to simplify such query. What do you think?
>> 
>> - Jark Wu 
>> 
>>> 在 2017年8月1日,上午2:58,Julian Hyde <jh...@apache.org> 写道:
>>> 
>>> Can you clarify whether you are interested in (a) streaming queries, (b) temporal non-streaming queries, or both? If (b), it is covered by the standard, and we should follow the standard, period. If (a), syntax and semantics are still under discussion. 
>>> 
>>> If you want both, could we perhaps separate the streaming and non-streaming cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming) queries?
>>> 
>>>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant.
>>> 
>>> I disagree. The standard says that it cannot contain a column reference:
>>> 
>>>> If <query system time period specification> QSTPS is specified, then:
>>>> a) The table identified by the <table or query name> shall be a system-versioned table.
>>>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified, then ASYMMETRIC is implicit.
>>>> c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.
>>> 
>>> We can go beyond the standard, but let’s be clear what it says.
>>> 
>>> Julian
>>> 
>>> 
>>> 
>>> 
>>>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>> 
>>>> Hi Julian,
>>>> 
>>>> The syntax you posted looks very good to me. Besides of that, we hope to extend the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
>>>> 
>>>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is still a query constant, but what we want is a variable timestamp changing with every record. The example in “Streams, joins and temporal tables” [1] fits our requirement much more. It uses the table-value function to express and not introduce new syntax. But it  it is not a standard to manipulate temporal table. And it can only express querying the table content as of a specified point in time, but can not express a period time (can it?). That’s why we prefer the standard temporal syntax.
>>>> 
>>>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant. But I think, it is very useful to support column reference. For example: 
>>>> 
>>>> SELECT  *
>>>> FROM Orders AS o
>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>   ON o.productId = p.productId
>>>> 
>>>> This is actually what we want. 
>>>> 
>>>> What do you think?
>>>> 
>>>> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>>>> - Jark Wu 
>>>> 
>>>>> 在 2017年7月29日,上午2:16,Julian Hyde <jh...@apache.org> 写道:
>>>>> 
>>>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>>>> think we should go with that approach.
>>>>> 
>>>>> The standard explicitly says "QSTPS shall not contain a <column
>>>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>>>> that, but I think we can do what you want without bind variables:
>>>>> 
>>>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>>>> 
>>>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>>>> value is constant throughout the execution of the query. So it behaves
>>>>> somewhat like a bind variable.)
>>>>> 
>>>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>>>> timestamp literal.)
>>>>> 
>>>>> The standard has this:
>>>>> 
>>>>> <table primary> ::=
>>>>>  <table or query name>
>>>>>  [ <query system time period specification> ]
>>>>>  [ <correlation or recognition> ]
>>>>> | ...
>>>>> 
>>>>> <correlation or recognition> ::=
>>>>>  [ AS ] <correlation name>
>>>>>  [ <parenthesized derived column list> ]
>>>>> | ...
>>>>> 
>>>>> <query system time period specification> ::=
>>>>>  FOR SYSTEM_TIME AS OF <point in time 1>
>>>>> | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>>>>  <point in time 1> AND <point in time 2>
>>>>> | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>>>> 
>>>>> <point in time 1> ::=
>>>>> <point in time>
>>>>> 
>>>>> <point in time 2> ::=
>>>>> <point in time>
>>>>> 
>>>>> <point in time> ::=
>>>>> <datetime value expression>
>>>>> 
>>>>> Julian
>>>>> 
>>>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>>>> Hi Julian,
>>>>>> 
>>>>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>>>>> tables” you proposed before I started the discussion. The “temporal table
>>>>>> view” proposed in the doc is very interesting and well-defined. But we hope
>>>>>> to support this feature based on the standard SQL semantics.
>>>>>> 
>>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>>> constant, whereas we require the timestamp to be automatically
>>>>>> varying.
>>>>>> 
>>>>>> 
>>>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>>>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it
>>>>>> follows after table not the whole query. And I also tried in SQL Server, the
>>>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>>>> temporal table with the same schema. The query chooses different timestamp
>>>>>> of table TemporalRates and TemporalRates2:
>>>>>> 
>>>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>>>> 07:49:07.5649523' AS T2
>>>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>>>>> T1
>>>>>> ON T1.id = T2.id;
>>>>>> 
>>>>>> So I think the only controversial point is it only allows constant
>>>>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>>>>> 
>>>>>> DECLARE @ADayAgo datetime2
>>>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>>>> /*Comparison between two points in time for subset of rows*/
>>>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>>>> 
>>>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>>>> automatically varying. Which is very useful in many scenarios.
>>>>>> 
>>>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>>>> [3]
>>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>>> 
>>>>>> - Jark Wu
>>>>>> 
>>>>>> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
>>>>>> 
>>>>>> Jark,
>>>>>> 
>>>>>> One more thing.
>>>>>> 
>>>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have concrete
>>>>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>>>>> of thought, seems to have incorporated my thinking, and generally does
>>>>>> excellent work.
>>>>>> 
>>>>>> So, definitely give his document serious consideration, as I am.
>>>>>> 
>>>>>> When we all agree that we have the concepts right, I think it likely that we
>>>>>> can embrace the syntax and semantics of temporal support that were
>>>>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>>>>> for areas that it already covers. And by the way, I added PERIOD support to
>>>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1] http://s.apache.org/streaming-sql-spec
>>>>>> 
>>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>>>> 
>>>>>> 
>>>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>>> 
>>>>>> This looks very interesting.
>>>>>> 
>>>>>> Are you familiar with the proposal I created in September, "Streams,
>>>>>> joins and temporal tables"[1]? I started thinking about
>>>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>>>> up with temporal database semantics.
>>>>>> 
>>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>>> constant, whereas we require the timestamp to be automatically
>>>>>> varying.
>>>>>> 
>>>>>> I think you have come to similar conclusions. I would like to hear how
>>>>>> your proposal fits with mine.
>>>>>> 
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>>>> 
>>>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> We are looking for the temporal support in Calcite, or so-called
>>>>>> system-versioned temporal table. The temporal database feature was
>>>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>>>>>> system-versioned temporal table is designed to keep a full history of data
>>>>>> changes and allow easy point in time analysis. It is very useful for
>>>>>> scenarios that require tracking history of data changes.
>>>>>> 
>>>>>> Because system-versioned tables are intended primarily for tracking
>>>>>> historical data changes, queries on system-versioned tables often tend to be
>>>>>> concerned with retrieving the table content as of a given point in time or
>>>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>>>>> OF syntactic extension for this specific purpose. For example, the following
>>>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>>>> 
>>>>>> In addition, we need the time expression can be a relational expression
>>>>>> whose value is from another table. For example, the following query joins
>>>>>> the Orders to the Prices as the price was at the order time:
>>>>>> SELECT STREAM *
>>>>>> FROM Orders AS o
>>>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>>> ON o.productId = p.productId
>>>>>> 
>>>>>> So I would like to introduce the syntactic extension in Calcite. What do you
>>>>>> think about this? Any comments or suggestions are welcome!
>>>>>> 
>>>>>> [1]
>>>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>>>> [2]
>>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>>>> [3]
>>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>>>> 
>>>>>> Bests,
>>>>>> Jark Wu
>>>>>> 
>>>>>> 
>>>> 
>> 
> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by Julian Hyde <jh...@apache.org>.
I do think that the best way to write the query is

SELECT  *
   FROM Orders AS o
   JOIN ProductPrices FOR SYSTEM_TIME AS OF o.OrderType AS p
     ON o.productId = p.productId

But do we agree that this goes beyond the standard? It breaches "c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.” because “o.OrderType” is a column reference.

Julian

> On Aug 1, 2017, at 1:33 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
> 
> Hi Julian,
> 
> We are interested in both and I have logged CALCITE-1912 for supporting “FOR SYSTEM_TIME AS OF” in regular queries.
> 
> Regarding to (b), you mean it can be covered by standard. I agree, the query I posted in the previous mail can be rewrote as a subquery:
> 
> SELECT  *
>    FROM Orders AS o
>    JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND sysEnd > O.orderTime) AS P
>      ON o.productId = p.productId
> 
> But subquery is too complex for users, the period condition is trivial for users. Using the “FOR SYSTEM_TIME AS OF”  is much simpler. Yes, you are right that the standard says it can’t contain a column reference. We initialize this discuss as we would like to "extend" the standard to simplify such query. What do you think?
> 
> - Jark Wu 
> 
>> 在 2017年8月1日,上午2:58,Julian Hyde <jh...@apache.org> 写道:
>> 
>> Can you clarify whether you are interested in (a) streaming queries, (b) temporal non-streaming queries, or both? If (b), it is covered by the standard, and we should follow the standard, period. If (a), syntax and semantics are still under discussion. 
>> 
>> If you want both, could we perhaps separate the streaming and non-streaming cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming) queries?
>> 
>>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant.
>> 
>> I disagree. The standard says that it cannot contain a column reference:
>> 
>>> If <query system time period specification> QSTPS is specified, then:
>>> a) The table identified by the <table or query name> shall be a system-versioned table.
>>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified, then ASYMMETRIC is implicit.
>>> c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.
>> 
>> We can go beyond the standard, but let’s be clear what it says.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> The syntax you posted looks very good to me. Besides of that, we hope to extend the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
>>> 
>>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is still a query constant, but what we want is a variable timestamp changing with every record. The example in “Streams, joins and temporal tables” [1] fits our requirement much more. It uses the table-value function to express and not introduce new syntax. But it  it is not a standard to manipulate temporal table. And it can only express querying the table content as of a specified point in time, but can not express a period time (can it?). That’s why we prefer the standard temporal syntax.
>>> 
>>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant. But I think, it is very useful to support column reference. For example: 
>>> 
>>> SELECT  *
>>>  FROM Orders AS o
>>>  JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>    ON o.productId = p.productId
>>> 
>>> This is actually what we want. 
>>> 
>>> What do you think?
>>> 
>>> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>>> - Jark Wu 
>>> 
>>>> 在 2017年7月29日,上午2:16,Julian Hyde <jh...@apache.org> 写道:
>>>> 
>>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>>> think we should go with that approach.
>>>> 
>>>> The standard explicitly says "QSTPS shall not contain a <column
>>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>>> that, but I think we can do what you want without bind variables:
>>>> 
>>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>>> 
>>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>>> value is constant throughout the execution of the query. So it behaves
>>>> somewhat like a bind variable.)
>>>> 
>>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>>> timestamp literal.)
>>>> 
>>>> The standard has this:
>>>> 
>>>> <table primary> ::=
>>>>   <table or query name>
>>>>   [ <query system time period specification> ]
>>>>   [ <correlation or recognition> ]
>>>> | ...
>>>> 
>>>> <correlation or recognition> ::=
>>>>   [ AS ] <correlation name>
>>>>   [ <parenthesized derived column list> ]
>>>> | ...
>>>> 
>>>> <query system time period specification> ::=
>>>>   FOR SYSTEM_TIME AS OF <point in time 1>
>>>> | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>>>   <point in time 1> AND <point in time 2>
>>>> | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>>> 
>>>> <point in time 1> ::=
>>>> <point in time>
>>>> 
>>>> <point in time 2> ::=
>>>> <point in time>
>>>> 
>>>> <point in time> ::=
>>>> <datetime value expression>
>>>> 
>>>> Julian
>>>> 
>>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>>> Hi Julian,
>>>>> 
>>>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>>>> tables” you proposed before I started the discussion. The “temporal table
>>>>> view” proposed in the doc is very interesting and well-defined. But we hope
>>>>> to support this feature based on the standard SQL semantics.
>>>>> 
>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>> constant, whereas we require the timestamp to be automatically
>>>>> varying.
>>>>> 
>>>>> 
>>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it
>>>>> follows after table not the whole query. And I also tried in SQL Server, the
>>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>>> temporal table with the same schema. The query chooses different timestamp
>>>>> of table TemporalRates and TemporalRates2:
>>>>> 
>>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>>> 07:49:07.5649523' AS T2
>>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>>>> T1
>>>>> ON T1.id = T2.id;
>>>>> 
>>>>> So I think the only controversial point is it only allows constant
>>>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>>>> 
>>>>> DECLARE @ADayAgo datetime2
>>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>>> /*Comparison between two points in time for subset of rows*/
>>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>>> 
>>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>>> automatically varying. Which is very useful in many scenarios.
>>>>> 
>>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>>> [3]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>> 
>>>>> - Jark Wu
>>>>> 
>>>>> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
>>>>> 
>>>>> Jark,
>>>>> 
>>>>> One more thing.
>>>>> 
>>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have concrete
>>>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>>>> of thought, seems to have incorporated my thinking, and generally does
>>>>> excellent work.
>>>>> 
>>>>> So, definitely give his document serious consideration, as I am.
>>>>> 
>>>>> When we all agree that we have the concepts right, I think it likely that we
>>>>> can embrace the syntax and semantics of temporal support that were
>>>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>>>> for areas that it already covers. And by the way, I added PERIOD support to
>>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1] http://s.apache.org/streaming-sql-spec
>>>>> 
>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>>> 
>>>>> 
>>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
>>>>> 
>>>>> This looks very interesting.
>>>>> 
>>>>> Are you familiar with the proposal I created in September, "Streams,
>>>>> joins and temporal tables"[1]? I started thinking about
>>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>>> up with temporal database semantics.
>>>>> 
>>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>>> constant, whereas we require the timestamp to be automatically
>>>>> varying.
>>>>> 
>>>>> I think you have come to similar conclusions. I would like to hear how
>>>>> your proposal fits with mine.
>>>>> 
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>>> 
>>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> We are looking for the temporal support in Calcite, or so-called
>>>>> system-versioned temporal table. The temporal database feature was
>>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>>>>> system-versioned temporal table is designed to keep a full history of data
>>>>> changes and allow easy point in time analysis. It is very useful for
>>>>> scenarios that require tracking history of data changes.
>>>>> 
>>>>> Because system-versioned tables are intended primarily for tracking
>>>>> historical data changes, queries on system-versioned tables often tend to be
>>>>> concerned with retrieving the table content as of a given point in time or
>>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>>>> OF syntactic extension for this specific purpose. For example, the following
>>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>>> 
>>>>> In addition, we need the time expression can be a relational expression
>>>>> whose value is from another table. For example, the following query joins
>>>>> the Orders to the Prices as the price was at the order time:
>>>>> SELECT STREAM *
>>>>> FROM Orders AS o
>>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>>  ON o.productId = p.productId
>>>>> 
>>>>> So I would like to introduce the syntactic extension in Calcite. What do you
>>>>> think about this? Any comments or suggestions are welcome!
>>>>> 
>>>>> [1]
>>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>>> [2]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>>> [3]
>>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>>> 
>>>>> Bests,
>>>>> Jark Wu
>>>>> 
>>>>> 
>>> 
> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by "伍翀(云邪)" <wu...@alibaba-inc.com>.
Hi Julian,

We are interested in both and I have logged CALCITE-1912 for supporting “FOR SYSTEM_TIME AS OF” in regular queries.

Regarding to (b), you mean it can be covered by standard. I agree, the query I posted in the previous mail can be rewrote as a subquery:

 SELECT  *
    FROM Orders AS o
    JOIN LATERAL (SELECT * FROM ProductPrices WHERE sysStart <= O.orderTime AND sysEnd > O.orderTime) AS P
      ON o.productId = p.productId

But subquery is too complex for users, the period condition is trivial for users. Using the “FOR SYSTEM_TIME AS OF”  is much simpler. Yes, you are right that the standard says it can’t contain a column reference. We initialize this discuss as we would like to "extend" the standard to simplify such query. What do you think?

- Jark Wu 

> 在 2017年8月1日,上午2:58,Julian Hyde <jh...@apache.org> 写道:
> 
> Can you clarify whether you are interested in (a) streaming queries, (b) temporal non-streaming queries, or both? If (b), it is covered by the standard, and we should follow the standard, period. If (a), syntax and semantics are still under discussion. 
> 
> If you want both, could we perhaps separate the streaming and non-streaming cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming) queries?
> 
>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant.
> 
> I disagree. The standard says that it cannot contain a column reference:
> 
>> If <query system time period specification> QSTPS is specified, then:
>> a) The table identified by the <table or query name> shall be a system-versioned table.
>> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified, then ASYMMETRIC is implicit.
>> c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.
> 
> We can go beyond the standard, but let’s be clear what it says.
> 
> Julian
> 
> 
> 
> 
>> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>> 
>> Hi Julian,
>> 
>> The syntax you posted looks very good to me. Besides of that, we hope to extend the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
>> 
>> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is still a query constant, but what we want is a variable timestamp changing with every record. The example in “Streams, joins and temporal tables” [1] fits our requirement much more. It uses the table-value function to express and not introduce new syntax. But it  it is not a standard to manipulate temporal table. And it can only express querying the table content as of a specified point in time, but can not express a period time (can it?). That’s why we prefer the standard temporal syntax.
>> 
>> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant. But I think, it is very useful to support column reference. For example: 
>> 
>> SELECT  *
>>   FROM Orders AS o
>>   JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>     ON o.productId = p.productId
>> 
>> This is actually what we want. 
>> 
>> What do you think?
>> 
>> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
>> - Jark Wu 
>> 
>>> 在 2017年7月29日,上午2:16,Julian Hyde <jh...@apache.org> 写道:
>>> 
>>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>>> think we should go with that approach.
>>> 
>>> The standard explicitly says "QSTPS shall not contain a <column
>>> reference> or an <SQL parameter reference>." We may choose to ignore
>>> that, but I think we can do what you want without bind variables:
>>> 
>>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>>> 
>>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>>> value is constant throughout the execution of the query. So it behaves
>>> somewhat like a bind variable.)
>>> 
>>> Can we go with the syntax in the standard? (The Microsoft syntax
>>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>>> timestamp literal.)
>>> 
>>> The standard has this:
>>> 
>>> <table primary> ::=
>>>    <table or query name>
>>>    [ <query system time period specification> ]
>>>    [ <correlation or recognition> ]
>>>  | ...
>>> 
>>> <correlation or recognition> ::=
>>>    [ AS ] <correlation name>
>>>    [ <parenthesized derived column list> ]
>>>  | ...
>>> 
>>> <query system time period specification> ::=
>>>    FOR SYSTEM_TIME AS OF <point in time 1>
>>>  | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>>    <point in time 1> AND <point in time 2>
>>>  | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>>> 
>>> <point in time 1> ::=
>>>  <point in time>
>>> 
>>> <point in time 2> ::=
>>>  <point in time>
>>> 
>>> <point in time> ::=
>>>  <datetime value expression>
>>> 
>>> Julian
>>> 
>>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>> Hi Julian,
>>>> 
>>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>>> tables” you proposed before I started the discussion. The “temporal table
>>>> view” proposed in the doc is very interesting and well-defined. But we hope
>>>> to support this feature based on the standard SQL semantics.
>>>> 
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>> 
>>>> 
>>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it
>>>> follows after table not the whole query. And I also tried in SQL Server, the
>>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>>> temporal table with the same schema. The query chooses different timestamp
>>>> of table TemporalRates and TemporalRates2:
>>>> 
>>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>>> 07:49:07.5649523' AS T2
>>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>>> T1
>>>> ON T1.id = T2.id;
>>>> 
>>>> So I think the only controversial point is it only allows constant
>>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>>> 
>>>> DECLARE @ADayAgo datetime2
>>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>>> /*Comparison between two points in time for subset of rows*/
>>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>>> D_1_Ago.[DeptName], D.[DeptName],
>>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>>> 
>>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>>> automatically varying. Which is very useful in many scenarios.
>>>> 
>>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>> 
>>>> - Jark Wu
>>>> 
>>>> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
>>>> 
>>>> Jark,
>>>> 
>>>> One more thing.
>>>> 
>>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>>> Calcite, & Flink”[1] now. It will be a couple of days before I have concrete
>>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>>> of thought, seems to have incorporated my thinking, and generally does
>>>> excellent work.
>>>> 
>>>> So, definitely give his document serious consideration, as I am.
>>>> 
>>>> When we all agree that we have the concepts right, I think it likely that we
>>>> can embrace the syntax and semantics of temporal support that were
>>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>>> for areas that it already covers. And by the way, I added PERIOD support to
>>>> Calcite a while ago because that looked useful and un-contraversial.
>>>> 
>>>> Julian
>>>> 
>>>> [1] http://s.apache.org/streaming-sql-spec
>>>> 
>>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>>> 
>>>> 
>>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
>>>> 
>>>> This looks very interesting.
>>>> 
>>>> Are you familiar with the proposal I created in September, "Streams,
>>>> joins and temporal tables"[1]? I started thinking about
>>>> stream-to-table joins, where the tables where time-varying, and ended
>>>> up with temporal database semantics.
>>>> 
>>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>>> you couldn't choose for table A to be at timestamp X and table B to be
>>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>>> constant, whereas we require the timestamp to be automatically
>>>> varying.
>>>> 
>>>> I think you have come to similar conclusions. I would like to hear how
>>>> your proposal fits with mine.
>>>> 
>>>> 
>>>> Julian
>>>> 
>>>> [1]
>>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>>> 
>>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> We are looking for the temporal support in Calcite, or so-called
>>>> system-versioned temporal table. The temporal database feature was
>>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>>>> system-versioned temporal table is designed to keep a full history of data
>>>> changes and allow easy point in time analysis. It is very useful for
>>>> scenarios that require tracking history of data changes.
>>>> 
>>>> Because system-versioned tables are intended primarily for tracking
>>>> historical data changes, queries on system-versioned tables often tend to be
>>>> concerned with retrieving the table content as of a given point in time or
>>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>>> OF syntactic extension for this specific purpose. For example, the following
>>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>> SELECT ENo,EName,Sys_Start,Sys_End
>>>> FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>>> 
>>>> In addition, we need the time expression can be a relational expression
>>>> whose value is from another table. For example, the following query joins
>>>> the Orders to the Prices as the price was at the order time:
>>>> SELECT STREAM *
>>>> FROM Orders AS o
>>>> JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>>   ON o.productId = p.productId
>>>> 
>>>> So I would like to introduce the syntactic extension in Calcite. What do you
>>>> think about this? Any comments or suggestions are welcome!
>>>> 
>>>> [1]
>>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>>> [2]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>>> [3]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>>> 
>>>> Bests,
>>>> Jark Wu
>>>> 
>>>> 
>> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by Julian Hyde <jh...@apache.org>.
Can you clarify whether you are interested in (a) streaming queries, (b) temporal non-streaming queries, or both? If (b), it is covered by the standard, and we should follow the standard, period. If (a), syntax and semantics are still under discussion. 

If you want both, could we perhaps separate the streaming and non-streaming cases? Could you log a jira case for supporting "FOR SYSTEM_TIME AS OF” in regular (non-streaming) queries?

> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant.

I disagree. The standard says that it cannot contain a column reference:

> If <query system time period specification> QSTPS is specified, then:
> a) The table identified by the <table or query name> shall be a system-versioned table.
> b) If BETWEEN is speci ed and neither SYMMETRIC nor ASYMMETRIC is specified, then ASYMMETRIC is implicit.
> c) QSTPS shall not contain a <column reference> or an <SQL parameter reference>.

We can go beyond the standard, but let’s be clear what it says.

Julian




> On Jul 31, 2017, at 5:15 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
> 
> Hi Julian,
> 
> The syntax you posted looks very good to me. Besides of that, we hope to extend the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 
> 
> The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is still a query constant, but what we want is a variable timestamp changing with every record. The example in “Streams, joins and temporal tables” [1] fits our requirement much more. It uses the table-value function to express and not introduce new syntax. But it  it is not a standard to manipulate temporal table. And it can only express querying the table content as of a specified point in time, but can not express a period time (can it?). That’s why we prefer the standard temporal syntax.
> 
> The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant. But I think, it is very useful to support column reference. For example: 
> 
>  SELECT  *
>    FROM Orders AS o
>    JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>      ON o.productId = p.productId
> 
> This is actually what we want. 
> 
> What do you think?
> 
> [1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
> - Jark Wu 
> 
>> 在 2017年7月29日,上午2:16,Julian Hyde <jh...@apache.org> 写道:
>> 
>> '@' is specific to SQL Server, and a form of bind variable, so I don't
>> think we should go with that approach.
>> 
>> The standard explicitly says "QSTPS shall not contain a <column
>> reference> or an <SQL parameter reference>." We may choose to ignore
>> that, but I think we can do what you want without bind variables:
>> 
>> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
>> 
>> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
>> value is constant throughout the execution of the query. So it behaves
>> somewhat like a bind variable.)
>> 
>> Can we go with the syntax in the standard? (The Microsoft syntax
>> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
>> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
>> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
>> timestamp literal.)
>> 
>> The standard has this:
>> 
>> <table primary> ::=
>>     <table or query name>
>>     [ <query system time period specification> ]
>>     [ <correlation or recognition> ]
>>   | ...
>> 
>> <correlation or recognition> ::=
>>     [ AS ] <correlation name>
>>     [ <parenthesized derived column list> ]
>>   | ...
>> 
>> <query system time period specification> ::=
>>     FOR SYSTEM_TIME AS OF <point in time 1>
>>   | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>>     <point in time 1> AND <point in time 2>
>>   | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
>> 
>> <point in time 1> ::=
>>   <point in time>
>> 
>> <point in time 2> ::=
>>   <point in time>
>> 
>> <point in time> ::=
>>   <datetime value expression>
>> 
>> Julian
>> 
>> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>> Hi Julian,
>>> 
>>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>>> tables” you proposed before I started the discussion. The “temporal table
>>> view” proposed in the doc is very interesting and well-defined. But we hope
>>> to support this feature based on the standard SQL semantics.
>>> 
>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>> you couldn't choose for table A to be at timestamp X and table B to be
>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>> constant, whereas we require the timestamp to be automatically
>>> varying.
>>> 
>>> 
>>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it
>>> follows after table not the whole query. And I also tried in SQL Server, the
>>> following SQL works well. The TemporalRates and TemporalRates2 are both
>>> temporal table with the same schema. The query chooses different timestamp
>>> of table TemporalRates and TemporalRates2:
>>> 
>>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>>> 07:49:07.5649523' AS T2
>>> JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>>> T1
>>> ON T1.id = T2.id;
>>> 
>>> So I think the only controversial point is it only allows constant
>>> timestamp. Then I find the following example from SQL Server doc [3] :
>>> 
>>> DECLARE @ADayAgo datetime2
>>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>>> /*Comparison between two points in time for subset of rows*/
>>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>>> D_1_Ago.[DeptName], D.[DeptName],
>>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>>> 
>>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>>> whether it is acceptable to support LATERAL temporal to make the timestamp
>>> automatically varying. Which is very useful in many scenarios.
>>> 
>>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>>> [3]
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>> 
>>> - Jark Wu
>>> 
>>> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
>>> 
>>> Jark,
>>> 
>>> One more thing.
>>> 
>>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>>> Calcite, & Flink”[1] now. It will be a couple of days before I have concrete
>>> feedback but I will say now that Tyler has been giving temporal joins a lot
>>> of thought, seems to have incorporated my thinking, and generally does
>>> excellent work.
>>> 
>>> So, definitely give his document serious consideration, as I am.
>>> 
>>> When we all agree that we have the concepts right, I think it likely that we
>>> can embrace the syntax and semantics of temporal support that were
>>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>>> for areas that it already covers. And by the way, I added PERIOD support to
>>> Calcite a while ago because that looked useful and un-contraversial.
>>> 
>>> Julian
>>> 
>>> [1] http://s.apache.org/streaming-sql-spec
>>> 
>>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>>> 
>>> 
>>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
>>> 
>>> This looks very interesting.
>>> 
>>> Are you familiar with the proposal I created in September, "Streams,
>>> joins and temporal tables"[1]? I started thinking about
>>> stream-to-table joins, where the tables where time-varying, and ended
>>> up with temporal database semantics.
>>> 
>>> But my impression of SQL:2011 (based on what is in Oracle) was that
>>> you could execute the WHOLE QUERY as of a particular timestamp, but
>>> you couldn't choose for table A to be at timestamp X and table B to be
>>> at timestamp Y. Furthermore, it only allows the timestamp to be
>>> constant, whereas we require the timestamp to be automatically
>>> varying.
>>> 
>>> I think you have come to similar conclusions. I would like to hear how
>>> your proposal fits with mine.
>>> 
>>> 
>>> Julian
>>> 
>>> [1]
>>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>>> 
>>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> We are looking for the temporal support in Calcite, or so-called
>>> system-versioned temporal table. The temporal database feature was
>>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>>> system-versioned temporal table is designed to keep a full history of data
>>> changes and allow easy point in time analysis. It is very useful for
>>> scenarios that require tracking history of data changes.
>>> 
>>> Because system-versioned tables are intended primarily for tracking
>>> historical data changes, queries on system-versioned tables often tend to be
>>> concerned with retrieving the table content as of a given point in time or
>>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>>> OF syntactic extension for this specific purpose. For example, the following
>>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>  SELECT ENo,EName,Sys_Start,Sys_End
>>>  FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>> 
>>> In addition, we need the time expression can be a relational expression
>>> whose value is from another table. For example, the following query joins
>>> the Orders to the Prices as the price was at the order time:
>>>  SELECT STREAM *
>>>  FROM Orders AS o
>>>  JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>    ON o.productId = p.productId
>>> 
>>> So I would like to introduce the syntactic extension in Calcite. What do you
>>> think about this? Any comments or suggestions are welcome!
>>> 
>>> [1]
>>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>>> [2]
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>>> [3]
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>>> 
>>> Bests,
>>> Jark Wu
>>> 
>>> 
> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by "伍翀(云邪)" <wu...@alibaba-inc.com>.
Hi Julian,

The syntax you posted looks very good to me. Besides of that, we hope to extend the <point in time> to support column reference(i.e. LATERAL + temporal  syntax). 

The ‘@ADayAgo’ bind variable is not a good example for our requirement. It is still a query constant, but what we want is a variable timestamp changing with every record. The example in “Streams, joins and temporal tables” [1] fits our requirement much more. It uses the table-value function to express and not introduce new syntax. But it  it is not a standard to manipulate temporal table. And it can only express querying the table content as of a specified point in time, but can not express a period time (can it?). That’s why we prefer the standard temporal syntax.

The SQL standard doesn’t explicitly say the <point in time> must be a constant, just all the databases only support constant. But I think, it is very useful to support column reference. For example: 

  SELECT  *
    FROM Orders AS o
    JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
      ON o.productId = p.productId

This is actually what we want. 

What do you think?

[1] https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit# <https://docs.google.com/document/d/1RvnLEEQK92axdAaZ9XIU5szpkbGqFMBtzYiIY4dHe0Q/edit#>
- Jark Wu 

> 在 2017年7月29日,上午2:16,Julian Hyde <jh...@apache.org> 写道:
> 
> '@' is specific to SQL Server, and a form of bind variable, so I don't
> think we should go with that approach.
> 
> The standard explicitly says "QSTPS shall not contain a <column
> reference> or an <SQL parameter reference>." We may choose to ignore
> that, but I think we can do what you want without bind variables:
> 
> ... FROM dept FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP - INTERVAL '1' DAY
> 
> will suffice. (Recall that CURRENT_TIMESTAMP is a function, but its
> value is constant throughout the execution of the query. So it behaves
> somewhat like a bind variable.)
> 
> Can we go with the syntax in the standard? (The Microsoft syntax
> diagram has a bug in it - the '| ...' line should be '[ ... ]', I
> think - and Oracle's syntax throws in TIMESTAMP as a keyword, which
> would create "AS OF TIMESTAMP TIMESTAMP" if the expression is a
> timestamp literal.)
> 
> The standard has this:
> 
>  <table primary> ::=
>      <table or query name>
>      [ <query system time period specification> ]
>      [ <correlation or recognition> ]
>    | ...
> 
>  <correlation or recognition> ::=
>      [ AS ] <correlation name>
>      [ <parenthesized derived column list> ]
>    | ...
> 
>  <query system time period specification> ::=
>      FOR SYSTEM_TIME AS OF <point in time 1>
>    | FOR SYSTEM_TIME BETWEEN [ ASYMMETRIC | SYMMETRIC ]
>      <point in time 1> AND <point in time 2>
>    | FOR SYSTEM_TIME FROM <point in time 1> TO <point in time 2>
> 
>  <point in time 1> ::=
>    <point in time>
> 
>  <point in time 2> ::=
>    <point in time>
> 
>  <point in time> ::=
>    <datetime value expression>
> 
> Julian
> 
> On Fri, Jul 28, 2017 at 2:49 AM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>> Hi Julian,
>> 
>> Thanks for the valuable input. We have read  “Streams, joins and temporal
>> tables” you proposed before I started the discussion. The “temporal table
>> view” proposed in the doc is very interesting and well-defined. But we hope
>> to support this feature based on the standard SQL semantics.
>> 
>> But my impression of SQL:2011 (based on what is in Oracle) was that
>> you could execute the WHOLE QUERY as of a particular timestamp, but
>> you couldn't choose for table A to be at timestamp X and table B to be
>> at timestamp Y. Furthermore, it only allows the timestamp to be
>> constant, whereas we require the timestamp to be automatically
>> varying.
>> 
>> 
>> Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in
>> SQL:2011, but I think it effects the Table not the query. In SQL Server’s
>> “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it
>> follows after table not the whole query. And I also tried in SQL Server, the
>> following SQL works well. The TemporalRates and TemporalRates2 are both
>> temporal table with the same schema. The query chooses different timestamp
>> of table TemporalRates and TemporalRates2:
>> 
>> SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28
>> 07:49:07.5649523' AS T2
>>  JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS
>> T1
>>  ON T1.id = T2.id;
>> 
>> So I think the only controversial point is it only allows constant
>> timestamp. Then I find the following example from SQL Server doc [3] :
>> 
>> DECLARE @ADayAgo datetime2
>> SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())
>> /*Comparison between two points in time for subset of rows*/
>> SELECT D_1_Ago.[DeptID], D.[DeptID],
>> D_1_Ago.[DeptName], D.[DeptName],
>> D_1_Ago.[SysStartTime], D.[SysStartTime],
>> D_1_Ago.[SysEndTime], D.[SysEndTime]
>> FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago
>> JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]
>> AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;
>> 
>> The time @ADayAgo is a variable not a constant here, so I’m wondering
>> whether it is acceptable to support LATERAL temporal to make the timestamp
>> automatically varying. Which is very useful in many scenarios.
>> 
>> [1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql
>> [2] https://oracle-base.com/articles/12c/temporal-validity-12cr1
>> [3]
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>> 
>> - Jark Wu
>> 
>> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
>> 
>> Jark,
>> 
>> One more thing.
>> 
>> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam,
>> Calcite, & Flink”[1] now. It will be a couple of days before I have concrete
>> feedback but I will say now that Tyler has been giving temporal joins a lot
>> of thought, seems to have incorporated my thinking, and generally does
>> excellent work.
>> 
>> So, definitely give his document serious consideration, as I am.
>> 
>> When we all agree that we have the concepts right, I think it likely that we
>> can embrace the syntax and semantics of temporal support that were
>> introduced in SQL:2011. It’s important that we stay within the SQL standard
>> for areas that it already covers. And by the way, I added PERIOD support to
>> Calcite a while ago because that looked useful and un-contraversial.
>> 
>> Julian
>> 
>> [1] http://s.apache.org/streaming-sql-spec
>> 
>> [2] https://issues.apache.org/jira/browse/CALCITE-715
>> 
>> 
>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> This looks very interesting.
>> 
>> Are you familiar with the proposal I created in September, "Streams,
>> joins and temporal tables"[1]? I started thinking about
>> stream-to-table joins, where the tables where time-varying, and ended
>> up with temporal database semantics.
>> 
>> But my impression of SQL:2011 (based on what is in Oracle) was that
>> you could execute the WHOLE QUERY as of a particular timestamp, but
>> you couldn't choose for table A to be at timestamp X and table B to be
>> at timestamp Y. Furthermore, it only allows the timestamp to be
>> constant, whereas we require the timestamp to be automatically
>> varying.
>> 
>> I think you have come to similar conclusions. I would like to hear how
>> your proposal fits with mine.
>> 
>> 
>> Julian
>> 
>> [1]
>> https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
>> 
>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>> 
>> Hi all,
>> 
>> We are looking for the temporal support in Calcite, or so-called
>> system-versioned temporal table. The temporal database feature was
>> introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A
>> system-versioned temporal table is designed to keep a full history of data
>> changes and allow easy point in time analysis. It is very useful for
>> scenarios that require tracking history of data changes.
>> 
>> Because system-versioned tables are intended primarily for tracking
>> historical data changes, queries on system-versioned tables often tend to be
>> concerned with retrieving the table content as of a given point in time or
>> between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS
>> OF syntactic extension for this specific purpose. For example, the following
>> query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>   SELECT ENo,EName,Sys_Start,Sys_End
>>   FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>> 
>> In addition, we need the time expression can be a relational expression
>> whose value is from another table. For example, the following query joins
>> the Orders to the Prices as the price was at the order time:
>>   SELECT STREAM *
>>   FROM Orders AS o
>>   JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>     ON o.productId = p.productId
>> 
>> So I would like to introduce the syntactic extension in Calcite. What do you
>> think about this? Any comments or suggestions are welcome!
>> 
>> [1]
>> https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>> [2]
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>> [3]
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>> 
>> Bests,
>> Jark Wu
>> 
>> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by "伍翀(云邪)" <wu...@alibaba-inc.com>.
Hi Julian,

Thanks for the valuable input. We have read  “Streams, joins and temporal tables” you proposed before I started the discussion. The “temporal table view” proposed in the doc is very interesting and well-defined. But we hope to support this feature based on the standard SQL semantics. 

>> But my impression of SQL:2011 (based on what is in Oracle) was that
>> you could execute the WHOLE QUERY as of a particular timestamp, but
>> you couldn't choose for table A to be at timestamp X and table B to be
>> at timestamp Y. Furthermore, it only allows the timestamp to be
>> constant, whereas we require the timestamp to be automatically
>> varying.


Regarding to “WHOLE QUERY” or “Table”, I didn't find any declaration in SQL:2011, but I think it effects the Table not the query. In SQL Server’s “FROM” syntax[1], it is a subset of table syntax. In Oracle’s example[2], it follows after table not the whole query. And I also tried in SQL Server, the following SQL works well. The TemporalRates and TemporalRates2 are both temporal table with the same schema. The query chooses different timestamp of table TemporalRates and TemporalRates2:

SELECT * FROM TemporalRates2 FOR SYSTEM_TIME AS OF '2017-07-28 07:49:07.5649523' AS T2
  JOIN TemporalRates FOR SYSTEM_TIME AS OF '2017-07-28 04:26:19.5126849' AS T1
  ON T1.id = T2.id;

So I think the only controversial point is it only allows constant timestamp. Then I find the following example from SQL Server doc [3] :

DECLARE @ADayAgo datetime2   
SET @ADayAgo = DATEADD (day, -1, sysutcdatetime())   
/*Comparison between two points in time for subset of rows*/   
SELECT D_1_Ago.[DeptID], D.[DeptID],   
D_1_Ago.[DeptName], D.[DeptName],   
D_1_Ago.[SysStartTime], D.[SysStartTime],   
D_1_Ago.[SysEndTime], D.[SysEndTime]   
FROM [dbo].[Department] FOR SYSTEM_TIME AS OF @ADayAgo AS D_1_Ago   
JOIN [Department] AS D ON  D_1_Ago.[DeptID] = [D].[DeptID]    
AND D_1_Ago.[DeptID] BETWEEN 1 and 5 ;

The time @ADayAgo is a variable not a constant here, so I’m wondering whether it is acceptable to support LATERAL temporal to make the timestamp automatically varying. Which is very useful in many scenarios.

[1] https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql <https://docs.microsoft.com/en-us/sql/t-sql/queries/from-transact-sql>
[2] https://oracle-base.com/articles/12c/temporal-validity-12cr1 <https://oracle-base.com/articles/12c/temporal-validity-12cr1>
[3] https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table <https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table>

- Jark Wu 

> 在 2017年7月28日,上午9:03,Julian Hyde <jh...@apache.org> 写道:
> 
> Jark,
> 
> One more thing.
> 
> I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam, Calcite, & Flink”[1] now. It will be a couple of days before I have concrete feedback but I will say now that Tyler has been giving temporal joins a lot of thought, seems to have incorporated my thinking, and generally does excellent work.
> 
> So, definitely give his document serious consideration, as I am.
> 
> When we all agree that we have the concepts right, I think it likely that we can embrace the syntax and semantics of temporal support that were introduced in SQL:2011. It’s important that we stay within the SQL standard for areas that it already covers. And by the way, I added PERIOD support to Calcite a while ago because that looked useful and un-contraversial.
> 
> Julian
> 
> [1] http://s.apache.org/streaming-sql-spec <http://s.apache.org/streaming-sql-spec> 
> 
> [2] https://issues.apache.org/jira/browse/CALCITE-715 <https://issues.apache.org/jira/browse/CALCITE-715>
>  
> 
>> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>> 
>> This looks very interesting.
>> 
>> Are you familiar with the proposal I created in September, "Streams,
>> joins and temporal tables"[1]? I started thinking about
>> stream-to-table joins, where the tables where time-varying, and ended
>> up with temporal database semantics.
>> 
>> But my impression of SQL:2011 (based on what is in Oracle) was that
>> you could execute the WHOLE QUERY as of a particular timestamp, but
>> you couldn't choose for table A to be at timestamp X and table B to be
>> at timestamp Y. Furthermore, it only allows the timestamp to be
>> constant, whereas we require the timestamp to be automatically
>> varying.
>> 
>> I think you have come to similar conclusions. I would like to hear how
>> your proposal fits with mine.
>> 
>> 
>> Julian
>> 
>> [1] https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E <https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E>
>> 
>> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wuchong.wc@alibaba-inc.com <ma...@alibaba-inc.com>> wrote:
>>> Hi all,
>>> 
>>> We are looking for the temporal support in Calcite, or so-called system-versioned temporal table. The temporal database feature was introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A system-versioned temporal table is designed to keep a full history of data changes and allow easy point in time analysis. It is very useful for scenarios that require tracking history of data changes.
>>> 
>>> Because system-versioned tables are intended primarily for tracking historical data changes, queries on system-versioned tables often tend to be concerned with retrieving the table content as of a given point in time or between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS OF syntactic extension for this specific purpose. For example, the following query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>>    SELECT ENo,EName,Sys_Start,Sys_End
>>>    FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>>> 
>>> In addition, we need the time expression can be a relational expression whose value is from another table. For example, the following query joins the Orders to the Prices as the price was at the order time:
>>>    SELECT STREAM *
>>>    FROM Orders AS o
>>>    JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>>      ON o.productId = p.productId
>>> 
>>> So I would like to introduce the syntactic extension in Calcite. What do you think about this? Any comments or suggestions are welcome!
>>> 
>>> [1] https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf <https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf>
>>> [2] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables <https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables>
>>> [3] https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table <https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table>
>>> [4] http://sqlhints.com/tag/for-system_time-as-of/ <http://sqlhints.com/tag/for-system_time-as-of/>
>>> 
>>> Bests,
>>> Jark Wu
> 


Re: [DISCUSS] Temporal Feature in Calcite

Posted by Julian Hyde <jh...@apache.org>.
Jark,

One more thing.

I am reading Tyler Akidau’s "Robust Streaming SQL in Apache Apex, Beam, Calcite, & Flink”[1] now. It will be a couple of days before I have concrete feedback but I will say now that Tyler has been giving temporal joins a lot of thought, seems to have incorporated my thinking, and generally does excellent work.

So, definitely give his document serious consideration, as I am.

When we all agree that we have the concepts right, I think it likely that we can embrace the syntax and semantics of temporal support that were introduced in SQL:2011. It’s important that we stay within the SQL standard for areas that it already covers. And by the way, I added PERIOD support to Calcite a while ago because that looked useful and un-contraversial.

Julian

[1] http://s.apache.org/streaming-sql-spec <http://s.apache.org/streaming-sql-spec> 

[2] https://issues.apache.org/jira/browse/CALCITE-715 <https://issues.apache.org/jira/browse/CALCITE-715>
 

> On Jul 27, 2017, at 10:17 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> This looks very interesting.
> 
> Are you familiar with the proposal I created in September, "Streams,
> joins and temporal tables"[1]? I started thinking about
> stream-to-table joins, where the tables where time-varying, and ended
> up with temporal database semantics.
> 
> But my impression of SQL:2011 (based on what is in Oracle) was that
> you could execute the WHOLE QUERY as of a particular timestamp, but
> you couldn't choose for table A to be at timestamp X and table B to be
> at timestamp Y. Furthermore, it only allows the timestamp to be
> constant, whereas we require the timestamp to be automatically
> varying.
> 
> I think you have come to similar conclusions. I would like to hear how
> your proposal fits with mine.
> 
> 
> Julian
> 
> [1] https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E
> 
> On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
>> Hi all,
>> 
>> We are looking for the temporal support in Calcite, or so-called system-versioned temporal table. The temporal database feature was introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A system-versioned temporal table is designed to keep a full history of data changes and allow easy point in time analysis. It is very useful for scenarios that require tracking history of data changes.
>> 
>> Because system-versioned tables are intended primarily for tracking historical data changes, queries on system-versioned tables often tend to be concerned with retrieving the table content as of a given point in time or between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS OF syntactic extension for this specific purpose. For example, the following query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>>    SELECT ENo,EName,Sys_Start,Sys_End
>>    FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>> 
>> In addition, we need the time expression can be a relational expression whose value is from another table. For example, the following query joins the Orders to the Prices as the price was at the order time:
>>    SELECT STREAM *
>>    FROM Orders AS o
>>    JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>>      ON o.productId = p.productId
>> 
>> So I would like to introduce the syntactic extension in Calcite. What do you think about this? Any comments or suggestions are welcome!
>> 
>> [1] https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
>> [2] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
>> [3] https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
>> [4] http://sqlhints.com/tag/for-system_time-as-of/
>> 
>> Bests,
>> Jark Wu


Re: [DISCUSS] Temporal Feature in Calcite

Posted by Julian Hyde <jh...@apache.org>.
This looks very interesting.

Are you familiar with the proposal I created in September, "Streams,
joins and temporal tables"[1]? I started thinking about
stream-to-table joins, where the tables where time-varying, and ended
up with temporal database semantics.

But my impression of SQL:2011 (based on what is in Oracle) was that
you could execute the WHOLE QUERY as of a particular timestamp, but
you couldn't choose for table A to be at timestamp X and table B to be
at timestamp Y. Furthermore, it only allows the timestamp to be
constant, whereas we require the timestamp to be automatically
varying.

I think you have come to similar conclusions. I would like to hear how
your proposal fits with mine.


Julian

[1] https://lists.apache.org/thread.html/4914256ad347e1d2a72506e2773e59590312d820f04cfcddaffaef1e@%3Cdev.calcite.apache.org%3E

On Wed, Jul 26, 2017 at 11:39 PM, 伍翀(云邪) <wu...@alibaba-inc.com> wrote:
> Hi all,
>
> We are looking for the temporal support in Calcite, or so-called system-versioned temporal table. The temporal database feature was introduced in ANSI SQL 2011 [1] and was supported in SQL Server 2017 [2]. A system-versioned temporal table is designed to keep a full history of data changes and allow easy point in time analysis. It is very useful for scenarios that require tracking history of data changes.
>
> Because system-versioned tables are intended primarily for tracking historical data changes, queries on system-versioned tables often tend to be concerned with retrieving the table content as of a given point in time or between any two given points in time. SQL:2011 provides FOR SYSTEM_TIME AS OF syntactic extension for this specific purpose. For example, the following query retrieves the rows of Emp that were current as of Jan. 2, 2011:
>     SELECT ENo,EName,Sys_Start,Sys_End
>     FROM Emp FOR SYSTEM_TIME AS OF TIMESTAMP '2011-01-02 00:00:00'
>
> In addition, we need the time expression can be a relational expression whose value is from another table. For example, the following query joins the Orders to the Prices as the price was at the order time:
>     SELECT STREAM *
>     FROM Orders AS o
>     JOIN LATERAL ProductPrices FOR SYSTEM_TIME AS OF o.orderTime AS p
>       ON o.productId = p.productId
>
> So I would like to introduce the syntactic extension in Calcite. What do you think about this? Any comments or suggestions are welcome!
>
> [1] https://sigmodrecord.org/publications/sigmodRecord/1209/pdfs/07.industry.kulkarni.pdf
> [2] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
> [3] https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table
> [4] http://sqlhints.com/tag/for-system_time-as-of/
>
> Bests,
> Jark Wu