You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Leonard Xu <xb...@gmail.com> on 2020/07/28 07:56:22 UTC

[DISCUSS] FLIP-132: Temporal Table DDL

Hello  everyone,

Currently, user can correlate temporal table by temporal table join to enrich their fact table, a temporal table can be a changing history table which tracks the changes(e.g. database changelog) or a changing dimensioned table which materializes the changes(e.g. database table). For changing dimensioned table, Flink uses DDL to define a temporal table and visits the temporal table data by looking up the external system’s table. For changing history table, Flink uses temporal table function to define a parameterized view of changing history table and then accesses the data of view, but Temporal Table Function can only be called via Table API or YAML which is pretty inconvenient for FLINK SQL users. If we can support temporal table DDL, user no longer needs temporal table function, and they can visit temporal table easily in pure SQL world. 

Flink SQL obtains the ability to interpret changelog after FLIP-95[1], changelog is natural temporal table which contains all versioned data of the the origin database table. Support temporal table on changelog would help user visit specific version of the original database table, this will enrich Flink temporal table join scenario much.

The community have already had a related discussion thread[2] and reach some consensus. Based on that, I prepare the FLIP-132[3] to support Temporal Table DDL, please see the FLIP doc for more details.

Any comment is welcome, I'm looking  forward to your feedback.
 
Best
Leonard
[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces <https://cwiki.apache.org/confluence/display/FLINK/FLIP-95:+New+TableSource+and+TableSink+interfaces>
[2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLINK-16824-Creating-Temporal-Table-Function-via-DDL-td40333.html <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLINK-16824-Creating-Temporal-Table-Function-via-DDL-td40333.html>
[3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks Fabian, Rui and Jark for the nice discussion!

It seems everyone involved in this discussion has reached a consensus.

I will start another vote thread  later.

Best,
Leonard

> 在 2020年8月24日,20:54,Fabian Hueske <fh...@gmail.com> 写道:
> 
> Hi everyone,
> 
> Thanks for the good discussion!
> 
> I'm fine keeping the names "event-time temporal join" and "processing-time temporal join".
> Also +1 for Rui's proposal using "versioned table" for versioned dynamic table and "regular table" for regular dynamic table.
> 
> Thanks,
> Fabian
> 
> 
> 
> Am Mo., 24. Aug. 2020 um 04:24 Uhr schrieb Jark Wu <imjark@gmail.com <ma...@gmail.com>>:
> I think we have to make some compromise here. Either updating the definition of "temporal table", or extending the definition of "temporal join".
> I'm also fine with Rui's proposal that "temporal join" can also work with a regular table.
> 
> Best,
> Jark
> 
> On Fri, 21 Aug 2020 at 23:49, Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
> Thanks @Fabian @Jark @Rui for sharing your opinions.
> 
> For the the small divergence about choose a temporal join name or temporal table name,
> I don't have strong inclination.
> 
> Regarding to choose a different name for this join:
> I agree with Jark and Rui to keep the existing  "event-time temporal join" and "processing-time temporal join".
> 
> Regarding to name the "temporal table without versions/ latest-only temporal table":
> Although Flink has imported Temporal Table concept for users[1], I think users are more familiar with dynamic table concept.
> If we wouldn't like to import more temporal table concepts, Rui's proposal using "versioned table" for versioned dynamic table and "regular table" for regular dynamic table is fine to me.
> 
> HDYT? @Fabian  @Jark
> 
> 
> Best
> Leonard 
> [1]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html> 
> 
>> 在 2020年8月21日,14:18,Rui Li <lirui.fudan@gmail.com <ma...@gmail.com>> 写道:
>> 
>> Hi guys,
>> 
>> Just my two cents.
>> 
>> I agree with Jark that we should use "event-time/processing-time temporal join" as the name for this join.
>> 
>> But I'm not sure about the definition of "temporal table".
>> 
>> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a non-temporal table, which also sounds like contradictions.
>> 
>> Why would we require "temporal join" only joins a "temporal table", if the standard doesn't even define what is a "temporal table"?
>> 
>> What about defining the term "temporal table" to be "a table is changing over time", the same as the "dynamic table".
>>  
>> I'd prefer not to introduce another term if it has the same meaning as "dynamic table".
>> 
>> I wonder whether it's possible to consider "temporal join" as a specific kind of join that may work with regular or versioned tables, instead
>> of a join that requires a specific kind of table.
>> 
>> On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <imjark@gmail.com <ma...@gmail.com>> wrote:
>> Hi everyone,
>> 
>> Thank you for the great discussion @Leonard and @Fabian.
>> 
>> Regarding to choose a different name for this join:
>> 
>> From my point of view, I don't agree to introduce a new grammar called whatever "lookup join" or "version join", because:
>> 1. "lookup" is a physical behavior not a logical concept. 
>> 2. The SQL standard has proposed temporal join and it fits Flink well with the event-time and processing-time attributes. 
>> 3. We already have so many different join grammer, e.g. "regular join", "interval join", "temporal join", and maybe "window join" in the future.
>>     It may confuse users when having more join concepts. 
>> 
>> So I think the existing "event-time temporal join" and "processing-time temporal join" work well and we should still use them. 
>> 
>> Regarding to the "temporal table without versions":
>> 
>> I agree there are contradictions between "temporal table" and "temporal table without versions" if we think "temporal table" tracks full history.
>> However, if we call "temporal table without versions" as a "regular table", not a kind of "temporal table". 
>> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a non-temporal table, which also sounds like contradictions.
>> 
>> I also notice that in SQL Server, the "system-versioned temporal table" is a combination of "Temporal Table" (current data) and "History Table" [1]. 
>> The "Temporal Table" is the current data of the database which is the same as our definition of "temporal table without version". The documentation says:
>> 
>> > This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table.
>> 
>> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I think we can define the meaning ourselves.
>> 
>> > Interestingly, SQL:2011 manages to provide this (system-versioned table) support without actually defining or using the terms “temporal data” or “temporal table”.
>> 
>> What about defining the term "temporal table" to be "a table is changing over time", the same as the "dynamic table".
>> A "versioned temporal table" is a special temporal table which tracks the full history of the changing table and supports point-in-time access. 
>> A regular temporal table only supports access to the current data (no earlier versions).
>> 
>> In this way, the "temporal join" still makes sense, because the joined table is a "temporal table", and all the tables in Flink is "temporal table" (or dynamic table).
>> 
>> Best,
>> Jark
>> 
>> [1]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table <https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table>
>> [2]: https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf <https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf>
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
>> Hi everyone,
>> 
>> Yes, just to help user distinguish the difference between versioned temporal table and  latest-only temporal table.
>> 
>> 
>> I don't think we help users to understand the differences if we invent new (IMO confusing) terms ("temporal table without version" or "latest-only temporal table") instead of using known terminology ("table").
>>  
>>> As I suggested before, it might make sense to choose a different name for this join instead of using a fancy name for the tables such that the name fits the name of the join.
>> I think lookup join is a physical implementation of Processing-Time temporal table join, we can even offer a lookup join implementation for Event-time temporal table join if the external system support lookup history data.
>> 
>> Versioned temporal table can be used in Event-time temporal table join, all temporal table(including versioned temporal tale and regular table) can be used in Processing-time temporal table join, 
>> lookup join is a kind of physical implementation of  Processing-time temporal table join which lookups the external system’s data.
>> 
>> 
>> Lookup Join was just a quick shot proposal without putting much thought into it, but a different name might help to solve the naming issue with "temporal table without versions".
>> Another proposal would be "Version Join" or "Table Version Join".
>>  
>> If we agree regular tables are temporal table, I tend to keep latest-only temporal table to clarify the temporal and version concept in regular table.
>> 
>> 
>> In my understanding access to a table's history is the core property of temporal tables (that's also how temporal tables are defined for SQL Server [1] or PostgreSQL [2]).
>> A "temporal table without versions"or "latest-only temporal table" is just a table. It does not have any other property than all the tables we've always dealt with.
>> 
>> So I would say that there are temporal tables (for which earlier versions can be accessed) and just tables like those that our users have always used so far (for which there are no earlier versions).
>> 
>> Maybe it's just me, but "temporal table without versions" or "latest-only temporal table" sound like contradictions to me, similar to "tiny skyscraper".
>> 
>> How about we try to get a few more opinions on this?
>> What do others think about this issue?
>> 
>> 
>> Best
>> Leonard
>> 
>> 
>> Best, 
>> Fabian
>> 
>> [1] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 <https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15>
>> [2] https://pgxn.org/dist/temporal_tables/ <https://pgxn.org/dist/temporal_tables/>
>> 
>> 
>>> Cheers, 
>>> Fabian
>>> 
>>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>>:
>>> Thanks Fabian and Seth for the feedback
>>> 
>>> I agree the name “temporal table without version” is less accurate because this kind of temporal table has a latest version rather than has no version.
>>>  
>>> How about “Latest-only temporal table” ? The related concept section updated as following:
>>> 
>>> Temporal Table: Temporal table is a table that evolves over time, rows in temporal table are associated with one or more temporal periods.
>>>     Version: A temporal table can split into a set of versioned table snapshots, the version in table snapshots represents the valid life circle of rows, the start time and the end time of the valid period can be  assigned by users.
>>>     Versioned temporal table: If the row in temporal table can track its history changes and visit its history versions, we called this kind of temporal table as versioned temporal table.
>>>     Latest-only temporal table: If the row in temporal table can only track its latest version, we called this kind of temporal table as latest-only temporal table. The temporal table in lookup join can only track its latest version and thus it's also a latest-only temporal table.
>>> 
>>> Best
>>> Leonard
>>> 
>>>> 在 2020年8月19日,04:46,Seth Wiesman <sjwiesman@gmail.com <ma...@gmail.com>> 写道:
>>>> 
>>>> +1 to the updated design.
>>>> 
>>>> I agree with Fabian that the naming of "temporal table without version" is
>>>> a bit confusing but the actual semantics make sense to me. I think just
>>>> saying its a Flink managed lookup join makes sense.
>>>> 
>>>> Seth
>>>> 
>>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>>> Thanks for the updated FLIP Leonard!
>>>>> In my opinion this was an improvement.
>>>>> So +1 for this design.
>>>>> 
>>>>> I have just one remark regarding the terminology.
>>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>>> IMO, versions are the core principle of temporal tables and temporal
>>>>> tables without versions don't make much sense to me.
>>>>> 
>>>>> What makes such a table a "Temporal" table? Isn't it just a regular table?
>>>>> If I understand the proposal correctly, "Temporal Tables without Version"
>>>>> can only be used in processing time temporal table joins, because this join
>>>>> only requests the current version.
>>>>> But all regular tables can be used in processing time (temporal) table
>>>>> joins as well.
>>>>> It's basically the same as a lookup join, with the only difference that
>>>>> the table is maintained in Flink and not accessed in an external system
>>>>> (for example via JDBC).
>>>>> 
>>>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>>>> be used in "processing time temporal table joins" and due to its name this
>>>>> join needs to join something that's called "Temporal"?
>>>>> In that case, we might want to rename "processing time temporal table
>>>>> joins" into something else that does not imply a versioning.
>>>>> Maybe we can call them just lookup joins to avoid introducing another term?
>>>>> 
>>>>> Thanks, Fabian
>>>>> 
>>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <lirui.fudan@gmail.com <ma...@gmail.com>>:
>>>>> 
>>>>>> Thanks Leonard for the clarifications!
>>>>>> 
>>>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>>> But are we still able to track different views of such a
>>>>>>>> table through time, as rows are added/deleted to/from the table?
>>>>>>> 
>>>>>>> Yes, in fact we support temporal table from changlog which contains all
>>>>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>>>> 
>>>>>>>> For
>>>>>>>> example, suppose I have an append-only table source with event-time
>>>>>>> and PK,
>>>>>>>> will I be allowed to do an event-time temporal join with this table?
>>>>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>>>>> this case exactly.
>>>>>>> 
>>>>>>> Best
>>>>>>> Leonard
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi, all
>>>>>>>>> 
>>>>>>>>> After a detailed offline discussion about the temporal table related
>>>>>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>>>>>> alternatives.
>>>>>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>>>> unified
>>>>>>>>> story and also friendly to user and current Flink framework.
>>>>>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>>>>>> document organization to make it clear enough.
>>>>>>>>> 
>>>>>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>>>>>> comments.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> Leonard
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> <
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>>>> 
>>>>>>>>>> Hi, all
>>>>>>>>>> 
>>>>>>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>>>>>> 
>>>>>>>>>> Best
>>>>>>>>>> Leonard
>>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> <
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>>> 
>>>>>>>>> <
>>>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> <
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Hi, Timo
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for you response.
>>>>>>>>>>> 
>>>>>>>>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>>>> read
>>>>>>>>> "The operation time is the time when the changes happened in system."
>>>>>>> or
>>>>>>>>> "The system time of DML execution in database", why don't we call it
>>>>>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>>>> time in
>>>>>>>>> Flink should be thought through.
>>>>>>>>>>> 
>>>>>>>>>>> I agree that we should thought through. I have considered the name
>>>>>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>>>> the
>>>>>>>>> name.
>>>>>>>>>>> 
>>>>>>>>>>> I proposed `operationTime` because most changelog comes from
>>>>>>> Database
>>>>>>>>> and we always called an action as `operation` rather than `change` in
>>>>>>>>> Database, the operation time is  easier to understand  for database
>>>>>>> users,
>>>>>>>>> but it's more like a database terminology.
>>>>>>>>>>> 
>>>>>>>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>>>> not a
>>>>>>>>> good name.
>>>>>>>>>>> 
>>>>>>>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>>>> existed
>>>>>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>>>> `ChangelogTime`
>>>>>>>>> and I’ll update the FLIP.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>>>>> expose the concept of time through the user-level `Row` type? The
>>>>>>> FLIP does
>>>>>>>>> not mention this explictly. I think we can keep it as an internal
>>>>>>> concept
>>>>>>>>> but I just wanted to ask for clarification.
>>>>>>>>>>> 
>>>>>>>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>>>> that
>>>>>>>>> changelog time concept should be the third time concept(the other two
>>>>>>> are
>>>>>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>>>> help
>>>>>>>>> normal users) understand the three concepts accurately, and I did not
>>>>>>> find
>>>>>>>>> a big enough scenario that user need to touch the changelog time for
>>>>>>> now,
>>>>>>>>> so I tend to do not expose the concept to users.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Leonard
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>>>>>>>>> Thanks Konstantin,
>>>>>>>>>>>>> Regarding your questions, hope my comments has address your
>>>>>>> questions
>>>>>>>>> and I also add a few explanation in the FLIP.
>>>>>>>>>>>>> Thank you all for the feedback,
>>>>>>>>>>>>> It seems everyone involved  in this thread has reached a
>>>>>>> consensus.
>>>>>>>>>>>>> I will start a vote thread  later.
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <ma...@gmail.com> <mailto:
>>>>>>> godfreyhe@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>>>> godfreyhe@gmail.com <ma...@gmail.com> <mailto:godfreyhe@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Lennard for driving this FLIP.
>>>>>>>>>>>>>> Looks good to me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Godfrey
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>> imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>>>> 于2020年8月3日周一
>>>>>>>>> 下午12:04写道:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>>>> shape.
>>>>>>>>>>>>>>> +1 to start a vote.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>
>>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>
>>>>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com> <mailto:fhueske@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for this FLIP!
>>>>>>>>>>>>>>>> Looks good from my side.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers, Fabian
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>>>>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>>>
>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Leondard,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>>>>> really
>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>> and the semantics make sense to me.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Seth
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>
>>>>>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi, Konstantin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>>>>> joined
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>>>>> VIEW
>>>>>>>>>>>>>>>>>> definition that defines
>>>>>>>>>>>>>>>>>>> whether a event-time or processing time temporal table join
>>>>>>> is
>>>>>>>>>>>>>>> used?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think event-time or processing-time temporal table join
>>>>>>>>> depends on
>>>>>>>>>>>>>>>> fact
>>>>>>>>>>>>>>>>>> table’s time attribute in temporal join rather than from
>>>>>>> temporal
>>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>>> side, the event-time or processing time in temporal table is
>>>>>>> just
>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> split the validity period of versioned snapshot of temporal
>>>>>>>>> table.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>> processing time attribute is not  necessary for temporal
>>>>>>> table
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>> version, only the primary key is required, the following
>>>>>>> VIEW is
>>>>>>>>> also
>>>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>>>> for temporal table without version.
>>>>>>>>>>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>>>>> latest
>>>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>>> FROM rates
>>>>>>>>>>>>>>>>>> GROUP BY currency;                           -- inferred
>>>>>>> primary
>>>>>>>>> key
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>>>>> versioned
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> operation_time regardless of the lookup table attribute
>>>>>>>>> (event-time
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> processing time attribute), correct?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>>>> using the
>>>>>>>>>>>>>>>> o.time
>>>>>>>>>>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>>>>>>>>>>> For fact table has the processing time attribute, it means
>>>>>>> only
>>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> latest version of temporal table and we can do some
>>>>>>> optimization
>>>>>>>>> in
>>>>>>>>>>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best regards!
>>>>>>>> Rui Li
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Best regards!
>> Rui Li
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Fabian Hueske <fh...@gmail.com>.
Hi everyone,

Thanks for the good discussion!

I'm fine keeping the names "event-time temporal join" and "processing-time
temporal join".
Also +1 for Rui's proposal using "versioned table" for versioned dynamic
table and "regular table" for regular dynamic table.

Thanks,
Fabian



Am Mo., 24. Aug. 2020 um 04:24 Uhr schrieb Jark Wu <im...@gmail.com>:

> I think we have to make some compromise here. Either updating the
> definition of "temporal table", or extending the definition of "temporal
> join".
> I'm also fine with Rui's proposal that "temporal join" can also work with
> a regular table.
>
> Best,
> Jark
>
> On Fri, 21 Aug 2020 at 23:49, Leonard Xu <xb...@gmail.com> wrote:
>
>> Thanks @Fabian @Jark @Rui for sharing your opinions.
>>
>> For the the small divergence about choose a temporal join name or
>> temporal table name,
>> I don't have strong inclination.
>>
>> *Regarding to choose a different name for this join:*
>> I agree with Jark and Rui to keep the existing  "event-time temporal
>> join" and "processing-time temporal join".
>>
>> *Regarding to name the "temporal table without versions/ latest-only
>> temporal table":*
>> Although Flink has imported Temporal Table concept for users[1], I think
>> users are more familiar with dynamic table concept.
>> If we wouldn't like to import more temporal table concepts, Rui's
>> proposal using "versioned table" for versioned dynamic table and "regular
>> table" for regular dynamic table is fine to me.
>>
>> HDYT? @Fabian  @Jark
>>
>>
>> Best
>> Leonard
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html
>>
>>
>> 在 2020年8月21日,14:18,Rui Li <li...@gmail.com> 写道:
>>
>> Hi guys,
>>
>> Just my two cents.
>>
>> I agree with Jark that we should use "event-time/processing-time temporal
>> join" as the name for this join.
>>
>> But I'm not sure about the definition of "temporal table".
>>
>> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
>>> non-temporal table, which also sounds like contradictions.
>>>
>>
>> Why would we require "temporal join" only joins a "temporal table", if
>> the standard doesn't even define what is a "temporal table"?
>>
>> What about defining the term "temporal table" to be "a table is
>>> changing over time", the same as the "dynamic table".
>>>
>>
>> I'd prefer not to introduce another term if it has the same meaning as
>> "dynamic table".
>>
>> I wonder whether it's possible to consider "temporal join" as a specific
>> kind of join that may work with regular or versioned tables, instead
>> of a join that requires a specific kind of table.
>>
>> On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <im...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Thank you for the great discussion @Leonard and @Fabian.
>>>
>>> *Regarding to choose a different name for this join:*
>>>
>>> From my point of view, I don't agree to introduce a new grammar called
>>> whatever "lookup join" or "version join", because:
>>> 1. "lookup" is a physical behavior not a logical concept.
>>> 2. The SQL standard has proposed temporal join and it fits Flink well
>>> with the event-time and processing-time attributes.
>>> 3. We already have so many different join grammer, e.g. "regular join",
>>> "interval join", "temporal join", and maybe "window join" in the future.
>>>     It may confuse users when having more join concepts.
>>>
>>> So I think the existing "event-time temporal join" and "processing-time
>>> temporal join" work well and we should still use them.
>>>
>>> *Regarding to the "temporal table without versions":*
>>>
>>> I agree there are contradictions between "temporal table" and "temporal
>>> table without versions" if we think "temporal table" tracks full history.
>>> However, if we call "temporal table without versions" as a "regular
>>> table", not a kind of "temporal table".
>>> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
>>> non-temporal table, which also sounds like contradictions.
>>>
>>> I also notice that in SQL Server, the "system-versioned temporal table"
>>> is a combination of "Temporal Table" (current data) and "History Table"
>>> [1].
>>> The "Temporal Table" is the current data of the database which is the
>>> same as our definition of "temporal table without version". The
>>> documentation says:
>>>
>>> > *This additional table is referred to as the history table, while the
>>> main table that stores current (actual) row versions is referred to as the
>>> current table or simply as the temporal table. *
>>>
>>> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so
>>> I think we can define the meaning ourselves.
>>>
>>> *> Interestingly, SQL:2011 manages to provide this (system-versioned
>>> table) support without actually defining or using the terms “temporal data”
>>> or “temporal table”.*
>>>
>>> What about defining the term "temporal table" to be "a table is
>>> changing over time", the same as the "dynamic table".
>>> A "versioned temporal table" is a special temporal table which tracks
>>> the full history of the changing table and supports point-in-time access.
>>> A regular temporal table only supports access to the current data (no
>>> earlier versions).
>>>
>>> In this way, the "temporal join" still makes sense, because the joined
>>> table is a "temporal table", and all the tables in Flink is "temporal
>>> table" (or dynamic table).
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table
>>> [2]:
>>> https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fh...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Yes, just to help user distinguish the difference between versioned
>>>>> temporal table and  latest-only temporal table.
>>>>>
>>>>>
>>>> I don't think we help users to understand the differences if we invent
>>>> new (IMO confusing) terms ("temporal table without version" or "latest-only
>>>> temporal table") instead of using known terminology ("table").
>>>>
>>>>
>>>>> As I suggested before, it might make sense to choose a different name
>>>>> for this join instead of using a fancy name for the tables such that the
>>>>> name fits the name of the join.
>>>>>
>>>>> I think lookup join is a physical implementation of Processing-Time
>>>>> temporal table join, we can even offer a lookup join implementation for
>>>>> Event-time temporal table join if the external system support lookup
>>>>> history data.
>>>>>
>>>>> Versioned temporal table can be used in Event-time temporal table
>>>>> join, all temporal table(including versioned temporal tale and regular
>>>>> table) can be used in Processing-time temporal table join,
>>>>> lookup join is a kind of physical implementation of  Processing-time
>>>>> temporal table join which lookups the external system’s data.
>>>>>
>>>>>
>>>> Lookup Join was just a quick shot proposal without putting much thought
>>>> into it, but a different name might help to solve the naming issue with
>>>> "temporal table without versions".
>>>> Another proposal would be "Version Join" or "Table Version Join".
>>>>
>>>>
>>>>> If we agree regular tables are temporal table, I tend to keep
>>>>> latest-only temporal table to clarify the temporal and version concept in
>>>>> regular table.
>>>>>
>>>>>
>>>> In my understanding access to a table's history is the core property of
>>>> temporal tables (that's also how temporal tables are defined for SQL Server
>>>> [1] or PostgreSQL [2]).
>>>> A "temporal table without versions"or "latest-only temporal table" is
>>>> just a table. It does not have any other property than all the tables we've
>>>> always dealt with.
>>>>
>>>> So I would say that there are temporal tables (for which earlier
>>>> versions can be accessed) and just tables like those that our users have
>>>> always used so far (for which there are no earlier versions).
>>>>
>>>> Maybe it's just me, but "temporal table without versions" or
>>>> "latest-only temporal table" sound like contradictions to me, similar to
>>>> "tiny skyscraper".
>>>>
>>>> How about we try to get a few more opinions on this?
>>>> What do others think about this issue?
>>>>
>>>>
>>>>> Best
>>>>> Leonard
>>>>>
>>>>>
>>>> Best,
>>>> Fabian
>>>>
>>>> [1]
>>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
>>>> [2] https://pgxn.org/dist/temporal_tables/
>>>>
>>>>
>>>> Cheers,
>>>>> Fabian
>>>>>
>>>>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <
>>>>> xbjtdcq@gmail.com>:
>>>>>
>>>>>> Thanks Fabian and Seth for the feedback
>>>>>>
>>>>>> I agree the name “temporal table without version” is less accurate
>>>>>> because this kind of temporal table has a latest version rather than has no
>>>>>> version.
>>>>>>
>>>>>> How about “Latest-only temporal table” ? The related concept section
>>>>>> updated as following:
>>>>>>
>>>>>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>>>>>> in temporal table are associated with one or more temporal periods.
>>>>>>
>>>>>> *    Version:* A temporal table can split into a set of versioned
>>>>>> table snapshots, the *version* in table snapshots represents the
>>>>>> valid life circle of rows, the start time and the end time of the valid
>>>>>> period can be  assigned by users.
>>>>>>
>>>>>> *    Versioned temporal table*: If the row in temporal table can
>>>>>> track its history changes and visit its history versions, we called this
>>>>>> kind of temporal table as versioned temporal table.
>>>>>>
>>>>>> *    Latest-only temporal table**:* If the row in temporal table can
>>>>>> only track its latest version, we called this kind of temporal table as
>>>>>> latest-only temporal table. The temporal table in lookup join can only
>>>>>> track its latest version and thus it's also a latest-only temporal table.
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>>
>>>>>> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>>>>>>
>>>>>> +1 to the updated design.
>>>>>>
>>>>>> I agree with Fabian that the naming of "temporal table without
>>>>>> version" is
>>>>>> a bit confusing but the actual semantics make sense to me. I think
>>>>>> just
>>>>>> saying its a Flink managed lookup join makes sense.
>>>>>>
>>>>>> Seth
>>>>>>
>>>>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Thanks for the updated FLIP Leonard!
>>>>>> In my opinion this was an improvement.
>>>>>> So +1 for this design.
>>>>>>
>>>>>> I have just one remark regarding the terminology.
>>>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>>>> IMO, versions are the core principle of temporal tables and temporal
>>>>>> tables without versions don't make much sense to me.
>>>>>>
>>>>>> What makes such a table a "Temporal" table? Isn't it just a regular
>>>>>> table?
>>>>>> If I understand the proposal correctly, "Temporal Tables without
>>>>>> Version"
>>>>>> can only be used in processing time temporal table joins, because
>>>>>> this join
>>>>>> only requests the current version.
>>>>>> But all regular tables can be used in processing time (temporal) table
>>>>>> joins as well.
>>>>>> It's basically the same as a lookup join, with the only difference
>>>>>> that
>>>>>> the table is maintained in Flink and not accessed in an external
>>>>>> system
>>>>>> (for example via JDBC).
>>>>>>
>>>>>> Are "Temporal Tables without Version" called "Temporal" because they
>>>>>> can
>>>>>> be used in "processing time temporal table joins" and due to its name
>>>>>> this
>>>>>> join needs to join something that's called "Temporal"?
>>>>>> In that case, we might want to rename "processing time temporal table
>>>>>> joins" into something else that does not imply a versioning.
>>>>>> Maybe we can call them just lookup joins to avoid introducing another
>>>>>> term?
>>>>>>
>>>>>> Thanks, Fabian
>>>>>>
>>>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <
>>>>>> lirui.fudan@gmail.com>:
>>>>>>
>>>>>> Thanks Leonard for the clarifications!
>>>>>>
>>>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> But are we still able to track different views of such a
>>>>>> table through time, as rows are added/deleted to/from the table?
>>>>>>
>>>>>>
>>>>>> Yes, in fact we support temporal table from changlog which contains
>>>>>> all
>>>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>>>
>>>>>> For
>>>>>> example, suppose I have an append-only table source with event-time
>>>>>>
>>>>>> and PK,
>>>>>>
>>>>>> will I be allowed to do an event-time temporal join with this table?
>>>>>>
>>>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>>>> this case exactly.
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com
>>>>>> <mailto:
>>>>>>
>>>>>> xbjtdcq@gmail.com>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> After a detailed offline discussion about the temporal table related
>>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>>> alternatives.
>>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>>>
>>>>>> unified
>>>>>>
>>>>>> story and also friendly to user and current Flink framework.
>>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>>> document organization to make it clear enough.
>>>>>>
>>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>>> comments.
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>>
>>>>>> <
>>>>>>
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>> <
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>>>
>>>>>> xbjtdcq@gmail.com>> 写道:
>>>>>>
>>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>> <
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>>
>>>>>>
>>>>>> <
>>>>>>
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>> <
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>>>
>>>>>> xbjtdcq@gmail.com> <mailto:
>>>>>>
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>
>>>>>>
>>>>>> Hi, Timo
>>>>>>
>>>>>> Thanks for you response.
>>>>>>
>>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>>>
>>>>>> read
>>>>>>
>>>>>> "The operation time is the time when the changes happened in system."
>>>>>>
>>>>>> or
>>>>>>
>>>>>> "The system time of DML execution in database", why don't we call it
>>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>>>
>>>>>> time in
>>>>>>
>>>>>> Flink should be thought through.
>>>>>>
>>>>>>
>>>>>> I agree that we should thought through. I have considered the name
>>>>>>
>>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>>>
>>>>>> the
>>>>>>
>>>>>> name.
>>>>>>
>>>>>>
>>>>>> I proposed `operationTime` because most changelog comes from
>>>>>>
>>>>>> Database
>>>>>>
>>>>>> and we always called an action as `operation` rather than `change` in
>>>>>> Database, the operation time is  easier to understand  for database
>>>>>>
>>>>>> users,
>>>>>>
>>>>>> but it's more like a database terminology.
>>>>>>
>>>>>>
>>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>>
>>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>>>
>>>>>> not a
>>>>>>
>>>>>> good name.
>>>>>>
>>>>>>
>>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>>>
>>>>>> existed
>>>>>>
>>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>>>
>>>>>> `ChangelogTime`
>>>>>>
>>>>>> and I’ll update the FLIP.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>>
>>>>>> expose the concept of time through the user-level `Row` type? The
>>>>>>
>>>>>> FLIP does
>>>>>>
>>>>>> not mention this explictly. I think we can keep it as an internal
>>>>>>
>>>>>> concept
>>>>>>
>>>>>> but I just wanted to ask for clarification.
>>>>>>
>>>>>>
>>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>>>
>>>>>> that
>>>>>>
>>>>>> changelog time concept should be the third time concept(the other two
>>>>>>
>>>>>> are
>>>>>>
>>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>>>
>>>>>> help
>>>>>>
>>>>>> normal users) understand the three concepts accurately, and I did not
>>>>>>
>>>>>> find
>>>>>>
>>>>>> a big enough scenario that user need to touch the changelog time for
>>>>>>
>>>>>> now,
>>>>>>
>>>>>> so I tend to do not expose the concept to users.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Leonard
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>>
>>>>>> Thanks Konstantin,
>>>>>> Regarding your questions, hope my comments has address your
>>>>>>
>>>>>> questions
>>>>>>
>>>>>> and I also add a few explanation in the FLIP.
>>>>>>
>>>>>> Thank you all for the feedback,
>>>>>> It seems everyone involved  in this thread has reached a
>>>>>>
>>>>>> consensus.
>>>>>>
>>>>>> I will start a vote thread  later.
>>>>>> Best,
>>>>>> Leonard
>>>>>>
>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>>>>>
>>>>>> godfreyhe@gmail.com> <mailto:
>>>>>>
>>>>>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>
>>>>>>
>>>>>> Thanks Lennard for driving this FLIP.
>>>>>> Looks good to me.
>>>>>>
>>>>>> Best,
>>>>>> Godfrey
>>>>>>
>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>>>>>
>>>>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>>>>>
>>>>>> 下午12:04写道:
>>>>>>
>>>>>>
>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>>>
>>>>>> shape.
>>>>>>
>>>>>> +1 to start a vote.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>>>>>
>>>>>> <ma...@gmail.com>
>>>>>>
>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Leonard,
>>>>>>
>>>>>> Thanks for this FLIP!
>>>>>> Looks good from my side.
>>>>>>
>>>>>> Cheers, Fabian
>>>>>>
>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>>>>>
>>>>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>>>>>
>>>>>> :
>>>>>>
>>>>>>
>>>>>> Hi Leondard,
>>>>>>
>>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>>
>>>>>> really
>>>>>>
>>>>>> good
>>>>>>
>>>>>> and the semantics make sense to me.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Seth
>>>>>>
>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>>>
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>>>>>
>>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi, Konstantin
>>>>>>
>>>>>>
>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>>
>>>>>> joined
>>>>>>
>>>>>> on
>>>>>>
>>>>>> the
>>>>>>
>>>>>> PRIMARY KEY attribute, correct?
>>>>>>
>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>
>>>>>>
>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>>
>>>>>> VIEW
>>>>>>
>>>>>> definition that defines
>>>>>>
>>>>>> whether a event-time or processing time temporal table join
>>>>>>
>>>>>> is
>>>>>>
>>>>>> used?
>>>>>>
>>>>>>
>>>>>> I think event-time or processing-time temporal table join
>>>>>>
>>>>>> depends on
>>>>>>
>>>>>> fact
>>>>>>
>>>>>> table’s time attribute in temporal join rather than from
>>>>>>
>>>>>> temporal
>>>>>>
>>>>>> table
>>>>>>
>>>>>> side, the event-time or processing time in temporal table is
>>>>>>
>>>>>> just
>>>>>>
>>>>>> used
>>>>>>
>>>>>> to
>>>>>>
>>>>>> split the validity period of versioned snapshot of temporal
>>>>>>
>>>>>> table.
>>>>>>
>>>>>> The
>>>>>>
>>>>>> processing time attribute is not  necessary for temporal
>>>>>>
>>>>>> table
>>>>>>
>>>>>> without
>>>>>>
>>>>>> version, only the primary key is required, the following
>>>>>>
>>>>>> VIEW is
>>>>>>
>>>>>> also
>>>>>>
>>>>>> valid
>>>>>>
>>>>>> for temporal table without version.
>>>>>> CREATE VIEW latest_rates AS
>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>>
>>>>>> latest
>>>>>>
>>>>>> version
>>>>>> FROM rates
>>>>>> GROUP BY currency;                           -- inferred
>>>>>>
>>>>>> primary
>>>>>>
>>>>>> key
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>>
>>>>>> versioned
>>>>>>
>>>>>> on
>>>>>>
>>>>>> operation_time regardless of the lookup table attribute
>>>>>>
>>>>>> (event-time
>>>>>>
>>>>>> or
>>>>>>
>>>>>> processing time attribute), correct?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>>>
>>>>>> using the
>>>>>>
>>>>>> o.time
>>>>>>
>>>>>> value to lookup the version of the temporal table.
>>>>>> For fact table has the processing time attribute, it means
>>>>>>
>>>>>> only
>>>>>>
>>>>>> lookup
>>>>>>
>>>>>> the
>>>>>>
>>>>>> latest version of temporal table and we can do some
>>>>>>
>>>>>> optimization
>>>>>>
>>>>>> in
>>>>>>
>>>>>> implementation like only keep the latest version.
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>>
>>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Jark Wu <im...@gmail.com>.
I think we have to make some compromise here. Either updating the
definition of "temporal table", or extending the definition of "temporal
join".
I'm also fine with Rui's proposal that "temporal join" can also work with a
regular table.

Best,
Jark

On Fri, 21 Aug 2020 at 23:49, Leonard Xu <xb...@gmail.com> wrote:

> Thanks @Fabian @Jark @Rui for sharing your opinions.
>
> For the the small divergence about choose a temporal join name or temporal
> table name,
> I don't have strong inclination.
>
> *Regarding to choose a different name for this join:*
> I agree with Jark and Rui to keep the existing  "event-time temporal join"
> and "processing-time temporal join".
>
> *Regarding to name the "temporal table without versions/ latest-only
> temporal table":*
> Although Flink has imported Temporal Table concept for users[1], I think
> users are more familiar with dynamic table concept.
> If we wouldn't like to import more temporal table concepts, Rui's proposal
> using "versioned table" for versioned dynamic table and "regular table" for
> regular dynamic table is fine to me.
>
> HDYT? @Fabian  @Jark
>
>
> Best
> Leonard
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html
>
>
> 在 2020年8月21日,14:18,Rui Li <li...@gmail.com> 写道:
>
> Hi guys,
>
> Just my two cents.
>
> I agree with Jark that we should use "event-time/processing-time temporal
> join" as the name for this join.
>
> But I'm not sure about the definition of "temporal table".
>
> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
>> non-temporal table, which also sounds like contradictions.
>>
>
> Why would we require "temporal join" only joins a "temporal table", if the
> standard doesn't even define what is a "temporal table"?
>
> What about defining the term "temporal table" to be "a table is
>> changing over time", the same as the "dynamic table".
>>
>
> I'd prefer not to introduce another term if it has the same meaning as
> "dynamic table".
>
> I wonder whether it's possible to consider "temporal join" as a specific
> kind of join that may work with regular or versioned tables, instead
> of a join that requires a specific kind of table.
>
> On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <im...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Thank you for the great discussion @Leonard and @Fabian.
>>
>> *Regarding to choose a different name for this join:*
>>
>> From my point of view, I don't agree to introduce a new grammar called
>> whatever "lookup join" or "version join", because:
>> 1. "lookup" is a physical behavior not a logical concept.
>> 2. The SQL standard has proposed temporal join and it fits Flink well
>> with the event-time and processing-time attributes.
>> 3. We already have so many different join grammer, e.g. "regular join",
>> "interval join", "temporal join", and maybe "window join" in the future.
>>     It may confuse users when having more join concepts.
>>
>> So I think the existing "event-time temporal join" and "processing-time
>> temporal join" work well and we should still use them.
>>
>> *Regarding to the "temporal table without versions":*
>>
>> I agree there are contradictions between "temporal table" and "temporal
>> table without versions" if we think "temporal table" tracks full history.
>> However, if we call "temporal table without versions" as a "regular
>> table", not a kind of "temporal table".
>> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
>> non-temporal table, which also sounds like contradictions.
>>
>> I also notice that in SQL Server, the "system-versioned temporal table"
>> is a combination of "Temporal Table" (current data) and "History Table"
>> [1].
>> The "Temporal Table" is the current data of the database which is the
>> same as our definition of "temporal table without version". The
>> documentation says:
>>
>> > *This additional table is referred to as the history table, while the
>> main table that stores current (actual) row versions is referred to as the
>> current table or simply as the temporal table. *
>>
>> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so
>> I think we can define the meaning ourselves.
>>
>> *> Interestingly, SQL:2011 manages to provide this (system-versioned
>> table) support without actually defining or using the terms “temporal data”
>> or “temporal table”.*
>>
>> What about defining the term "temporal table" to be "a table is
>> changing over time", the same as the "dynamic table".
>> A "versioned temporal table" is a special temporal table which tracks the
>> full history of the changing table and supports point-in-time access.
>> A regular temporal table only supports access to the current data (no
>> earlier versions).
>>
>> In this way, the "temporal join" still makes sense, because the joined
>> table is a "temporal table", and all the tables in Flink is "temporal
>> table" (or dynamic table).
>>
>> Best,
>> Jark
>>
>> [1]:
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table
>> [2]:
>> https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
>>
>>
>>
>>
>>
>>
>> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Yes, just to help user distinguish the difference between versioned
>>>> temporal table and  latest-only temporal table.
>>>>
>>>>
>>> I don't think we help users to understand the differences if we invent
>>> new (IMO confusing) terms ("temporal table without version" or "latest-only
>>> temporal table") instead of using known terminology ("table").
>>>
>>>
>>>> As I suggested before, it might make sense to choose a different name
>>>> for this join instead of using a fancy name for the tables such that the
>>>> name fits the name of the join.
>>>>
>>>> I think lookup join is a physical implementation of Processing-Time
>>>> temporal table join, we can even offer a lookup join implementation for
>>>> Event-time temporal table join if the external system support lookup
>>>> history data.
>>>>
>>>> Versioned temporal table can be used in Event-time temporal table join,
>>>> all temporal table(including versioned temporal tale and regular table) can
>>>> be used in Processing-time temporal table join,
>>>> lookup join is a kind of physical implementation of  Processing-time
>>>> temporal table join which lookups the external system’s data.
>>>>
>>>>
>>> Lookup Join was just a quick shot proposal without putting much thought
>>> into it, but a different name might help to solve the naming issue with
>>> "temporal table without versions".
>>> Another proposal would be "Version Join" or "Table Version Join".
>>>
>>>
>>>> If we agree regular tables are temporal table, I tend to keep
>>>> latest-only temporal table to clarify the temporal and version concept in
>>>> regular table.
>>>>
>>>>
>>> In my understanding access to a table's history is the core property of
>>> temporal tables (that's also how temporal tables are defined for SQL Server
>>> [1] or PostgreSQL [2]).
>>> A "temporal table without versions"or "latest-only temporal table" is
>>> just a table. It does not have any other property than all the tables we've
>>> always dealt with.
>>>
>>> So I would say that there are temporal tables (for which earlier
>>> versions can be accessed) and just tables like those that our users have
>>> always used so far (for which there are no earlier versions).
>>>
>>> Maybe it's just me, but "temporal table without versions" or
>>> "latest-only temporal table" sound like contradictions to me, similar to
>>> "tiny skyscraper".
>>>
>>> How about we try to get a few more opinions on this?
>>> What do others think about this issue?
>>>
>>>
>>>> Best
>>>> Leonard
>>>>
>>>>
>>> Best,
>>> Fabian
>>>
>>> [1]
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
>>> [2] https://pgxn.org/dist/temporal_tables/
>>>
>>>
>>> Cheers,
>>>> Fabian
>>>>
>>>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <
>>>> xbjtdcq@gmail.com>:
>>>>
>>>>> Thanks Fabian and Seth for the feedback
>>>>>
>>>>> I agree the name “temporal table without version” is less accurate
>>>>> because this kind of temporal table has a latest version rather than has no
>>>>> version.
>>>>>
>>>>> How about “Latest-only temporal table” ? The related concept section
>>>>> updated as following:
>>>>>
>>>>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>>>>> in temporal table are associated with one or more temporal periods.
>>>>>
>>>>> *    Version:* A temporal table can split into a set of versioned
>>>>> table snapshots, the *version* in table snapshots represents the
>>>>> valid life circle of rows, the start time and the end time of the valid
>>>>> period can be  assigned by users.
>>>>>
>>>>> *    Versioned temporal table*: If the row in temporal table can
>>>>> track its history changes and visit its history versions, we called this
>>>>> kind of temporal table as versioned temporal table.
>>>>>
>>>>> *    Latest-only temporal table**:* If the row in temporal table can
>>>>> only track its latest version, we called this kind of temporal table as
>>>>> latest-only temporal table. The temporal table in lookup join can only
>>>>> track its latest version and thus it's also a latest-only temporal table.
>>>>>
>>>>> Best
>>>>> Leonard
>>>>>
>>>>> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>>>>>
>>>>> +1 to the updated design.
>>>>>
>>>>> I agree with Fabian that the naming of "temporal table without
>>>>> version" is
>>>>> a bit confusing but the actual semantics make sense to me. I think just
>>>>> saying its a Flink managed lookup join makes sense.
>>>>>
>>>>> Seth
>>>>>
>>>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thanks for the updated FLIP Leonard!
>>>>> In my opinion this was an improvement.
>>>>> So +1 for this design.
>>>>>
>>>>> I have just one remark regarding the terminology.
>>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>>> IMO, versions are the core principle of temporal tables and temporal
>>>>> tables without versions don't make much sense to me.
>>>>>
>>>>> What makes such a table a "Temporal" table? Isn't it just a regular
>>>>> table?
>>>>> If I understand the proposal correctly, "Temporal Tables without
>>>>> Version"
>>>>> can only be used in processing time temporal table joins, because this
>>>>> join
>>>>> only requests the current version.
>>>>> But all regular tables can be used in processing time (temporal) table
>>>>> joins as well.
>>>>> It's basically the same as a lookup join, with the only difference that
>>>>> the table is maintained in Flink and not accessed in an external system
>>>>> (for example via JDBC).
>>>>>
>>>>> Are "Temporal Tables without Version" called "Temporal" because they
>>>>> can
>>>>> be used in "processing time temporal table joins" and due to its name
>>>>> this
>>>>> join needs to join something that's called "Temporal"?
>>>>> In that case, we might want to rename "processing time temporal table
>>>>> joins" into something else that does not imply a versioning.
>>>>> Maybe we can call them just lookup joins to avoid introducing another
>>>>> term?
>>>>>
>>>>> Thanks, Fabian
>>>>>
>>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <
>>>>> lirui.fudan@gmail.com>:
>>>>>
>>>>> Thanks Leonard for the clarifications!
>>>>>
>>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> But are we still able to track different views of such a
>>>>> table through time, as rows are added/deleted to/from the table?
>>>>>
>>>>>
>>>>> Yes, in fact we support temporal table from changlog which contains all
>>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>>
>>>>> For
>>>>> example, suppose I have an append-only table source with event-time
>>>>>
>>>>> and PK,
>>>>>
>>>>> will I be allowed to do an event-time temporal join with this table?
>>>>>
>>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>>> this case exactly.
>>>>>
>>>>> Best
>>>>> Leonard
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>>
>>>>> xbjtdcq@gmail.com>> wrote:
>>>>>
>>>>>
>>>>> Hi, all
>>>>>
>>>>> After a detailed offline discussion about the temporal table related
>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>> alternatives.
>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>>
>>>>> unified
>>>>>
>>>>> story and also friendly to user and current Flink framework.
>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>> document organization to make it clear enough.
>>>>>
>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>> comments.
>>>>>
>>>>>
>>>>> Best
>>>>> Leonard
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>
>>>>> <
>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> <
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>>
>>>>> xbjtdcq@gmail.com>> 写道:
>>>>>
>>>>>
>>>>> Hi, all
>>>>>
>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>
>>>>> Best
>>>>> Leonard
>>>>> [1]
>>>>>
>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> <
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>
>>>>>
>>>>> <
>>>>>
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> <
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>>
>>>>> xbjtdcq@gmail.com> <mailto:
>>>>>
>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>
>>>>>
>>>>> Hi, Timo
>>>>>
>>>>> Thanks for you response.
>>>>>
>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>>
>>>>> read
>>>>>
>>>>> "The operation time is the time when the changes happened in system."
>>>>>
>>>>> or
>>>>>
>>>>> "The system time of DML execution in database", why don't we call it
>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>>
>>>>> time in
>>>>>
>>>>> Flink should be thought through.
>>>>>
>>>>>
>>>>> I agree that we should thought through. I have considered the name
>>>>>
>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>>
>>>>> the
>>>>>
>>>>> name.
>>>>>
>>>>>
>>>>> I proposed `operationTime` because most changelog comes from
>>>>>
>>>>> Database
>>>>>
>>>>> and we always called an action as `operation` rather than `change` in
>>>>> Database, the operation time is  easier to understand  for database
>>>>>
>>>>> users,
>>>>>
>>>>> but it's more like a database terminology.
>>>>>
>>>>>
>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>
>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>>
>>>>> not a
>>>>>
>>>>> good name.
>>>>>
>>>>>
>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>>
>>>>> existed
>>>>>
>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>>
>>>>> `ChangelogTime`
>>>>>
>>>>> and I’ll update the FLIP.
>>>>>
>>>>>
>>>>>
>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>
>>>>> expose the concept of time through the user-level `Row` type? The
>>>>>
>>>>> FLIP does
>>>>>
>>>>> not mention this explictly. I think we can keep it as an internal
>>>>>
>>>>> concept
>>>>>
>>>>> but I just wanted to ask for clarification.
>>>>>
>>>>>
>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>>
>>>>> that
>>>>>
>>>>> changelog time concept should be the third time concept(the other two
>>>>>
>>>>> are
>>>>>
>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>>
>>>>> help
>>>>>
>>>>> normal users) understand the three concepts accurately, and I did not
>>>>>
>>>>> find
>>>>>
>>>>> a big enough scenario that user need to touch the changelog time for
>>>>>
>>>>> now,
>>>>>
>>>>> so I tend to do not expose the concept to users.
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>> Leonard
>>>>>
>>>>>
>>>>>
>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>
>>>>> Thanks Konstantin,
>>>>> Regarding your questions, hope my comments has address your
>>>>>
>>>>> questions
>>>>>
>>>>> and I also add a few explanation in the FLIP.
>>>>>
>>>>> Thank you all for the feedback,
>>>>> It seems everyone involved  in this thread has reached a
>>>>>
>>>>> consensus.
>>>>>
>>>>> I will start a vote thread  later.
>>>>> Best,
>>>>> Leonard
>>>>>
>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>>>>
>>>>> godfreyhe@gmail.com> <mailto:
>>>>>
>>>>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>>>
>>>>>
>>>>> Thanks Lennard for driving this FLIP.
>>>>> Looks good to me.
>>>>>
>>>>> Best,
>>>>> Godfrey
>>>>>
>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>>>>
>>>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>>>>
>>>>> 下午12:04写道:
>>>>>
>>>>>
>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>>
>>>>> shape.
>>>>>
>>>>> +1 to start a vote.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>>>>
>>>>> <ma...@gmail.com>
>>>>>
>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>>>
>>>>>
>>>>> Hi Leonard,
>>>>>
>>>>> Thanks for this FLIP!
>>>>> Looks good from my side.
>>>>>
>>>>> Cheers, Fabian
>>>>>
>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>>>>
>>>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>>>>
>>>>> :
>>>>>
>>>>>
>>>>> Hi Leondard,
>>>>>
>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>
>>>>> really
>>>>>
>>>>> good
>>>>>
>>>>> and the semantics make sense to me.
>>>>>
>>>>> +1
>>>>>
>>>>> Seth
>>>>>
>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>>
>>>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>>>>
>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>
>>>>>
>>>>> Hi, Konstantin
>>>>>
>>>>>
>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>
>>>>> joined
>>>>>
>>>>> on
>>>>>
>>>>> the
>>>>>
>>>>> PRIMARY KEY attribute, correct?
>>>>>
>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>
>>>>>
>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>
>>>>> VIEW
>>>>>
>>>>> definition that defines
>>>>>
>>>>> whether a event-time or processing time temporal table join
>>>>>
>>>>> is
>>>>>
>>>>> used?
>>>>>
>>>>>
>>>>> I think event-time or processing-time temporal table join
>>>>>
>>>>> depends on
>>>>>
>>>>> fact
>>>>>
>>>>> table’s time attribute in temporal join rather than from
>>>>>
>>>>> temporal
>>>>>
>>>>> table
>>>>>
>>>>> side, the event-time or processing time in temporal table is
>>>>>
>>>>> just
>>>>>
>>>>> used
>>>>>
>>>>> to
>>>>>
>>>>> split the validity period of versioned snapshot of temporal
>>>>>
>>>>> table.
>>>>>
>>>>> The
>>>>>
>>>>> processing time attribute is not  necessary for temporal
>>>>>
>>>>> table
>>>>>
>>>>> without
>>>>>
>>>>> version, only the primary key is required, the following
>>>>>
>>>>> VIEW is
>>>>>
>>>>> also
>>>>>
>>>>> valid
>>>>>
>>>>> for temporal table without version.
>>>>> CREATE VIEW latest_rates AS
>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>
>>>>> latest
>>>>>
>>>>> version
>>>>> FROM rates
>>>>> GROUP BY currency;                           -- inferred
>>>>>
>>>>> primary
>>>>>
>>>>> key
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>
>>>>> versioned
>>>>>
>>>>> on
>>>>>
>>>>> operation_time regardless of the lookup table attribute
>>>>>
>>>>> (event-time
>>>>>
>>>>> or
>>>>>
>>>>> processing time attribute), correct?
>>>>>
>>>>>
>>>>>
>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>>
>>>>> using the
>>>>>
>>>>> o.time
>>>>>
>>>>> value to lookup the version of the temporal table.
>>>>> For fact table has the processing time attribute, it means
>>>>>
>>>>> only
>>>>>
>>>>> lookup
>>>>>
>>>>> the
>>>>>
>>>>> latest version of temporal table and we can do some
>>>>>
>>>>> optimization
>>>>>
>>>>> in
>>>>>
>>>>> implementation like only keep the latest version.
>>>>>
>>>>>
>>>>> Best
>>>>> Leonard
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards!
>>>>> Rui Li
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards!
>>>>> Rui Li
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>
> --
> Best regards!
> Rui Li
>
>
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks @Fabian @Jark @Rui for sharing your opinions.

For the the small divergence about choose a temporal join name or temporal table name,
I don't have strong inclination.

Regarding to choose a different name for this join:
I agree with Jark and Rui to keep the existing  "event-time temporal join" and "processing-time temporal join".

Regarding to name the "temporal table without versions/ latest-only temporal table":
Although Flink has imported Temporal Table concept for users[1], I think users are more familiar with dynamic table concept.
If we wouldn't like to import more temporal table concepts, Rui's proposal using "versioned table" for versioned dynamic table and "regular table" for regular dynamic table is fine to me.

HDYT? @Fabian  @Jark


Best
Leonard 
[1]https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html <https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/temporal_tables.html> 

> 在 2020年8月21日,14:18,Rui Li <li...@gmail.com> 写道:
> 
> Hi guys,
> 
> Just my two cents.
> 
> I agree with Jark that we should use "event-time/processing-time temporal join" as the name for this join.
> 
> But I'm not sure about the definition of "temporal table".
> 
> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a non-temporal table, which also sounds like contradictions.
> 
> Why would we require "temporal join" only joins a "temporal table", if the standard doesn't even define what is a "temporal table"?
> 
> What about defining the term "temporal table" to be "a table is changing over time", the same as the "dynamic table".
>  
> I'd prefer not to introduce another term if it has the same meaning as "dynamic table".
> 
> I wonder whether it's possible to consider "temporal join" as a specific kind of join that may work with regular or versioned tables, instead
> of a join that requires a specific kind of table.
> 
> On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <imjark@gmail.com <ma...@gmail.com>> wrote:
> Hi everyone,
> 
> Thank you for the great discussion @Leonard and @Fabian.
> 
> Regarding to choose a different name for this join:
> 
> From my point of view, I don't agree to introduce a new grammar called whatever "lookup join" or "version join", because:
> 1. "lookup" is a physical behavior not a logical concept. 
> 2. The SQL standard has proposed temporal join and it fits Flink well with the event-time and processing-time attributes. 
> 3. We already have so many different join grammer, e.g. "regular join", "interval join", "temporal join", and maybe "window join" in the future.
>     It may confuse users when having more join concepts. 
> 
> So I think the existing "event-time temporal join" and "processing-time temporal join" work well and we should still use them. 
> 
> Regarding to the "temporal table without versions":
> 
> I agree there are contradictions between "temporal table" and "temporal table without versions" if we think "temporal table" tracks full history.
> However, if we call "temporal table without versions" as a "regular table", not a kind of "temporal table". 
> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a non-temporal table, which also sounds like contradictions.
> 
> I also notice that in SQL Server, the "system-versioned temporal table" is a combination of "Temporal Table" (current data) and "History Table" [1]. 
> The "Temporal Table" is the current data of the database which is the same as our definition of "temporal table without version". The documentation says:
> 
> > This additional table is referred to as the history table, while the main table that stores current (actual) row versions is referred to as the current table or simply as the temporal table.
> 
> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I think we can define the meaning ourselves.
> 
> > Interestingly, SQL:2011 manages to provide this (system-versioned table) support without actually defining or using the terms “temporal data” or “temporal table”.
> 
> What about defining the term "temporal table" to be "a table is changing over time", the same as the "dynamic table".
> A "versioned temporal table" is a special temporal table which tracks the full history of the changing table and supports point-in-time access. 
> A regular temporal table only supports access to the current data (no earlier versions).
> 
> In this way, the "temporal join" still makes sense, because the joined table is a "temporal table", and all the tables in Flink is "temporal table" (or dynamic table).
> 
> Best,
> Jark
> 
> [1]: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table <https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table>
> [2]: https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf <https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf>
> 
> 
> 
> 
> 
> 
> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
> Hi everyone,
> 
> Yes, just to help user distinguish the difference between versioned temporal table and  latest-only temporal table.
> 
> 
> I don't think we help users to understand the differences if we invent new (IMO confusing) terms ("temporal table without version" or "latest-only temporal table") instead of using known terminology ("table").
>  
>> As I suggested before, it might make sense to choose a different name for this join instead of using a fancy name for the tables such that the name fits the name of the join.
> I think lookup join is a physical implementation of Processing-Time temporal table join, we can even offer a lookup join implementation for Event-time temporal table join if the external system support lookup history data.
> 
> Versioned temporal table can be used in Event-time temporal table join, all temporal table(including versioned temporal tale and regular table) can be used in Processing-time temporal table join, 
> lookup join is a kind of physical implementation of  Processing-time temporal table join which lookups the external system’s data.
> 
> 
> Lookup Join was just a quick shot proposal without putting much thought into it, but a different name might help to solve the naming issue with "temporal table without versions".
> Another proposal would be "Version Join" or "Table Version Join".
>  
> If we agree regular tables are temporal table, I tend to keep latest-only temporal table to clarify the temporal and version concept in regular table.
> 
> 
> In my understanding access to a table's history is the core property of temporal tables (that's also how temporal tables are defined for SQL Server [1] or PostgreSQL [2]).
> A "temporal table without versions"or "latest-only temporal table" is just a table. It does not have any other property than all the tables we've always dealt with.
> 
> So I would say that there are temporal tables (for which earlier versions can be accessed) and just tables like those that our users have always used so far (for which there are no earlier versions).
> 
> Maybe it's just me, but "temporal table without versions" or "latest-only temporal table" sound like contradictions to me, similar to "tiny skyscraper".
> 
> How about we try to get a few more opinions on this?
> What do others think about this issue?
> 
> 
> Best
> Leonard
> 
> 
> Best, 
> Fabian
> 
> [1] https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 <https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15>
> [2] https://pgxn.org/dist/temporal_tables/ <https://pgxn.org/dist/temporal_tables/>
> 
> 
>> Cheers, 
>> Fabian
>> 
>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>>:
>> Thanks Fabian and Seth for the feedback
>> 
>> I agree the name “temporal table without version” is less accurate because this kind of temporal table has a latest version rather than has no version.
>>  
>> How about “Latest-only temporal table” ? The related concept section updated as following:
>> 
>> Temporal Table: Temporal table is a table that evolves over time, rows in temporal table are associated with one or more temporal periods.
>>     Version: A temporal table can split into a set of versioned table snapshots, the version in table snapshots represents the valid life circle of rows, the start time and the end time of the valid period can be  assigned by users.
>>     Versioned temporal table: If the row in temporal table can track its history changes and visit its history versions, we called this kind of temporal table as versioned temporal table.
>>     Latest-only temporal table: If the row in temporal table can only track its latest version, we called this kind of temporal table as latest-only temporal table. The temporal table in lookup join can only track its latest version and thus it's also a latest-only temporal table.
>> 
>> Best
>> Leonard
>> 
>>> 在 2020年8月19日,04:46,Seth Wiesman <sjwiesman@gmail.com <ma...@gmail.com>> 写道:
>>> 
>>> +1 to the updated design.
>>> 
>>> I agree with Fabian that the naming of "temporal table without version" is
>>> a bit confusing but the actual semantics make sense to me. I think just
>>> saying its a Flink managed lookup join makes sense.
>>> 
>>> Seth
>>> 
>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>>> Thanks for the updated FLIP Leonard!
>>>> In my opinion this was an improvement.
>>>> So +1 for this design.
>>>> 
>>>> I have just one remark regarding the terminology.
>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>> IMO, versions are the core principle of temporal tables and temporal
>>>> tables without versions don't make much sense to me.
>>>> 
>>>> What makes such a table a "Temporal" table? Isn't it just a regular table?
>>>> If I understand the proposal correctly, "Temporal Tables without Version"
>>>> can only be used in processing time temporal table joins, because this join
>>>> only requests the current version.
>>>> But all regular tables can be used in processing time (temporal) table
>>>> joins as well.
>>>> It's basically the same as a lookup join, with the only difference that
>>>> the table is maintained in Flink and not accessed in an external system
>>>> (for example via JDBC).
>>>> 
>>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>>> be used in "processing time temporal table joins" and due to its name this
>>>> join needs to join something that's called "Temporal"?
>>>> In that case, we might want to rename "processing time temporal table
>>>> joins" into something else that does not imply a versioning.
>>>> Maybe we can call them just lookup joins to avoid introducing another term?
>>>> 
>>>> Thanks, Fabian
>>>> 
>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <lirui.fudan@gmail.com <ma...@gmail.com>>:
>>>> 
>>>>> Thanks Leonard for the clarifications!
>>>>> 
>>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>>> 
>>>>>>> But are we still able to track different views of such a
>>>>>>> table through time, as rows are added/deleted to/from the table?
>>>>>> 
>>>>>> Yes, in fact we support temporal table from changlog which contains all
>>>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>>> 
>>>>>>> For
>>>>>>> example, suppose I have an append-only table source with event-time
>>>>>> and PK,
>>>>>>> will I be allowed to do an event-time temporal join with this table?
>>>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>>>> this case exactly.
>>>>>> 
>>>>>> Best
>>>>>> Leonard
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>> 
>>>>>>>> Hi, all
>>>>>>>> 
>>>>>>>> After a detailed offline discussion about the temporal table related
>>>>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>>>>> alternatives.
>>>>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>>> unified
>>>>>>>> story and also friendly to user and current Flink framework.
>>>>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>>>>> document organization to make it clear enough.
>>>>>>>> 
>>>>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>>>>> comments.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Leonard
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>>> <
>>>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> <
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>>> 
>>>>>>>>> Hi, all
>>>>>>>>> 
>>>>>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> Leonard
>>>>>>>>> [1]
>>>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> <
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> 
>>>>>>>> <
>>>>>>>> 
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> <
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>>> 
>>>>>>>>>> Hi, Timo
>>>>>>>>>> 
>>>>>>>>>> Thanks for you response.
>>>>>>>>>> 
>>>>>>>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>>> read
>>>>>>>> "The operation time is the time when the changes happened in system."
>>>>>> or
>>>>>>>> "The system time of DML execution in database", why don't we call it
>>>>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>>> time in
>>>>>>>> Flink should be thought through.
>>>>>>>>>> 
>>>>>>>>>> I agree that we should thought through. I have considered the name
>>>>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>>> the
>>>>>>>> name.
>>>>>>>>>> 
>>>>>>>>>> I proposed `operationTime` because most changelog comes from
>>>>>> Database
>>>>>>>> and we always called an action as `operation` rather than `change` in
>>>>>>>> Database, the operation time is  easier to understand  for database
>>>>>> users,
>>>>>>>> but it's more like a database terminology.
>>>>>>>>>> 
>>>>>>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>>> not a
>>>>>>>> good name.
>>>>>>>>>> 
>>>>>>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>>> existed
>>>>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>>> `ChangelogTime`
>>>>>>>> and I’ll update the FLIP.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>>>> expose the concept of time through the user-level `Row` type? The
>>>>>> FLIP does
>>>>>>>> not mention this explictly. I think we can keep it as an internal
>>>>>> concept
>>>>>>>> but I just wanted to ask for clarification.
>>>>>>>>>> 
>>>>>>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>>> that
>>>>>>>> changelog time concept should be the third time concept(the other two
>>>>>> are
>>>>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>>> help
>>>>>>>> normal users) understand the three concepts accurately, and I did not
>>>>>> find
>>>>>>>> a big enough scenario that user need to touch the changelog time for
>>>>>> now,
>>>>>>>> so I tend to do not expose the concept to users.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Leonard
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>>>>>>>> Thanks Konstantin,
>>>>>>>>>>>> Regarding your questions, hope my comments has address your
>>>>>> questions
>>>>>>>> and I also add a few explanation in the FLIP.
>>>>>>>>>>>> Thank you all for the feedback,
>>>>>>>>>>>> It seems everyone involved  in this thread has reached a
>>>>>> consensus.
>>>>>>>>>>>> I will start a vote thread  later.
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Leonard
>>>>>>>>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <ma...@gmail.com> <mailto:
>>>>>> godfreyhe@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>>> godfreyhe@gmail.com <ma...@gmail.com> <mailto:godfreyhe@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks Lennard for driving this FLIP.
>>>>>>>>>>>>> Looks good to me.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Godfrey
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>> <mailto:
>>>>>> imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>>>> 于2020年8月3日周一
>>>>>>>> 下午12:04写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>>> shape.
>>>>>>>>>>>>>> +1 to start a vote.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>
>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>
>>>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com> <mailto:fhueske@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for this FLIP!
>>>>>>>>>>>>>>> Looks good from my side.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers, Fabian
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>>>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>> <mailto:
>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>>>
>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Leondard,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>>>> really
>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>> and the semantics make sense to me.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Seth
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>
>>>>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi, Konstantin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>>>> joined
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>>>> VIEW
>>>>>>>>>>>>>>>>> definition that defines
>>>>>>>>>>>>>>>>>> whether a event-time or processing time temporal table join
>>>>>> is
>>>>>>>>>>>>>> used?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think event-time or processing-time temporal table join
>>>>>>>> depends on
>>>>>>>>>>>>>>> fact
>>>>>>>>>>>>>>>>> table’s time attribute in temporal join rather than from
>>>>>> temporal
>>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>>> side, the event-time or processing time in temporal table is
>>>>>> just
>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> split the validity period of versioned snapshot of temporal
>>>>>>>> table.
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> processing time attribute is not  necessary for temporal
>>>>>> table
>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>> version, only the primary key is required, the following
>>>>>> VIEW is
>>>>>>>> also
>>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>>> for temporal table without version.
>>>>>>>>>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>>>> latest
>>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>>> FROM rates
>>>>>>>>>>>>>>>>> GROUP BY currency;                           -- inferred
>>>>>> primary
>>>>>>>> key
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>>>> versioned
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> operation_time regardless of the lookup table attribute
>>>>>>>> (event-time
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>> processing time attribute), correct?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>>> using the
>>>>>>>>>>>>>>> o.time
>>>>>>>>>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>>>>>>>>>> For fact table has the processing time attribute, it means
>>>>>> only
>>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> latest version of temporal table and we can do some
>>>>>> optimization
>>>>>>>> in
>>>>>>>>>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards!
>>>>>>> Rui Li
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards!
>>>>> Rui Li
>>>>> 
>>>> 
>> 
> 
> 
> 
> -- 
> Best regards!
> Rui Li


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Rui Li <li...@gmail.com>.
Hi guys,

Just my two cents.

I agree with Jark that we should use "event-time/processing-time temporal
join" as the name for this join.

But I'm not sure about the definition of "temporal table".

Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
> non-temporal table, which also sounds like contradictions.
>

Why would we require "temporal join" only joins a "temporal table", if the
standard doesn't even define what is a "temporal table"?

What about defining the term "temporal table" to be "a table is
> changing over time", the same as the "dynamic table".
>

I'd prefer not to introduce another term if it has the same meaning as
"dynamic table".

I wonder whether it's possible to consider "temporal join" as a specific
kind of join that may work with regular or versioned tables, instead
of a join that requires a specific kind of table.

On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <im...@gmail.com> wrote:

> Hi everyone,
>
> Thank you for the great discussion @Leonard and @Fabian.
>
> *Regarding to choose a different name for this join:*
>
> From my point of view, I don't agree to introduce a new grammar called
> whatever "lookup join" or "version join", because:
> 1. "lookup" is a physical behavior not a logical concept.
> 2. The SQL standard has proposed temporal join and it fits Flink well with
> the event-time and processing-time attributes.
> 3. We already have so many different join grammer, e.g. "regular join",
> "interval join", "temporal join", and maybe "window join" in the future.
>     It may confuse users when having more join concepts.
>
> So I think the existing "event-time temporal join" and "processing-time
> temporal join" work well and we should still use them.
>
> *Regarding to the "temporal table without versions":*
>
> I agree there are contradictions between "temporal table" and "temporal
> table without versions" if we think "temporal table" tracks full history.
> However, if we call "temporal table without versions" as a "regular
> table", not a kind of "temporal table".
> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
> non-temporal table, which also sounds like contradictions.
>
> I also notice that in SQL Server, the "system-versioned temporal table" is
> a combination of "Temporal Table" (current data) and "History Table" [1].
> The "Temporal Table" is the current data of the database which is the same
> as our definition of "temporal table without version". The documentation
> says:
>
> > *This additional table is referred to as the history table, while the
> main table that stores current (actual) row versions is referred to as the
> current table or simply as the temporal table. *
>
> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I
> think we can define the meaning ourselves.
>
> *> Interestingly, SQL:2011 manages to provide this (system-versioned
> table) support without actually defining or using the terms “temporal data”
> or “temporal table”.*
>
> What about defining the term "temporal table" to be "a table is
> changing over time", the same as the "dynamic table".
> A "versioned temporal table" is a special temporal table which tracks the
> full history of the changing table and supports point-in-time access.
> A regular temporal table only supports access to the current data (no
> earlier versions).
>
> In this way, the "temporal join" still makes sense, because the joined
> table is a "temporal table", and all the tables in Flink is "temporal
> table" (or dynamic table).
>
> Best,
> Jark
>
> [1]:
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table
> [2]:
> https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
>
>
>
>
>
>
> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Yes, just to help user distinguish the difference between versioned
>>> temporal table and  latest-only temporal table.
>>>
>>>
>> I don't think we help users to understand the differences if we invent
>> new (IMO confusing) terms ("temporal table without version" or "latest-only
>> temporal table") instead of using known terminology ("table").
>>
>>
>>> As I suggested before, it might make sense to choose a different name
>>> for this join instead of using a fancy name for the tables such that the
>>> name fits the name of the join.
>>>
>>> I think lookup join is a physical implementation of Processing-Time
>>> temporal table join, we can even offer a lookup join implementation for
>>> Event-time temporal table join if the external system support lookup
>>> history data.
>>>
>>> Versioned temporal table can be used in Event-time temporal table join,
>>> all temporal table(including versioned temporal tale and regular table) can
>>> be used in Processing-time temporal table join,
>>> lookup join is a kind of physical implementation of  Processing-time
>>> temporal table join which lookups the external system’s data.
>>>
>>>
>> Lookup Join was just a quick shot proposal without putting much thought
>> into it, but a different name might help to solve the naming issue with
>> "temporal table without versions".
>> Another proposal would be "Version Join" or "Table Version Join".
>>
>>
>>> If we agree regular tables are temporal table, I tend to keep
>>> latest-only temporal table to clarify the temporal and version concept in
>>> regular table.
>>>
>>>
>> In my understanding access to a table's history is the core property of
>> temporal tables (that's also how temporal tables are defined for SQL Server
>> [1] or PostgreSQL [2]).
>> A "temporal table without versions"or "latest-only temporal table" is
>> just a table. It does not have any other property than all the tables we've
>> always dealt with.
>>
>> So I would say that there are temporal tables (for which earlier versions
>> can be accessed) and just tables like those that our users have always used
>> so far (for which there are no earlier versions).
>>
>> Maybe it's just me, but "temporal table without versions" or "latest-only
>> temporal table" sound like contradictions to me, similar to "tiny
>> skyscraper".
>>
>> How about we try to get a few more opinions on this?
>> What do others think about this issue?
>>
>>
>>> Best
>>> Leonard
>>>
>>>
>> Best,
>> Fabian
>>
>> [1]
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
>> [2] https://pgxn.org/dist/temporal_tables/
>>
>>
>> Cheers,
>>> Fabian
>>>
>>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjtdcq@gmail.com
>>> >:
>>>
>>>> Thanks Fabian and Seth for the feedback
>>>>
>>>> I agree the name “temporal table without version” is less accurate
>>>> because this kind of temporal table has a latest version rather than has no
>>>> version.
>>>>
>>>> How about “Latest-only temporal table” ? The related concept section
>>>> updated as following:
>>>>
>>>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>>>> in temporal table are associated with one or more temporal periods.
>>>>
>>>> *    Version:* A temporal table can split into a set of versioned
>>>> table snapshots, the *version* in table snapshots represents the valid
>>>> life circle of rows, the start time and the end time of the valid period
>>>> can be  assigned by users.
>>>>
>>>> *    Versioned temporal table*: If the row in temporal table can track
>>>> its history changes and visit its history versions, we called this kind of
>>>> temporal table as versioned temporal table.
>>>>
>>>> *    Latest-only temporal table**:* If the row in temporal table can
>>>> only track its latest version, we called this kind of temporal table as
>>>> latest-only temporal table. The temporal table in lookup join can only
>>>> track its latest version and thus it's also a latest-only temporal table.
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>>>>
>>>> +1 to the updated design.
>>>>
>>>> I agree with Fabian that the naming of "temporal table without version"
>>>> is
>>>> a bit confusing but the actual semantics make sense to me. I think just
>>>> saying its a Flink managed lookup join makes sense.
>>>>
>>>> Seth
>>>>
>>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>
>>>> Thanks for the updated FLIP Leonard!
>>>> In my opinion this was an improvement.
>>>> So +1 for this design.
>>>>
>>>> I have just one remark regarding the terminology.
>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>> IMO, versions are the core principle of temporal tables and temporal
>>>> tables without versions don't make much sense to me.
>>>>
>>>> What makes such a table a "Temporal" table? Isn't it just a regular
>>>> table?
>>>> If I understand the proposal correctly, "Temporal Tables without
>>>> Version"
>>>> can only be used in processing time temporal table joins, because this
>>>> join
>>>> only requests the current version.
>>>> But all regular tables can be used in processing time (temporal) table
>>>> joins as well.
>>>> It's basically the same as a lookup join, with the only difference that
>>>> the table is maintained in Flink and not accessed in an external system
>>>> (for example via JDBC).
>>>>
>>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>>> be used in "processing time temporal table joins" and due to its name
>>>> this
>>>> join needs to join something that's called "Temporal"?
>>>> In that case, we might want to rename "processing time temporal table
>>>> joins" into something else that does not imply a versioning.
>>>> Maybe we can call them just lookup joins to avoid introducing another
>>>> term?
>>>>
>>>> Thanks, Fabian
>>>>
>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <
>>>> lirui.fudan@gmail.com>:
>>>>
>>>> Thanks Leonard for the clarifications!
>>>>
>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>>>
>>>>
>>>> But are we still able to track different views of such a
>>>> table through time, as rows are added/deleted to/from the table?
>>>>
>>>>
>>>> Yes, in fact we support temporal table from changlog which contains all
>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>
>>>> For
>>>> example, suppose I have an append-only table source with event-time
>>>>
>>>> and PK,
>>>>
>>>> will I be allowed to do an event-time temporal join with this table?
>>>>
>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>> this case exactly.
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>>
>>>>
>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>
>>>> xbjtdcq@gmail.com>> wrote:
>>>>
>>>>
>>>> Hi, all
>>>>
>>>> After a detailed offline discussion about the temporal table related
>>>> concept and behavior, we had a reliable solution and rejected several
>>>> alternatives.
>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>
>>>> unified
>>>>
>>>> story and also friendly to user and current Flink framework.
>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>> document organization to make it clear enough.
>>>>
>>>> Please let me know if you have any concerns, I’m looking forward your
>>>> comments.
>>>>
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>> [1]
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>> <
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>
>>>> xbjtdcq@gmail.com>> 写道:
>>>>
>>>>
>>>> Hi, all
>>>>
>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>
>>>> Best
>>>> Leonard
>>>> [1]
>>>>
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>> <
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>>
>>>>
>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>>
>>>> xbjtdcq@gmail.com> <mailto:
>>>>
>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>
>>>>
>>>> Hi, Timo
>>>>
>>>> Thanks for you response.
>>>>
>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>
>>>> read
>>>>
>>>> "The operation time is the time when the changes happened in system."
>>>>
>>>> or
>>>>
>>>> "The system time of DML execution in database", why don't we call it
>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>
>>>> time in
>>>>
>>>> Flink should be thought through.
>>>>
>>>>
>>>> I agree that we should thought through. I have considered the name
>>>>
>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>
>>>> the
>>>>
>>>> name.
>>>>
>>>>
>>>> I proposed `operationTime` because most changelog comes from
>>>>
>>>> Database
>>>>
>>>> and we always called an action as `operation` rather than `change` in
>>>> Database, the operation time is  easier to understand  for database
>>>>
>>>> users,
>>>>
>>>> but it's more like a database terminology.
>>>>
>>>>
>>>> For `SystemTime`, user may confuse which one does the system in
>>>>
>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>
>>>> not a
>>>>
>>>> good name.
>>>>
>>>>
>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>
>>>> existed
>>>>
>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>
>>>> `ChangelogTime`
>>>>
>>>> and I’ll update the FLIP.
>>>>
>>>>
>>>>
>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>
>>>> expose the concept of time through the user-level `Row` type? The
>>>>
>>>> FLIP does
>>>>
>>>> not mention this explictly. I think we can keep it as an internal
>>>>
>>>> concept
>>>>
>>>> but I just wanted to ask for clarification.
>>>>
>>>>
>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>
>>>> that
>>>>
>>>> changelog time concept should be the third time concept(the other two
>>>>
>>>> are
>>>>
>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>
>>>> help
>>>>
>>>> normal users) understand the three concepts accurately, and I did not
>>>>
>>>> find
>>>>
>>>> a big enough scenario that user need to touch the changelog time for
>>>>
>>>> now,
>>>>
>>>> so I tend to do not expose the concept to users.
>>>>
>>>>
>>>>
>>>> Best,
>>>> Leonard
>>>>
>>>>
>>>>
>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>
>>>> Thanks Konstantin,
>>>> Regarding your questions, hope my comments has address your
>>>>
>>>> questions
>>>>
>>>> and I also add a few explanation in the FLIP.
>>>>
>>>> Thank you all for the feedback,
>>>> It seems everyone involved  in this thread has reached a
>>>>
>>>> consensus.
>>>>
>>>> I will start a vote thread  later.
>>>> Best,
>>>> Leonard
>>>>
>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>>>
>>>> godfreyhe@gmail.com> <mailto:
>>>>
>>>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>>
>>>>
>>>> Thanks Lennard for driving this FLIP.
>>>> Looks good to me.
>>>>
>>>> Best,
>>>> Godfrey
>>>>
>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>>>
>>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>>>
>>>> 下午12:04写道:
>>>>
>>>>
>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>
>>>> shape.
>>>>
>>>> +1 to start a vote.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>>>
>>>> <ma...@gmail.com>
>>>>
>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>>
>>>>
>>>> Hi Leonard,
>>>>
>>>> Thanks for this FLIP!
>>>> Looks good from my side.
>>>>
>>>> Cheers, Fabian
>>>>
>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>>>
>>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>>>
>>>> :
>>>>
>>>>
>>>> Hi Leondard,
>>>>
>>>> Thank you for pushing this, I think the updated syntax looks
>>>>
>>>> really
>>>>
>>>> good
>>>>
>>>> and the semantics make sense to me.
>>>>
>>>> +1
>>>>
>>>> Seth
>>>>
>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>
>>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>>>
>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>
>>>>
>>>> Hi, Konstantin
>>>>
>>>>
>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>
>>>> joined
>>>>
>>>> on
>>>>
>>>> the
>>>>
>>>> PRIMARY KEY attribute, correct?
>>>>
>>>> Yes, the PRIMARY KEY would be join key.
>>>>
>>>>
>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>
>>>> VIEW
>>>>
>>>> definition that defines
>>>>
>>>> whether a event-time or processing time temporal table join
>>>>
>>>> is
>>>>
>>>> used?
>>>>
>>>>
>>>> I think event-time or processing-time temporal table join
>>>>
>>>> depends on
>>>>
>>>> fact
>>>>
>>>> table’s time attribute in temporal join rather than from
>>>>
>>>> temporal
>>>>
>>>> table
>>>>
>>>> side, the event-time or processing time in temporal table is
>>>>
>>>> just
>>>>
>>>> used
>>>>
>>>> to
>>>>
>>>> split the validity period of versioned snapshot of temporal
>>>>
>>>> table.
>>>>
>>>> The
>>>>
>>>> processing time attribute is not  necessary for temporal
>>>>
>>>> table
>>>>
>>>> without
>>>>
>>>> version, only the primary key is required, the following
>>>>
>>>> VIEW is
>>>>
>>>> also
>>>>
>>>> valid
>>>>
>>>> for temporal table without version.
>>>> CREATE VIEW latest_rates AS
>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>
>>>> latest
>>>>
>>>> version
>>>> FROM rates
>>>> GROUP BY currency;                           -- inferred
>>>>
>>>> primary
>>>>
>>>> key
>>>>
>>>>
>>>>
>>>>
>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>
>>>> versioned
>>>>
>>>> on
>>>>
>>>> operation_time regardless of the lookup table attribute
>>>>
>>>> (event-time
>>>>
>>>> or
>>>>
>>>> processing time attribute), correct?
>>>>
>>>>
>>>>
>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>
>>>> using the
>>>>
>>>> o.time
>>>>
>>>> value to lookup the version of the temporal table.
>>>> For fact table has the processing time attribute, it means
>>>>
>>>> only
>>>>
>>>> lookup
>>>>
>>>> the
>>>>
>>>> latest version of temporal table and we can do some
>>>>
>>>> optimization
>>>>
>>>> in
>>>>
>>>> implementation like only keep the latest version.
>>>>
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>>
>>>>
>>>>
>>>

-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Jark Wu <im...@gmail.com>.
Hi everyone,

Thank you for the great discussion @Leonard and @Fabian.

*Regarding to choose a different name for this join:*

From my point of view, I don't agree to introduce a new grammar called
whatever "lookup join" or "version join", because:
1. "lookup" is a physical behavior not a logical concept.
2. The SQL standard has proposed temporal join and it fits Flink well with
the event-time and processing-time attributes.
3. We already have so many different join grammer, e.g. "regular join",
"interval join", "temporal join", and maybe "window join" in the future.
    It may confuse users when having more join concepts.

So I think the existing "event-time temporal join" and "processing-time
temporal join" work well and we should still use them.

*Regarding to the "temporal table without versions":*

I agree there are contradictions between "temporal table" and "temporal
table without versions" if we think "temporal table" tracks full history.
However, if we call "temporal table without versions" as a "regular table",
not a kind of "temporal table".
Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
non-temporal table, which also sounds like contradictions.

I also notice that in SQL Server, the "system-versioned temporal table" is
a combination of "Temporal Table" (current data) and "History Table" [1].
The "Temporal Table" is the current data of the database which is the same
as our definition of "temporal table without version". The documentation
says:

> *This additional table is referred to as the history table, while the
main table that stores current (actual) row versions is referred to as the
current table or simply as the temporal table. *

Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I
think we can define the meaning ourselves.

*> Interestingly, SQL:2011 manages to provide this (system-versioned table)
support without actually defining or using the terms “temporal data” or
“temporal table”.*

What about defining the term "temporal table" to be "a table is
changing over time", the same as the "dynamic table".
A "versioned temporal table" is a special temporal table which tracks the
full history of the changing table and supports point-in-time access.
A regular temporal table only supports access to the current data (no
earlier versions).

In this way, the "temporal join" still makes sense, because the joined
table is a "temporal table", and all the tables in Flink is "temporal
table" (or dynamic table).

Best,
Jark

[1]:
https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table
[2]:
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf






On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fh...@gmail.com> wrote:

> Hi everyone,
>
> Yes, just to help user distinguish the difference between versioned
>> temporal table and  latest-only temporal table.
>>
>>
> I don't think we help users to understand the differences if we invent new
> (IMO confusing) terms ("temporal table without version" or "latest-only
> temporal table") instead of using known terminology ("table").
>
>
>> As I suggested before, it might make sense to choose a different name for
>> this join instead of using a fancy name for the tables such that the name
>> fits the name of the join.
>>
>> I think lookup join is a physical implementation of Processing-Time
>> temporal table join, we can even offer a lookup join implementation for
>> Event-time temporal table join if the external system support lookup
>> history data.
>>
>> Versioned temporal table can be used in Event-time temporal table join,
>> all temporal table(including versioned temporal tale and regular table) can
>> be used in Processing-time temporal table join,
>> lookup join is a kind of physical implementation of  Processing-time
>> temporal table join which lookups the external system’s data.
>>
>>
> Lookup Join was just a quick shot proposal without putting much thought
> into it, but a different name might help to solve the naming issue with
> "temporal table without versions".
> Another proposal would be "Version Join" or "Table Version Join".
>
>
>> If we agree regular tables are temporal table, I tend to keep latest-only
>> temporal table to clarify the temporal and version concept in regular table.
>>
>>
> In my understanding access to a table's history is the core property of
> temporal tables (that's also how temporal tables are defined for SQL Server
> [1] or PostgreSQL [2]).
> A "temporal table without versions"or "latest-only temporal table" is just
> a table. It does not have any other property than all the tables we've
> always dealt with.
>
> So I would say that there are temporal tables (for which earlier versions
> can be accessed) and just tables like those that our users have always used
> so far (for which there are no earlier versions).
>
> Maybe it's just me, but "temporal table without versions" or "latest-only
> temporal table" sound like contradictions to me, similar to "tiny
> skyscraper".
>
> How about we try to get a few more opinions on this?
> What do others think about this issue?
>
>
>> Best
>> Leonard
>>
>>
> Best,
> Fabian
>
> [1]
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
> [2] https://pgxn.org/dist/temporal_tables/
>
>
> Cheers,
>> Fabian
>>
>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjtdcq@gmail.com
>> >:
>>
>>> Thanks Fabian and Seth for the feedback
>>>
>>> I agree the name “temporal table without version” is less accurate
>>> because this kind of temporal table has a latest version rather than has no
>>> version.
>>>
>>> How about “Latest-only temporal table” ? The related concept section
>>> updated as following:
>>>
>>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>>> in temporal table are associated with one or more temporal periods.
>>>
>>> *    Version:* A temporal table can split into a set of versioned table
>>> snapshots, the *version* in table snapshots represents the valid life
>>> circle of rows, the start time and the end time of the valid period can be
>>>  assigned by users.
>>>
>>> *    Versioned temporal table*: If the row in temporal table can track
>>> its history changes and visit its history versions, we called this kind of
>>> temporal table as versioned temporal table.
>>>
>>> *    Latest-only temporal table**:* If the row in temporal table can
>>> only track its latest version, we called this kind of temporal table as
>>> latest-only temporal table. The temporal table in lookup join can only
>>> track its latest version and thus it's also a latest-only temporal table.
>>>
>>> Best
>>> Leonard
>>>
>>> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>>>
>>> +1 to the updated design.
>>>
>>> I agree with Fabian that the naming of "temporal table without version"
>>> is
>>> a bit confusing but the actual semantics make sense to me. I think just
>>> saying its a Flink managed lookup join makes sense.
>>>
>>> Seth
>>>
>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com> wrote:
>>>
>>> Thanks for the updated FLIP Leonard!
>>> In my opinion this was an improvement.
>>> So +1 for this design.
>>>
>>> I have just one remark regarding the terminology.
>>> I find the term "Temporal Table without Version" somewhat confusing.
>>> IMO, versions are the core principle of temporal tables and temporal
>>> tables without versions don't make much sense to me.
>>>
>>> What makes such a table a "Temporal" table? Isn't it just a regular
>>> table?
>>> If I understand the proposal correctly, "Temporal Tables without Version"
>>> can only be used in processing time temporal table joins, because this
>>> join
>>> only requests the current version.
>>> But all regular tables can be used in processing time (temporal) table
>>> joins as well.
>>> It's basically the same as a lookup join, with the only difference that
>>> the table is maintained in Flink and not accessed in an external system
>>> (for example via JDBC).
>>>
>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>> be used in "processing time temporal table joins" and due to its name
>>> this
>>> join needs to join something that's called "Temporal"?
>>> In that case, we might want to rename "processing time temporal table
>>> joins" into something else that does not imply a versioning.
>>> Maybe we can call them just lookup joins to avoid introducing another
>>> term?
>>>
>>> Thanks, Fabian
>>>
>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <lirui.fudan@gmail.com
>>> >:
>>>
>>> Thanks Leonard for the clarifications!
>>>
>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>>
>>>
>>> But are we still able to track different views of such a
>>> table through time, as rows are added/deleted to/from the table?
>>>
>>>
>>> Yes, in fact we support temporal table from changlog which contains all
>>> possible message types(INSERT/UPDATE/DELETE).
>>>
>>> For
>>> example, suppose I have an append-only table source with event-time
>>>
>>> and PK,
>>>
>>> will I be allowed to do an event-time temporal join with this table?
>>>
>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>> this case exactly.
>>>
>>> Best
>>> Leonard
>>>
>>>
>>>
>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>
>>> xbjtdcq@gmail.com>> wrote:
>>>
>>>
>>> Hi, all
>>>
>>> After a detailed offline discussion about the temporal table related
>>> concept and behavior, we had a reliable solution and rejected several
>>> alternatives.
>>> Compared to rejected alternatives, the proposed approach is a more
>>>
>>> unified
>>>
>>> story and also friendly to user and current Flink framework.
>>> I improved the FLIP[1] with the proposed approach and refactored the
>>> document organization to make it clear enough.
>>>
>>> Please let me know if you have any concerns, I’m looking forward your
>>> comments.
>>>
>>>
>>> Best
>>> Leonard
>>>
>>> [1]
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>
>>> <
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>
>>>
>>>
>>>
>>>
>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>
>>> xbjtdcq@gmail.com>> 写道:
>>>
>>>
>>> Hi, all
>>>
>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>
>>> Best
>>> Leonard
>>> [1]
>>>
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>
>>>
>>> <
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>
>>>
>>>
>>>
>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>
>>> xbjtdcq@gmail.com> <mailto:
>>>
>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>
>>>
>>> Hi, Timo
>>>
>>> Thanks for you response.
>>>
>>> 1) Naming: Is operation time a good term for this concept? If I
>>>
>>> read
>>>
>>> "The operation time is the time when the changes happened in system."
>>>
>>> or
>>>
>>> "The system time of DML execution in database", why don't we call it
>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>
>>> time in
>>>
>>> Flink should be thought through.
>>>
>>>
>>> I agree that we should thought through. I have considered the name
>>>
>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>
>>> the
>>>
>>> name.
>>>
>>>
>>> I proposed `operationTime` because most changelog comes from
>>>
>>> Database
>>>
>>> and we always called an action as `operation` rather than `change` in
>>> Database, the operation time is  easier to understand  for database
>>>
>>> users,
>>>
>>> but it's more like a database terminology.
>>>
>>>
>>> For `SystemTime`, user may confuse which one does the system in
>>>
>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>
>>> not a
>>>
>>> good name.
>>>
>>>
>>> `ChangelogTime` is a pretty choice which is more unified with
>>>
>>> existed
>>>
>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>
>>> `ChangelogTime`
>>>
>>> and I’ll update the FLIP.
>>>
>>>
>>>
>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>
>>> expose the concept of time through the user-level `Row` type? The
>>>
>>> FLIP does
>>>
>>> not mention this explictly. I think we can keep it as an internal
>>>
>>> concept
>>>
>>> but I just wanted to ask for clarification.
>>>
>>>
>>> Yes, I want to keep it as an internal concept, we have discussed
>>>
>>> that
>>>
>>> changelog time concept should be the third time concept(the other two
>>>
>>> are
>>>
>>> event-time and processing-time). It’s not easy for normal users(or to
>>>
>>> help
>>>
>>> normal users) understand the three concepts accurately, and I did not
>>>
>>> find
>>>
>>> a big enough scenario that user need to touch the changelog time for
>>>
>>> now,
>>>
>>> so I tend to do not expose the concept to users.
>>>
>>>
>>>
>>> Best,
>>> Leonard
>>>
>>>
>>>
>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>
>>> Thanks Konstantin,
>>> Regarding your questions, hope my comments has address your
>>>
>>> questions
>>>
>>> and I also add a few explanation in the FLIP.
>>>
>>> Thank you all for the feedback,
>>> It seems everyone involved  in this thread has reached a
>>>
>>> consensus.
>>>
>>> I will start a vote thread  later.
>>> Best,
>>> Leonard
>>>
>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>>
>>> godfreyhe@gmail.com> <mailto:
>>>
>>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>
>>>
>>> Thanks Lennard for driving this FLIP.
>>> Looks good to me.
>>>
>>> Best,
>>> Godfrey
>>>
>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>>
>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>>
>>> 下午12:04写道:
>>>
>>>
>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>
>>> shape.
>>>
>>> +1 to start a vote.
>>>
>>> Best,
>>> Jark
>>>
>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>>
>>> <ma...@gmail.com>
>>>
>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>
>>>
>>> Hi Leonard,
>>>
>>> Thanks for this FLIP!
>>> Looks good from my side.
>>>
>>> Cheers, Fabian
>>>
>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>>
>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>>
>>> :
>>>
>>>
>>> Hi Leondard,
>>>
>>> Thank you for pushing this, I think the updated syntax looks
>>>
>>> really
>>>
>>> good
>>>
>>> and the semantics make sense to me.
>>>
>>> +1
>>>
>>> Seth
>>>
>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>
>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>>
>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>
>>>
>>> Hi, Konstantin
>>>
>>>
>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>
>>> joined
>>>
>>> on
>>>
>>> the
>>>
>>> PRIMARY KEY attribute, correct?
>>>
>>> Yes, the PRIMARY KEY would be join key.
>>>
>>>
>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>
>>> VIEW
>>>
>>> definition that defines
>>>
>>> whether a event-time or processing time temporal table join
>>>
>>> is
>>>
>>> used?
>>>
>>>
>>> I think event-time or processing-time temporal table join
>>>
>>> depends on
>>>
>>> fact
>>>
>>> table’s time attribute in temporal join rather than from
>>>
>>> temporal
>>>
>>> table
>>>
>>> side, the event-time or processing time in temporal table is
>>>
>>> just
>>>
>>> used
>>>
>>> to
>>>
>>> split the validity period of versioned snapshot of temporal
>>>
>>> table.
>>>
>>> The
>>>
>>> processing time attribute is not  necessary for temporal
>>>
>>> table
>>>
>>> without
>>>
>>> version, only the primary key is required, the following
>>>
>>> VIEW is
>>>
>>> also
>>>
>>> valid
>>>
>>> for temporal table without version.
>>> CREATE VIEW latest_rates AS
>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>
>>> latest
>>>
>>> version
>>> FROM rates
>>> GROUP BY currency;                           -- inferred
>>>
>>> primary
>>>
>>> key
>>>
>>>
>>>
>>>
>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>
>>> versioned
>>>
>>> on
>>>
>>> operation_time regardless of the lookup table attribute
>>>
>>> (event-time
>>>
>>> or
>>>
>>> processing time attribute), correct?
>>>
>>>
>>>
>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>
>>> using the
>>>
>>> o.time
>>>
>>> value to lookup the version of the temporal table.
>>> For fact table has the processing time attribute, it means
>>>
>>> only
>>>
>>> lookup
>>>
>>> the
>>>
>>> latest version of temporal table and we can do some
>>>
>>> optimization
>>>
>>> in
>>>
>>> implementation like only keep the latest version.
>>>
>>>
>>> Best
>>> Leonard
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>>
>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Fabian Hueske <fh...@gmail.com>.
Hi everyone,

Yes, just to help user distinguish the difference between versioned
> temporal table and  latest-only temporal table.
>
>
I don't think we help users to understand the differences if we invent new
(IMO confusing) terms ("temporal table without version" or "latest-only
temporal table") instead of using known terminology ("table").


> As I suggested before, it might make sense to choose a different name for
> this join instead of using a fancy name for the tables such that the name
> fits the name of the join.
>
> I think lookup join is a physical implementation of Processing-Time
> temporal table join, we can even offer a lookup join implementation for
> Event-time temporal table join if the external system support lookup
> history data.
>
> Versioned temporal table can be used in Event-time temporal table join,
> all temporal table(including versioned temporal tale and regular table) can
> be used in Processing-time temporal table join,
> lookup join is a kind of physical implementation of  Processing-time
> temporal table join which lookups the external system’s data.
>
>
Lookup Join was just a quick shot proposal without putting much thought
into it, but a different name might help to solve the naming issue with
"temporal table without versions".
Another proposal would be "Version Join" or "Table Version Join".


> If we agree regular tables are temporal table, I tend to keep latest-only
> temporal table to clarify the temporal and version concept in regular table.
>
>
In my understanding access to a table's history is the core property of
temporal tables (that's also how temporal tables are defined for SQL Server
[1] or PostgreSQL [2]).
A "temporal table without versions"or "latest-only temporal table" is just
a table. It does not have any other property than all the tables we've
always dealt with.

So I would say that there are temporal tables (for which earlier versions
can be accessed) and just tables like those that our users have always used
so far (for which there are no earlier versions).

Maybe it's just me, but "temporal table without versions" or "latest-only
temporal table" sound like contradictions to me, similar to "tiny
skyscraper".

How about we try to get a few more opinions on this?
What do others think about this issue?


> Best
> Leonard
>
>
Best,
Fabian

[1]
https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
[2] https://pgxn.org/dist/temporal_tables/


Cheers,
> Fabian
>
> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xb...@gmail.com>:
>
>> Thanks Fabian and Seth for the feedback
>>
>> I agree the name “temporal table without version” is less accurate
>> because this kind of temporal table has a latest version rather than has no
>> version.
>>
>> How about “Latest-only temporal table” ? The related concept section
>> updated as following:
>>
>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>> in temporal table are associated with one or more temporal periods.
>>
>> *    Version:* A temporal table can split into a set of versioned table
>> snapshots, the *version* in table snapshots represents the valid life
>> circle of rows, the start time and the end time of the valid period can be
>>  assigned by users.
>>
>> *    Versioned temporal table*: If the row in temporal table can track
>> its history changes and visit its history versions, we called this kind of
>> temporal table as versioned temporal table.
>>
>> *    Latest-only temporal table**:* If the row in temporal table can
>> only track its latest version, we called this kind of temporal table as
>> latest-only temporal table. The temporal table in lookup join can only
>> track its latest version and thus it's also a latest-only temporal table.
>>
>> Best
>> Leonard
>>
>> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>>
>> +1 to the updated design.
>>
>> I agree with Fabian that the naming of "temporal table without version" is
>> a bit confusing but the actual semantics make sense to me. I think just
>> saying its a Flink managed lookup join makes sense.
>>
>> Seth
>>
>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com> wrote:
>>
>> Thanks for the updated FLIP Leonard!
>> In my opinion this was an improvement.
>> So +1 for this design.
>>
>> I have just one remark regarding the terminology.
>> I find the term "Temporal Table without Version" somewhat confusing.
>> IMO, versions are the core principle of temporal tables and temporal
>> tables without versions don't make much sense to me.
>>
>> What makes such a table a "Temporal" table? Isn't it just a regular table?
>> If I understand the proposal correctly, "Temporal Tables without Version"
>> can only be used in processing time temporal table joins, because this
>> join
>> only requests the current version.
>> But all regular tables can be used in processing time (temporal) table
>> joins as well.
>> It's basically the same as a lookup join, with the only difference that
>> the table is maintained in Flink and not accessed in an external system
>> (for example via JDBC).
>>
>> Are "Temporal Tables without Version" called "Temporal" because they can
>> be used in "processing time temporal table joins" and due to its name this
>> join needs to join something that's called "Temporal"?
>> In that case, we might want to rename "processing time temporal table
>> joins" into something else that does not imply a versioning.
>> Maybe we can call them just lookup joins to avoid introducing another
>> term?
>>
>> Thanks, Fabian
>>
>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <lirui.fudan@gmail.com
>> >:
>>
>> Thanks Leonard for the clarifications!
>>
>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>
>>
>> But are we still able to track different views of such a
>> table through time, as rows are added/deleted to/from the table?
>>
>>
>> Yes, in fact we support temporal table from changlog which contains all
>> possible message types(INSERT/UPDATE/DELETE).
>>
>> For
>> example, suppose I have an append-only table source with event-time
>>
>> and PK,
>>
>> will I be allowed to do an event-time temporal join with this table?
>>
>> Yes, I list some examples in the doc, the example versioned_rates3  is
>> this case exactly.
>>
>> Best
>> Leonard
>>
>>
>>
>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>
>> xbjtdcq@gmail.com>> wrote:
>>
>>
>> Hi, all
>>
>> After a detailed offline discussion about the temporal table related
>> concept and behavior, we had a reliable solution and rejected several
>> alternatives.
>> Compared to rejected alternatives, the proposed approach is a more
>>
>> unified
>>
>> story and also friendly to user and current Flink framework.
>> I improved the FLIP[1] with the proposed approach and refactored the
>> document organization to make it clear enough.
>>
>> Please let me know if you have any concerns, I’m looking forward your
>> comments.
>>
>>
>> Best
>> Leonard
>>
>> [1]
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>
>> <
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>
>>
>>
>>
>>
>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>
>> xbjtdcq@gmail.com>> 写道:
>>
>>
>> Hi, all
>>
>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>
>> Best
>> Leonard
>> [1]
>>
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>
>>
>> <
>>
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>
>>
>>
>>
>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>
>> xbjtdcq@gmail.com> <mailto:
>>
>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>
>>
>> Hi, Timo
>>
>> Thanks for you response.
>>
>> 1) Naming: Is operation time a good term for this concept? If I
>>
>> read
>>
>> "The operation time is the time when the changes happened in system."
>>
>> or
>>
>> "The system time of DML execution in database", why don't we call it
>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>
>> time in
>>
>> Flink should be thought through.
>>
>>
>> I agree that we should thought through. I have considered the name
>>
>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>
>> the
>>
>> name.
>>
>>
>> I proposed `operationTime` because most changelog comes from
>>
>> Database
>>
>> and we always called an action as `operation` rather than `change` in
>> Database, the operation time is  easier to understand  for database
>>
>> users,
>>
>> but it's more like a database terminology.
>>
>>
>> For `SystemTime`, user may confuse which one does the system in
>>
>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>
>> not a
>>
>> good name.
>>
>>
>> `ChangelogTime` is a pretty choice which is more unified with
>>
>> existed
>>
>> terminology `Changelog` and `ChangelogMode`, so let me use
>>
>> `ChangelogTime`
>>
>> and I’ll update the FLIP.
>>
>>
>>
>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>
>> expose the concept of time through the user-level `Row` type? The
>>
>> FLIP does
>>
>> not mention this explictly. I think we can keep it as an internal
>>
>> concept
>>
>> but I just wanted to ask for clarification.
>>
>>
>> Yes, I want to keep it as an internal concept, we have discussed
>>
>> that
>>
>> changelog time concept should be the third time concept(the other two
>>
>> are
>>
>> event-time and processing-time). It’s not easy for normal users(or to
>>
>> help
>>
>> normal users) understand the three concepts accurately, and I did not
>>
>> find
>>
>> a big enough scenario that user need to touch the changelog time for
>>
>> now,
>>
>> so I tend to do not expose the concept to users.
>>
>>
>>
>> Best,
>> Leonard
>>
>>
>>
>> On 04.08.20 04:58, Leonard Xu wrote:
>>
>> Thanks Konstantin,
>> Regarding your questions, hope my comments has address your
>>
>> questions
>>
>> and I also add a few explanation in the FLIP.
>>
>> Thank you all for the feedback,
>> It seems everyone involved  in this thread has reached a
>>
>> consensus.
>>
>> I will start a vote thread  later.
>> Best,
>> Leonard
>>
>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>
>> godfreyhe@gmail.com> <mailto:
>>
>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>
>>
>> Thanks Lennard for driving this FLIP.
>> Looks good to me.
>>
>> Best,
>> Godfrey
>>
>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>
>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>
>> 下午12:04写道:
>>
>>
>> Thanks Leonard for the great FLIP. I think it is in very good
>>
>> shape.
>>
>> +1 to start a vote.
>>
>> Best,
>> Jark
>>
>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>
>> <ma...@gmail.com>
>>
>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>
>>
>> Hi Leonard,
>>
>> Thanks for this FLIP!
>> Looks good from my side.
>>
>> Cheers, Fabian
>>
>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>
>> sjwiesman@gmail.com <ma...@gmail.com>>
>>
>> :
>>
>>
>> Hi Leondard,
>>
>> Thank you for pushing this, I think the updated syntax looks
>>
>> really
>>
>> good
>>
>> and the semantics make sense to me.
>>
>> +1
>>
>> Seth
>>
>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>
>> xbjtdcq@gmail.com <ma...@gmail.com>
>>
>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>
>>
>> Hi, Konstantin
>>
>>
>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>
>> joined
>>
>> on
>>
>> the
>>
>> PRIMARY KEY attribute, correct?
>>
>> Yes, the PRIMARY KEY would be join key.
>>
>>
>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>
>> VIEW
>>
>> definition that defines
>>
>> whether a event-time or processing time temporal table join
>>
>> is
>>
>> used?
>>
>>
>> I think event-time or processing-time temporal table join
>>
>> depends on
>>
>> fact
>>
>> table’s time attribute in temporal join rather than from
>>
>> temporal
>>
>> table
>>
>> side, the event-time or processing time in temporal table is
>>
>> just
>>
>> used
>>
>> to
>>
>> split the validity period of versioned snapshot of temporal
>>
>> table.
>>
>> The
>>
>> processing time attribute is not  necessary for temporal
>>
>> table
>>
>> without
>>
>> version, only the primary key is required, the following
>>
>> VIEW is
>>
>> also
>>
>> valid
>>
>> for temporal table without version.
>> CREATE VIEW latest_rates AS
>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>
>> latest
>>
>> version
>> FROM rates
>> GROUP BY currency;                           -- inferred
>>
>> primary
>>
>> key
>>
>>
>>
>>
>> 3) A "Versioned Temporal Table DDL on source" is always
>>
>> versioned
>>
>> on
>>
>> operation_time regardless of the lookup table attribute
>>
>> (event-time
>>
>> or
>>
>> processing time attribute), correct?
>>
>>
>>
>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>
>> using the
>>
>> o.time
>>
>> value to lookup the version of the temporal table.
>> For fact table has the processing time attribute, it means
>>
>> only
>>
>> lookup
>>
>> the
>>
>> latest version of temporal table and we can do some
>>
>> optimization
>>
>> in
>>
>> implementation like only keep the latest version.
>>
>>
>> Best
>> Leonard
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>>
>>
>>
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks Fabian


> Isn't a table that has only the latest version just a regular table?
> Why invent a fancy name for something that is the standard behavior of (non-temporal) tables in all database systems?
Yes, just to help user distinguish the difference between versioned temporal table and  latest-only temporal table.

(1)The table after `FOR SYSTEM_TIME AS OF` must be a temporal table, 
(2) All Flink table can be placed after `FOR SYSTEM_TIME AS OF` in processing-time temporal join

IMO, all Flink tables are temporal table, versioned temporal table is a kind of temporal table which can visit its history versions,
and the other temporal table which can only track its current version(you called regular table). 

So the other regular table is temporal table, the version is the most important concept in temporal table, I’d like to rename this kind of temporal table to help user understand it. 
Database system has a similar “current table” in system-versioned temporal table. 


> Couldn't we just say that a table with a processing time attribute can be joined with the current version of any table (not calling it a "Latest-only temporal table") using `FOR SYSTEM_TIME AS OF`?
The "any table" is temporal table, and the current version is equal to latest version.


> As I suggested before, it might make sense to choose a different name for this join instead of using a fancy name for the tables such that the name fits the name of the join.
I think lookup join is a physical implementation of Processing-Time temporal table join, we can even offer a lookup join implementation for Event-time temporal table join if the external system support lookup history data.

Versioned temporal table can be used in Event-time temporal table join, all temporal table(including versioned temporal tale and regular table) can be used in Processing-time temporal table join, 
lookup join is a kind of physical implementation of  Processing-time temporal table join which lookups the external system’s data.

If we agree regular tables are temporal table, I tend to keep latest-only temporal table to clarify the temporal and version concept in regular table.


Best
Leonard

> Cheers, 
> Fabian
> 
> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>>:
> Thanks Fabian and Seth for the feedback
> 
> I agree the name “temporal table without version” is less accurate because this kind of temporal table has a latest version rather than has no version.
>  
> How about “Latest-only temporal table” ? The related concept section updated as following:
> 
> Temporal Table: Temporal table is a table that evolves over time, rows in temporal table are associated with one or more temporal periods.
>     Version: A temporal table can split into a set of versioned table snapshots, the version in table snapshots represents the valid life circle of rows, the start time and the end time of the valid period can be  assigned by users.
>     Versioned temporal table: If the row in temporal table can track its history changes and visit its history versions, we called this kind of temporal table as versioned temporal table.
>     Latest-only temporal table: If the row in temporal table can only track its latest version, we called this kind of temporal table as latest-only temporal table. The temporal table in lookup join can only track its latest version and thus it's also a latest-only temporal table.
> 
> Best
> Leonard
> 
>> 在 2020年8月19日,04:46,Seth Wiesman <sjwiesman@gmail.com <ma...@gmail.com>> 写道:
>> 
>> +1 to the updated design.
>> 
>> I agree with Fabian that the naming of "temporal table without version" is
>> a bit confusing but the actual semantics make sense to me. I think just
>> saying its a Flink managed lookup join makes sense.
>> 
>> Seth
>> 
>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
>> 
>>> Thanks for the updated FLIP Leonard!
>>> In my opinion this was an improvement.
>>> So +1 for this design.
>>> 
>>> I have just one remark regarding the terminology.
>>> I find the term "Temporal Table without Version" somewhat confusing.
>>> IMO, versions are the core principle of temporal tables and temporal
>>> tables without versions don't make much sense to me.
>>> 
>>> What makes such a table a "Temporal" table? Isn't it just a regular table?
>>> If I understand the proposal correctly, "Temporal Tables without Version"
>>> can only be used in processing time temporal table joins, because this join
>>> only requests the current version.
>>> But all regular tables can be used in processing time (temporal) table
>>> joins as well.
>>> It's basically the same as a lookup join, with the only difference that
>>> the table is maintained in Flink and not accessed in an external system
>>> (for example via JDBC).
>>> 
>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>> be used in "processing time temporal table joins" and due to its name this
>>> join needs to join something that's called "Temporal"?
>>> In that case, we might want to rename "processing time temporal table
>>> joins" into something else that does not imply a versioning.
>>> Maybe we can call them just lookup joins to avoid introducing another term?
>>> 
>>> Thanks, Fabian
>>> 
>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <lirui.fudan@gmail.com <ma...@gmail.com>>:
>>> 
>>>> Thanks Leonard for the clarifications!
>>>> 
>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>>> 
>>>>>> But are we still able to track different views of such a
>>>>>> table through time, as rows are added/deleted to/from the table?
>>>>> 
>>>>> Yes, in fact we support temporal table from changlog which contains all
>>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>> 
>>>>>> For
>>>>>> example, suppose I have an append-only table source with event-time
>>>>> and PK,
>>>>>> will I be allowed to do an event-time temporal join with this table?
>>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>>> this case exactly.
>>>>> 
>>>>> Best
>>>>> Leonard
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>> 
>>>>>>> Hi, all
>>>>>>> 
>>>>>>> After a detailed offline discussion about the temporal table related
>>>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>>>> alternatives.
>>>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>> unified
>>>>>>> story and also friendly to user and current Flink framework.
>>>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>>>> document organization to make it clear enough.
>>>>>>> 
>>>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>>>> comments.
>>>>>>> 
>>>>>>> 
>>>>>>> Best
>>>>>>> Leonard
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>>> <
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>> <
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>> 
>>>>>>>> Hi, all
>>>>>>>> 
>>>>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Leonard
>>>>>>>> [1]
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>> <
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> 
>>>>>>> <
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>> <
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>>>>> xbjtdcq@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>> 
>>>>>>>>> Hi, Timo
>>>>>>>>> 
>>>>>>>>> Thanks for you response.
>>>>>>>>> 
>>>>>>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>> read
>>>>>>> "The operation time is the time when the changes happened in system."
>>>>> or
>>>>>>> "The system time of DML execution in database", why don't we call it
>>>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>> time in
>>>>>>> Flink should be thought through.
>>>>>>>>> 
>>>>>>>>> I agree that we should thought through. I have considered the name
>>>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>> the
>>>>>>> name.
>>>>>>>>> 
>>>>>>>>> I proposed `operationTime` because most changelog comes from
>>>>> Database
>>>>>>> and we always called an action as `operation` rather than `change` in
>>>>>>> Database, the operation time is  easier to understand  for database
>>>>> users,
>>>>>>> but it's more like a database terminology.
>>>>>>>>> 
>>>>>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>> not a
>>>>>>> good name.
>>>>>>>>> 
>>>>>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>> existed
>>>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>> `ChangelogTime`
>>>>>>> and I’ll update the FLIP.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>>> expose the concept of time through the user-level `Row` type? The
>>>>> FLIP does
>>>>>>> not mention this explictly. I think we can keep it as an internal
>>>>> concept
>>>>>>> but I just wanted to ask for clarification.
>>>>>>>>> 
>>>>>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>> that
>>>>>>> changelog time concept should be the third time concept(the other two
>>>>> are
>>>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>> help
>>>>>>> normal users) understand the three concepts accurately, and I did not
>>>>> find
>>>>>>> a big enough scenario that user need to touch the changelog time for
>>>>> now,
>>>>>>> so I tend to do not expose the concept to users.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Leonard
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>>>>>>> Thanks Konstantin,
>>>>>>>>>>> Regarding your questions, hope my comments has address your
>>>>> questions
>>>>>>> and I also add a few explanation in the FLIP.
>>>>>>>>>>> Thank you all for the feedback,
>>>>>>>>>>> It seems everyone involved  in this thread has reached a
>>>>> consensus.
>>>>>>>>>>> I will start a vote thread  later.
>>>>>>>>>>> Best,
>>>>>>>>>>> Leonard
>>>>>>>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <ma...@gmail.com> <mailto:
>>>>> godfreyhe@gmail.com <ma...@gmail.com>> <mailto:
>>>>>>> godfreyhe@gmail.com <ma...@gmail.com> <mailto:godfreyhe@gmail.com <ma...@gmail.com>>>> 写道:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Lennard for driving this FLIP.
>>>>>>>>>>>> Looks good to me.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Godfrey
>>>>>>>>>>>> 
>>>>>>>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>> <mailto:
>>>>> imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>>>> 于2020年8月3日周一
>>>>>>> 下午12:04写道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>> shape.
>>>>>>>>>>>>> +1 to start a vote.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jark
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>
>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>
>>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com> <mailto:fhueske@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for this FLIP!
>>>>>>>>>>>>>> Looks good from my side.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers, Fabian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>> <mailto:
>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>>>
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Leondard,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>>> really
>>>>>>>>>>>>> good
>>>>>>>>>>>>>>> and the semantics make sense to me.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Seth
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>> xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>
>>>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi, Konstantin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>>> joined
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>>> VIEW
>>>>>>>>>>>>>>>> definition that defines
>>>>>>>>>>>>>>>>> whether a event-time or processing time temporal table join
>>>>> is
>>>>>>>>>>>>> used?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think event-time or processing-time temporal table join
>>>>>>> depends on
>>>>>>>>>>>>>> fact
>>>>>>>>>>>>>>>> table’s time attribute in temporal join rather than from
>>>>> temporal
>>>>>>>>>>>>> table
>>>>>>>>>>>>>>>> side, the event-time or processing time in temporal table is
>>>>> just
>>>>>>>>>>>>> used
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> split the validity period of versioned snapshot of temporal
>>>>>>> table.
>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> processing time attribute is not  necessary for temporal
>>>>> table
>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>> version, only the primary key is required, the following
>>>>> VIEW is
>>>>>>> also
>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>> for temporal table without version.
>>>>>>>>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>>> latest
>>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>>> FROM rates
>>>>>>>>>>>>>>>> GROUP BY currency;                           -- inferred
>>>>> primary
>>>>>>> key
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>>> versioned
>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> operation_time regardless of the lookup table attribute
>>>>>>> (event-time
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> processing time attribute), correct?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>> using the
>>>>>>>>>>>>>> o.time
>>>>>>>>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>>>>>>>>> For fact table has the processing time attribute, it means
>>>>> only
>>>>>>>>>>>>> lookup
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> latest version of temporal table and we can do some
>>>>> optimization
>>>>>>> in
>>>>>>>>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>> 
>>> 
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Leonard,

Not sure if "Latest-only temporal table" is much better than "temporal
table without version".
Isn't a table that has only the latest version just a regular table?
Why invent a fancy name for something that is the standard behavior of
(non-temporal) tables in all database systems?

Couldn't we just say that a table with a processing time attribute can be
joined with the current version of any table (not calling it a "Latest-only
temporal table") using `FOR SYSTEM_TIME AS OF`?
As I suggested before, it might make sense to choose a different name for
this join instead of using a fancy name for the tables such that the name
fits the name of the join.

Cheers,
Fabian

Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xb...@gmail.com>:

> Thanks Fabian and Seth for the feedback
>
> I agree the name “temporal table without version” is less accurate because
> this kind of temporal table has a latest version rather than has no version.
>
> How about “Latest-only temporal table” ? The related concept section
> updated as following:
>
> *Temporal Table:* Temporal table is a table that evolves over time, rows
> in temporal table are associated with one or more temporal periods.
>
> *    Version:* A temporal table can split into a set of versioned table
> snapshots, the *version* in table snapshots represents the valid life
> circle of rows, the start time and the end time of the valid period can be
>  assigned by users.
>
> *    Versioned temporal table*: If the row in temporal table can track
> its history changes and visit its history versions, we called this kind of
> temporal table as versioned temporal table.
>
> *    Latest-only temporal table**:* If the row in temporal table can only
> track its latest version, we called this kind of temporal table as
> latest-only temporal table. The temporal table in lookup join can only
> track its latest version and thus it's also a latest-only temporal table.
>
> Best
> Leonard
>
> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
>
> +1 to the updated design.
>
> I agree with Fabian that the naming of "temporal table without version" is
> a bit confusing but the actual semantics make sense to me. I think just
> saying its a Flink managed lookup join makes sense.
>
> Seth
>
> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com> wrote:
>
> Thanks for the updated FLIP Leonard!
> In my opinion this was an improvement.
> So +1 for this design.
>
> I have just one remark regarding the terminology.
> I find the term "Temporal Table without Version" somewhat confusing.
> IMO, versions are the core principle of temporal tables and temporal
> tables without versions don't make much sense to me.
>
> What makes such a table a "Temporal" table? Isn't it just a regular table?
> If I understand the proposal correctly, "Temporal Tables without Version"
> can only be used in processing time temporal table joins, because this join
> only requests the current version.
> But all regular tables can be used in processing time (temporal) table
> joins as well.
> It's basically the same as a lookup join, with the only difference that
> the table is maintained in Flink and not accessed in an external system
> (for example via JDBC).
>
> Are "Temporal Tables without Version" called "Temporal" because they can
> be used in "processing time temporal table joins" and due to its name this
> join needs to join something that's called "Temporal"?
> In that case, we might want to rename "processing time temporal table
> joins" into something else that does not imply a versioning.
> Maybe we can call them just lookup joins to avoid introducing another term?
>
> Thanks, Fabian
>
> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <li...@gmail.com>:
>
> Thanks Leonard for the clarifications!
>
> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>
>
> But are we still able to track different views of such a
> table through time, as rows are added/deleted to/from the table?
>
>
> Yes, in fact we support temporal table from changlog which contains all
> possible message types(INSERT/UPDATE/DELETE).
>
> For
> example, suppose I have an append-only table source with event-time
>
> and PK,
>
> will I be allowed to do an event-time temporal join with this table?
>
> Yes, I list some examples in the doc, the example versioned_rates3  is
> this case exactly.
>
> Best
> Leonard
>
>
>
> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>
> xbjtdcq@gmail.com>> wrote:
>
>
> Hi, all
>
> After a detailed offline discussion about the temporal table related
> concept and behavior, we had a reliable solution and rejected several
> alternatives.
> Compared to rejected alternatives, the proposed approach is a more
>
> unified
>
> story and also friendly to user and current Flink framework.
> I improved the FLIP[1] with the proposed approach and refactored the
> document organization to make it clear enough.
>
> Please let me know if you have any concerns, I’m looking forward your
> comments.
>
>
> Best
> Leonard
>
> [1]
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>
> <
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>
>
>
>
>
> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>
> xbjtdcq@gmail.com>> 写道:
>
>
> Hi, all
>
> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>
> Best
> Leonard
> [1]
>
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>
>
> <
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>
>
>
>
> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>
> xbjtdcq@gmail.com> <mailto:
>
> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>
>
> Hi, Timo
>
> Thanks for you response.
>
> 1) Naming: Is operation time a good term for this concept? If I
>
> read
>
> "The operation time is the time when the changes happened in system."
>
> or
>
> "The system time of DML execution in database", why don't we call it
> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>
> time in
>
> Flink should be thought through.
>
>
> I agree that we should thought through. I have considered the name
>
> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>
> the
>
> name.
>
>
> I proposed `operationTime` because most changelog comes from
>
> Database
>
> and we always called an action as `operation` rather than `change` in
> Database, the operation time is  easier to understand  for database
>
> users,
>
> but it's more like a database terminology.
>
>
> For `SystemTime`, user may confuse which one does the system in
>
> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>
> not a
>
> good name.
>
>
> `ChangelogTime` is a pretty choice which is more unified with
>
> existed
>
> terminology `Changelog` and `ChangelogMode`, so let me use
>
> `ChangelogTime`
>
> and I’ll update the FLIP.
>
>
>
> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>
> expose the concept of time through the user-level `Row` type? The
>
> FLIP does
>
> not mention this explictly. I think we can keep it as an internal
>
> concept
>
> but I just wanted to ask for clarification.
>
>
> Yes, I want to keep it as an internal concept, we have discussed
>
> that
>
> changelog time concept should be the third time concept(the other two
>
> are
>
> event-time and processing-time). It’s not easy for normal users(or to
>
> help
>
> normal users) understand the three concepts accurately, and I did not
>
> find
>
> a big enough scenario that user need to touch the changelog time for
>
> now,
>
> so I tend to do not expose the concept to users.
>
>
>
> Best,
> Leonard
>
>
>
> On 04.08.20 04:58, Leonard Xu wrote:
>
> Thanks Konstantin,
> Regarding your questions, hope my comments has address your
>
> questions
>
> and I also add a few explanation in the FLIP.
>
> Thank you all for the feedback,
> It seems everyone involved  in this thread has reached a
>
> consensus.
>
> I will start a vote thread  later.
> Best,
> Leonard
>
> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>
> godfreyhe@gmail.com> <mailto:
>
> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>
>
> Thanks Lennard for driving this FLIP.
> Looks good to me.
>
> Best,
> Godfrey
>
> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>
> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>
> 下午12:04写道:
>
>
> Thanks Leonard for the great FLIP. I think it is in very good
>
> shape.
>
> +1 to start a vote.
>
> Best,
> Jark
>
> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>
> <ma...@gmail.com>
>
> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>
>
> Hi Leonard,
>
> Thanks for this FLIP!
> Looks good from my side.
>
> Cheers, Fabian
>
> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>
> sjwiesman@gmail.com <ma...@gmail.com>>
>
> :
>
>
> Hi Leondard,
>
> Thank you for pushing this, I think the updated syntax looks
>
> really
>
> good
>
> and the semantics make sense to me.
>
> +1
>
> Seth
>
> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>
> xbjtdcq@gmail.com <ma...@gmail.com>
>
> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>
>
> Hi, Konstantin
>
>
> 1) A  "Versioned Temporal Table DDL on source" can only be
>
> joined
>
> on
>
> the
>
> PRIMARY KEY attribute, correct?
>
> Yes, the PRIMARY KEY would be join key.
>
>
> 2) Isn't it the time attribute in the ORDER BY clause of the
>
> VIEW
>
> definition that defines
>
> whether a event-time or processing time temporal table join
>
> is
>
> used?
>
>
> I think event-time or processing-time temporal table join
>
> depends on
>
> fact
>
> table’s time attribute in temporal join rather than from
>
> temporal
>
> table
>
> side, the event-time or processing time in temporal table is
>
> just
>
> used
>
> to
>
> split the validity period of versioned snapshot of temporal
>
> table.
>
> The
>
> processing time attribute is not  necessary for temporal
>
> table
>
> without
>
> version, only the primary key is required, the following
>
> VIEW is
>
> also
>
> valid
>
> for temporal table without version.
> CREATE VIEW latest_rates AS
> SELECT currency, LAST_VALUE(rate)            -- only keep the
>
> latest
>
> version
> FROM rates
> GROUP BY currency;                           -- inferred
>
> primary
>
> key
>
>
>
>
> 3) A "Versioned Temporal Table DDL on source" is always
>
> versioned
>
> on
>
> operation_time regardless of the lookup table attribute
>
> (event-time
>
> or
>
> processing time attribute), correct?
>
>
>
> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>
> using the
>
> o.time
>
> value to lookup the version of the temporal table.
> For fact table has the processing time attribute, it means
>
> only
>
> lookup
>
> the
>
> latest version of temporal table and we can do some
>
> optimization
>
> in
>
> implementation like only keep the latest version.
>
>
> Best
> Leonard
>
>
>
>
>
>
>
>
>
>
> --
> Best regards!
> Rui Li
>
>
>
>
> --
> Best regards!
> Rui Li
>
>
>
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks Fabian and Seth for the feedback

I agree the name “temporal table without version” is less accurate because this kind of temporal table has a latest version rather than has no version.
 
How about “Latest-only temporal table” ? The related concept section updated as following:

Temporal Table: Temporal table is a table that evolves over time, rows in temporal table are associated with one or more temporal periods.
    Version: A temporal table can split into a set of versioned table snapshots, the version in table snapshots represents the valid life circle of rows, the start time and the end time of the valid period can be  assigned by users.
    Versioned temporal table: If the row in temporal table can track its history changes and visit its history versions, we called this kind of temporal table as versioned temporal table.
    Latest-only temporal table: If the row in temporal table can only track its latest version, we called this kind of temporal table as latest-only temporal table. The temporal table in lookup join can only track its latest version and thus it's also a latest-only temporal table.

Best
Leonard

> 在 2020年8月19日,04:46,Seth Wiesman <sj...@gmail.com> 写道:
> 
> +1 to the updated design.
> 
> I agree with Fabian that the naming of "temporal table without version" is
> a bit confusing but the actual semantics make sense to me. I think just
> saying its a Flink managed lookup join makes sense.
> 
> Seth
> 
> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com> wrote:
> 
>> Thanks for the updated FLIP Leonard!
>> In my opinion this was an improvement.
>> So +1 for this design.
>> 
>> I have just one remark regarding the terminology.
>> I find the term "Temporal Table without Version" somewhat confusing.
>> IMO, versions are the core principle of temporal tables and temporal
>> tables without versions don't make much sense to me.
>> 
>> What makes such a table a "Temporal" table? Isn't it just a regular table?
>> If I understand the proposal correctly, "Temporal Tables without Version"
>> can only be used in processing time temporal table joins, because this join
>> only requests the current version.
>> But all regular tables can be used in processing time (temporal) table
>> joins as well.
>> It's basically the same as a lookup join, with the only difference that
>> the table is maintained in Flink and not accessed in an external system
>> (for example via JDBC).
>> 
>> Are "Temporal Tables without Version" called "Temporal" because they can
>> be used in "processing time temporal table joins" and due to its name this
>> join needs to join something that's called "Temporal"?
>> In that case, we might want to rename "processing time temporal table
>> joins" into something else that does not imply a versioning.
>> Maybe we can call them just lookup joins to avoid introducing another term?
>> 
>> Thanks, Fabian
>> 
>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <li...@gmail.com>:
>> 
>>> Thanks Leonard for the clarifications!
>>> 
>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>> 
>>>> 
>>>>> But are we still able to track different views of such a
>>>>> table through time, as rows are added/deleted to/from the table?
>>>> 
>>>> Yes, in fact we support temporal table from changlog which contains all
>>>> possible message types(INSERT/UPDATE/DELETE).
>>>> 
>>>>> For
>>>>> example, suppose I have an append-only table source with event-time
>>>> and PK,
>>>>> will I be allowed to do an event-time temporal join with this table?
>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>> this case exactly.
>>>> 
>>>> Best
>>>> Leonard
>>>> 
>>>> 
>>>>> 
>>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>> xbjtdcq@gmail.com>> wrote:
>>>>> 
>>>>>> Hi, all
>>>>>> 
>>>>>> After a detailed offline discussion about the temporal table related
>>>>>> concept and behavior, we had a reliable solution and rejected several
>>>>>> alternatives.
>>>>>> Compared to rejected alternatives, the proposed approach is a more
>>>> unified
>>>>>> story and also friendly to user and current Flink framework.
>>>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>>>> document organization to make it clear enough.
>>>>>> 
>>>>>> Please let me know if you have any concerns, I’m looking forward your
>>>>>> comments.
>>>>>> 
>>>>>> 
>>>>>> Best
>>>>>> Leonard
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>>> <
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>> xbjtdcq@gmail.com>> 写道:
>>>>>>> 
>>>>>>> Hi, all
>>>>>>> 
>>>>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>>>> 
>>>>>>> Best
>>>>>>> Leonard
>>>>>>> [1]
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> 
>>>>>> <
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>>> xbjtdcq@gmail.com> <mailto:
>>>>>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>> 
>>>>>>>> Hi, Timo
>>>>>>>> 
>>>>>>>> Thanks for you response.
>>>>>>>> 
>>>>>>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>> read
>>>>>> "The operation time is the time when the changes happened in system."
>>>> or
>>>>>> "The system time of DML execution in database", why don't we call it
>>>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>> time in
>>>>>> Flink should be thought through.
>>>>>>>> 
>>>>>>>> I agree that we should thought through. I have considered the name
>>>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>> the
>>>>>> name.
>>>>>>>> 
>>>>>>>> I proposed `operationTime` because most changelog comes from
>>>> Database
>>>>>> and we always called an action as `operation` rather than `change` in
>>>>>> Database, the operation time is  easier to understand  for database
>>>> users,
>>>>>> but it's more like a database terminology.
>>>>>>>> 
>>>>>>>> For `SystemTime`, user may confuse which one does the system in
>>>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>> not a
>>>>>> good name.
>>>>>>>> 
>>>>>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>> existed
>>>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>> `ChangelogTime`
>>>>>> and I’ll update the FLIP.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>>> expose the concept of time through the user-level `Row` type? The
>>>> FLIP does
>>>>>> not mention this explictly. I think we can keep it as an internal
>>>> concept
>>>>>> but I just wanted to ask for clarification.
>>>>>>>> 
>>>>>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>> that
>>>>>> changelog time concept should be the third time concept(the other two
>>>> are
>>>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>> help
>>>>>> normal users) understand the three concepts accurately, and I did not
>>>> find
>>>>>> a big enough scenario that user need to touch the changelog time for
>>>> now,
>>>>>> so I tend to do not expose the concept to users.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Leonard
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>>>>>> Thanks Konstantin,
>>>>>>>>>> Regarding your questions, hope my comments has address your
>>>> questions
>>>>>> and I also add a few explanation in the FLIP.
>>>>>>>>>> Thank you all for the feedback,
>>>>>>>>>> It seems everyone involved  in this thread has reached a
>>>> consensus.
>>>>>>>>>> I will start a vote thread  later.
>>>>>>>>>> Best,
>>>>>>>>>> Leonard
>>>>>>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>>> godfreyhe@gmail.com> <mailto:
>>>>>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Lennard for driving this FLIP.
>>>>>>>>>>> Looks good to me.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Godfrey
>>>>>>>>>>> 
>>>>>>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>>>>> 下午12:04写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>> shape.
>>>>>>>>>>>> +1 to start a vote.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jark
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>>> <ma...@gmail.com>
>>>>>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for this FLIP!
>>>>>>>>>>>>> Looks good from my side.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers, Fabian
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>>>>>>>>>>>>> :
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Leondard,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>>>>>> really
>>>>>>>>>>>> good
>>>>>>>>>>>>>> and the semantics make sense to me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Seth
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>>>>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi, Konstantin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>>> joined
>>>>>>>>>>>> on
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>>> VIEW
>>>>>>>>>>>>>>> definition that defines
>>>>>>>>>>>>>>>> whether a event-time or processing time temporal table join
>>>> is
>>>>>>>>>>>> used?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I think event-time or processing-time temporal table join
>>>>>> depends on
>>>>>>>>>>>>> fact
>>>>>>>>>>>>>>> table’s time attribute in temporal join rather than from
>>>> temporal
>>>>>>>>>>>> table
>>>>>>>>>>>>>>> side, the event-time or processing time in temporal table is
>>>> just
>>>>>>>>>>>> used
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> split the validity period of versioned snapshot of temporal
>>>>>> table.
>>>>>>>>>>>> The
>>>>>>>>>>>>>>> processing time attribute is not  necessary for temporal
>>>> table
>>>>>>>>>>>> without
>>>>>>>>>>>>>>> version, only the primary key is required, the following
>>>> VIEW is
>>>>>> also
>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>> for temporal table without version.
>>>>>>>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>>> latest
>>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>> FROM rates
>>>>>>>>>>>>>>> GROUP BY currency;                           -- inferred
>>>> primary
>>>>>> key
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>>> versioned
>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> operation_time regardless of the lookup table attribute
>>>>>> (event-time
>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> processing time attribute), correct?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>> using the
>>>>>>>>>>>>> o.time
>>>>>>>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>>>>>>>> For fact table has the processing time attribute, it means
>>>> only
>>>>>>>>>>>> lookup
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> latest version of temporal table and we can do some
>>>> optimization
>>>>>> in
>>>>>>>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>> Leonard
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards!
>>>>> Rui Li
>>>> 
>>>> 
>>> 
>>> --
>>> Best regards!
>>> Rui Li
>>> 
>> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Seth Wiesman <sj...@gmail.com>.
+1 to the updated design.

I agree with Fabian that the naming of "temporal table without version" is
a bit confusing but the actual semantics make sense to me. I think just
saying its a Flink managed lookup join makes sense.

Seth

On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fh...@gmail.com> wrote:

> Thanks for the updated FLIP Leonard!
> In my opinion this was an improvement.
> So +1 for this design.
>
> I have just one remark regarding the terminology.
> I find the term "Temporal Table without Version" somewhat confusing.
> IMO, versions are the core principle of temporal tables and temporal
> tables without versions don't make much sense to me.
>
> What makes such a table a "Temporal" table? Isn't it just a regular table?
> If I understand the proposal correctly, "Temporal Tables without Version"
> can only be used in processing time temporal table joins, because this join
> only requests the current version.
> But all regular tables can be used in processing time (temporal) table
> joins as well.
> It's basically the same as a lookup join, with the only difference that
> the table is maintained in Flink and not accessed in an external system
> (for example via JDBC).
>
> Are "Temporal Tables without Version" called "Temporal" because they can
> be used in "processing time temporal table joins" and due to its name this
> join needs to join something that's called "Temporal"?
> In that case, we might want to rename "processing time temporal table
> joins" into something else that does not imply a versioning.
> Maybe we can call them just lookup joins to avoid introducing another term?
>
> Thanks, Fabian
>
> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <li...@gmail.com>:
>
>> Thanks Leonard for the clarifications!
>>
>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>>
>>>
>>> > But are we still able to track different views of such a
>>> > table through time, as rows are added/deleted to/from the table?
>>>
>>> Yes, in fact we support temporal table from changlog which contains all
>>> possible message types(INSERT/UPDATE/DELETE).
>>>
>>> > For
>>> > example, suppose I have an append-only table source with event-time
>>> and PK,
>>> > will I be allowed to do an event-time temporal join with this table?
>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>> this case exactly.
>>>
>>> Best
>>> Leonard
>>>
>>>
>>> >
>>> > On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>>> xbjtdcq@gmail.com>> wrote:
>>> >
>>> >> Hi, all
>>> >>
>>> >> After a detailed offline discussion about the temporal table related
>>> >> concept and behavior, we had a reliable solution and rejected several
>>> >> alternatives.
>>> >> Compared to rejected alternatives, the proposed approach is a more
>>> unified
>>> >> story and also friendly to user and current Flink framework.
>>> >> I improved the FLIP[1] with the proposed approach and refactored the
>>> >> document organization to make it clear enough.
>>> >>
>>> >> Please let me know if you have any concerns, I’m looking forward your
>>> >> comments.
>>> >>
>>> >>
>>> >> Best
>>> >> Leonard
>>> >>
>>> >> [1]
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> >> <
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> >
>>> >>>
>>> >>
>>> >>
>>> >>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>> xbjtdcq@gmail.com>> 写道:
>>> >>>
>>> >>> Hi, all
>>> >>>
>>> >>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>> >>>
>>> >>> Best
>>> >>> Leonard
>>> >>> [1]
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> >
>>> >> <
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>> >
>>> >>>
>>> >>>
>>> >>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>>> xbjtdcq@gmail.com> <mailto:
>>> >> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>> >>>>
>>> >>>> Hi, Timo
>>> >>>>
>>> >>>> Thanks for you response.
>>> >>>>
>>> >>>>> 1) Naming: Is operation time a good term for this concept? If I
>>> read
>>> >> "The operation time is the time when the changes happened in system."
>>> or
>>> >> "The system time of DML execution in database", why don't we call it
>>> >> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>> time in
>>> >> Flink should be thought through.
>>> >>>>
>>> >>>> I agree that we should thought through. I have considered the name
>>> >> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>> the
>>> >> name.
>>> >>>>
>>> >>>> I proposed `operationTime` because most changelog comes from
>>> Database
>>> >> and we always called an action as `operation` rather than `change` in
>>> >> Database, the operation time is  easier to understand  for database
>>> users,
>>> >> but it's more like a database terminology.
>>> >>>>
>>> >>>> For `SystemTime`, user may confuse which one does the system in
>>> >> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>> not a
>>> >> good name.
>>> >>>>
>>> >>>> `ChangelogTime` is a pretty choice which is more unified with
>>> existed
>>> >> terminology `Changelog` and `ChangelogMode`, so let me use
>>> `ChangelogTime`
>>> >> and I’ll update the FLIP.
>>> >>>>
>>> >>>>
>>> >>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>> >> expose the concept of time through the user-level `Row` type? The
>>> FLIP does
>>> >> not mention this explictly. I think we can keep it as an internal
>>> concept
>>> >> but I just wanted to ask for clarification.
>>> >>>>
>>> >>>> Yes, I want to keep it as an internal concept, we have discussed
>>> that
>>> >> changelog time concept should be the third time concept(the other two
>>> are
>>> >> event-time and processing-time). It’s not easy for normal users(or to
>>> help
>>> >> normal users) understand the three concepts accurately, and I did not
>>> find
>>> >> a big enough scenario that user need to touch the changelog time for
>>> now,
>>> >> so I tend to do not expose the concept to users.
>>> >>>>
>>> >>>>
>>> >>>> Best,
>>> >>>> Leonard
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>> >>>>>> Thanks Konstantin,
>>> >>>>>> Regarding your questions, hope my comments has address your
>>> questions
>>> >> and I also add a few explanation in the FLIP.
>>> >>>>>> Thank you all for the feedback,
>>> >>>>>> It seems everyone involved  in this thread has reached a
>>> consensus.
>>> >>>>>> I will start a vote thread  later.
>>> >>>>>> Best,
>>> >>>>>> Leonard
>>> >>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>>> godfreyhe@gmail.com> <mailto:
>>> >> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>> >>>>>>>
>>> >>>>>>> Thanks Lennard for driving this FLIP.
>>> >>>>>>> Looks good to me.
>>> >>>>>>>
>>> >>>>>>> Best,
>>> >>>>>>> Godfrey
>>> >>>>>>>
>>> >>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>>> >> 下午12:04写道:
>>> >>>>>>>
>>> >>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>> shape.
>>> >>>>>>>> +1 to start a vote.
>>> >>>>>>>>
>>> >>>>>>>> Best,
>>> >>>>>>>> Jark
>>> >>>>>>>>
>>> >>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>>> <ma...@gmail.com>
>>> >> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Hi Leonard,
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks for this FLIP!
>>> >>>>>>>>> Looks good from my side.
>>> >>>>>>>>>
>>> >>>>>>>>> Cheers, Fabian
>>> >>>>>>>>>
>>> >>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>> >>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>>> sjwiesman@gmail.com <ma...@gmail.com>>
>>> >>>>>>>>>> :
>>> >>>>>>>>>
>>> >>>>>>>>>> Hi Leondard,
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>>> >> really
>>> >>>>>>>> good
>>> >>>>>>>>>> and the semantics make sense to me.
>>> >>>>>>>>>>
>>> >>>>>>>>>> +1
>>> >>>>>>>>>>
>>> >>>>>>>>>> Seth
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>> xbjtdcq@gmail.com <ma...@gmail.com>
>>> >> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> Hi, Konstantin
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>> >> joined
>>> >>>>>>>> on
>>> >>>>>>>>>> the
>>> >>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>> >>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>> >> VIEW
>>> >>>>>>>>>>> definition that defines
>>> >>>>>>>>>>>> whether a event-time or processing time temporal table join
>>> is
>>> >>>>>>>> used?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> I think event-time or processing-time temporal table join
>>> >> depends on
>>> >>>>>>>>> fact
>>> >>>>>>>>>>> table’s time attribute in temporal join rather than from
>>> temporal
>>> >>>>>>>> table
>>> >>>>>>>>>>> side, the event-time or processing time in temporal table is
>>> just
>>> >>>>>>>> used
>>> >>>>>>>>> to
>>> >>>>>>>>>>> split the validity period of versioned snapshot of temporal
>>> >> table.
>>> >>>>>>>> The
>>> >>>>>>>>>>> processing time attribute is not  necessary for temporal
>>> table
>>> >>>>>>>> without
>>> >>>>>>>>>>> version, only the primary key is required, the following
>>> VIEW is
>>> >> also
>>> >>>>>>>>>> valid
>>> >>>>>>>>>>> for temporal table without version.
>>> >>>>>>>>>>> CREATE VIEW latest_rates AS
>>> >>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>> >> latest
>>> >>>>>>>>>>> version
>>> >>>>>>>>>>> FROM rates
>>> >>>>>>>>>>> GROUP BY currency;                           -- inferred
>>> primary
>>> >> key
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>> >> versioned
>>> >>>>>>>> on
>>> >>>>>>>>>>>> operation_time regardless of the lookup table attribute
>>> >> (event-time
>>> >>>>>>>>> or
>>> >>>>>>>>>>>> processing time attribute), correct?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>> using the
>>> >>>>>>>>> o.time
>>> >>>>>>>>>>> value to lookup the version of the temporal table.
>>> >>>>>>>>>>> For fact table has the processing time attribute, it means
>>> only
>>> >>>>>>>> lookup
>>> >>>>>>>>>> the
>>> >>>>>>>>>>> latest version of temporal table and we can do some
>>> optimization
>>> >> in
>>> >>>>>>>>>>> implementation like only keep the latest version.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Best
>>> >>>>>>>>>>> Leonard
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >>
>>> >
>>> > --
>>> > Best regards!
>>> > Rui Li
>>>
>>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Fabian Hueske <fh...@gmail.com>.
Thanks for the updated FLIP Leonard!
In my opinion this was an improvement.
So +1 for this design.

I have just one remark regarding the terminology.
I find the term "Temporal Table without Version" somewhat confusing.
IMO, versions are the core principle of temporal tables and temporal tables
without versions don't make much sense to me.

What makes such a table a "Temporal" table? Isn't it just a regular table?
If I understand the proposal correctly, "Temporal Tables without Version"
can only be used in processing time temporal table joins, because this join
only requests the current version.
But all regular tables can be used in processing time (temporal) table
joins as well.
It's basically the same as a lookup join, with the only difference that the
table is maintained in Flink and not accessed in an external system (for
example via JDBC).

Are "Temporal Tables without Version" called "Temporal" because they can be
used in "processing time temporal table joins" and due to its name this
join needs to join something that's called "Temporal"?
In that case, we might want to rename "processing time temporal table
joins" into something else that does not imply a versioning.
Maybe we can call them just lookup joins to avoid introducing another term?

Thanks, Fabian

Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <li...@gmail.com>:

> Thanks Leonard for the clarifications!
>
> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:
>
>>
>> > But are we still able to track different views of such a
>> > table through time, as rows are added/deleted to/from the table?
>>
>> Yes, in fact we support temporal table from changlog which contains all
>> possible message types(INSERT/UPDATE/DELETE).
>>
>> > For
>> > example, suppose I have an append-only table source with event-time and
>> PK,
>> > will I be allowed to do an event-time temporal join with this table?
>> Yes, I list some examples in the doc, the example versioned_rates3  is
>> this case exactly.
>>
>> Best
>> Leonard
>>
>>
>> >
>> > On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
>> xbjtdcq@gmail.com>> wrote:
>> >
>> >> Hi, all
>> >>
>> >> After a detailed offline discussion about the temporal table related
>> >> concept and behavior, we had a reliable solution and rejected several
>> >> alternatives.
>> >> Compared to rejected alternatives, the proposed approach is a more
>> unified
>> >> story and also friendly to user and current Flink framework.
>> >> I improved the FLIP[1] with the proposed approach and refactored the
>> >> document organization to make it clear enough.
>> >>
>> >> Please let me know if you have any concerns, I’m looking forward your
>> >> comments.
>> >>
>> >>
>> >> Best
>> >> Leonard
>> >>
>> >> [1]
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> >
>> >>>
>> >>
>> >>
>> >>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
>> xbjtdcq@gmail.com>> 写道:
>> >>>
>> >>> Hi, all
>> >>>
>> >>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>> >>>
>> >>> Best
>> >>> Leonard
>> >>> [1]
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> >
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> >
>> >>>
>> >>>
>> >>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
>> xbjtdcq@gmail.com> <mailto:
>> >> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>> >>>>
>> >>>> Hi, Timo
>> >>>>
>> >>>> Thanks for you response.
>> >>>>
>> >>>>> 1) Naming: Is operation time a good term for this concept? If I read
>> >> "The operation time is the time when the changes happened in system."
>> or
>> >> "The system time of DML execution in database", why don't we call it
>> >> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>> time in
>> >> Flink should be thought through.
>> >>>>
>> >>>> I agree that we should thought through. I have considered the name
>> >> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>> the
>> >> name.
>> >>>>
>> >>>> I proposed `operationTime` because most changelog comes from Database
>> >> and we always called an action as `operation` rather than `change` in
>> >> Database, the operation time is  easier to understand  for database
>> users,
>> >> but it's more like a database terminology.
>> >>>>
>> >>>> For `SystemTime`, user may confuse which one does the system in
>> >> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not
>> a
>> >> good name.
>> >>>>
>> >>>> `ChangelogTime` is a pretty choice which is more unified with existed
>> >> terminology `Changelog` and `ChangelogMode`, so let me use
>> `ChangelogTime`
>> >> and I’ll update the FLIP.
>> >>>>
>> >>>>
>> >>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>> >> expose the concept of time through the user-level `Row` type? The FLIP
>> does
>> >> not mention this explictly. I think we can keep it as an internal
>> concept
>> >> but I just wanted to ask for clarification.
>> >>>>
>> >>>> Yes, I want to keep it as an internal concept, we have discussed that
>> >> changelog time concept should be the third time concept(the other two
>> are
>> >> event-time and processing-time). It’s not easy for normal users(or to
>> help
>> >> normal users) understand the three concepts accurately, and I did not
>> find
>> >> a big enough scenario that user need to touch the changelog time for
>> now,
>> >> so I tend to do not expose the concept to users.
>> >>>>
>> >>>>
>> >>>> Best,
>> >>>> Leonard
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> On 04.08.20 04:58, Leonard Xu wrote:
>> >>>>>> Thanks Konstantin,
>> >>>>>> Regarding your questions, hope my comments has address your
>> questions
>> >> and I also add a few explanation in the FLIP.
>> >>>>>> Thank you all for the feedback,
>> >>>>>> It seems everyone involved  in this thread has reached a consensus.
>> >>>>>> I will start a vote thread  later.
>> >>>>>> Best,
>> >>>>>> Leonard
>> >>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
>> godfreyhe@gmail.com> <mailto:
>> >> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>> >>>>>>>
>> >>>>>>> Thanks Lennard for driving this FLIP.
>> >>>>>>> Looks good to me.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Godfrey
>> >>>>>>>
>> >>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
>> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>> >> 下午12:04写道:
>> >>>>>>>
>> >>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
>> shape.
>> >>>>>>>> +1 to start a vote.
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Jark
>> >>>>>>>>
>> >>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
>> <ma...@gmail.com>
>> >> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi Leonard,
>> >>>>>>>>>
>> >>>>>>>>> Thanks for this FLIP!
>> >>>>>>>>> Looks good from my side.
>> >>>>>>>>>
>> >>>>>>>>> Cheers, Fabian
>> >>>>>>>>>
>> >>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>> >>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
>> sjwiesman@gmail.com <ma...@gmail.com>>
>> >>>>>>>>>> :
>> >>>>>>>>>
>> >>>>>>>>>> Hi Leondard,
>> >>>>>>>>>>
>> >>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>> >> really
>> >>>>>>>> good
>> >>>>>>>>>> and the semantics make sense to me.
>> >>>>>>>>>>
>> >>>>>>>>>> +1
>> >>>>>>>>>>
>> >>>>>>>>>> Seth
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjtdcq@gmail.com
>> <ma...@gmail.com>
>> >> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Hi, Konstantin
>> >>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>> >> joined
>> >>>>>>>> on
>> >>>>>>>>>> the
>> >>>>>>>>>>>> PRIMARY KEY attribute, correct?
>> >>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>> >>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>> >> VIEW
>> >>>>>>>>>>> definition that defines
>> >>>>>>>>>>>> whether a event-time or processing time temporal table join
>> is
>> >>>>>>>> used?
>> >>>>>>>>>>>
>> >>>>>>>>>>> I think event-time or processing-time temporal table join
>> >> depends on
>> >>>>>>>>> fact
>> >>>>>>>>>>> table’s time attribute in temporal join rather than from
>> temporal
>> >>>>>>>> table
>> >>>>>>>>>>> side, the event-time or processing time in temporal table is
>> just
>> >>>>>>>> used
>> >>>>>>>>> to
>> >>>>>>>>>>> split the validity period of versioned snapshot of temporal
>> >> table.
>> >>>>>>>> The
>> >>>>>>>>>>> processing time attribute is not  necessary for temporal table
>> >>>>>>>> without
>> >>>>>>>>>>> version, only the primary key is required, the following VIEW
>> is
>> >> also
>> >>>>>>>>>> valid
>> >>>>>>>>>>> for temporal table without version.
>> >>>>>>>>>>> CREATE VIEW latest_rates AS
>> >>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>> >> latest
>> >>>>>>>>>>> version
>> >>>>>>>>>>> FROM rates
>> >>>>>>>>>>> GROUP BY currency;                           -- inferred
>> primary
>> >> key
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>> >> versioned
>> >>>>>>>> on
>> >>>>>>>>>>>> operation_time regardless of the lookup table attribute
>> >> (event-time
>> >>>>>>>>> or
>> >>>>>>>>>>>> processing time attribute), correct?
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using
>> the
>> >>>>>>>>> o.time
>> >>>>>>>>>>> value to lookup the version of the temporal table.
>> >>>>>>>>>>> For fact table has the processing time attribute, it means
>> only
>> >>>>>>>> lookup
>> >>>>>>>>>> the
>> >>>>>>>>>>> latest version of temporal table and we can do some
>> optimization
>> >> in
>> >>>>>>>>>>> implementation like only keep the latest version.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best
>> >>>>>>>>>>> Leonard
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>> > --
>> > Best regards!
>> > Rui Li
>>
>>
>
> --
> Best regards!
> Rui Li
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Rui Li <li...@gmail.com>.
Thanks Leonard for the clarifications!

On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xb...@gmail.com> wrote:

>
> > But are we still able to track different views of such a
> > table through time, as rows are added/deleted to/from the table?
>
> Yes, in fact we support temporal table from changlog which contains all
> possible message types(INSERT/UPDATE/DELETE).
>
> > For
> > example, suppose I have an append-only table source with event-time and
> PK,
> > will I be allowed to do an event-time temporal join with this table?
> Yes, I list some examples in the doc, the example versioned_rates3  is
> this case exactly.
>
> Best
> Leonard
>
>
> >
> > On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <mailto:
> xbjtdcq@gmail.com>> wrote:
> >
> >> Hi, all
> >>
> >> After a detailed offline discussion about the temporal table related
> >> concept and behavior, we had a reliable solution and rejected several
> >> alternatives.
> >> Compared to rejected alternatives, the proposed approach is a more
> unified
> >> story and also friendly to user and current Flink framework.
> >> I improved the FLIP[1] with the proposed approach and refactored the
> >> document organization to make it clear enough.
> >>
> >> Please let me know if you have any concerns, I’m looking forward your
> >> comments.
> >>
> >>
> >> Best
> >> Leonard
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >
> >>>
> >>
> >>
> >>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <mailto:
> xbjtdcq@gmail.com>> 写道:
> >>>
> >>> Hi, all
> >>>
> >>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
> >>>
> >>> Best
> >>> Leonard
> >>> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >
> >>>
> >>>
> >>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
> xbjtdcq@gmail.com> <mailto:
> >> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
> >>>>
> >>>> Hi, Timo
> >>>>
> >>>> Thanks for you response.
> >>>>
> >>>>> 1) Naming: Is operation time a good term for this concept? If I read
> >> "The operation time is the time when the changes happened in system." or
> >> "The system time of DML execution in database", why don't we call it
> >> `ChangelogTime` or `SystemTime`? Introducing another terminology of
> time in
> >> Flink should be thought through.
> >>>>
> >>>> I agree that we should thought through. I have considered the name
> >> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the
> >> name.
> >>>>
> >>>> I proposed `operationTime` because most changelog comes from Database
> >> and we always called an action as `operation` rather than `change` in
> >> Database, the operation time is  easier to understand  for database
> users,
> >> but it's more like a database terminology.
> >>>>
> >>>> For `SystemTime`, user may confuse which one does the system in
> >> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a
> >> good name.
> >>>>
> >>>> `ChangelogTime` is a pretty choice which is more unified with existed
> >> terminology `Changelog` and `ChangelogMode`, so let me use
> `ChangelogTime`
> >> and I’ll update the FLIP.
> >>>>
> >>>>
> >>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
> >> expose the concept of time through the user-level `Row` type? The FLIP
> does
> >> not mention this explictly. I think we can keep it as an internal
> concept
> >> but I just wanted to ask for clarification.
> >>>>
> >>>> Yes, I want to keep it as an internal concept, we have discussed that
> >> changelog time concept should be the third time concept(the other two
> are
> >> event-time and processing-time). It’s not easy for normal users(or to
> help
> >> normal users) understand the three concepts accurately, and I did not
> find
> >> a big enough scenario that user need to touch the changelog time for
> now,
> >> so I tend to do not expose the concept to users.
> >>>>
> >>>>
> >>>> Best,
> >>>> Leonard
> >>>>
> >>>>
> >>>>>
> >>>>> On 04.08.20 04:58, Leonard Xu wrote:
> >>>>>> Thanks Konstantin,
> >>>>>> Regarding your questions, hope my comments has address your
> questions
> >> and I also add a few explanation in the FLIP.
> >>>>>> Thank you all for the feedback,
> >>>>>> It seems everyone involved  in this thread has reached a consensus.
> >>>>>> I will start a vote thread  later.
> >>>>>> Best,
> >>>>>> Leonard
> >>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
> godfreyhe@gmail.com> <mailto:
> >> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
> >>>>>>>
> >>>>>>> Thanks Lennard for driving this FLIP.
> >>>>>>> Looks good to me.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Godfrey
> >>>>>>>
> >>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:
> imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
> >> 下午12:04写道:
> >>>>>>>
> >>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good
> shape.
> >>>>>>>> +1 to start a vote.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
> <ma...@gmail.com>
> >> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Leonard,
> >>>>>>>>>
> >>>>>>>>> Thanks for this FLIP!
> >>>>>>>>> Looks good from my side.
> >>>>>>>>>
> >>>>>>>>> Cheers, Fabian
> >>>>>>>>>
> >>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
> >>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:
> sjwiesman@gmail.com <ma...@gmail.com>>
> >>>>>>>>>> :
> >>>>>>>>>
> >>>>>>>>>> Hi Leondard,
> >>>>>>>>>>
> >>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
> >> really
> >>>>>>>> good
> >>>>>>>>>> and the semantics make sense to me.
> >>>>>>>>>>
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> Seth
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjtdcq@gmail.com
> <ma...@gmail.com>
> >> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi, Konstantin
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
> >> joined
> >>>>>>>> on
> >>>>>>>>>> the
> >>>>>>>>>>>> PRIMARY KEY attribute, correct?
> >>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
> >> VIEW
> >>>>>>>>>>> definition that defines
> >>>>>>>>>>>> whether a event-time or processing time temporal table join is
> >>>>>>>> used?
> >>>>>>>>>>>
> >>>>>>>>>>> I think event-time or processing-time temporal table join
> >> depends on
> >>>>>>>>> fact
> >>>>>>>>>>> table’s time attribute in temporal join rather than from
> temporal
> >>>>>>>> table
> >>>>>>>>>>> side, the event-time or processing time in temporal table is
> just
> >>>>>>>> used
> >>>>>>>>> to
> >>>>>>>>>>> split the validity period of versioned snapshot of temporal
> >> table.
> >>>>>>>> The
> >>>>>>>>>>> processing time attribute is not  necessary for temporal table
> >>>>>>>> without
> >>>>>>>>>>> version, only the primary key is required, the following VIEW
> is
> >> also
> >>>>>>>>>> valid
> >>>>>>>>>>> for temporal table without version.
> >>>>>>>>>>> CREATE VIEW latest_rates AS
> >>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
> >> latest
> >>>>>>>>>>> version
> >>>>>>>>>>> FROM rates
> >>>>>>>>>>> GROUP BY currency;                           -- inferred
> primary
> >> key
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
> >> versioned
> >>>>>>>> on
> >>>>>>>>>>>> operation_time regardless of the lookup table attribute
> >> (event-time
> >>>>>>>>> or
> >>>>>>>>>>>> processing time attribute), correct?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using
> the
> >>>>>>>>> o.time
> >>>>>>>>>>> value to lookup the version of the temporal table.
> >>>>>>>>>>> For fact table has the processing time attribute, it means only
> >>>>>>>> lookup
> >>>>>>>>>> the
> >>>>>>>>>>> latest version of temporal table and we can do some
> optimization
> >> in
> >>>>>>>>>>> implementation like only keep the latest version.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best
> >>>>>>>>>>> Leonard
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
> > --
> > Best regards!
> > Rui Li
>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
> But are we still able to track different views of such a
> table through time, as rows are added/deleted to/from the table?

Yes, in fact we support temporal table from changlog which contains all possible message types(INSERT/UPDATE/DELETE).

> For
> example, suppose I have an append-only table source with event-time and PK,
> will I be allowed to do an event-time temporal join with this table?
Yes, I list some examples in the doc, the example versioned_rates3  is this case exactly.

Best
Leonard


> 
> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
> 
>> Hi, all
>> 
>> After a detailed offline discussion about the temporal table related
>> concept and behavior, we had a reliable solution and rejected several
>> alternatives.
>> Compared to rejected alternatives, the proposed approach is a more unified
>> story and also friendly to user and current Flink framework.
>> I improved the FLIP[1] with the proposed approach and refactored the
>> document organization to make it clear enough.
>> 
>> Please let me know if you have any concerns, I’m looking forward your
>> comments.
>> 
>> 
>> Best
>> Leonard
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>> 
>> 
>> 
>>> 在 2020年8月4日,21:25,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
>>> 
>>> Hi, all
>>> 
>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>> 
>>> Best
>>> Leonard
>>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
>>> 
>>> 
>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com> <mailto:
>> xbjtdcq@gmail.com <ma...@gmail.com>>> 写道:
>>>> 
>>>> Hi, Timo
>>>> 
>>>> Thanks for you response.
>>>> 
>>>>> 1) Naming: Is operation time a good term for this concept? If I read
>> "The operation time is the time when the changes happened in system." or
>> "The system time of DML execution in database", why don't we call it
>> `ChangelogTime` or `SystemTime`? Introducing another terminology of time in
>> Flink should be thought through.
>>>> 
>>>> I agree that we should thought through. I have considered the name
>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the
>> name.
>>>> 
>>>> I proposed `operationTime` because most changelog comes from Database
>> and we always called an action as `operation` rather than `change` in
>> Database, the operation time is  easier to understand  for database users,
>> but it's more like a database terminology.
>>>> 
>>>> For `SystemTime`, user may confuse which one does the system in
>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a
>> good name.
>>>> 
>>>> `ChangelogTime` is a pretty choice which is more unified with existed
>> terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime`
>> and I’ll update the FLIP.
>>>> 
>>>> 
>>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>> expose the concept of time through the user-level `Row` type? The FLIP does
>> not mention this explictly. I think we can keep it as an internal concept
>> but I just wanted to ask for clarification.
>>>> 
>>>> Yes, I want to keep it as an internal concept, we have discussed that
>> changelog time concept should be the third time concept(the other two are
>> event-time and processing-time). It’s not easy for normal users(or to help
>> normal users) understand the three concepts accurately, and I did not find
>> a big enough scenario that user need to touch the changelog time for now,
>> so I tend to do not expose the concept to users.
>>>> 
>>>> 
>>>> Best,
>>>> Leonard
>>>> 
>>>> 
>>>>> 
>>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>>> Thanks Konstantin,
>>>>>> Regarding your questions, hope my comments has address your questions
>> and I also add a few explanation in the FLIP.
>>>>>> Thank you all for the feedback,
>>>>>> It seems everyone involved  in this thread has reached a consensus.
>>>>>> I will start a vote thread  later.
>>>>>> Best,
>>>>>> Leonard
>>>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <ma...@gmail.com> <mailto:
>> godfreyhe@gmail.com <ma...@gmail.com>>> 写道:
>>>>>>> 
>>>>>>> Thanks Lennard for driving this FLIP.
>>>>>>> Looks good to me.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Godfrey
>>>>>>> 
>>>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com> <mailto:imjark@gmail.com <ma...@gmail.com>>> 于2020年8月3日周一
>> 下午12:04写道:
>>>>>>> 
>>>>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>>>>>>> +1 to start a vote.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>> 
>>>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>
>> <mailto:fhueske@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Leonard,
>>>>>>>>> 
>>>>>>>>> Thanks for this FLIP!
>>>>>>>>> Looks good from my side.
>>>>>>>>> 
>>>>>>>>> Cheers, Fabian
>>>>>>>>> 
>>>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>>>> sjwiesman@gmail.com <ma...@gmail.com> <mailto:sjwiesman@gmail.com <ma...@gmail.com>>
>>>>>>>>>> :
>>>>>>>>> 
>>>>>>>>>> Hi Leondard,
>>>>>>>>>> 
>>>>>>>>>> Thank you for pushing this, I think the updated syntax looks
>> really
>>>>>>>> good
>>>>>>>>>> and the semantics make sense to me.
>>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> 
>>>>>>>>>> Seth
>>>>>>>>>> 
>>>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>
>> <mailto:xbjtdcq@gmail.com <ma...@gmail.com>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi, Konstantin
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>> joined
>>>>>>>> on
>>>>>>>>>> the
>>>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>> VIEW
>>>>>>>>>>> definition that defines
>>>>>>>>>>>> whether a event-time or processing time temporal table join is
>>>>>>>> used?
>>>>>>>>>>> 
>>>>>>>>>>> I think event-time or processing-time temporal table join
>> depends on
>>>>>>>>> fact
>>>>>>>>>>> table’s time attribute in temporal join rather than from temporal
>>>>>>>> table
>>>>>>>>>>> side, the event-time or processing time in temporal table is just
>>>>>>>> used
>>>>>>>>> to
>>>>>>>>>>> split the validity period of versioned snapshot of temporal
>> table.
>>>>>>>> The
>>>>>>>>>>> processing time attribute is not  necessary for temporal table
>>>>>>>> without
>>>>>>>>>>> version, only the primary key is required, the following VIEW is
>> also
>>>>>>>>>> valid
>>>>>>>>>>> for temporal table without version.
>>>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>> latest
>>>>>>>>>>> version
>>>>>>>>>>> FROM rates
>>>>>>>>>>> GROUP BY currency;                           -- inferred primary
>> key
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
>> versioned
>>>>>>>> on
>>>>>>>>>>>> operation_time regardless of the lookup table attribute
>> (event-time
>>>>>>>>> or
>>>>>>>>>>>> processing time attribute), correct?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>>>>>>> o.time
>>>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>>>> For fact table has the processing time attribute, it means only
>>>>>>>> lookup
>>>>>>>>>> the
>>>>>>>>>>> latest version of temporal table and we can do some optimization
>> in
>>>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best
>>>>>>>>>>> Leonard
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
> 
> -- 
> Best regards!
> Rui Li


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Rui Li <li...@gmail.com>.
Hey Leonard,

Thanks for summarizing the document. I have one quick question. I
understand a temporal table w/o version means each row in the table only
has one version. But are we still able to track different views of such a
table through time, as rows are added/deleted to/from the table? For
example, suppose I have an append-only table source with event-time and PK,
will I be allowed to do an event-time temporal join with this table?

On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xb...@gmail.com> wrote:

> Hi, all
>
> After a detailed offline discussion about the temporal table related
> concept and behavior, we had a reliable solution and rejected several
> alternatives.
> Compared to rejected alternatives, the proposed approach is a more unified
> story and also friendly to user and current Flink framework.
> I improved the FLIP[1] with the proposed approach and refactored the
> document organization to make it clear enough.
>
> Please let me know if you have any concerns, I’m looking forward your
> comments.
>
>
> Best
> Leonard
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >
>
>
> > 在 2020年8月4日,21:25,Leonard Xu <xb...@gmail.com> 写道:
> >
> > Hi, all
> >
> > I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
> >
> > Best
> > Leonard
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >
> >
> >> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <mailto:
> xbjtdcq@gmail.com>> 写道:
> >>
> >> Hi, Timo
> >>
> >> Thanks for you response.
> >>
> >>> 1) Naming: Is operation time a good term for this concept? If I read
> "The operation time is the time when the changes happened in system." or
> "The system time of DML execution in database", why don't we call it
> `ChangelogTime` or `SystemTime`? Introducing another terminology of time in
> Flink should be thought through.
> >>
> >> I agree that we should thought through. I have considered the name
> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the
> name.
> >>
> >> I proposed `operationTime` because most changelog comes from Database
> and we always called an action as `operation` rather than `change` in
> Database, the operation time is  easier to understand  for database users,
> but it's more like a database terminology.
> >>
> >> For `SystemTime`, user may confuse which one does the system in
> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a
> good name.
> >>
> >> `ChangelogTime` is a pretty choice which is more unified with existed
> terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime`
> and I’ll update the FLIP.
> >>
> >>
> >>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
> expose the concept of time through the user-level `Row` type? The FLIP does
> not mention this explictly. I think we can keep it as an internal concept
> but I just wanted to ask for clarification.
> >>
> >> Yes, I want to keep it as an internal concept, we have discussed that
> changelog time concept should be the third time concept(the other two are
> event-time and processing-time). It’s not easy for normal users(or to help
> normal users) understand the three concepts accurately, and I did not find
> a big enough scenario that user need to touch the changelog time for now,
> so I tend to do not expose the concept to users.
> >>
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >>>
> >>> On 04.08.20 04:58, Leonard Xu wrote:
> >>>> Thanks Konstantin,
> >>>> Regarding your questions, hope my comments has address your questions
> and I also add a few explanation in the FLIP.
> >>>> Thank you all for the feedback,
> >>>> It seems everyone involved  in this thread has reached a consensus.
> >>>> I will start a vote thread  later.
> >>>> Best,
> >>>> Leonard
> >>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <mailto:
> godfreyhe@gmail.com>> 写道:
> >>>>>
> >>>>> Thanks Lennard for driving this FLIP.
> >>>>> Looks good to me.
> >>>>>
> >>>>> Best,
> >>>>> Godfrey
> >>>>>
> >>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com>> 于2020年8月3日周一
> 下午12:04写道:
> >>>>>
> >>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
> >>>>>> +1 to start a vote.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com
> <ma...@gmail.com>> wrote:
> >>>>>>
> >>>>>>> Hi Leonard,
> >>>>>>>
> >>>>>>> Thanks for this FLIP!
> >>>>>>> Looks good from my side.
> >>>>>>>
> >>>>>>> Cheers, Fabian
> >>>>>>>
> >>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
> >>>>>>> sjwiesman@gmail.com <ma...@gmail.com>
> >>>>>>>> :
> >>>>>>>
> >>>>>>>> Hi Leondard,
> >>>>>>>>
> >>>>>>>> Thank you for pushing this, I think the updated syntax looks
> really
> >>>>>> good
> >>>>>>>> and the semantics make sense to me.
> >>>>>>>>
> >>>>>>>> +1
> >>>>>>>>
> >>>>>>>> Seth
> >>>>>>>>
> >>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjtdcq@gmail.com
> <ma...@gmail.com>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi, Konstantin
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
> joined
> >>>>>> on
> >>>>>>>> the
> >>>>>>>>>> PRIMARY KEY attribute, correct?
> >>>>>>>>> Yes, the PRIMARY KEY would be join key.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
> VIEW
> >>>>>>>>> definition that defines
> >>>>>>>>>> whether a event-time or processing time temporal table join is
> >>>>>> used?
> >>>>>>>>>
> >>>>>>>>> I think event-time or processing-time temporal table join
> depends on
> >>>>>>> fact
> >>>>>>>>> table’s time attribute in temporal join rather than from temporal
> >>>>>> table
> >>>>>>>>> side, the event-time or processing time in temporal table is just
> >>>>>> used
> >>>>>>> to
> >>>>>>>>> split the validity period of versioned snapshot of temporal
> table.
> >>>>>> The
> >>>>>>>>> processing time attribute is not  necessary for temporal table
> >>>>>> without
> >>>>>>>>> version, only the primary key is required, the following VIEW is
> also
> >>>>>>>> valid
> >>>>>>>>> for temporal table without version.
> >>>>>>>>> CREATE VIEW latest_rates AS
> >>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
> latest
> >>>>>>>>> version
> >>>>>>>>> FROM rates
> >>>>>>>>> GROUP BY currency;                           -- inferred primary
> key
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always
> versioned
> >>>>>> on
> >>>>>>>>>> operation_time regardless of the lookup table attribute
> (event-time
> >>>>>>> or
> >>>>>>>>>> processing time attribute), correct?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
> >>>>>>> o.time
> >>>>>>>>> value to lookup the version of the temporal table.
> >>>>>>>>> For fact table has the processing time attribute, it means only
> >>>>>> lookup
> >>>>>>>> the
> >>>>>>>>> latest version of temporal table and we can do some optimization
> in
> >>>>>>>>> implementation like only keep the latest version.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Best
> >>>>>>>>> Leonard
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> >
>
>

-- 
Best regards!
Rui Li

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Hi, all

After a detailed offline discussion about the temporal table related concept and behavior, we had a reliable solution and rejected several alternatives.
Compared to rejected alternatives, the proposed approach is a more unified story and also friendly to user and current Flink framework.
I improved the FLIP[1] with the proposed approach and refactored the document organization to make it clear enough.

Please let me know if you have any concerns, I’m looking forward your comments.


Best
Leonard

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>


> 在 2020年8月4日,21:25,Leonard Xu <xb...@gmail.com> 写道:
> 
> Hi, all
> 
> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
> 
> Best
> Leonard
> [1]  https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
> 
>> 在 2020年8月4日,20:58,Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> 写道:
>> 
>> Hi, Timo
>> 
>> Thanks for you response.
>> 
>>> 1) Naming: Is operation time a good term for this concept? If I read "The operation time is the time when the changes happened in system." or "The system time of DML execution in database", why don't we call it `ChangelogTime` or `SystemTime`? Introducing another terminology of time in Flink should be thought through.
>> 
>> I agree that we should thought through. I have considered the name `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the name.
>> 
>> I proposed `operationTime` because most changelog comes from Database and we always called an action as `operation` rather than `change` in Database, the operation time is  easier to understand  for database users, but it's more like a database terminology. 
>> 
>> For `SystemTime`, user may confuse which one does the system in `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a good name.
>> 
>> `ChangelogTime` is a pretty choice which is more unified with existed terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` and I’ll update the FLIP.
>> 
>> 
>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also expose the concept of time through the user-level `Row` type? The FLIP does not mention this explictly. I think we can keep it as an internal concept but I just wanted to ask for clarification.
>> 
>> Yes, I want to keep it as an internal concept, we have discussed that changelog time concept should be the third time concept(the other two are event-time and processing-time). It’s not easy for normal users(or to help normal users) understand the three concepts accurately, and I did not find a big enough scenario that user need to touch the changelog time for now, so I tend to do not expose the concept to users.
>> 
>> 
>> Best,
>> Leonard
>> 
>> 
>>> 
>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>> Thanks Konstantin,
>>>> Regarding your questions, hope my comments has address your questions and I also add a few explanation in the FLIP.
>>>> Thank you all for the feedback,
>>>> It seems everyone involved  in this thread has reached a consensus.
>>>> I will start a vote thread  later.
>>>> Best,
>>>> Leonard
>>>>> 在 2020年8月3日,19:35,godfrey he <godfreyhe@gmail.com <ma...@gmail.com>> 写道:
>>>>> 
>>>>> Thanks Lennard for driving this FLIP.
>>>>> Looks good to me.
>>>>> 
>>>>> Best,
>>>>> Godfrey
>>>>> 
>>>>> Jark Wu <imjark@gmail.com <ma...@gmail.com>> 于2020年8月3日周一 下午12:04写道:
>>>>> 
>>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>>>>> +1 to start a vote.
>>>>>> 
>>>>>> Best,
>>>>>> Jark
>>>>>> 
>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhueske@gmail.com <ma...@gmail.com>> wrote:
>>>>>> 
>>>>>>> Hi Leonard,
>>>>>>> 
>>>>>>> Thanks for this FLIP!
>>>>>>> Looks good from my side.
>>>>>>> 
>>>>>>> Cheers, Fabian
>>>>>>> 
>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>> sjwiesman@gmail.com <ma...@gmail.com>
>>>>>>>> :
>>>>>>> 
>>>>>>>> Hi Leondard,
>>>>>>>> 
>>>>>>>> Thank you for pushing this, I think the updated syntax looks really
>>>>>> good
>>>>>>>> and the semantics make sense to me.
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Seth
>>>>>>>> 
>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjtdcq@gmail.com <ma...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>>> Hi, Konstantin
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>>>>>> on
>>>>>>>> the
>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>>>>>> definition that defines
>>>>>>>>>> whether a event-time or processing time temporal table join is
>>>>>> used?
>>>>>>>>> 
>>>>>>>>> I think event-time or processing-time temporal table join depends on
>>>>>>> fact
>>>>>>>>> table’s time attribute in temporal join rather than from temporal
>>>>>> table
>>>>>>>>> side, the event-time or processing time in temporal table is just
>>>>>> used
>>>>>>> to
>>>>>>>>> split the validity period of versioned snapshot of temporal table.
>>>>>> The
>>>>>>>>> processing time attribute is not  necessary for temporal table
>>>>>> without
>>>>>>>>> version, only the primary key is required, the following VIEW is also
>>>>>>>> valid
>>>>>>>>> for temporal table without version.
>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>>>>>> version
>>>>>>>>> FROM rates
>>>>>>>>> GROUP BY currency;                           -- inferred primary key
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>>>>>> on
>>>>>>>>>> operation_time regardless of the lookup table attribute (event-time
>>>>>>> or
>>>>>>>>>> processing time attribute), correct?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>>>>> o.time
>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>> For fact table has the processing time attribute, it means only
>>>>>> lookup
>>>>>>>> the
>>>>>>>>> latest version of temporal table and we can do some optimization in
>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> Leonard
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>> 
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Hi, all

I’ve updated the FLIP[1] with the terminology `ChangelogTime`.

Best
Leonard
[1]  https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>

> 在 2020年8月4日,20:58,Leonard Xu <xb...@gmail.com> 写道:
> 
> Hi, Timo
> 
> Thanks for you response.
> 
>> 1) Naming: Is operation time a good term for this concept? If I read "The operation time is the time when the changes happened in system." or "The system time of DML execution in database", why don't we call it `ChangelogTime` or `SystemTime`? Introducing another terminology of time in Flink should be thought through.
> 
> I agree that we should thought through. I have considered the name `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the name.
> 
> I proposed `operationTime` because most changelog comes from Database and we always called an action as `operation` rather than `change` in Database, the operation time is  easier to understand  for database users, but it's more like a database terminology. 
> 
> For `SystemTime`, user may confuse which one does the system in `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a good name.
> 
> `ChangelogTime` is a pretty choice which is more unified with existed terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` and I’ll update the FLIP.
> 
> 
>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also expose the concept of time through the user-level `Row` type? The FLIP does not mention this explictly. I think we can keep it as an internal concept but I just wanted to ask for clarification.
> 
> Yes, I want to keep it as an internal concept, we have discussed that changelog time concept should be the third time concept(the other two are event-time and processing-time). It’s not easy for normal users(or to help normal users) understand the three concepts accurately, and I did not find a big enough scenario that user need to touch the changelog time for now, so I tend to do not expose the concept to users.
> 
> 
> Best,
> Leonard
> 
> 
>> 
>> On 04.08.20 04:58, Leonard Xu wrote:
>>> Thanks Konstantin,
>>> Regarding your questions, hope my comments has address your questions and I also add a few explanation in the FLIP.
>>> Thank you all for the feedback,
>>> It seems everyone involved  in this thread has reached a consensus.
>>> I will start a vote thread  later.
>>> Best,
>>> Leonard
>>>> 在 2020年8月3日,19:35,godfrey he <go...@gmail.com> 写道:
>>>> 
>>>> Thanks Lennard for driving this FLIP.
>>>> Looks good to me.
>>>> 
>>>> Best,
>>>> Godfrey
>>>> 
>>>> Jark Wu <im...@gmail.com> 于2020年8月3日周一 下午12:04写道:
>>>> 
>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>>>> +1 to start a vote.
>>>>> 
>>>>> Best,
>>>>> Jark
>>>>> 
>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:
>>>>> 
>>>>>> Hi Leonard,
>>>>>> 
>>>>>> Thanks for this FLIP!
>>>>>> Looks good from my side.
>>>>>> 
>>>>>> Cheers, Fabian
>>>>>> 
>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>> sjwiesman@gmail.com
>>>>>>> :
>>>>>> 
>>>>>>> Hi Leondard,
>>>>>>> 
>>>>>>> Thank you for pushing this, I think the updated syntax looks really
>>>>> good
>>>>>>> and the semantics make sense to me.
>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Seth
>>>>>>> 
>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi, Konstantin
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>>>>> on
>>>>>>> the
>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>>>>> definition that defines
>>>>>>>>> whether a event-time or processing time temporal table join is
>>>>> used?
>>>>>>>> 
>>>>>>>> I think event-time or processing-time temporal table join depends on
>>>>>> fact
>>>>>>>> table’s time attribute in temporal join rather than from temporal
>>>>> table
>>>>>>>> side, the event-time or processing time in temporal table is just
>>>>> used
>>>>>> to
>>>>>>>> split the validity period of versioned snapshot of temporal table.
>>>>> The
>>>>>>>> processing time attribute is not  necessary for temporal table
>>>>> without
>>>>>>>> version, only the primary key is required, the following VIEW is also
>>>>>>> valid
>>>>>>>> for temporal table without version.
>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>>>>> version
>>>>>>>> FROM rates
>>>>>>>> GROUP BY currency;                           -- inferred primary key
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>>>>> on
>>>>>>>>> operation_time regardless of the lookup table attribute (event-time
>>>>>> or
>>>>>>>>> processing time attribute), correct?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>>>> o.time
>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>> For fact table has the processing time attribute, it means only
>>>>> lookup
>>>>>>> the
>>>>>>>> latest version of temporal table and we can do some optimization in
>>>>>>>> implementation like only keep the latest version.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best
>>>>>>>> Leonard
>>>>>>> 
>>>>>> 
>>>>> 
>> 
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Hi, Timo

Thanks for you response.

> 1) Naming: Is operation time a good term for this concept? If I read "The operation time is the time when the changes happened in system." or "The system time of DML execution in database", why don't we call it `ChangelogTime` or `SystemTime`? Introducing another terminology of time in Flink should be thought through.

I agree that we should thought through. I have considered the name `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the name.

I proposed `operationTime` because most changelog comes from Database and we always called an action as `operation` rather than `change` in Database, the operation time is  easier to understand  for database users, but it's more like a database terminology. 

For `SystemTime`, user may confuse which one does the system in `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s not a good name.

`ChangelogTime` is a pretty choice which is more unified with existed terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` and I’ll update the FLIP.


> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also expose the concept of time through the user-level `Row` type? The FLIP does not mention this explictly. I think we can keep it as an internal concept but I just wanted to ask for clarification.

Yes, I want to keep it as an internal concept, we have discussed that changelog time concept should be the third time concept(the other two are event-time and processing-time). It’s not easy for normal users(or to help normal users) understand the three concepts accurately, and I did not find a big enough scenario that user need to touch the changelog time for now, so I tend to do not expose the concept to users.


Best,
Leonard


> 
> On 04.08.20 04:58, Leonard Xu wrote:
>> Thanks Konstantin,
>> Regarding your questions, hope my comments has address your questions and I also add a few explanation in the FLIP.
>> Thank you all for the feedback,
>> It seems everyone involved  in this thread has reached a consensus.
>> I will start a vote thread  later.
>> Best,
>> Leonard
>>> 在 2020年8月3日,19:35,godfrey he <go...@gmail.com> 写道:
>>> 
>>> Thanks Lennard for driving this FLIP.
>>> Looks good to me.
>>> 
>>> Best,
>>> Godfrey
>>> 
>>> Jark Wu <im...@gmail.com> 于2020年8月3日周一 下午12:04写道:
>>> 
>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>>> +1 to start a vote.
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:
>>>> 
>>>>> Hi Leonard,
>>>>> 
>>>>> Thanks for this FLIP!
>>>>> Looks good from my side.
>>>>> 
>>>>> Cheers, Fabian
>>>>> 
>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>> sjwiesman@gmail.com
>>>>>> :
>>>>> 
>>>>>> Hi Leondard,
>>>>>> 
>>>>>> Thank you for pushing this, I think the updated syntax looks really
>>>> good
>>>>>> and the semantics make sense to me.
>>>>>> 
>>>>>> +1
>>>>>> 
>>>>>> Seth
>>>>>> 
>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi, Konstantin
>>>>>>> 
>>>>>>>> 
>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>>>> on
>>>>>> the
>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>> 
>>>>>>>> 
>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>>>> definition that defines
>>>>>>>> whether a event-time or processing time temporal table join is
>>>> used?
>>>>>>> 
>>>>>>> I think event-time or processing-time temporal table join depends on
>>>>> fact
>>>>>>> table’s time attribute in temporal join rather than from temporal
>>>> table
>>>>>>> side, the event-time or processing time in temporal table is just
>>>> used
>>>>> to
>>>>>>> split the validity period of versioned snapshot of temporal table.
>>>> The
>>>>>>> processing time attribute is not  necessary for temporal table
>>>> without
>>>>>>> version, only the primary key is required, the following VIEW is also
>>>>>> valid
>>>>>>> for temporal table without version.
>>>>>>> CREATE VIEW latest_rates AS
>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>>>> version
>>>>>>> FROM rates
>>>>>>> GROUP BY currency;                           -- inferred primary key
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>>>> on
>>>>>>>> operation_time regardless of the lookup table attribute (event-time
>>>>> or
>>>>>>>> processing time attribute), correct?
>>>>>>> 
>>>>>>> 
>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>>> o.time
>>>>>>> value to lookup the version of the temporal table.
>>>>>>> For fact table has the processing time attribute, it means only
>>>> lookup
>>>>>> the
>>>>>>> latest version of temporal table and we can do some optimization in
>>>>>>> implementation like only keep the latest version.
>>>>>>> 
>>>>>>> 
>>>>>>> Best
>>>>>>> Leonard
>>>>>> 
>>>>> 
>>>> 
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Timo Walther <tw...@apache.org>.
Hi Leonard,

sorry for jumping into the discussion so late. But I have two questions:

1) Naming: Is operation time a good term for this concept? If I read 
"The operation time is the time when the changes happened in system." or 
"The system time of DML execution in database", why don't we call it 
`ChangelogTime` or `SystemTime`? Introducing another terminology of time 
in Flink should be thought through.

2) Exposing it through `org.apache.flink.types.Row`: Shall we also 
expose the concept of time through the user-level `Row` type? The FLIP 
does not mention this explictly. I think we can keep it as an internal 
concept but I just wanted to ask for clarification.

Thanks,
Timo


On 04.08.20 04:58, Leonard Xu wrote:
> Thanks Konstantin,
> 
> Regarding your questions, hope my comments has address your questions and I also add a few explanation in the FLIP.
> 
> Thank you all for the feedback,
> 
> It seems everyone involved  in this thread has reached a consensus.
> I will start a vote thread  later.
> 
> 
> Best,
> Leonard
> 
> 
>> 在 2020年8月3日,19:35,godfrey he <go...@gmail.com> 写道:
>>
>> Thanks Lennard for driving this FLIP.
>> Looks good to me.
>>
>> Best,
>> Godfrey
>>
>> Jark Wu <im...@gmail.com> 于2020年8月3日周一 下午12:04写道:
>>
>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>> +1 to start a vote.
>>>
>>> Best,
>>> Jark
>>>
>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:
>>>
>>>> Hi Leonard,
>>>>
>>>> Thanks for this FLIP!
>>>> Looks good from my side.
>>>>
>>>> Cheers, Fabian
>>>>
>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>> sjwiesman@gmail.com
>>>>> :
>>>>
>>>>> Hi Leondard,
>>>>>
>>>>> Thank you for pushing this, I think the updated syntax looks really
>>> good
>>>>> and the semantics make sense to me.
>>>>>
>>>>> +1
>>>>>
>>>>> Seth
>>>>>
>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
>>>>>
>>>>>> Hi, Konstantin
>>>>>>
>>>>>>>
>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>>> on
>>>>> the
>>>>>>> PRIMARY KEY attribute, correct?
>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>
>>>>>>>
>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>>> definition that defines
>>>>>>> whether a event-time or processing time temporal table join is
>>> used?
>>>>>>
>>>>>> I think event-time or processing-time temporal table join depends on
>>>> fact
>>>>>> table’s time attribute in temporal join rather than from temporal
>>> table
>>>>>> side, the event-time or processing time in temporal table is just
>>> used
>>>> to
>>>>>> split the validity period of versioned snapshot of temporal table.
>>> The
>>>>>> processing time attribute is not  necessary for temporal table
>>> without
>>>>>> version, only the primary key is required, the following VIEW is also
>>>>> valid
>>>>>> for temporal table without version.
>>>>>> CREATE VIEW latest_rates AS
>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>>> version
>>>>>> FROM rates
>>>>>> GROUP BY currency;                           -- inferred primary key
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>>> on
>>>>>>> operation_time regardless of the lookup table attribute (event-time
>>>> or
>>>>>>> processing time attribute), correct?
>>>>>>
>>>>>>
>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>> o.time
>>>>>> value to lookup the version of the temporal table.
>>>>>> For fact table has the processing time attribute, it means only
>>> lookup
>>>>> the
>>>>>> latest version of temporal table and we can do some optimization in
>>>>>> implementation like only keep the latest version.
>>>>>>
>>>>>>
>>>>>> Best
>>>>>> Leonard
>>>>>
>>>>
>>>
> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Thanks Konstantin, 

Regarding your questions, hope my comments has address your questions and I also add a few explanation in the FLIP.

Thank you all for the feedback,

It seems everyone involved  in this thread has reached a consensus.
I will start a vote thread  later.


Best,
Leonard


> 在 2020年8月3日,19:35,godfrey he <go...@gmail.com> 写道:
> 
> Thanks Lennard for driving this FLIP.
> Looks good to me.
> 
> Best,
> Godfrey
> 
> Jark Wu <im...@gmail.com> 于2020年8月3日周一 下午12:04写道:
> 
>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>> +1 to start a vote.
>> 
>> Best,
>> Jark
>> 
>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:
>> 
>>> Hi Leonard,
>>> 
>>> Thanks for this FLIP!
>>> Looks good from my side.
>>> 
>>> Cheers, Fabian
>>> 
>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>> sjwiesman@gmail.com
>>>> :
>>> 
>>>> Hi Leondard,
>>>> 
>>>> Thank you for pushing this, I think the updated syntax looks really
>> good
>>>> and the semantics make sense to me.
>>>> 
>>>> +1
>>>> 
>>>> Seth
>>>> 
>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
>>>> 
>>>>> Hi, Konstantin
>>>>> 
>>>>>> 
>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>> on
>>>> the
>>>>>> PRIMARY KEY attribute, correct?
>>>>> Yes, the PRIMARY KEY would be join key.
>>>>> 
>>>>>> 
>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>> definition that defines
>>>>>> whether a event-time or processing time temporal table join is
>> used?
>>>>> 
>>>>> I think event-time or processing-time temporal table join depends on
>>> fact
>>>>> table’s time attribute in temporal join rather than from temporal
>> table
>>>>> side, the event-time or processing time in temporal table is just
>> used
>>> to
>>>>> split the validity period of versioned snapshot of temporal table.
>> The
>>>>> processing time attribute is not  necessary for temporal table
>> without
>>>>> version, only the primary key is required, the following VIEW is also
>>>> valid
>>>>> for temporal table without version.
>>>>> CREATE VIEW latest_rates AS
>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>> version
>>>>> FROM rates
>>>>> GROUP BY currency;                           -- inferred primary key
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>> on
>>>>>> operation_time regardless of the lookup table attribute (event-time
>>> or
>>>>>> processing time attribute), correct?
>>>>> 
>>>>> 
>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>> o.time
>>>>> value to lookup the version of the temporal table.
>>>>> For fact table has the processing time attribute, it means only
>> lookup
>>>> the
>>>>> latest version of temporal table and we can do some optimization in
>>>>> implementation like only keep the latest version.
>>>>> 
>>>>> 
>>>>> Best
>>>>> Leonard
>>>> 
>>> 
>> 


Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by godfrey he <go...@gmail.com>.
Thanks Lennard for driving this FLIP.
Looks good to me.

Best,
Godfrey

Jark Wu <im...@gmail.com> 于2020年8月3日周一 下午12:04写道:

> Thanks Leonard for the great FLIP. I think it is in very good shape.
> +1 to start a vote.
>
> Best,
> Jark
>
> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi Leonard,
> >
> > Thanks for this FLIP!
> > Looks good from my side.
> >
> > Cheers, Fabian
> >
> > Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
> > sjwiesman@gmail.com
> > >:
> >
> > > Hi Leondard,
> > >
> > > Thank you for pushing this, I think the updated syntax looks really
> good
> > > and the semantics make sense to me.
> > >
> > > +1
> > >
> > > Seth
> > >
> > > On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
> > >
> > > > Hi, Konstantin
> > > >
> > > > >
> > > > > 1) A  "Versioned Temporal Table DDL on source" can only be joined
> on
> > > the
> > > > > PRIMARY KEY attribute, correct?
> > > > Yes, the PRIMARY KEY would be join key.
> > > >
> > > > >
> > > > > 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
> > > > definition that defines
> > > > > whether a event-time or processing time temporal table join is
> used?
> > > >
> > > > I think event-time or processing-time temporal table join depends on
> > fact
> > > > table’s time attribute in temporal join rather than from temporal
> table
> > > > side, the event-time or processing time in temporal table is just
> used
> > to
> > > > split the validity period of versioned snapshot of temporal table.
> The
> > > > processing time attribute is not  necessary for temporal table
> without
> > > > version, only the primary key is required, the following VIEW is also
> > > valid
> > > > for temporal table without version.
> > > > CREATE VIEW latest_rates AS
> > > > SELECT currency, LAST_VALUE(rate)            -- only keep the latest
> > > > version
> > > > FROM rates
> > > > GROUP BY currency;                           -- inferred primary key
> > > >
> > > >
> > > > >
> > > > > 3) A "Versioned Temporal Table DDL on source" is always versioned
> on
> > > > > operation_time regardless of the lookup table attribute (event-time
> > or
> > > > > processing time attribute), correct?
> > > >
> > > >
> > > > Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
> > o.time
> > > > value to lookup the version of the temporal table.
> > > > For fact table has the processing time attribute, it means only
> lookup
> > > the
> > > > latest version of temporal table and we can do some optimization in
> > > > implementation like only keep the latest version.
> > > >
> > > >
> > > > Best
> > > > Leonard
> > >
> >
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Jark Wu <im...@gmail.com>.
Thanks Leonard for the great FLIP. I think it is in very good shape.
+1 to start a vote.

Best,
Jark

On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Leonard,
>
> Thanks for this FLIP!
> Looks good from my side.
>
> Cheers, Fabian
>
> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
> sjwiesman@gmail.com
> >:
>
> > Hi Leondard,
> >
> > Thank you for pushing this, I think the updated syntax looks really good
> > and the semantics make sense to me.
> >
> > +1
> >
> > Seth
> >
> > On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
> >
> > > Hi, Konstantin
> > >
> > > >
> > > > 1) A  "Versioned Temporal Table DDL on source" can only be joined on
> > the
> > > > PRIMARY KEY attribute, correct?
> > > Yes, the PRIMARY KEY would be join key.
> > >
> > > >
> > > > 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
> > > definition that defines
> > > > whether a event-time or processing time temporal table join is used?
> > >
> > > I think event-time or processing-time temporal table join depends on
> fact
> > > table’s time attribute in temporal join rather than from temporal table
> > > side, the event-time or processing time in temporal table is just used
> to
> > > split the validity period of versioned snapshot of temporal table. The
> > > processing time attribute is not  necessary for temporal table without
> > > version, only the primary key is required, the following VIEW is also
> > valid
> > > for temporal table without version.
> > > CREATE VIEW latest_rates AS
> > > SELECT currency, LAST_VALUE(rate)            -- only keep the latest
> > > version
> > > FROM rates
> > > GROUP BY currency;                           -- inferred primary key
> > >
> > >
> > > >
> > > > 3) A "Versioned Temporal Table DDL on source" is always versioned on
> > > > operation_time regardless of the lookup table attribute (event-time
> or
> > > > processing time attribute), correct?
> > >
> > >
> > > Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
> o.time
> > > value to lookup the version of the temporal table.
> > > For fact table has the processing time attribute, it means only lookup
> > the
> > > latest version of temporal table and we can do some optimization in
> > > implementation like only keep the latest version.
> > >
> > >
> > > Best
> > > Leonard
> >
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Leonard,

Thanks for this FLIP!
Looks good from my side.

Cheers, Fabian

Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <sjwiesman@gmail.com
>:

> Hi Leondard,
>
> Thank you for pushing this, I think the updated syntax looks really good
> and the semantics make sense to me.
>
> +1
>
> Seth
>
> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:
>
> > Hi, Konstantin
> >
> > >
> > > 1) A  "Versioned Temporal Table DDL on source" can only be joined on
> the
> > > PRIMARY KEY attribute, correct?
> > Yes, the PRIMARY KEY would be join key.
> >
> > >
> > > 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
> > definition that defines
> > > whether a event-time or processing time temporal table join is used?
> >
> > I think event-time or processing-time temporal table join depends on fact
> > table’s time attribute in temporal join rather than from temporal table
> > side, the event-time or processing time in temporal table is just used to
> > split the validity period of versioned snapshot of temporal table. The
> > processing time attribute is not  necessary for temporal table without
> > version, only the primary key is required, the following VIEW is also
> valid
> > for temporal table without version.
> > CREATE VIEW latest_rates AS
> > SELECT currency, LAST_VALUE(rate)            -- only keep the latest
> > version
> > FROM rates
> > GROUP BY currency;                           -- inferred primary key
> >
> >
> > >
> > > 3) A "Versioned Temporal Table DDL on source" is always versioned on
> > > operation_time regardless of the lookup table attribute (event-time or
> > > processing time attribute), correct?
> >
> >
> > Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the o.time
> > value to lookup the version of the temporal table.
> > For fact table has the processing time attribute, it means only lookup
> the
> > latest version of temporal table and we can do some optimization in
> > implementation like only keep the latest version.
> >
> >
> > Best
> > Leonard
>

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Seth Wiesman <sj...@gmail.com>.
Hi Leondard,

Thank you for pushing this, I think the updated syntax looks really good
and the semantics make sense to me.

+1

Seth

On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xb...@gmail.com> wrote:

> Hi, Konstantin
>
> >
> > 1) A  "Versioned Temporal Table DDL on source" can only be joined on the
> > PRIMARY KEY attribute, correct?
> Yes, the PRIMARY KEY would be join key.
>
> >
> > 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
> definition that defines
> > whether a event-time or processing time temporal table join is used?
>
> I think event-time or processing-time temporal table join depends on fact
> table’s time attribute in temporal join rather than from temporal table
> side, the event-time or processing time in temporal table is just used to
> split the validity period of versioned snapshot of temporal table. The
> processing time attribute is not  necessary for temporal table without
> version, only the primary key is required, the following VIEW is also valid
> for temporal table without version.
> CREATE VIEW latest_rates AS
> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
> version
> FROM rates
> GROUP BY currency;                           -- inferred primary key
>
>
> >
> > 3) A "Versioned Temporal Table DDL on source" is always versioned on
> > operation_time regardless of the lookup table attribute (event-time or
> > processing time attribute), correct?
>
>
> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the o.time
> value to lookup the version of the temporal table.
> For fact table has the processing time attribute, it means only lookup the
> latest version of temporal table and we can do some optimization in
> implementation like only keep the latest version.
>
>
> Best
> Leonard

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Leonard Xu <xb...@gmail.com>.
Hi, Konstantin

> 
> 1) A  "Versioned Temporal Table DDL on source" can only be joined on the
> PRIMARY KEY attribute, correct?
Yes, the PRIMARY KEY would be join key.

> 
> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW definition that defines
> whether a event-time or processing time temporal table join is used?

I think event-time or processing-time temporal table join depends on fact table’s time attribute in temporal join rather than from temporal table side, the event-time or processing time in temporal table is just used to split the validity period of versioned snapshot of temporal table. The  processing time attribute is not  necessary for temporal table without version, only the primary key is required, the following VIEW is also valid for temporal table without version.
CREATE VIEW latest_rates AS
SELECT currency, LAST_VALUE(rate)            -- only keep the latest version
FROM rates 
GROUP BY currency;                           -- inferred primary key


> 
> 3) A "Versioned Temporal Table DDL on source" is always versioned on
> operation_time regardless of the lookup table attribute (event-time or
> processing time attribute), correct? 


Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the o.time value to lookup the version of the temporal table.
For fact table has the processing time attribute, it means only lookup the latest version of temporal table and we can do some optimization in implementation like only keep the latest version.


Best
Leonard

Re: [DISCUSS] FLIP-132: Temporal Table DDL

Posted by Konstantin Knauf <kn...@apache.org>.
Hi Leonard,

Thank you for preparing this FLIP. I have a few questions.

1) A  "Versioned Temporal Table DDL on source" can only be joined on the
PRIMARY KEY attribute, correct?

2) The "processing time temporal table join" example under "how to
correlate a temporal table" does not make sense to me. Isn't it the time
attribute in the ORDER BY clause of the VIEW definition that defines
whether a event-time or processing time temporal table join is used? Of
course, the time characteristic of the lookup time attribute should match
the time characteristics of the versioning attribute.

3) A "Versioned Temporal Table DDL on source" is always versioned on
operation_time regardless of the lookup table attribute (event-time or
processing time attribute), correct? Or does the lookup table attribute
define the versioning?

Thanks,

Konstantin


On Tue, Jul 28, 2020 at 9:56 AM Leonard Xu <xb...@gmail.com> wrote:

> Hello  everyone,
>
> Currently, user can correlate temporal table by temporal table join to
> enrich their fact table, a temporal table can be a changing history table
> which tracks the changes(e.g. database changelog) or a changing dimensioned
> table which materializes the changes(e.g. database table). For changing
> dimensioned table, Flink uses DDL to define a temporal table and visits the
> temporal table data by looking up the external system’s table. For changing
> history table, Flink uses temporal table function to define a parameterized
> view of changing history table and then accesses the data of view, but
> Temporal Table Function can only be called via Table API or YAML which is
> pretty inconvenient for FLINK SQL users. If we can support temporal table
> DDL, user no longer needs temporal table function, and they can visit
> temporal table easily in pure SQL world.
>
> Flink SQL obtains the ability to interpret changelog after FLIP-95[1],
> changelog is natural temporal table which contains all versioned data of
> the the origin database table. Support temporal table on changelog would
> help user visit specific version of the original database table, this will
> enrich Flink temporal table join scenario much.
>
> The community have already had a related discussion thread[2] and reach
> some consensus. Based on that, I prepare the FLIP-132[3] to support
> Temporal Table DDL, please see the FLIP doc for more details.
>
> Any comment is welcome, I'm looking  forward to your feedback.
>
> Best
> Leonard
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95:+New+TableSource+and+TableSink+interfaces
> >
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLINK-16824-Creating-Temporal-Table-Function-via-DDL-td40333.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLINK-16824-Creating-Temporal-Table-Function-via-DDL-td40333.html
> >
> [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
> >



-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk