You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Shammon FY <zj...@gmail.com> on 2023/06/01 09:57:35 UTC

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Hi Feng,

I have one minor comment about the public interface `Optional<Long>
getSnapshot()` in the `CatalogTable`.

As we can get tables from the new method `Catalog.getTable(ObjectPath
tablePath, long timestamp)`, I think the returned `CatalogBaseTable` will
have the information of timestamp. Flink or connector such as
iceberg/paimon can create sources from the `CatalogBaseTable` directly
without the need to get the snapshot ID from `CatalogTable.getSnapshot()`.
What do you think of it?

Best,
Shammon FY


On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid> wrote:

> Hi Feng,
>
> Thanks for the proposal! Very interesting feature. Would you like to update
> your thoughts described in your previous email about why SupportsTimeTravel
> has been rejected into the FLIP? This will help readers understand the
> context (in the future).
>
> Since we always directly add overload methods into Catalog according to new
> requirements, which makes the interface bloated. Just out of curiosity,
> does it make sense to introduce some DSL design? Like
> Catalog.getTable(tablePath).on(timeStamp),
> Catalog.getTable(tablePath).current() for the most current version, and
> more room for further extension like timestamp range, etc. I haven't read
> all the source code yet and I'm not sure if it is possible. But a
> design like this will keep the Catalog API lean and the API/DSL will be
> self described and easier to use.
>
> Best regards,
> Jing
>
>
> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> krzysiek.chmielewski@gmail.com> wrote:
>
> > Ok after second though I'm retracting my previous statement about Catalog
> > changes you proposed.
> > I do see a benefit for Delta connector actually with this change and see
> > why this could be coupled with Catalog.
> >
> > Delta Connector SQL support, also ships a Delta Catalog implementation
> for
> > Flink.
> > For Delta Catalog, table schema information is fetched from underlying
> > _delta_log and not stored in metastore. For time travel we actually had a
> > problem, that if we would like to timetravel back to some old version,
> > where schema was slightly different, then we would have a conflict since
> > Catalog would return current schema and not how it was for version X.
> >
> > With your change, our Delta Catalog can actually fetch schema for
> version X
> > and send it to DeltaTableFactory. Currency, Catalog can fetch only
> current
> > version. What we would also need however is version (number/timestamp)
> for
> > this table passed to DynamicTableFactory so we could properly set Delta
> > standalone library.
> >
> > Regards,
> > Krzysztof
> >
> > śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> > krzysiek.chmielewski@gmail.com> napisał(a):
> >
> > > Hi,
> > > happy to see such a feature.
> > > Small note from my end regarding Catalog changes.
> > >
> > > TL;DR
> > > I don't think it is necessary to delegate this feature to the catalog.
> I
> > > think that since "timetravel" is per job/query property, its should not
> > be
> > > coupled with the Catalog or table definition. In my opinion this is
> > > something that DynamicTableFactory only has to know about. I would
> rather
> > > see this feature as it is - SQL syntax enhancement but delegate clearly
> > to
> > > DynamicTableFactory.
> > >
> > > I've implemented timetravel feature for Delta Connector  [1]  using
> > > current Flink API.
> > > Docs are pending code review, but you can find them here [2] and
> examples
> > > are available here [3]
> > >
> > > The timetravel feature that I've implemented is based on Flink Query
> > > hints.
> > > "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> > >
> > > The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
> > not
> > > by Catalog but by DyntamicTableFactory implementation for Delta
> > connector.
> > > The value of this property is passed to Delta standalone lib API that
> > > returns table view for given version.
> > >
> > > I'm not sure how/if proposed change could benefit Delta connector
> > > implementation for this feature.
> > >
> > > Thanks,
> > > Krzysztof
> > >
> > > [1]
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > > [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > > [3]
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> > >
> > > śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
> > >
> > >> Hi, Feng
> > >>
> > >> Thanks for driving this FLIP, Time travel is very useful for Flink
> > >> integrate with data lake system. I have one question why the
> > >> implementation
> > >> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
> > query
> > >> Hudi table with the time travel syntax, but we don't use the
> > HudiCatalog,
> > >> instead, we register the hudi table to InMemoryCatalog,  can we
> support
> > >> time travel for Hudi table in this case?
> > >> In contrast, I think time travel should bind to connector instead of
> > >> Catalog, so the rejected alternative should be considered.
> > >>
> > >> Best,
> > >> Ron
> > >>
> > >> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> > >>
> > >> > Hi, Feng.
> > >> > Notice this FLIP only support batch mode for time travel.  Would it
> > also
> > >> > make sense to support stream mode to a read a snapshot of the table
> > as a
> > >> > bounded stream?
> > >> >
> > >> > Best regards,
> > >> > Yuxia
> > >> >
> > >> > ----- 原始邮件 -----
> > >> > 发件人: "Benchao Li" <li...@apache.org>
> > >> > 收件人: "dev" <de...@flink.apache.org>
> > >> > 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> > >> > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >> >
> > >> > # Can Calcite support this syntax ` VERSION AS OF`  ?
> > >> >
> > >> > This also depends on whether this is defined in standard or any
> known
> > >> > databases that have implemented this. If not, it would be hard to
> push
> > >> it
> > >> > to Calcite.
> > >> >
> > >> > # getTable(ObjectPath object, long timestamp)
> > >> >
> > >> > Then we again come to the problem of "casting between timestamp and
> > >> > numeric", which has been disabled in FLINK-21978[1]. If you're gonna
> > use
> > >> > this, then we need to clarify that problem first.
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >> >
> > >> >
> > >> > Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> > >> >
> > >> > > hi, thanks for your reply.
> > >> > >
> > >> > > @Benchao
> > >> > > > did you consider the pushdown abilities compatible
> > >> > >
> > >> > > In the current design, the implementation of TimeTravel is
> delegated
> > >> to
> > >> > > Catalog. We have added a function called getTable(ObjectPath
> > >> tablePath,
> > >> > > long timestamp) to obtain the corresponding CatalogBaseTable at a
> > >> > specific
> > >> > > time.  Therefore, I think it will not have any impact on the
> > original
> > >> > > pushdown abilities.
> > >> > >
> > >> > >
> > >> > > >   I see there is a rejected  design for adding
> SupportsTimeTravel,
> > >> but
> > >> > I
> > >> > > didn't see the alternative in  the FLIP doc
> > >> > >
> > >> > > Sorry, the document description is not very clear.  Regarding
> > whether
> > >> to
> > >> > > support SupportTimeTravel, I have discussed it with yuxia. Since
> we
> > >> have
> > >> > > already passed the corresponding time in getTable(ObjectPath, long
> > >> > > timestamp) of Catalog, SupportTimeTravel may not be necessary.
> > >> > >
> > >> > > In getTable(ObjectPath object, long timestamp), we can obtain the
> > >> schema
> > >> > of
> > >> > > the corresponding time point and put the SNAPSHOT that needs to be
> > >> > consumed
> > >> > > into options.
> > >> > >
> > >> > >
> > >> > > @Shammon
> > >> > > > Could we support this in Flink too?
> > >> > >
> > >> > > I personally think it's possible, but limited by Calcite's syntax
> > >> > > restrictions. I believe we should first support this syntax in
> > >> Calcite.
> > >> > > Currently, I think it may not be easy  to support this syntax in
> > >> Flink's
> > >> > > parser. @Benchao, what do you think? Can Calcite support this
> syntax
> > >> > > ` VERSION AS OF`  ?
> > >> > >
> > >> > >
> > >> > > Best,
> > >> > > Feng.
> > >> > >
> > >> > >
> > >> > > On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > Thanks Feng, the feature of time travel sounds great!
> > >> > > >
> > >> > > > In addition to SYSTEM_TIME, lake houses such as paimon and
> iceberg
> > >> > > support
> > >> > > > snapshot or version. For example, users can query snapshot 1 for
> > >> paimon
> > >> > > by
> > >> > > > the following statement
> > >> > > > SELECT * FROM t VERSION AS OF 1
> > >> > > >
> > >> > > > Could we support this in Flink too?
> > >> > > >
> > >> > > > Best,
> > >> > > > Shammon FY
> > >> > > >
> > >> > > > On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> libenchao@apache.org>
> > >> > wrote:
> > >> > > >
> > >> > > > > Regarding the implementation, did you consider the pushdown
> > >> abilities
> > >> > > > > compatible, e.g., projection pushdown, filter pushdown,
> > partition
> > >> > > > pushdown.
> > >> > > > > Since `Snapshot` is not handled much in existing rules, I
> have a
> > >> > > concern
> > >> > > > > about this. Of course, it depends on your implementation
> detail,
> > >> what
> > >> > > is
> > >> > > > > important is that we'd better add some cross tests for these.
> > >> > > > >
> > >> > > > > Regarding the interface exposed to Connector, I see there is a
> > >> > rejected
> > >> > > > > design for adding SupportsTimeTravel, but I didn't see the
> > >> > alternative
> > >> > > in
> > >> > > > > the FLIP doc. IMO, this is an important thing we need to
> clarify
> > >> > > because
> > >> > > > we
> > >> > > > > need to know whether the Connector supports this, and what
> > >> > > > column/metadata
> > >> > > > > corresponds to 'system_time'.
> > >> > > > >
> > >> > > > > Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> > >> > > > >
> > >> > > > > > Thanks for your reply
> > >> > > > > >
> > >> > > > > > @Timo @BenChao @yuxia
> > >> > > > > >
> > >> > > > > > Sorry for the mistake,  Currently , calcite only supports
> > `FOR
> > >> > > > > SYSTEM_TIME
> > >> > > > > > AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> OF`
> > .
> > >> > I've
> > >> > > > > > updated the syntax part of the FLIP.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > @Timo
> > >> > > > > >
> > >> > > > > > > We will convert it to TIMESTAMP_LTZ?
> > >> > > > > >
> > >> > > > > > Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> and
> > >> then
> > >> > > > > convert
> > >> > > > > > it into a long value.
> > >> > > > > >
> > >> > > > > > > How do we want to query the most recent version of a table
> > >> > > > > >
> > >> > > > > > I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> > cause
> > >> > > > > > inconsistency with the real-time concept.
> > >> > > > > > However, from my personal understanding, the scope of  `AS
> OF
> > >> > > > > > CURRENT_TIMESTAMP` is the table itself, not the table
> record.
> > >> So,
> > >> > I
> > >> > > > > think
> > >> > > > > > using CURRENT_TIMESTAMP should also be reasonable?.
> > >> > > > > > Additionally, if no version is specified, the latest version
> > >> should
> > >> > > be
> > >> > > > > used
> > >> > > > > > by default.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Feng
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Thu, May 25, 2023 at 7:47 PM yuxia <
> > >> luoyuxia@alumni.sjtu.edu.cn
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Thanks Feng for bringing this up. It'll be great to
> > introduce
> > >> > time
> > >> > > > > travel
> > >> > > > > > > to Flink to have a better integration with external data
> > >> soruces.
> > >> > > > > > >
> > >> > > > > > > I also share same concern about the syntax.
> > >> > > > > > > I see in the part of `Whether to support other syntax
> > >> > > > implementations`
> > >> > > > > in
> > >> > > > > > > this FLIP, seems the syntax in Calcite should be `FOR
> > >> SYSTEM_TIME
> > >> > > AS
> > >> > > > > OF`,
> > >> > > > > > > right?
> > >> > > > > > > But the the syntax part in this FLIP, it seems to be `AS
> OF
> > >> > > > TIMESTAMP`
> > >> > > > > > > instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> or
> > >> by
> > >> > > > design?
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Best regards,
> > >> > > > > > > Yuxia
> > >> > > > > > >
> > >> > > > > > > ----- 原始邮件 -----
> > >> > > > > > > 发件人: "Benchao Li" <li...@apache.org>
> > >> > > > > > > 收件人: "dev" <de...@flink.apache.org>
> > >> > > > > > > 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > >> > > > > > > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> > Mode
> > >> > > > > > >
> > >> > > > > > > Thanks Feng, it's exciting to have this ability.
> > >> > > > > > >
> > >> > > > > > > Regarding the syntax section, are you proposing `AS OF`
> > >> instead
> > >> > of
> > >> > > > `FOR
> > >> > > > > > > SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> > the
> > >> SQL
> > >> > > > > > standard
> > >> > > > > > > and has been supported in some database vendors such as
> SQL
> > >> > Server.
> > >> > > > > About
> > >> > > > > > > `AS OF`, is it in the standard or any database vendor
> > supports
> > >> > > this,
> > >> > > > if
> > >> > > > > > > yes, I think it's worth to add this support to Calcite,
> and
> > I
> > >> > would
> > >> > > > > give
> > >> > > > > > a
> > >> > > > > > > hand in Calcite side. Otherwise, I think we'd better to
> use
> > >> `FOR
> > >> > > > SYSTEM
> > >> > > > > > AS
> > >> > > > > > > OF`.
> > >> > > > > > >
> > >> > > > > > > Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> > >> > > > > > >
> > >> > > > > > > > Also: How do we want to query the most recent version
> of a
> > >> > table?
> > >> > > > > > > >
> > >> > > > > > > > `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> to
> > >> the
> > >> > > docs
> > >> > > > > > both
> > >> > > > > > > > the type is TIMESTAMP_LTZ and what is even more
> concerning
> > >> is
> > >> > the
> > >> > > > it
> > >> > > > > > > > actually is evalated row-based:
> > >> > > > > > > >
> > >> > > > > > > >  > Returns the current SQL timestamp in the local time
> > zone,
> > >> > the
> > >> > > > > return
> > >> > > > > > > > type is TIMESTAMP_LTZ(3). It is evaluated for each
> record
> > in
> > >> > > > > streaming
> > >> > > > > > > > mode. But in batch mode, it is evaluated once as the
> query
> > >> > starts
> > >> > > > and
> > >> > > > > > > > uses the same result for every row.
> > >> > > > > > > >
> > >> > > > > > > > This could make it difficult to explain in a join
> scenario
> > >> of
> > >> > > > > multiple
> > >> > > > > > > > snapshotted tables.
> > >> > > > > > > >
> > >> > > > > > > > Regards,
> > >> > > > > > > > Timo
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On 25.05.23 12:29, Timo Walther wrote:
> > >> > > > > > > > > Hi Feng,
> > >> > > > > > > > >
> > >> > > > > > > > > thanks for proposing this FLIP. It makes a lot of
> sense
> > to
> > >> > > > finally
> > >> > > > > > > > > support querying tables at a specific point in time or
> > >> > > hopefully
> > >> > > > > also
> > >> > > > > > > > > ranges soon. Following time-versioned tables.
> > >> > > > > > > > >
> > >> > > > > > > > > Here is some feedback from my side:
> > >> > > > > > > > >
> > >> > > > > > > > > 1. Syntax
> > >> > > > > > > > >
> > >> > > > > > > > > Can you elaborate a bit on the Calcite restrictions?
> > >> > > > > > > > >
> > >> > > > > > > > > Does Calcite currently support `AS OF` syntax for this
> > but
> > >> > not
> > >> > > > `FOR
> > >> > > > > > > > > SYSTEM_TIME AS OF`?
> > >> > > > > > > > >
> > >> > > > > > > > > It would be great to support `AS OF` also for
> > >> time-versioned
> > >> > > > joins
> > >> > > > > > and
> > >> > > > > > > > > have a unified and short syntax.
> > >> > > > > > > > >
> > >> > > > > > > > > Once a fix is merged in Calcite for this, we can make
> > this
> > >> > > > > available
> > >> > > > > > in
> > >> > > > > > > > > Flink earlier by copying the corresponding classes
> until
> > >> the
> > >> > > next
> > >> > > > > > > > > Calcite upgrade is performed.
> > >> > > > > > > > >
> > >> > > > > > > > > 2. Semantics
> > >> > > > > > > > >
> > >> > > > > > > > > How do we interpret the timestamp? In Flink we have 2
> > >> > timestamp
> > >> > > > > types
> > >> > > > > > > > > (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> > >> > TIMESTAMP
> > >> > > > > > > > > '2023-04-27 00:00:00', in which timezone will the
> > >> timestamp
> > >> > be?
> > >> > > > We
> > >> > > > > > will
> > >> > > > > > > > > convert it to TIMESTAMP_LTZ?
> > >> > > > > > > > >
> > >> > > > > > > > > We definely need to clarify this because the past has
> > >> shown
> > >> > > that
> > >> > > > > > > > > daylight saving times make our lives hard.
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > > Timo
> > >> > > > > > > > >
> > >> > > > > > > > > On 25.05.23 10:57, Feng Jin wrote:
> > >> > > > > > > > >> Hi, everyone.
> > >> > > > > > > > >>
> > >> > > > > > > > >> I’d like to start a discussion about FLIP-308:
> Support
> > >> Time
> > >> > > > Travel
> > >> > > > > > In
> > >> > > > > > > > >> Batch
> > >> > > > > > > > >> Mode [1]
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Time travel is a SQL syntax used to query historical
> > >> > versions
> > >> > > of
> > >> > > > > > data.
> > >> > > > > > > > It
> > >> > > > > > > > >> allows users to specify a point in time and retrieve
> > the
> > >> > data
> > >> > > > and
> > >> > > > > > > > >> schema of
> > >> > > > > > > > >> a table as it appeared at that time. With time
> travel,
> > >> users
> > >> > > can
> > >> > > > > > > easily
> > >> > > > > > > > >> analyze and compare historical versions of data.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> With the widespread use of data lake systems such as
> > >> Paimon,
> > >> > > > > > Iceberg,
> > >> > > > > > > > and
> > >> > > > > > > > >> Hudi, time travel can provide more convenience for
> > users'
> > >> > data
> > >> > > > > > > analysis.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Looking forward to your opinions, any suggestions are
> > >> > > welcomed.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> 1.
> > >> > > > > > > > >>
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Best.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Feng
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Benchao Li
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Benchao Li
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Best,
> > >> > Benchao Li
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Hi everyone,

Thank you all for your valuable input. If there are no further questions or
concerns regarding FLIP-308[1], I would like to start voting on Monday,
June 19th.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel


Best,

Feng

On Mon, Jun 12, 2023 at 10:57 AM Feng Jin <ji...@gmail.com> wrote:

> Thanks Benchao and Leonard.
>
> 'Implicitly type conversion' makes sense to me.   I will emphasize the
> 'Implicitly type conversion' in the document.
>
>
> Best,
> Feng
>
> On Sat, Jun 10, 2023 at 10:11 AM Benchao Li <li...@apache.org> wrote:
>
>> Thanks Leonard for the input, "Implicitly type conversion" way sounds good
>> to me.
>> I also agree that this should be done in planner instead of connector,
>> it'll be a lot easier for connector development.
>>
>> Leonard Xu <xb...@gmail.com> 于2023年6月9日周五 20:11写道:
>>
>> > About the semantics consideration, I have some new input after rethink.
>> >
>> > 1. We can support both TIMESTAMP and TIMESTAMP_LTZ expression following
>> > the syntax  `SELECT [column_name(s)] FROM [table_name] FOR SYSTEM_TIME
>> AS
>> > OF `
>> >
>> > 2. For TIMESTAMP_LTZ type, give a long instant value to CatalogTable is
>> > pretty intuitive, for TIMESTAMP_type, it will be implied cast to
>> > TIMESTAMP_LTZ type by planner using session timezone and then pass to
>> > CatalogTable. This case can be considered as a Function
>> AsOfSnapshot(Table
>> > t, TIMESTAMP_LTZ arg), which can pass arg with TIMESTAMP_LTZ type, but
>> our
>> > framework supports implicit type conversion thus users can also pass arg
>> > with TIMESTAMP type. Hint, Spark[1] did the  implicit type conversion
>> too.
>> >
>> > 3.I also considered handing over the implicit type conversion to the
>> > connector instead of planner, such as passing a TIMESTAMP literal, and
>> the
>> > connector using the session timezone to perform type conversion, but
>> this
>> > is more complicated than previous planner handling, and it’s not
>> friendly
>> > to the connector developers.
>> >
>> > 4. The last point,  TIMESTAMP_LTZ  '1970-01-01 00:00:04.001’ should be
>> an
>> > invalid expression as if you can not define a instant point (i.e
>> > TIMSTAMP_LTZ semantics in SQL) from a timestamp literal without
>> timezone.
>> > You can use explicit type conversion like `cast(ts_ntz as
>> TIMESTAMP_LTZ)`
>> > after `FOR SYSTEM_TIME AS OF ` if you want to use
>> > Timestamp type/expression/literal without timezone.
>> >
>> > 5. The last last point, the TIMESTAMP_LTZ type of Flink SQL supports DST
>> > time[2] well that will help user avoid many corner case.
>> >
>> >
>> > Best,
>> > Leonard
>> >
>> > [1]
>> >
>> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
>> > [2]
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#daylight-saving-time-support
>> >
>> >
>> >
>> >
>> > > On Jun 9, 2023, at 1:13 PM, Benchao Li <li...@apache.org> wrote:
>> > >
>> > > As you can see that you must use `UNIX_TIMESTAMP` to do this work,
>> that's
>> > > where the time zone happens.
>> > >
>> > > What I'm talking about is casting timestamp/timestamp_ltz to long
>> > directly,
>> > > that's why the semantic is tricky when you are casting timestamp to
>> long
>> > > using time zone.
>> > >
>> > > For other systems, such as SQL server[1], they actually uses a string
>> > > instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
>> > > 00:00:00.0000000'`, I'm not sure whether they convert the string
>> > implicitly
>> > > to TIMESTAMP_LTZ, or they just have a different definition of the
>> syntax.
>> > >
>> > > But for us, we are definitely using timestamp/timestmap_ltz literal
>> here,
>> > > that's why it is special, and we must highlight this behavior that we
>> are
>> > > converting a timestamp without time zone literal to long using the
>> > session
>> > > time zone.
>> > >
>> > > [1]
>> > >
>> >
>> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16
>> > >
>> > > Feng Jin <ji...@gmail.com> 于2023年6月8日周四 11:35写道:
>> > >
>> > >> Hi all,
>> > >>
>> > >> thanks for your input
>> > >>
>> > >>
>> > >> @Benchao
>> > >>
>> > >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
>> > >> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC
>> > timezone,
>> > >> which is not usually expected by users.
>> > >>
>> > >> It was indeed the case before Flink 1.13, but now my understanding is
>> > that
>> > >> there have been some slight changes in the definition of TIMESTAMP.
>> > >>
>> > >> TIMESTAMP is currently used to specify the year, month, day, hour,
>> > minute
>> > >> and second. We recommend that users use
>> > *UNIX_TIMESTAMP(CAST(timestamp_col
>> > >> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
>> > >> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
>> > >> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will
>> > involve
>> > >> using the *LOCAL TIME ZONE*.
>> > >>
>> > >>
>> > >> Here is an test:
>> > >>
>> > >> Flink SQL> SET 'table.local-time-zone' = 'UTC';
>> > >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01
>> 00:00:00' as
>> > >> STRING)) as `timestamp`;
>> > >> ---------------
>> > >> timestamp
>> > >> --------------
>> > >> 0
>> > >>
>> > >> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
>> > >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01
>> 00:00:00' as
>> > >> STRING)) as `timestamp`;
>> > >> ---------------
>> > >> timestamp
>> > >> --------------
>> > >> -28800
>> > >>
>> > >> Therefore, the current conversion method exposed to users is also
>> using
>> > >> LOCAL TIME ZONE.
>> > >>
>> > >>
>> > >> @yuxia
>> > >>
>> > >> Thank you very much for providing the list of behaviors of TIMESTAMP
>> in
>> > >> other systems.
>> > >>
>> > >>> I think we can align them to avoid the inconsistency to other
>> engines
>> > and
>> > >> provide convenience for the external connectors while integrating
>> > Flink's
>> > >> time travel API.
>> > >>
>> > >> +1 for this.
>> > >>
>> > >>> Regarding the inconsistency, I think we can consider time-travel as
>> a
>> > >> specical case, and we do needs to highlight this in this FLIP.
>> > >> As for "violate the restriction outlined in FLINK-21978[1]", since we
>> > cast
>> > >> timestamp to epochMillis only for the internal use, and won't expose
>> it
>> > to
>> > >> users, I don't think it will violate the restriction.
>> > >> Btw, please add a brief desc to explain the meaning of the parameter
>> > >> `timestamp` in method `CatalogBaseTable getTable(ObjectPath
>> tablePath,
>> > long
>> > >> timestamp)`. Maybe something like "timestamp of the table snapt,
>> which
>> > is
>> > >> millseconds since 1970-01-01 00:00:00 UTC".
>> > >>
>> > >> Thank you for the suggestions regarding the document. I will add
>> them to
>> > >> FLIP.
>> > >>
>> > >>
>> > >> Best,
>> > >> Feng
>> > >>
>> > >>
>> > >> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org>
>> > wrote:
>> > >>
>> > >>> I also share the concern about the timezone problem.
>> > >>>
>> > >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
>> > >> WITHOUT
>> > >>> TIME ZONE", converting it to unix timestamp would use UTC timezone,
>> > which
>> > >>> is not usually expected by users.
>> > >>>
>> > >>> If we want to keep consistent with the standard, we probably should
>> use
>> > >>> "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
>> > >> "TIMESTAMP
>> > >>> WITH LOCAL TIME ZONE", and converting it to unix timestamp will
>> > consider
>> > >>> the session timezone, which is the expected result. But it's
>> > inconvenient
>> > >>> for users.
>> > >>>
>> > >>> Taking this a special case, and converting "TIMESTAMP '2023-04-27
>> > >>> 00:00:00'" to a unix timestamp with session timezone, will be
>> > convenient
>> > >>> for users, but will break the standard. I will +0.5 for this choice.
>> > >>>
>> > >>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
>> > >>>
>> > >>>> Hi, Feng Jin.
>> > >>>> I think the concern of Leonard may be the inconsistency of the
>> > behavior
>> > >>> of
>> > >>>> TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
>> > >>> statement.
>> > >>>>
>> > >>>> For the normal sql:
>> > >>>> `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider
>> timezone.
>> > >>>> But for the sql for timetravl:
>> > >>>> `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP
>> '2023-04-27
>> > >>>> 00:00:00'`, we will consider the timezone and convert to UTC
>> > timestamp.
>> > >>>>
>> > >>>> The concern is valid. But for time travel, most style of engines,
>> > >>>> Spark[1], Hive[2], Trino[3] also do the time conversion with
>> > >> considering
>> > >>>> the seesion time zone. I think we can align them to avoid the
>> > >>> inconsistency
>> > >>>> to other engines and provide convenience for the external
>> connectors
>> > >>> while
>> > >>>> integrating Flink's time travel API.
>> > >>>>
>> > >>>> Regarding the inconsistency, I think we can consider time-travel
>> as a
>> > >>>> specical case, and we do needs to highlight this in this FLIP.
>> > >>>> As for "violate the restriction outlined in FLINK-21978[1]", since
>> we
>> > >>> cast
>> > >>>> timestamp to epochMillis only for the internal use, and won't
>> expose
>> > it
>> > >>> to
>> > >>>> users, I don't think it will violate the restriction.
>> > >>>> Btw, please add a brief desc to explain the meaning of the
>> parameter
>> > >>>> `timestamp` in method `CatalogBaseTable getTable(ObjectPath
>> tablePath,
>> > >>> long
>> > >>>> timestamp)`. Maybe something like "timestamp of the table snapt,
>> which
>> > >> is
>> > >>>> millseconds since 1970-01-01 00:00:00 UTC".
>> > >>>>
>> > >>>> [1]
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
>> > >>>> [2]
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
>> > >>>> [3]
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
>> > >>>>
>> > >>>>
>> > >>>> Best regards,
>> > >>>> Yuxia
>> > >>>>
>> > >>>> ----- 原始邮件 -----
>> > >>>> 发件人: "Feng Jin" <ji...@gmail.com>
>> > >>>> 收件人: "dev" <de...@flink.apache.org>
>> > >>>> 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
>> > >>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>> > >>>>
>> > >>>> Hi everyone
>> > >>>>
>> > >>>> Thanks everyone for your input.
>> > >>>>
>> > >>>>
>> > >>>> @Yun
>> > >>>>
>> > >>>>> I think you could add descriptions of how to align backfill time
>> > >>> travel
>> > >>>> with querying the latest data. And I think you should also update
>> the
>> > >>>> "Discussion thread" in the original FLIP.
>> > >>>>
>> > >>>> Thank you for the suggestion, I will update it in the document.
>> > >>>>
>> > >>>>> I have a question about getting the table schema from the catalog.
>> > >> I'm
>> > >>>> not sure whether the Catalog#getTable(tablePath, timestamp) will be
>> > >>> called
>> > >>>> only once.
>> > >>>>
>> > >>>> I understand that in a query, the schema of the table is determined
>> > >>> before
>> > >>>> execution. The schema used will be based on the latest schema
>> within
>> > >> the
>> > >>>> TimeTravel period.
>> > >>>>
>> > >>>> In addition, due to current syntax limitations, we are unable to
>> > >> support
>> > >>>> the use of BETWEEN AND.
>> > >>>>
>> > >>>>
>> > >>>> @Jing
>> > >>>>
>> > >>>>> Would you like to update your thoughts described in your previous
>> > >>> email
>> > >>>> about why SupportsTimeTravel has been rejected into the FLIP?
>> > >>>>
>> > >>>> Sure,  I updated the doc.
>> > >>>>
>> > >>>>
>> > >>>>>   Since we always directly add overload methods into Catalog
>> > >> according
>> > >>>> to new requirements, which makes the interface bloated
>> > >>>>
>> > >>>> Your concern is valid. If we need to support the long type version
>> in
>> > >> the
>> > >>>> future, we may have to add another method "getTable(ObjectPath,
>> long
>> > >>>> version)". However, I understand that
>> > >>>> "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
>> > >>> requirements.
>> > >>>> The timestamp is for Catalog's use, and Catalog obtains the
>> > >> corresponding
>> > >>>> schema based on this time.
>> > >>>>
>> > >>>>
>> > >>>> @liu @Regards
>> > >>>>
>> > >>>> I am very sorry for the unclear description in the document. I have
>> > >>> updated
>> > >>>> relevant descriptions regarding why it needs to be implemented in
>> > >>> Catalog.
>> > >>>>
>> > >>>> Travel not only requires obtaining data at the corresponding time
>> > >> point,
>> > >>>> but also requires the corresponding Schema at that time point
>> > >>>>
>> > >>>>
>> > >>>> @Shammon
>> > >>>>
>> > >>>>> Flink or connector such as  iceberg/paimon can create sources from
>> > >> the
>> > >>>> `CatalogBaseTable` directly without the need to get the snapshot ID
>> > >> from
>> > >>>> `CatalogTable.getSnapshot()`.  What do you think of it?
>> > >>>>
>> > >>>> You are right, we don't need the getSnapshot interface for
>> > >> PaimonCatalog
>> > >>> or
>> > >>>> IcebergCatalog tables, but we may need it for temporary tables.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> Best,
>> > >>>> Feng
>> > >>>>
>> > >>>>
>> > >>>> On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com>
>> > wrote:
>> > >>>>
>> > >>>>> Sorry I replied to the wrong mail. Please ignore the last email.
>> > >>>>>
>> > >>>>>
>> > >>>>> Hi Leonard
>> > >>>>>
>> > >>>>>> 1. Unification SQL
>> > >>>>>
>> > >>>>> I agree that it is crucial for us to support both batch and
>> streaming
>> > >>>>> processing.  The current design allows for the support of both
>> batch
>> > >>> and
>> > >>>>> streaming processing. I'll update the FLIP later.
>> > >>>>>
>> > >>>>>
>> > >>>>>> 2.Semantics
>> > >>>>>
>> > >>>>> In my opinion, it would be feasible to perform the conversion
>> based
>> > >> on
>> > >>>> the
>> > >>>>> current session time, regardless of whether it is TIMESTAMP or
>> > >>>>> TIMESTAMP_LTZ.
>> > >>>>>
>> > >>>>> However, this may indeed violate the restriction outlined in
>> > >>>>> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to
>> > >> whether
>> > >>> it
>> > >>>>> is reasonable.
>> > >>>>>
>> > >>>>>
>> > >>>>>>  3.  Some external systems may use timestamp value to mark a
>> > >>> version,
>> > >>>>> but others may use version number、file position、log offset.
>> > >>>>>
>> > >>>>> It is true that most systems support time-related operations, and
>> I
>> > >>>>> believe that the current design is compatible with most systems.
>> > >>> However,
>> > >>>>> if we want to support long data type, it may require Calcite to
>> > >> support
>> > >>>> the
>> > >>>>> VERSION AS OF syntax. I understand that this is something that we
>> may
>> > >>>> need
>> > >>>>> to consider in the future.
>> > >>>>>
>> > >>>>>
>> > >>>>> Best,
>> > >>>>> Feng
>> > >>>>>
>> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>> > >>>>>
>> > >>>>> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com>
>> wrote:
>> > >>>>>
>> > >>>>>> Hi, Feng
>> > >>>>>>
>> > >>>>>> Thanks for driving this FLIP, very impressive feature that users
>> > >> want,
>> > >>>>>> I’ve some quick questions here.
>> > >>>>>>
>> > >>>>>> 1.Unification SQL:
>> > >>>>>>        The snapshot  concept exists both in Batch mode and
>> > >> Streaming
>> > >>>>>> mode,  could we consider a unified proposal? I think users won’t
>> > >>> another
>> > >>>>>> SQL syntax named
>> > >>>>>> Time travel for Streaming mode.
>> > >>>>>>
>> > >>>>>> 2.Semantics:
>> > >>>>>>        Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
>> > >>> long
>> > >>>>>> timestamp value (getTable(ObjectPath tablePath, long timestamp))
>> we
>> > >>> need
>> > >>>>>> two information i.e. a TIMESTAMP value and current session
>> timezone,
>> > >>>> how
>> > >>>>>> we deal the value with current proposed SQL syntax.
>> > >>>>>>
>> > >>>>>> 3. Is it enough using sinlge timestamp to track a
>> snapshot(version)
>> > >> of
>> > >>>>>> external table?   Some external systems may use timestamp value
>> to
>> > >>> mark
>> > >>>> a
>> > >>>>>> version, but others may use version number、file position、log
>> offset.
>> > >>>>>>
>> > >>>>>> Best,
>> > >>>>>> Leonard
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>> On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
>> > >>>>>>>
>> > >>>>>>> Hi Feng,
>> > >>>>>>>
>> > >>>>>>> I think this FLIP would provide one important feature to unify
>> the
>> > >>>>>> stream-SQL and batch-SQL when we backfill the historical data in
>> > >> batch
>> > >>>> mode.
>> > >>>>>>>
>> > >>>>>>> For the "Syntax" session, I think you could add descriptions of
>> > >> how
>> > >>> to
>> > >>>>>> align backfill time travel with querying the latest data. And I
>> > >> think
>> > >>>> you
>> > >>>>>> should also update the "Discussion thread" in the original FLIP.
>> > >>>>>>>
>> > >>>>>>> Moreover, I have a question about getting the table schema from
>> > >> the
>> > >>>>>> catalog. I'm not sure whether the Catalog#getTable(tablePath,
>> > >>> timestamp)
>> > >>>>>> will be called only once. If we have a backfill query between
>> > >>> 2023-05-29
>> > >>>>>> and 2023-06-04 in the past week, and the table schema changed on
>> > >>>>>> 2023-06-01, will the query below detect the schema changes during
>> > >>>> backfill
>> > >>>>>> the whole week?
>> > >>>>>>>
>> > >>>>>>> SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
>> > >>>>>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
>> > >>>>>>>
>> > >>>>>>> Best
>> > >>>>>>> Yun Tang
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> ________________________________
>> > >>>>>>> From: Shammon FY <zj...@gmail.com>
>> > >>>>>>> Sent: Thursday, June 1, 2023 17:57
>> > >>>>>>> To: dev@flink.apache.org <de...@flink.apache.org>
>> > >>>>>>> Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>> Mode
>> > >>>>>>>
>> > >>>>>>> Hi Feng,
>> > >>>>>>>
>> > >>>>>>> I have one minor comment about the public interface
>> > >> `Optional<Long>
>> > >>>>>>> getSnapshot()` in the `CatalogTable`.
>> > >>>>>>>
>> > >>>>>>> As we can get tables from the new method
>> > >>> `Catalog.getTable(ObjectPath
>> > >>>>>>> tablePath, long timestamp)`, I think the returned
>> > >> `CatalogBaseTable`
>> > >>>>>> will
>> > >>>>>>> have the information of timestamp. Flink or connector such as
>> > >>>>>>> iceberg/paimon can create sources from the `CatalogBaseTable`
>> > >>> directly
>> > >>>>>>> without the need to get the snapshot ID from
>> > >>>>>> `CatalogTable.getSnapshot()`.
>> > >>>>>>> What do you think of it?
>> > >>>>>>>
>> > >>>>>>> Best,
>> > >>>>>>> Shammon FY
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> On Thu, Jun 1, 2023 at 7:22 AM Jing Ge
>> <jing@ververica.com.invalid
>> > >>>
>> > >>>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>> Hi Feng,
>> > >>>>>>>>
>> > >>>>>>>> Thanks for the proposal! Very interesting feature. Would you
>> like
>> > >>> to
>> > >>>>>> update
>> > >>>>>>>> your thoughts described in your previous email about why
>> > >>>>>> SupportsTimeTravel
>> > >>>>>>>> has been rejected into the FLIP? This will help readers
>> > >> understand
>> > >>>> the
>> > >>>>>>>> context (in the future).
>> > >>>>>>>>
>> > >>>>>>>> Since we always directly add overload methods into Catalog
>> > >>> according
>> > >>>>>> to new
>> > >>>>>>>> requirements, which makes the interface bloated. Just out of
>> > >>>> curiosity,
>> > >>>>>>>> does it make sense to introduce some DSL design? Like
>> > >>>>>>>> Catalog.getTable(tablePath).on(timeStamp),
>> > >>>>>>>> Catalog.getTable(tablePath).current() for the most current
>> > >> version,
>> > >>>> and
>> > >>>>>>>> more room for further extension like timestamp range, etc. I
>> > >>> haven't
>> > >>>>>> read
>> > >>>>>>>> all the source code yet and I'm not sure if it is possible.
>> But a
>> > >>>>>>>> design like this will keep the Catalog API lean and the API/DSL
>> > >>> will
>> > >>>> be
>> > >>>>>>>> self described and easier to use.
>> > >>>>>>>>
>> > >>>>>>>> Best regards,
>> > >>>>>>>> Jing
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
>> > >>>>>>>> krzysiek.chmielewski@gmail.com> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Ok after second though I'm retracting my previous statement
>> > >> about
>> > >>>>>> Catalog
>> > >>>>>>>>> changes you proposed.
>> > >>>>>>>>> I do see a benefit for Delta connector actually with this
>> change
>> > >>> and
>> > >>>>>> see
>> > >>>>>>>>> why this could be coupled with Catalog.
>> > >>>>>>>>>
>> > >>>>>>>>> Delta Connector SQL support, also ships a Delta Catalog
>> > >>>> implementation
>> > >>>>>>>> for
>> > >>>>>>>>> Flink.
>> > >>>>>>>>> For Delta Catalog, table schema information is fetched from
>> > >>>> underlying
>> > >>>>>>>>> _delta_log and not stored in metastore. For time travel we
>> > >>> actually
>> > >>>>>> had a
>> > >>>>>>>>> problem, that if we would like to timetravel back to some old
>> > >>>> version,
>> > >>>>>>>>> where schema was slightly different, then we would have a
>> > >> conflict
>> > >>>>>> since
>> > >>>>>>>>> Catalog would return current schema and not how it was for
>> > >> version
>> > >>>> X.
>> > >>>>>>>>>
>> > >>>>>>>>> With your change, our Delta Catalog can actually fetch schema
>> > >> for
>> > >>>>>>>> version X
>> > >>>>>>>>> and send it to DeltaTableFactory. Currency, Catalog can fetch
>> > >> only
>> > >>>>>>>> current
>> > >>>>>>>>> version. What we would also need however is version
>> > >>>> (number/timestamp)
>> > >>>>>>>> for
>> > >>>>>>>>> this table passed to DynamicTableFactory so we could properly
>> > >> set
>> > >>>>>> Delta
>> > >>>>>>>>> standalone library.
>> > >>>>>>>>>
>> > >>>>>>>>> Regards,
>> > >>>>>>>>> Krzysztof
>> > >>>>>>>>>
>> > >>>>>>>>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
>> > >>>>>>>>> krzysiek.chmielewski@gmail.com> napisał(a):
>> > >>>>>>>>>
>> > >>>>>>>>>> Hi,
>> > >>>>>>>>>> happy to see such a feature.
>> > >>>>>>>>>> Small note from my end regarding Catalog changes.
>> > >>>>>>>>>>
>> > >>>>>>>>>> TL;DR
>> > >>>>>>>>>> I don't think it is necessary to delegate this feature to the
>> > >>>>>> catalog.
>> > >>>>>>>> I
>> > >>>>>>>>>> think that since "timetravel" is per job/query property, its
>> > >>> should
>> > >>>>>> not
>> > >>>>>>>>> be
>> > >>>>>>>>>> coupled with the Catalog or table definition. In my opinion
>> > >> this
>> > >>> is
>> > >>>>>>>>>> something that DynamicTableFactory only has to know about. I
>> > >>> would
>> > >>>>>>>> rather
>> > >>>>>>>>>> see this feature as it is - SQL syntax enhancement but
>> delegate
>> > >>>>>> clearly
>> > >>>>>>>>> to
>> > >>>>>>>>>> DynamicTableFactory.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I've implemented timetravel feature for Delta Connector  [1]
>> > >>> using
>> > >>>>>>>>>> current Flink API.
>> > >>>>>>>>>> Docs are pending code review, but you can find them here [2]
>> > >> and
>> > >>>>>>>> examples
>> > >>>>>>>>>> are available here [3]
>> > >>>>>>>>>>
>> > >>>>>>>>>> The timetravel feature that I've implemented is based on
>> Flink
>> > >>>> Query
>> > >>>>>>>>>> hints.
>> > >>>>>>>>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1')
>> */"
>> > >>>>>>>>>>
>> > >>>>>>>>>> The " versionAsOf" (we also have 'timestampAsOf') parameter
>> is
>> > >>>>>> handled
>> > >>>>>>>>> not
>> > >>>>>>>>>> by Catalog but by DyntamicTableFactory implementation for
>> Delta
>> > >>>>>>>>> connector.
>> > >>>>>>>>>> The value of this property is passed to Delta standalone lib
>> > >> API
>> > >>>> that
>> > >>>>>>>>>> returns table view for given version.
>> > >>>>>>>>>>
>> > >>>>>>>>>> I'm not sure how/if proposed change could benefit Delta
>> > >> connector
>> > >>>>>>>>>> implementation for this feature.
>> > >>>>>>>>>>
>> > >>>>>>>>>> Thanks,
>> > >>>>>>>>>> Krzysztof
>> > >>>>>>>>>>
>> > >>>>>>>>>> [1]
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
>> > >>>>>>>>>> [2]
>> > >>>>>>
>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
>> > >>>>>>>>>> [3]
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
>> > >>>>>>>>>>
>> > >>>>>>>>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
>> > >>> napisał(a):
>> > >>>>>>>>>>
>> > >>>>>>>>>>> Hi, Feng
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Thanks for driving this FLIP, Time travel is very useful for
>> > >>> Flink
>> > >>>>>>>>>>> integrate with data lake system. I have one question why the
>> > >>>>>>>>>>> implementation
>> > >>>>>>>>>>> of TimeTravel is delegated to Catalog? Assuming that we use
>> > >>> Flink
>> > >>>> to
>> > >>>>>>>>> query
>> > >>>>>>>>>>> Hudi table with the time travel syntax, but we don't use the
>> > >>>>>>>>> HudiCatalog,
>> > >>>>>>>>>>> instead, we register the hudi table to InMemoryCatalog,  can
>> > >> we
>> > >>>>>>>> support
>> > >>>>>>>>>>> time travel for Hudi table in this case?
>> > >>>>>>>>>>> In contrast, I think time travel should bind to connector
>> > >>> instead
>> > >>>> of
>> > >>>>>>>>>>> Catalog, so the rejected alternative should be considered.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Best,
>> > >>>>>>>>>>> Ron
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> Hi, Feng.
>> > >>>>>>>>>>>> Notice this FLIP only support batch mode for time travel.
>> > >>> Would
>> > >>>> it
>> > >>>>>>>>> also
>> > >>>>>>>>>>>> make sense to support stream mode to a read a snapshot of
>> the
>> > >>>> table
>> > >>>>>>>>> as a
>> > >>>>>>>>>>>> bounded stream?
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Best regards,
>> > >>>>>>>>>>>> Yuxia
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> ----- 原始邮件 -----
>> > >>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> > >>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>> > >>>>>>>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
>> > >>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>> Mode
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> This also depends on whether this is defined in standard or
>> > >> any
>> > >>>>>>>> known
>> > >>>>>>>>>>>> databases that have implemented this. If not, it would be
>> > >> hard
>> > >>> to
>> > >>>>>>>> push
>> > >>>>>>>>>>> it
>> > >>>>>>>>>>>> to Calcite.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> # getTable(ObjectPath object, long timestamp)
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Then we again come to the problem of "casting between
>> > >> timestamp
>> > >>>> and
>> > >>>>>>>>>>>> numeric", which has been disabled in FLINK-21978[1]. If
>> > >> you're
>> > >>>>>> gonna
>> > >>>>>>>>> use
>> > >>>>>>>>>>>> this, then we need to clarify that problem first.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>> hi, thanks for your reply.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> @Benchao
>> > >>>>>>>>>>>>>> did you consider the pushdown abilities compatible
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> In the current design, the implementation of TimeTravel is
>> > >>>>>>>> delegated
>> > >>>>>>>>>>> to
>> > >>>>>>>>>>>>> Catalog. We have added a function called
>> getTable(ObjectPath
>> > >>>>>>>>>>> tablePath,
>> > >>>>>>>>>>>>> long timestamp) to obtain the corresponding
>> CatalogBaseTable
>> > >>> at
>> > >>>> a
>> > >>>>>>>>>>>> specific
>> > >>>>>>>>>>>>> time.  Therefore, I think it will not have any impact on
>> the
>> > >>>>>>>>> original
>> > >>>>>>>>>>>>> pushdown abilities.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> I see there is a rejected  design for adding
>> > >>>>>>>> SupportsTimeTravel,
>> > >>>>>>>>>>> but
>> > >>>>>>>>>>>> I
>> > >>>>>>>>>>>>> didn't see the alternative in  the FLIP doc
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> Sorry, the document description is not very clear.
>> > >> Regarding
>> > >>>>>>>>> whether
>> > >>>>>>>>>>> to
>> > >>>>>>>>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
>> > >>> Since
>> > >>>>>>>> we
>> > >>>>>>>>>>> have
>> > >>>>>>>>>>>>> already passed the corresponding time in
>> > >> getTable(ObjectPath,
>> > >>>> long
>> > >>>>>>>>>>>>> timestamp) of Catalog, SupportTimeTravel may not be
>> > >> necessary.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> In getTable(ObjectPath object, long timestamp), we can
>> > >> obtain
>> > >>>> the
>> > >>>>>>>>>>> schema
>> > >>>>>>>>>>>> of
>> > >>>>>>>>>>>>> the corresponding time point and put the SNAPSHOT that
>> needs
>> > >>> to
>> > >>>> be
>> > >>>>>>>>>>>> consumed
>> > >>>>>>>>>>>>> into options.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> @Shammon
>> > >>>>>>>>>>>>>> Could we support this in Flink too?
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> I personally think it's possible, but limited by Calcite's
>> > >>>> syntax
>> > >>>>>>>>>>>>> restrictions. I believe we should first support this
>> syntax
>> > >> in
>> > >>>>>>>>>>> Calcite.
>> > >>>>>>>>>>>>> Currently, I think it may not be easy  to support this
>> > >> syntax
>> > >>> in
>> > >>>>>>>>>>> Flink's
>> > >>>>>>>>>>>>> parser. @Benchao, what do you think? Can Calcite support
>> > >> this
>> > >>>>>>>> syntax
>> > >>>>>>>>>>>>> ` VERSION AS OF`  ?
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> Best,
>> > >>>>>>>>>>>>> Feng.
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <
>> > >> zjureel@gmail.com
>> > >>>>
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Thanks Feng, the feature of time travel sounds great!
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon
>> and
>> > >>>>>>>> iceberg
>> > >>>>>>>>>>>>> support
>> > >>>>>>>>>>>>>> snapshot or version. For example, users can query
>> snapshot
>> > >> 1
>> > >>>> for
>> > >>>>>>>>>>> paimon
>> > >>>>>>>>>>>>> by
>> > >>>>>>>>>>>>>> the following statement
>> > >>>>>>>>>>>>>> SELECT * FROM t VERSION AS OF 1
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Could we support this in Flink too?
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Best,
>> > >>>>>>>>>>>>>> Shammon FY
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
>> > >>>>>>>> libenchao@apache.org>
>> > >>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Regarding the implementation, did you consider the
>> > >> pushdown
>> > >>>>>>>>>>> abilities
>> > >>>>>>>>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
>> > >>>>>>>>> partition
>> > >>>>>>>>>>>>>> pushdown.
>> > >>>>>>>>>>>>>>> Since `Snapshot` is not handled much in existing rules,
>> I
>> > >>>>>>>> have a
>> > >>>>>>>>>>>>> concern
>> > >>>>>>>>>>>>>>> about this. Of course, it depends on your implementation
>> > >>>>>>>> detail,
>> > >>>>>>>>>>> what
>> > >>>>>>>>>>>>> is
>> > >>>>>>>>>>>>>>> important is that we'd better add some cross tests for
>> > >>> these.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Regarding the interface exposed to Connector, I see
>> there
>> > >>> is a
>> > >>>>>>>>>>>> rejected
>> > >>>>>>>>>>>>>>> design for adding SupportsTimeTravel, but I didn't see
>> the
>> > >>>>>>>>>>>> alternative
>> > >>>>>>>>>>>>> in
>> > >>>>>>>>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
>> > >>>>>>>> clarify
>> > >>>>>>>>>>>>> because
>> > >>>>>>>>>>>>>> we
>> > >>>>>>>>>>>>>>> need to know whether the Connector supports this, and
>> what
>> > >>>>>>>>>>>>>> column/metadata
>> > >>>>>>>>>>>>>>> corresponds to 'system_time'.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Thanks for your reply
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> @Timo @BenChao @yuxia
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Sorry for the mistake,  Currently , calcite only
>> supports
>> > >>>>>>>>> `FOR
>> > >>>>>>>>>>>>>>> SYSTEM_TIME
>> > >>>>>>>>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME
>> AS
>> > >>>>>>>> OF`
>> > >>>>>>>>> .
>> > >>>>>>>>>>>> I've
>> > >>>>>>>>>>>>>>>> updated the syntax part of the FLIP.
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> @Timo
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Yes, I think we need to convert TIMESTAMP to
>> > >> TIMESTAMP_LTZ
>> > >>>>>>>> and
>> > >>>>>>>>>>> then
>> > >>>>>>>>>>>>>>> convert
>> > >>>>>>>>>>>>>>>> it into a long value.
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> How do we want to query the most recent version of a
>> > >> table
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it
>> does
>> > >>>>>>>>> cause
>> > >>>>>>>>>>>>>>>> inconsistency with the real-time concept.
>> > >>>>>>>>>>>>>>>> However, from my personal understanding, the scope of
>> > >> `AS
>> > >>>>>>>> OF
>> > >>>>>>>>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
>> > >>>>>>>> record.
>> > >>>>>>>>>>> So,
>> > >>>>>>>>>>>> I
>> > >>>>>>>>>>>>>>> think
>> > >>>>>>>>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
>> > >>>>>>>>>>>>>>>> Additionally, if no version is specified, the latest
>> > >>> version
>> > >>>>>>>>>>> should
>> > >>>>>>>>>>>>> be
>> > >>>>>>>>>>>>>>> used
>> > >>>>>>>>>>>>>>>> by default.
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> Best,
>> > >>>>>>>>>>>>>>>> Feng
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
>> > >>>>>>>>>>> luoyuxia@alumni.sjtu.edu.cn
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
>> > >>>>>>>>> introduce
>> > >>>>>>>>>>>> time
>> > >>>>>>>>>>>>>>> travel
>> > >>>>>>>>>>>>>>>>> to Flink to have a better integration with external
>> data
>> > >>>>>>>>>>> soruces.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> I also share same concern about the syntax.
>> > >>>>>>>>>>>>>>>>> I see in the part of `Whether to support other syntax
>> > >>>>>>>>>>>>>> implementations`
>> > >>>>>>>>>>>>>>> in
>> > >>>>>>>>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
>> > >>>>>>>>>>> SYSTEM_TIME
>> > >>>>>>>>>>>>> AS
>> > >>>>>>>>>>>>>>> OF`,
>> > >>>>>>>>>>>>>>>>> right?
>> > >>>>>>>>>>>>>>>>> But the the syntax part in this FLIP, it seems to be
>> `AS
>> > >>>>>>>> OF
>> > >>>>>>>>>>>>>> TIMESTAMP`
>> > >>>>>>>>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a
>> > >> mistake
>> > >>>>>>>> or
>> > >>>>>>>>>>> by
>> > >>>>>>>>>>>>>> design?
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Best regards,
>> > >>>>>>>>>>>>>>>>> Yuxia
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> ----- 原始邮件 -----
>> > >>>>>>>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> > >>>>>>>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>> > >>>>>>>>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
>> > >>>>>>>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In
>> Batch
>> > >>>>>>>>> Mode
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Regarding the syntax section, are you proposing `AS
>> OF`
>> > >>>>>>>>>>> instead
>> > >>>>>>>>>>>> of
>> > >>>>>>>>>>>>>> `FOR
>> > >>>>>>>>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is
>> > >> in
>> > >>>>>>>>> the
>> > >>>>>>>>>>> SQL
>> > >>>>>>>>>>>>>>>> standard
>> > >>>>>>>>>>>>>>>>> and has been supported in some database vendors such
>> as
>> > >>>>>>>> SQL
>> > >>>>>>>>>>>> Server.
>> > >>>>>>>>>>>>>>> About
>> > >>>>>>>>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
>> > >>>>>>>>> supports
>> > >>>>>>>>>>>>> this,
>> > >>>>>>>>>>>>>> if
>> > >>>>>>>>>>>>>>>>> yes, I think it's worth to add this support to
>> Calcite,
>> > >>>>>>>> and
>> > >>>>>>>>> I
>> > >>>>>>>>>>>> would
>> > >>>>>>>>>>>>>>> give
>> > >>>>>>>>>>>>>>>> a
>> > >>>>>>>>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better
>> to
>> > >>>>>>>> use
>> > >>>>>>>>>>> `FOR
>> > >>>>>>>>>>>>>> SYSTEM
>> > >>>>>>>>>>>>>>>> AS
>> > >>>>>>>>>>>>>>>>> OF`.
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四
>> > >> 19:02写道:
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Also: How do we want to query the most recent version
>> > >>>>>>>> of a
>> > >>>>>>>>>>>> table?
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but
>> according
>> > >>>>>>>> to
>> > >>>>>>>>>>> the
>> > >>>>>>>>>>>>> docs
>> > >>>>>>>>>>>>>>>> both
>> > >>>>>>>>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
>> > >>>>>>>> concerning
>> > >>>>>>>>>>> is
>> > >>>>>>>>>>>> the
>> > >>>>>>>>>>>>>> it
>> > >>>>>>>>>>>>>>>>>> actually is evalated row-based:
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Returns the current SQL timestamp in the local time
>> > >>>>>>>>> zone,
>> > >>>>>>>>>>>> the
>> > >>>>>>>>>>>>>>> return
>> > >>>>>>>>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
>> > >>>>>>>> record
>> > >>>>>>>>> in
>> > >>>>>>>>>>>>>>> streaming
>> > >>>>>>>>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
>> > >>>>>>>> query
>> > >>>>>>>>>>>> starts
>> > >>>>>>>>>>>>>> and
>> > >>>>>>>>>>>>>>>>>> uses the same result for every row.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> This could make it difficult to explain in a join
>> > >>>>>>>> scenario
>> > >>>>>>>>>>> of
>> > >>>>>>>>>>>>>>> multiple
>> > >>>>>>>>>>>>>>>>>> snapshotted tables.
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> Regards,
>> > >>>>>>>>>>>>>>>>>> Timo
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
>> > >>>>>>>>>>>>>>>>>>> Hi Feng,
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
>> > >>>>>>>> sense
>> > >>>>>>>>> to
>> > >>>>>>>>>>>>>> finally
>> > >>>>>>>>>>>>>>>>>>> support querying tables at a specific point in time
>> or
>> > >>>>>>>>>>>>> hopefully
>> > >>>>>>>>>>>>>>> also
>> > >>>>>>>>>>>>>>>>>>> ranges soon. Following time-versioned tables.
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Here is some feedback from my side:
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> 1. Syntax
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for
>> this
>> > >>>>>>>>> but
>> > >>>>>>>>>>>> not
>> > >>>>>>>>>>>>>> `FOR
>> > >>>>>>>>>>>>>>>>>>> SYSTEM_TIME AS OF`?
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> It would be great to support `AS OF` also for
>> > >>>>>>>>>>> time-versioned
>> > >>>>>>>>>>>>>> joins
>> > >>>>>>>>>>>>>>>> and
>> > >>>>>>>>>>>>>>>>>>> have a unified and short syntax.
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can
>> make
>> > >>>>>>>>> this
>> > >>>>>>>>>>>>>>> available
>> > >>>>>>>>>>>>>>>> in
>> > >>>>>>>>>>>>>>>>>>> Flink earlier by copying the corresponding classes
>> > >>>>>>>> until
>> > >>>>>>>>>>> the
>> > >>>>>>>>>>>>> next
>> > >>>>>>>>>>>>>>>>>>> Calcite upgrade is performed.
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> 2. Semantics
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have
>> 2
>> > >>>>>>>>>>>> timestamp
>> > >>>>>>>>>>>>>>> types
>> > >>>>>>>>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS
>> OF
>> > >>>>>>>>>>>> TIMESTAMP
>> > >>>>>>>>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
>> > >>>>>>>>>>> timestamp
>> > >>>>>>>>>>>> be?
>> > >>>>>>>>>>>>>> We
>> > >>>>>>>>>>>>>>>> will
>> > >>>>>>>>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> We definely need to clarify this because the past
>> has
>> > >>>>>>>>>>> shown
>> > >>>>>>>>>>>>> that
>> > >>>>>>>>>>>>>>>>>>> daylight saving times make our lives hard.
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> Thanks,
>> > >>>>>>>>>>>>>>>>>>> Timo
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
>> > >>>>>>>>>>>>>>>>>>>> Hi, everyone.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
>> > >>>>>>>> Support
>> > >>>>>>>>>>> Time
>> > >>>>>>>>>>>>>> Travel
>> > >>>>>>>>>>>>>>>> In
>> > >>>>>>>>>>>>>>>>>>>> Batch
>> > >>>>>>>>>>>>>>>>>>>> Mode [1]
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> Time travel is a SQL syntax used to query
>> historical
>> > >>>>>>>>>>>> versions
>> > >>>>>>>>>>>>> of
>> > >>>>>>>>>>>>>>>> data.
>> > >>>>>>>>>>>>>>>>>> It
>> > >>>>>>>>>>>>>>>>>>>> allows users to specify a point in time and
>> retrieve
>> > >>>>>>>>> the
>> > >>>>>>>>>>>> data
>> > >>>>>>>>>>>>>> and
>> > >>>>>>>>>>>>>>>>>>>> schema of
>> > >>>>>>>>>>>>>>>>>>>> a table as it appeared at that time. With time
>> > >>>>>>>> travel,
>> > >>>>>>>>>>> users
>> > >>>>>>>>>>>>> can
>> > >>>>>>>>>>>>>>>>> easily
>> > >>>>>>>>>>>>>>>>>>>> analyze and compare historical versions of data.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> With the widespread use of data lake systems such
>> as
>> > >>>>>>>>>>> Paimon,
>> > >>>>>>>>>>>>>>>> Iceberg,
>> > >>>>>>>>>>>>>>>>>> and
>> > >>>>>>>>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
>> > >>>>>>>>> users'
>> > >>>>>>>>>>>> data
>> > >>>>>>>>>>>>>>>>> analysis.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> Looking forward to your opinions, any suggestions
>> are
>> > >>>>>>>>>>>>> welcomed.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> 1.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> Best.
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>> Feng
>> > >>>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> --
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>> Best,
>> > >>>>>>>>>>>>>>>>> Benchao Li
>> > >>>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> --
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Best,
>> > >>>>>>>>>>>>>>> Benchao Li
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> --
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Best,
>> > >>>>>>>>>>>> Benchao Li
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Best,
>> > >>> Benchao Li
>> > >>>
>> > >>
>> > >
>> > >
>> > > --
>> > >
>> > > Best,
>> > > Benchao Li
>> >
>> >
>>
>> --
>>
>> Best,
>> Benchao Li
>>
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Thanks Benchao and Leonard.

'Implicitly type conversion' makes sense to me.   I will emphasize the
'Implicitly type conversion' in the document.


Best,
Feng

On Sat, Jun 10, 2023 at 10:11 AM Benchao Li <li...@apache.org> wrote:

> Thanks Leonard for the input, "Implicitly type conversion" way sounds good
> to me.
> I also agree that this should be done in planner instead of connector,
> it'll be a lot easier for connector development.
>
> Leonard Xu <xb...@gmail.com> 于2023年6月9日周五 20:11写道:
>
> > About the semantics consideration, I have some new input after rethink.
> >
> > 1. We can support both TIMESTAMP and TIMESTAMP_LTZ expression following
> > the syntax  `SELECT [column_name(s)] FROM [table_name] FOR SYSTEM_TIME AS
> > OF `
> >
> > 2. For TIMESTAMP_LTZ type, give a long instant value to CatalogTable is
> > pretty intuitive, for TIMESTAMP_type, it will be implied cast to
> > TIMESTAMP_LTZ type by planner using session timezone and then pass to
> > CatalogTable. This case can be considered as a Function
> AsOfSnapshot(Table
> > t, TIMESTAMP_LTZ arg), which can pass arg with TIMESTAMP_LTZ type, but
> our
> > framework supports implicit type conversion thus users can also pass arg
> > with TIMESTAMP type. Hint, Spark[1] did the  implicit type conversion
> too.
> >
> > 3.I also considered handing over the implicit type conversion to the
> > connector instead of planner, such as passing a TIMESTAMP literal, and
> the
> > connector using the session timezone to perform type conversion, but this
> > is more complicated than previous planner handling, and it’s not friendly
> > to the connector developers.
> >
> > 4. The last point,  TIMESTAMP_LTZ  '1970-01-01 00:00:04.001’ should be an
> > invalid expression as if you can not define a instant point (i.e
> > TIMSTAMP_LTZ semantics in SQL) from a timestamp literal without timezone.
> > You can use explicit type conversion like `cast(ts_ntz as TIMESTAMP_LTZ)`
> > after `FOR SYSTEM_TIME AS OF ` if you want to use
> > Timestamp type/expression/literal without timezone.
> >
> > 5. The last last point, the TIMESTAMP_LTZ type of Flink SQL supports DST
> > time[2] well that will help user avoid many corner case.
> >
> >
> > Best,
> > Leonard
> >
> > [1]
> >
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> > [2]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#daylight-saving-time-support
> >
> >
> >
> >
> > > On Jun 9, 2023, at 1:13 PM, Benchao Li <li...@apache.org> wrote:
> > >
> > > As you can see that you must use `UNIX_TIMESTAMP` to do this work,
> that's
> > > where the time zone happens.
> > >
> > > What I'm talking about is casting timestamp/timestamp_ltz to long
> > directly,
> > > that's why the semantic is tricky when you are casting timestamp to
> long
> > > using time zone.
> > >
> > > For other systems, such as SQL server[1], they actually uses a string
> > > instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
> > > 00:00:00.0000000'`, I'm not sure whether they convert the string
> > implicitly
> > > to TIMESTAMP_LTZ, or they just have a different definition of the
> syntax.
> > >
> > > But for us, we are definitely using timestamp/timestmap_ltz literal
> here,
> > > that's why it is special, and we must highlight this behavior that we
> are
> > > converting a timestamp without time zone literal to long using the
> > session
> > > time zone.
> > >
> > > [1]
> > >
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16
> > >
> > > Feng Jin <ji...@gmail.com> 于2023年6月8日周四 11:35写道:
> > >
> > >> Hi all,
> > >>
> > >> thanks for your input
> > >>
> > >>
> > >> @Benchao
> > >>
> > >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> > >> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC
> > timezone,
> > >> which is not usually expected by users.
> > >>
> > >> It was indeed the case before Flink 1.13, but now my understanding is
> > that
> > >> there have been some slight changes in the definition of TIMESTAMP.
> > >>
> > >> TIMESTAMP is currently used to specify the year, month, day, hour,
> > minute
> > >> and second. We recommend that users use
> > *UNIX_TIMESTAMP(CAST(timestamp_col
> > >> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
> > >> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
> > >> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will
> > involve
> > >> using the *LOCAL TIME ZONE*.
> > >>
> > >>
> > >> Here is an test:
> > >>
> > >> Flink SQL> SET 'table.local-time-zone' = 'UTC';
> > >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00'
> as
> > >> STRING)) as `timestamp`;
> > >> ---------------
> > >> timestamp
> > >> --------------
> > >> 0
> > >>
> > >> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
> > >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00'
> as
> > >> STRING)) as `timestamp`;
> > >> ---------------
> > >> timestamp
> > >> --------------
> > >> -28800
> > >>
> > >> Therefore, the current conversion method exposed to users is also
> using
> > >> LOCAL TIME ZONE.
> > >>
> > >>
> > >> @yuxia
> > >>
> > >> Thank you very much for providing the list of behaviors of TIMESTAMP
> in
> > >> other systems.
> > >>
> > >>> I think we can align them to avoid the inconsistency to other engines
> > and
> > >> provide convenience for the external connectors while integrating
> > Flink's
> > >> time travel API.
> > >>
> > >> +1 for this.
> > >>
> > >>> Regarding the inconsistency, I think we can consider time-travel as a
> > >> specical case, and we do needs to highlight this in this FLIP.
> > >> As for "violate the restriction outlined in FLINK-21978[1]", since we
> > cast
> > >> timestamp to epochMillis only for the internal use, and won't expose
> it
> > to
> > >> users, I don't think it will violate the restriction.
> > >> Btw, please add a brief desc to explain the meaning of the parameter
> > >> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
> > long
> > >> timestamp)`. Maybe something like "timestamp of the table snapt, which
> > is
> > >> millseconds since 1970-01-01 00:00:00 UTC".
> > >>
> > >> Thank you for the suggestions regarding the document. I will add them
> to
> > >> FLIP.
> > >>
> > >>
> > >> Best,
> > >> Feng
> > >>
> > >>
> > >> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org>
> > wrote:
> > >>
> > >>> I also share the concern about the timezone problem.
> > >>>
> > >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> > >> WITHOUT
> > >>> TIME ZONE", converting it to unix timestamp would use UTC timezone,
> > which
> > >>> is not usually expected by users.
> > >>>
> > >>> If we want to keep consistent with the standard, we probably should
> use
> > >>> "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
> > >> "TIMESTAMP
> > >>> WITH LOCAL TIME ZONE", and converting it to unix timestamp will
> > consider
> > >>> the session timezone, which is the expected result. But it's
> > inconvenient
> > >>> for users.
> > >>>
> > >>> Taking this a special case, and converting "TIMESTAMP '2023-04-27
> > >>> 00:00:00'" to a unix timestamp with session timezone, will be
> > convenient
> > >>> for users, but will break the standard. I will +0.5 for this choice.
> > >>>
> > >>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
> > >>>
> > >>>> Hi, Feng Jin.
> > >>>> I think the concern of Leonard may be the inconsistency of the
> > behavior
> > >>> of
> > >>>> TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
> > >>> statement.
> > >>>>
> > >>>> For the normal sql:
> > >>>> `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider
> timezone.
> > >>>> But for the sql for timetravl:
> > >>>> `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
> > >>>> 00:00:00'`, we will consider the timezone and convert to UTC
> > timestamp.
> > >>>>
> > >>>> The concern is valid. But for time travel, most style of engines,
> > >>>> Spark[1], Hive[2], Trino[3] also do the time conversion with
> > >> considering
> > >>>> the seesion time zone. I think we can align them to avoid the
> > >>> inconsistency
> > >>>> to other engines and provide convenience for the external connectors
> > >>> while
> > >>>> integrating Flink's time travel API.
> > >>>>
> > >>>> Regarding the inconsistency, I think we can consider time-travel as
> a
> > >>>> specical case, and we do needs to highlight this in this FLIP.
> > >>>> As for "violate the restriction outlined in FLINK-21978[1]", since
> we
> > >>> cast
> > >>>> timestamp to epochMillis only for the internal use, and won't expose
> > it
> > >>> to
> > >>>> users, I don't think it will violate the restriction.
> > >>>> Btw, please add a brief desc to explain the meaning of the parameter
> > >>>> `timestamp` in method `CatalogBaseTable getTable(ObjectPath
> tablePath,
> > >>> long
> > >>>> timestamp)`. Maybe something like "timestamp of the table snapt,
> which
> > >> is
> > >>>> millseconds since 1970-01-01 00:00:00 UTC".
> > >>>>
> > >>>> [1]
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> > >>>> [2]
> > >>>>
> > >>>
> > >>
> >
> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
> > >>>> [3]
> > >>>>
> > >>>
> > >>
> >
> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
> > >>>>
> > >>>>
> > >>>> Best regards,
> > >>>> Yuxia
> > >>>>
> > >>>> ----- 原始邮件 -----
> > >>>> 发件人: "Feng Jin" <ji...@gmail.com>
> > >>>> 收件人: "dev" <de...@flink.apache.org>
> > >>>> 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
> > >>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >>>>
> > >>>> Hi everyone
> > >>>>
> > >>>> Thanks everyone for your input.
> > >>>>
> > >>>>
> > >>>> @Yun
> > >>>>
> > >>>>> I think you could add descriptions of how to align backfill time
> > >>> travel
> > >>>> with querying the latest data. And I think you should also update
> the
> > >>>> "Discussion thread" in the original FLIP.
> > >>>>
> > >>>> Thank you for the suggestion, I will update it in the document.
> > >>>>
> > >>>>> I have a question about getting the table schema from the catalog.
> > >> I'm
> > >>>> not sure whether the Catalog#getTable(tablePath, timestamp) will be
> > >>> called
> > >>>> only once.
> > >>>>
> > >>>> I understand that in a query, the schema of the table is determined
> > >>> before
> > >>>> execution. The schema used will be based on the latest schema within
> > >> the
> > >>>> TimeTravel period.
> > >>>>
> > >>>> In addition, due to current syntax limitations, we are unable to
> > >> support
> > >>>> the use of BETWEEN AND.
> > >>>>
> > >>>>
> > >>>> @Jing
> > >>>>
> > >>>>> Would you like to update your thoughts described in your previous
> > >>> email
> > >>>> about why SupportsTimeTravel has been rejected into the FLIP?
> > >>>>
> > >>>> Sure,  I updated the doc.
> > >>>>
> > >>>>
> > >>>>>   Since we always directly add overload methods into Catalog
> > >> according
> > >>>> to new requirements, which makes the interface bloated
> > >>>>
> > >>>> Your concern is valid. If we need to support the long type version
> in
> > >> the
> > >>>> future, we may have to add another method "getTable(ObjectPath, long
> > >>>> version)". However, I understand that
> > >>>> "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
> > >>> requirements.
> > >>>> The timestamp is for Catalog's use, and Catalog obtains the
> > >> corresponding
> > >>>> schema based on this time.
> > >>>>
> > >>>>
> > >>>> @liu @Regards
> > >>>>
> > >>>> I am very sorry for the unclear description in the document. I have
> > >>> updated
> > >>>> relevant descriptions regarding why it needs to be implemented in
> > >>> Catalog.
> > >>>>
> > >>>> Travel not only requires obtaining data at the corresponding time
> > >> point,
> > >>>> but also requires the corresponding Schema at that time point
> > >>>>
> > >>>>
> > >>>> @Shammon
> > >>>>
> > >>>>> Flink or connector such as  iceberg/paimon can create sources from
> > >> the
> > >>>> `CatalogBaseTable` directly without the need to get the snapshot ID
> > >> from
> > >>>> `CatalogTable.getSnapshot()`.  What do you think of it?
> > >>>>
> > >>>> You are right, we don't need the getSnapshot interface for
> > >> PaimonCatalog
> > >>> or
> > >>>> IcebergCatalog tables, but we may need it for temporary tables.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Best,
> > >>>> Feng
> > >>>>
> > >>>>
> > >>>> On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Sorry I replied to the wrong mail. Please ignore the last email.
> > >>>>>
> > >>>>>
> > >>>>> Hi Leonard
> > >>>>>
> > >>>>>> 1. Unification SQL
> > >>>>>
> > >>>>> I agree that it is crucial for us to support both batch and
> streaming
> > >>>>> processing.  The current design allows for the support of both
> batch
> > >>> and
> > >>>>> streaming processing. I'll update the FLIP later.
> > >>>>>
> > >>>>>
> > >>>>>> 2.Semantics
> > >>>>>
> > >>>>> In my opinion, it would be feasible to perform the conversion based
> > >> on
> > >>>> the
> > >>>>> current session time, regardless of whether it is TIMESTAMP or
> > >>>>> TIMESTAMP_LTZ.
> > >>>>>
> > >>>>> However, this may indeed violate the restriction outlined in
> > >>>>> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to
> > >> whether
> > >>> it
> > >>>>> is reasonable.
> > >>>>>
> > >>>>>
> > >>>>>>  3.  Some external systems may use timestamp value to mark a
> > >>> version,
> > >>>>> but others may use version number、file position、log offset.
> > >>>>>
> > >>>>> It is true that most systems support time-related operations, and I
> > >>>>> believe that the current design is compatible with most systems.
> > >>> However,
> > >>>>> if we want to support long data type, it may require Calcite to
> > >> support
> > >>>> the
> > >>>>> VERSION AS OF syntax. I understand that this is something that we
> may
> > >>>> need
> > >>>>> to consider in the future.
> > >>>>>
> > >>>>>
> > >>>>> Best,
> > >>>>> Feng
> > >>>>>
> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >>>>>
> > >>>>> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Hi, Feng
> > >>>>>>
> > >>>>>> Thanks for driving this FLIP, very impressive feature that users
> > >> want,
> > >>>>>> I’ve some quick questions here.
> > >>>>>>
> > >>>>>> 1.Unification SQL:
> > >>>>>>        The snapshot  concept exists both in Batch mode and
> > >> Streaming
> > >>>>>> mode,  could we consider a unified proposal? I think users won’t
> > >>> another
> > >>>>>> SQL syntax named
> > >>>>>> Time travel for Streaming mode.
> > >>>>>>
> > >>>>>> 2.Semantics:
> > >>>>>>        Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
> > >>> long
> > >>>>>> timestamp value (getTable(ObjectPath tablePath, long timestamp))
> we
> > >>> need
> > >>>>>> two information i.e. a TIMESTAMP value and current session
> timezone,
> > >>>> how
> > >>>>>> we deal the value with current proposed SQL syntax.
> > >>>>>>
> > >>>>>> 3. Is it enough using sinlge timestamp to track a
> snapshot(version)
> > >> of
> > >>>>>> external table?   Some external systems may use timestamp value to
> > >>> mark
> > >>>> a
> > >>>>>> version, but others may use version number、file position、log
> offset.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Leonard
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> > >>>>>>>
> > >>>>>>> Hi Feng,
> > >>>>>>>
> > >>>>>>> I think this FLIP would provide one important feature to unify
> the
> > >>>>>> stream-SQL and batch-SQL when we backfill the historical data in
> > >> batch
> > >>>> mode.
> > >>>>>>>
> > >>>>>>> For the "Syntax" session, I think you could add descriptions of
> > >> how
> > >>> to
> > >>>>>> align backfill time travel with querying the latest data. And I
> > >> think
> > >>>> you
> > >>>>>> should also update the "Discussion thread" in the original FLIP.
> > >>>>>>>
> > >>>>>>> Moreover, I have a question about getting the table schema from
> > >> the
> > >>>>>> catalog. I'm not sure whether the Catalog#getTable(tablePath,
> > >>> timestamp)
> > >>>>>> will be called only once. If we have a backfill query between
> > >>> 2023-05-29
> > >>>>>> and 2023-06-04 in the past week, and the table schema changed on
> > >>>>>> 2023-06-01, will the query below detect the schema changes during
> > >>>> backfill
> > >>>>>> the whole week?
> > >>>>>>>
> > >>>>>>> SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> > >>>>>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> > >>>>>>>
> > >>>>>>> Best
> > >>>>>>> Yun Tang
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ________________________________
> > >>>>>>> From: Shammon FY <zj...@gmail.com>
> > >>>>>>> Sent: Thursday, June 1, 2023 17:57
> > >>>>>>> To: dev@flink.apache.org <de...@flink.apache.org>
> > >>>>>>> Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> Mode
> > >>>>>>>
> > >>>>>>> Hi Feng,
> > >>>>>>>
> > >>>>>>> I have one minor comment about the public interface
> > >> `Optional<Long>
> > >>>>>>> getSnapshot()` in the `CatalogTable`.
> > >>>>>>>
> > >>>>>>> As we can get tables from the new method
> > >>> `Catalog.getTable(ObjectPath
> > >>>>>>> tablePath, long timestamp)`, I think the returned
> > >> `CatalogBaseTable`
> > >>>>>> will
> > >>>>>>> have the information of timestamp. Flink or connector such as
> > >>>>>>> iceberg/paimon can create sources from the `CatalogBaseTable`
> > >>> directly
> > >>>>>>> without the need to get the snapshot ID from
> > >>>>>> `CatalogTable.getSnapshot()`.
> > >>>>>>> What do you think of it?
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shammon FY
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Jun 1, 2023 at 7:22 AM Jing Ge
> <jing@ververica.com.invalid
> > >>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Feng,
> > >>>>>>>>
> > >>>>>>>> Thanks for the proposal! Very interesting feature. Would you
> like
> > >>> to
> > >>>>>> update
> > >>>>>>>> your thoughts described in your previous email about why
> > >>>>>> SupportsTimeTravel
> > >>>>>>>> has been rejected into the FLIP? This will help readers
> > >> understand
> > >>>> the
> > >>>>>>>> context (in the future).
> > >>>>>>>>
> > >>>>>>>> Since we always directly add overload methods into Catalog
> > >>> according
> > >>>>>> to new
> > >>>>>>>> requirements, which makes the interface bloated. Just out of
> > >>>> curiosity,
> > >>>>>>>> does it make sense to introduce some DSL design? Like
> > >>>>>>>> Catalog.getTable(tablePath).on(timeStamp),
> > >>>>>>>> Catalog.getTable(tablePath).current() for the most current
> > >> version,
> > >>>> and
> > >>>>>>>> more room for further extension like timestamp range, etc. I
> > >>> haven't
> > >>>>>> read
> > >>>>>>>> all the source code yet and I'm not sure if it is possible. But
> a
> > >>>>>>>> design like this will keep the Catalog API lean and the API/DSL
> > >>> will
> > >>>> be
> > >>>>>>>> self described and easier to use.
> > >>>>>>>>
> > >>>>>>>> Best regards,
> > >>>>>>>> Jing
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> > >>>>>>>> krzysiek.chmielewski@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> Ok after second though I'm retracting my previous statement
> > >> about
> > >>>>>> Catalog
> > >>>>>>>>> changes you proposed.
> > >>>>>>>>> I do see a benefit for Delta connector actually with this
> change
> > >>> and
> > >>>>>> see
> > >>>>>>>>> why this could be coupled with Catalog.
> > >>>>>>>>>
> > >>>>>>>>> Delta Connector SQL support, also ships a Delta Catalog
> > >>>> implementation
> > >>>>>>>> for
> > >>>>>>>>> Flink.
> > >>>>>>>>> For Delta Catalog, table schema information is fetched from
> > >>>> underlying
> > >>>>>>>>> _delta_log and not stored in metastore. For time travel we
> > >>> actually
> > >>>>>> had a
> > >>>>>>>>> problem, that if we would like to timetravel back to some old
> > >>>> version,
> > >>>>>>>>> where schema was slightly different, then we would have a
> > >> conflict
> > >>>>>> since
> > >>>>>>>>> Catalog would return current schema and not how it was for
> > >> version
> > >>>> X.
> > >>>>>>>>>
> > >>>>>>>>> With your change, our Delta Catalog can actually fetch schema
> > >> for
> > >>>>>>>> version X
> > >>>>>>>>> and send it to DeltaTableFactory. Currency, Catalog can fetch
> > >> only
> > >>>>>>>> current
> > >>>>>>>>> version. What we would also need however is version
> > >>>> (number/timestamp)
> > >>>>>>>> for
> > >>>>>>>>> this table passed to DynamicTableFactory so we could properly
> > >> set
> > >>>>>> Delta
> > >>>>>>>>> standalone library.
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Krzysztof
> > >>>>>>>>>
> > >>>>>>>>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> > >>>>>>>>> krzysiek.chmielewski@gmail.com> napisał(a):
> > >>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>> happy to see such a feature.
> > >>>>>>>>>> Small note from my end regarding Catalog changes.
> > >>>>>>>>>>
> > >>>>>>>>>> TL;DR
> > >>>>>>>>>> I don't think it is necessary to delegate this feature to the
> > >>>>>> catalog.
> > >>>>>>>> I
> > >>>>>>>>>> think that since "timetravel" is per job/query property, its
> > >>> should
> > >>>>>> not
> > >>>>>>>>> be
> > >>>>>>>>>> coupled with the Catalog or table definition. In my opinion
> > >> this
> > >>> is
> > >>>>>>>>>> something that DynamicTableFactory only has to know about. I
> > >>> would
> > >>>>>>>> rather
> > >>>>>>>>>> see this feature as it is - SQL syntax enhancement but
> delegate
> > >>>>>> clearly
> > >>>>>>>>> to
> > >>>>>>>>>> DynamicTableFactory.
> > >>>>>>>>>>
> > >>>>>>>>>> I've implemented timetravel feature for Delta Connector  [1]
> > >>> using
> > >>>>>>>>>> current Flink API.
> > >>>>>>>>>> Docs are pending code review, but you can find them here [2]
> > >> and
> > >>>>>>>> examples
> > >>>>>>>>>> are available here [3]
> > >>>>>>>>>>
> > >>>>>>>>>> The timetravel feature that I've implemented is based on Flink
> > >>>> Query
> > >>>>>>>>>> hints.
> > >>>>>>>>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1')
> */"
> > >>>>>>>>>>
> > >>>>>>>>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> > >>>>>> handled
> > >>>>>>>>> not
> > >>>>>>>>>> by Catalog but by DyntamicTableFactory implementation for
> Delta
> > >>>>>>>>> connector.
> > >>>>>>>>>> The value of this property is passed to Delta standalone lib
> > >> API
> > >>>> that
> > >>>>>>>>>> returns table view for given version.
> > >>>>>>>>>>
> > >>>>>>>>>> I'm not sure how/if proposed change could benefit Delta
> > >> connector
> > >>>>>>>>>> implementation for this feature.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Krzysztof
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > >>>>>>>>>> [2]
> > >>>>>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > >>>>>>>>>> [3]
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> > >>>>>>>>>>
> > >>>>>>>>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
> > >>> napisał(a):
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi, Feng
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks for driving this FLIP, Time travel is very useful for
> > >>> Flink
> > >>>>>>>>>>> integrate with data lake system. I have one question why the
> > >>>>>>>>>>> implementation
> > >>>>>>>>>>> of TimeTravel is delegated to Catalog? Assuming that we use
> > >>> Flink
> > >>>> to
> > >>>>>>>>> query
> > >>>>>>>>>>> Hudi table with the time travel syntax, but we don't use the
> > >>>>>>>>> HudiCatalog,
> > >>>>>>>>>>> instead, we register the hudi table to InMemoryCatalog,  can
> > >> we
> > >>>>>>>> support
> > >>>>>>>>>>> time travel for Hudi table in this case?
> > >>>>>>>>>>> In contrast, I think time travel should bind to connector
> > >>> instead
> > >>>> of
> > >>>>>>>>>>> Catalog, so the rejected alternative should be considered.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>> Ron
> > >>>>>>>>>>>
> > >>>>>>>>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi, Feng.
> > >>>>>>>>>>>> Notice this FLIP only support batch mode for time travel.
> > >>> Would
> > >>>> it
> > >>>>>>>>> also
> > >>>>>>>>>>>> make sense to support stream mode to a read a snapshot of
> the
> > >>>> table
> > >>>>>>>>> as a
> > >>>>>>>>>>>> bounded stream?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>> Yuxia
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ----- 原始邮件 -----
> > >>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > >>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> > >>>>>>>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> > >>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> Mode
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> This also depends on whether this is defined in standard or
> > >> any
> > >>>>>>>> known
> > >>>>>>>>>>>> databases that have implemented this. If not, it would be
> > >> hard
> > >>> to
> > >>>>>>>> push
> > >>>>>>>>>>> it
> > >>>>>>>>>>>> to Calcite.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> # getTable(ObjectPath object, long timestamp)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Then we again come to the problem of "casting between
> > >> timestamp
> > >>>> and
> > >>>>>>>>>>>> numeric", which has been disabled in FLINK-21978[1]. If
> > >> you're
> > >>>>>> gonna
> > >>>>>>>>> use
> > >>>>>>>>>>>> this, then we need to clarify that problem first.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> hi, thanks for your reply.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> @Benchao
> > >>>>>>>>>>>>>> did you consider the pushdown abilities compatible
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> In the current design, the implementation of TimeTravel is
> > >>>>>>>> delegated
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> Catalog. We have added a function called
> getTable(ObjectPath
> > >>>>>>>>>>> tablePath,
> > >>>>>>>>>>>>> long timestamp) to obtain the corresponding
> CatalogBaseTable
> > >>> at
> > >>>> a
> > >>>>>>>>>>>> specific
> > >>>>>>>>>>>>> time.  Therefore, I think it will not have any impact on
> the
> > >>>>>>>>> original
> > >>>>>>>>>>>>> pushdown abilities.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I see there is a rejected  design for adding
> > >>>>>>>> SupportsTimeTravel,
> > >>>>>>>>>>> but
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>>> didn't see the alternative in  the FLIP doc
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Sorry, the document description is not very clear.
> > >> Regarding
> > >>>>>>>>> whether
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
> > >>> Since
> > >>>>>>>> we
> > >>>>>>>>>>> have
> > >>>>>>>>>>>>> already passed the corresponding time in
> > >> getTable(ObjectPath,
> > >>>> long
> > >>>>>>>>>>>>> timestamp) of Catalog, SupportTimeTravel may not be
> > >> necessary.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> In getTable(ObjectPath object, long timestamp), we can
> > >> obtain
> > >>>> the
> > >>>>>>>>>>> schema
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>> the corresponding time point and put the SNAPSHOT that
> needs
> > >>> to
> > >>>> be
> > >>>>>>>>>>>> consumed
> > >>>>>>>>>>>>> into options.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> @Shammon
> > >>>>>>>>>>>>>> Could we support this in Flink too?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I personally think it's possible, but limited by Calcite's
> > >>>> syntax
> > >>>>>>>>>>>>> restrictions. I believe we should first support this syntax
> > >> in
> > >>>>>>>>>>> Calcite.
> > >>>>>>>>>>>>> Currently, I think it may not be easy  to support this
> > >> syntax
> > >>> in
> > >>>>>>>>>>> Flink's
> > >>>>>>>>>>>>> parser. @Benchao, what do you think? Can Calcite support
> > >> this
> > >>>>>>>> syntax
> > >>>>>>>>>>>>> ` VERSION AS OF`  ?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Feng.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <
> > >> zjureel@gmail.com
> > >>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks Feng, the feature of time travel sounds great!
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> > >>>>>>>> iceberg
> > >>>>>>>>>>>>> support
> > >>>>>>>>>>>>>> snapshot or version. For example, users can query snapshot
> > >> 1
> > >>>> for
> > >>>>>>>>>>> paimon
> > >>>>>>>>>>>>> by
> > >>>>>>>>>>>>>> the following statement
> > >>>>>>>>>>>>>> SELECT * FROM t VERSION AS OF 1
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Could we support this in Flink too?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>> Shammon FY
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> > >>>>>>>> libenchao@apache.org>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Regarding the implementation, did you consider the
> > >> pushdown
> > >>>>>>>>>>> abilities
> > >>>>>>>>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> > >>>>>>>>> partition
> > >>>>>>>>>>>>>> pushdown.
> > >>>>>>>>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> > >>>>>>>> have a
> > >>>>>>>>>>>>> concern
> > >>>>>>>>>>>>>>> about this. Of course, it depends on your implementation
> > >>>>>>>> detail,
> > >>>>>>>>>>> what
> > >>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>> important is that we'd better add some cross tests for
> > >>> these.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Regarding the interface exposed to Connector, I see there
> > >>> is a
> > >>>>>>>>>>>> rejected
> > >>>>>>>>>>>>>>> design for adding SupportsTimeTravel, but I didn't see
> the
> > >>>>>>>>>>>> alternative
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> > >>>>>>>> clarify
> > >>>>>>>>>>>>> because
> > >>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> need to know whether the Connector supports this, and
> what
> > >>>>>>>>>>>>>> column/metadata
> > >>>>>>>>>>>>>>> corresponds to 'system_time'.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Thanks for your reply
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> @Timo @BenChao @yuxia
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Sorry for the mistake,  Currently , calcite only
> supports
> > >>>>>>>>> `FOR
> > >>>>>>>>>>>>>>> SYSTEM_TIME
> > >>>>>>>>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME
> AS
> > >>>>>>>> OF`
> > >>>>>>>>> .
> > >>>>>>>>>>>> I've
> > >>>>>>>>>>>>>>>> updated the syntax part of the FLIP.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> @Timo
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Yes, I think we need to convert TIMESTAMP to
> > >> TIMESTAMP_LTZ
> > >>>>>>>> and
> > >>>>>>>>>>> then
> > >>>>>>>>>>>>>>> convert
> > >>>>>>>>>>>>>>>> it into a long value.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> How do we want to query the most recent version of a
> > >> table
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it
> does
> > >>>>>>>>> cause
> > >>>>>>>>>>>>>>>> inconsistency with the real-time concept.
> > >>>>>>>>>>>>>>>> However, from my personal understanding, the scope of
> > >> `AS
> > >>>>>>>> OF
> > >>>>>>>>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> > >>>>>>>> record.
> > >>>>>>>>>>> So,
> > >>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> > >>>>>>>>>>>>>>>> Additionally, if no version is specified, the latest
> > >>> version
> > >>>>>>>>>>> should
> > >>>>>>>>>>>>> be
> > >>>>>>>>>>>>>>> used
> > >>>>>>>>>>>>>>>> by default.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>> Feng
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> > >>>>>>>>>>> luoyuxia@alumni.sjtu.edu.cn
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> > >>>>>>>>> introduce
> > >>>>>>>>>>>> time
> > >>>>>>>>>>>>>>> travel
> > >>>>>>>>>>>>>>>>> to Flink to have a better integration with external
> data
> > >>>>>>>>>>> soruces.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I also share same concern about the syntax.
> > >>>>>>>>>>>>>>>>> I see in the part of `Whether to support other syntax
> > >>>>>>>>>>>>>> implementations`
> > >>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> > >>>>>>>>>>> SYSTEM_TIME
> > >>>>>>>>>>>>> AS
> > >>>>>>>>>>>>>>> OF`,
> > >>>>>>>>>>>>>>>>> right?
> > >>>>>>>>>>>>>>>>> But the the syntax part in this FLIP, it seems to be
> `AS
> > >>>>>>>> OF
> > >>>>>>>>>>>>>> TIMESTAMP`
> > >>>>>>>>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a
> > >> mistake
> > >>>>>>>> or
> > >>>>>>>>>>> by
> > >>>>>>>>>>>>>> design?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best regards,
> > >>>>>>>>>>>>>>>>> Yuxia
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> ----- 原始邮件 -----
> > >>>>>>>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > >>>>>>>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> > >>>>>>>>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > >>>>>>>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In
> Batch
> > >>>>>>>>> Mode
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> > >>>>>>>>>>> instead
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>> `FOR
> > >>>>>>>>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is
> > >> in
> > >>>>>>>>> the
> > >>>>>>>>>>> SQL
> > >>>>>>>>>>>>>>>> standard
> > >>>>>>>>>>>>>>>>> and has been supported in some database vendors such as
> > >>>>>>>> SQL
> > >>>>>>>>>>>> Server.
> > >>>>>>>>>>>>>>> About
> > >>>>>>>>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> > >>>>>>>>> supports
> > >>>>>>>>>>>>> this,
> > >>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> > >>>>>>>> and
> > >>>>>>>>> I
> > >>>>>>>>>>>> would
> > >>>>>>>>>>>>>>> give
> > >>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> > >>>>>>>> use
> > >>>>>>>>>>> `FOR
> > >>>>>>>>>>>>>> SYSTEM
> > >>>>>>>>>>>>>>>> AS
> > >>>>>>>>>>>>>>>>> OF`.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四
> > >> 19:02写道:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Also: How do we want to query the most recent version
> > >>>>>>>> of a
> > >>>>>>>>>>>> table?
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but
> according
> > >>>>>>>> to
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> docs
> > >>>>>>>>>>>>>>>> both
> > >>>>>>>>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> > >>>>>>>> concerning
> > >>>>>>>>>>> is
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>> actually is evalated row-based:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> > >>>>>>>>> zone,
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> return
> > >>>>>>>>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> > >>>>>>>> record
> > >>>>>>>>> in
> > >>>>>>>>>>>>>>> streaming
> > >>>>>>>>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> > >>>>>>>> query
> > >>>>>>>>>>>> starts
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>> uses the same result for every row.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> This could make it difficult to explain in a join
> > >>>>>>>> scenario
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>> multiple
> > >>>>>>>>>>>>>>>>>> snapshotted tables.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> > >>>>>>>>>>>>>>>>>>> Hi Feng,
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> > >>>>>>>> sense
> > >>>>>>>>> to
> > >>>>>>>>>>>>>> finally
> > >>>>>>>>>>>>>>>>>>> support querying tables at a specific point in time
> or
> > >>>>>>>>>>>>> hopefully
> > >>>>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Here is some feedback from my side:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> 1. Syntax
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for
> this
> > >>>>>>>>> but
> > >>>>>>>>>>>> not
> > >>>>>>>>>>>>>> `FOR
> > >>>>>>>>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> It would be great to support `AS OF` also for
> > >>>>>>>>>>> time-versioned
> > >>>>>>>>>>>>>> joins
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> have a unified and short syntax.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> > >>>>>>>>> this
> > >>>>>>>>>>>>>>> available
> > >>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> > >>>>>>>> until
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> next
> > >>>>>>>>>>>>>>>>>>> Calcite upgrade is performed.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> 2. Semantics
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> > >>>>>>>>>>>> timestamp
> > >>>>>>>>>>>>>>> types
> > >>>>>>>>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> > >>>>>>>>>>>> TIMESTAMP
> > >>>>>>>>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> > >>>>>>>>>>> timestamp
> > >>>>>>>>>>>> be?
> > >>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> We definely need to clarify this because the past has
> > >>>>>>>>>>> shown
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> daylight saving times make our lives hard.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>> Timo
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> > >>>>>>>>>>>>>>>>>>>> Hi, everyone.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> > >>>>>>>> Support
> > >>>>>>>>>>> Time
> > >>>>>>>>>>>>>> Travel
> > >>>>>>>>>>>>>>>> In
> > >>>>>>>>>>>>>>>>>>>> Batch
> > >>>>>>>>>>>>>>>>>>>> Mode [1]
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> > >>>>>>>>>>>> versions
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> data.
> > >>>>>>>>>>>>>>>>>> It
> > >>>>>>>>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> > >>>>>>>>> the
> > >>>>>>>>>>>> data
> > >>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> schema of
> > >>>>>>>>>>>>>>>>>>>> a table as it appeared at that time. With time
> > >>>>>>>> travel,
> > >>>>>>>>>>> users
> > >>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>> easily
> > >>>>>>>>>>>>>>>>>>>> analyze and compare historical versions of data.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> With the widespread use of data lake systems such as
> > >>>>>>>>>>> Paimon,
> > >>>>>>>>>>>>>>>> Iceberg,
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> > >>>>>>>>> users'
> > >>>>>>>>>>>> data
> > >>>>>>>>>>>>>>>>> analysis.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Looking forward to your opinions, any suggestions
> are
> > >>>>>>>>>>>>> welcomed.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> 1.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Best.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Feng
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>>>> Benchao Li
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>>>> Benchao Li
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Best,
> > >>>>>>>>>>>> Benchao Li
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Best,
> > >>> Benchao Li
> > >>>
> > >>
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> >
> >
>
> --
>
> Best,
> Benchao Li
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Benchao Li <li...@apache.org>.
Thanks Leonard for the input, "Implicitly type conversion" way sounds good
to me.
I also agree that this should be done in planner instead of connector,
it'll be a lot easier for connector development.

Leonard Xu <xb...@gmail.com> 于2023年6月9日周五 20:11写道:

> About the semantics consideration, I have some new input after rethink.
>
> 1. We can support both TIMESTAMP and TIMESTAMP_LTZ expression following
> the syntax  `SELECT [column_name(s)] FROM [table_name] FOR SYSTEM_TIME AS
> OF `
>
> 2. For TIMESTAMP_LTZ type, give a long instant value to CatalogTable is
> pretty intuitive, for TIMESTAMP_type, it will be implied cast to
> TIMESTAMP_LTZ type by planner using session timezone and then pass to
> CatalogTable. This case can be considered as a Function AsOfSnapshot(Table
> t, TIMESTAMP_LTZ arg), which can pass arg with TIMESTAMP_LTZ type, but our
> framework supports implicit type conversion thus users can also pass arg
> with TIMESTAMP type. Hint, Spark[1] did the  implicit type conversion too.
>
> 3.I also considered handing over the implicit type conversion to the
> connector instead of planner, such as passing a TIMESTAMP literal, and the
> connector using the session timezone to perform type conversion, but this
> is more complicated than previous planner handling, and it’s not friendly
> to the connector developers.
>
> 4. The last point,  TIMESTAMP_LTZ  '1970-01-01 00:00:04.001’ should be an
> invalid expression as if you can not define a instant point (i.e
> TIMSTAMP_LTZ semantics in SQL) from a timestamp literal without timezone.
> You can use explicit type conversion like `cast(ts_ntz as TIMESTAMP_LTZ)`
> after `FOR SYSTEM_TIME AS OF ` if you want to use
> Timestamp type/expression/literal without timezone.
>
> 5. The last last point, the TIMESTAMP_LTZ type of Flink SQL supports DST
> time[2] well that will help user avoid many corner case.
>
>
> Best,
> Leonard
>
> [1]
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#daylight-saving-time-support
>
>
>
>
> > On Jun 9, 2023, at 1:13 PM, Benchao Li <li...@apache.org> wrote:
> >
> > As you can see that you must use `UNIX_TIMESTAMP` to do this work, that's
> > where the time zone happens.
> >
> > What I'm talking about is casting timestamp/timestamp_ltz to long
> directly,
> > that's why the semantic is tricky when you are casting timestamp to long
> > using time zone.
> >
> > For other systems, such as SQL server[1], they actually uses a string
> > instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
> > 00:00:00.0000000'`, I'm not sure whether they convert the string
> implicitly
> > to TIMESTAMP_LTZ, or they just have a different definition of the syntax.
> >
> > But for us, we are definitely using timestamp/timestmap_ltz literal here,
> > that's why it is special, and we must highlight this behavior that we are
> > converting a timestamp without time zone literal to long using the
> session
> > time zone.
> >
> > [1]
> >
> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16
> >
> > Feng Jin <ji...@gmail.com> 于2023年6月8日周四 11:35写道:
> >
> >> Hi all,
> >>
> >> thanks for your input
> >>
> >>
> >> @Benchao
> >>
> >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> >> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC
> timezone,
> >> which is not usually expected by users.
> >>
> >> It was indeed the case before Flink 1.13, but now my understanding is
> that
> >> there have been some slight changes in the definition of TIMESTAMP.
> >>
> >> TIMESTAMP is currently used to specify the year, month, day, hour,
> minute
> >> and second. We recommend that users use
> *UNIX_TIMESTAMP(CAST(timestamp_col
> >> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
> >> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
> >> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will
> involve
> >> using the *LOCAL TIME ZONE*.
> >>
> >>
> >> Here is an test:
> >>
> >> Flink SQL> SET 'table.local-time-zone' = 'UTC';
> >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> >> STRING)) as `timestamp`;
> >> ---------------
> >> timestamp
> >> --------------
> >> 0
> >>
> >> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
> >> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> >> STRING)) as `timestamp`;
> >> ---------------
> >> timestamp
> >> --------------
> >> -28800
> >>
> >> Therefore, the current conversion method exposed to users is also using
> >> LOCAL TIME ZONE.
> >>
> >>
> >> @yuxia
> >>
> >> Thank you very much for providing the list of behaviors of TIMESTAMP in
> >> other systems.
> >>
> >>> I think we can align them to avoid the inconsistency to other engines
> and
> >> provide convenience for the external connectors while integrating
> Flink's
> >> time travel API.
> >>
> >> +1 for this.
> >>
> >>> Regarding the inconsistency, I think we can consider time-travel as a
> >> specical case, and we do needs to highlight this in this FLIP.
> >> As for "violate the restriction outlined in FLINK-21978[1]", since we
> cast
> >> timestamp to epochMillis only for the internal use, and won't expose it
> to
> >> users, I don't think it will violate the restriction.
> >> Btw, please add a brief desc to explain the meaning of the parameter
> >> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
> long
> >> timestamp)`. Maybe something like "timestamp of the table snapt, which
> is
> >> millseconds since 1970-01-01 00:00:00 UTC".
> >>
> >> Thank you for the suggestions regarding the document. I will add them to
> >> FLIP.
> >>
> >>
> >> Best,
> >> Feng
> >>
> >>
> >> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org>
> wrote:
> >>
> >>> I also share the concern about the timezone problem.
> >>>
> >>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> >> WITHOUT
> >>> TIME ZONE", converting it to unix timestamp would use UTC timezone,
> which
> >>> is not usually expected by users.
> >>>
> >>> If we want to keep consistent with the standard, we probably should use
> >>> "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
> >> "TIMESTAMP
> >>> WITH LOCAL TIME ZONE", and converting it to unix timestamp will
> consider
> >>> the session timezone, which is the expected result. But it's
> inconvenient
> >>> for users.
> >>>
> >>> Taking this a special case, and converting "TIMESTAMP '2023-04-27
> >>> 00:00:00'" to a unix timestamp with session timezone, will be
> convenient
> >>> for users, but will break the standard. I will +0.5 for this choice.
> >>>
> >>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
> >>>
> >>>> Hi, Feng Jin.
> >>>> I think the concern of Leonard may be the inconsistency of the
> behavior
> >>> of
> >>>> TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
> >>> statement.
> >>>>
> >>>> For the normal sql:
> >>>> `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
> >>>> But for the sql for timetravl:
> >>>> `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
> >>>> 00:00:00'`, we will consider the timezone and convert to UTC
> timestamp.
> >>>>
> >>>> The concern is valid. But for time travel, most style of engines,
> >>>> Spark[1], Hive[2], Trino[3] also do the time conversion with
> >> considering
> >>>> the seesion time zone. I think we can align them to avoid the
> >>> inconsistency
> >>>> to other engines and provide convenience for the external connectors
> >>> while
> >>>> integrating Flink's time travel API.
> >>>>
> >>>> Regarding the inconsistency, I think we can consider time-travel as a
> >>>> specical case, and we do needs to highlight this in this FLIP.
> >>>> As for "violate the restriction outlined in FLINK-21978[1]", since we
> >>> cast
> >>>> timestamp to epochMillis only for the internal use, and won't expose
> it
> >>> to
> >>>> users, I don't think it will violate the restriction.
> >>>> Btw, please add a brief desc to explain the meaning of the parameter
> >>>> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
> >>> long
> >>>> timestamp)`. Maybe something like "timestamp of the table snapt, which
> >> is
> >>>> millseconds since 1970-01-01 00:00:00 UTC".
> >>>>
> >>>> [1]
> >>>>
> >>>
> >>
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> >>>> [2]
> >>>>
> >>>
> >>
> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
> >>>> [3]
> >>>>
> >>>
> >>
> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Yuxia
> >>>>
> >>>> ----- 原始邮件 -----
> >>>> 发件人: "Feng Jin" <ji...@gmail.com>
> >>>> 收件人: "dev" <de...@flink.apache.org>
> >>>> 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
> >>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >>>>
> >>>> Hi everyone
> >>>>
> >>>> Thanks everyone for your input.
> >>>>
> >>>>
> >>>> @Yun
> >>>>
> >>>>> I think you could add descriptions of how to align backfill time
> >>> travel
> >>>> with querying the latest data. And I think you should also update the
> >>>> "Discussion thread" in the original FLIP.
> >>>>
> >>>> Thank you for the suggestion, I will update it in the document.
> >>>>
> >>>>> I have a question about getting the table schema from the catalog.
> >> I'm
> >>>> not sure whether the Catalog#getTable(tablePath, timestamp) will be
> >>> called
> >>>> only once.
> >>>>
> >>>> I understand that in a query, the schema of the table is determined
> >>> before
> >>>> execution. The schema used will be based on the latest schema within
> >> the
> >>>> TimeTravel period.
> >>>>
> >>>> In addition, due to current syntax limitations, we are unable to
> >> support
> >>>> the use of BETWEEN AND.
> >>>>
> >>>>
> >>>> @Jing
> >>>>
> >>>>> Would you like to update your thoughts described in your previous
> >>> email
> >>>> about why SupportsTimeTravel has been rejected into the FLIP?
> >>>>
> >>>> Sure,  I updated the doc.
> >>>>
> >>>>
> >>>>>   Since we always directly add overload methods into Catalog
> >> according
> >>>> to new requirements, which makes the interface bloated
> >>>>
> >>>> Your concern is valid. If we need to support the long type version in
> >> the
> >>>> future, we may have to add another method "getTable(ObjectPath, long
> >>>> version)". However, I understand that
> >>>> "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
> >>> requirements.
> >>>> The timestamp is for Catalog's use, and Catalog obtains the
> >> corresponding
> >>>> schema based on this time.
> >>>>
> >>>>
> >>>> @liu @Regards
> >>>>
> >>>> I am very sorry for the unclear description in the document. I have
> >>> updated
> >>>> relevant descriptions regarding why it needs to be implemented in
> >>> Catalog.
> >>>>
> >>>> Travel not only requires obtaining data at the corresponding time
> >> point,
> >>>> but also requires the corresponding Schema at that time point
> >>>>
> >>>>
> >>>> @Shammon
> >>>>
> >>>>> Flink or connector such as  iceberg/paimon can create sources from
> >> the
> >>>> `CatalogBaseTable` directly without the need to get the snapshot ID
> >> from
> >>>> `CatalogTable.getSnapshot()`.  What do you think of it?
> >>>>
> >>>> You are right, we don't need the getSnapshot interface for
> >> PaimonCatalog
> >>> or
> >>>> IcebergCatalog tables, but we may need it for temporary tables.
> >>>>
> >>>>
> >>>>
> >>>> Best,
> >>>> Feng
> >>>>
> >>>>
> >>>> On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com>
> wrote:
> >>>>
> >>>>> Sorry I replied to the wrong mail. Please ignore the last email.
> >>>>>
> >>>>>
> >>>>> Hi Leonard
> >>>>>
> >>>>>> 1. Unification SQL
> >>>>>
> >>>>> I agree that it is crucial for us to support both batch and streaming
> >>>>> processing.  The current design allows for the support of both batch
> >>> and
> >>>>> streaming processing. I'll update the FLIP later.
> >>>>>
> >>>>>
> >>>>>> 2.Semantics
> >>>>>
> >>>>> In my opinion, it would be feasible to perform the conversion based
> >> on
> >>>> the
> >>>>> current session time, regardless of whether it is TIMESTAMP or
> >>>>> TIMESTAMP_LTZ.
> >>>>>
> >>>>> However, this may indeed violate the restriction outlined in
> >>>>> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to
> >> whether
> >>> it
> >>>>> is reasonable.
> >>>>>
> >>>>>
> >>>>>>  3.  Some external systems may use timestamp value to mark a
> >>> version,
> >>>>> but others may use version number、file position、log offset.
> >>>>>
> >>>>> It is true that most systems support time-related operations, and I
> >>>>> believe that the current design is compatible with most systems.
> >>> However,
> >>>>> if we want to support long data type, it may require Calcite to
> >> support
> >>>> the
> >>>>> VERSION AS OF syntax. I understand that this is something that we may
> >>>> need
> >>>>> to consider in the future.
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Feng
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> >>>>>
> >>>>> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
> >>>>>
> >>>>>> Hi, Feng
> >>>>>>
> >>>>>> Thanks for driving this FLIP, very impressive feature that users
> >> want,
> >>>>>> I’ve some quick questions here.
> >>>>>>
> >>>>>> 1.Unification SQL:
> >>>>>>        The snapshot  concept exists both in Batch mode and
> >> Streaming
> >>>>>> mode,  could we consider a unified proposal? I think users won’t
> >>> another
> >>>>>> SQL syntax named
> >>>>>> Time travel for Streaming mode.
> >>>>>>
> >>>>>> 2.Semantics:
> >>>>>>        Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
> >>> long
> >>>>>> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we
> >>> need
> >>>>>> two information i.e. a TIMESTAMP value and current session timezone,
> >>>> how
> >>>>>> we deal the value with current proposed SQL syntax.
> >>>>>>
> >>>>>> 3. Is it enough using sinlge timestamp to track a snapshot(version)
> >> of
> >>>>>> external table?   Some external systems may use timestamp value to
> >>> mark
> >>>> a
> >>>>>> version, but others may use version number、file position、log offset.
> >>>>>>
> >>>>>> Best,
> >>>>>> Leonard
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> >>>>>>>
> >>>>>>> Hi Feng,
> >>>>>>>
> >>>>>>> I think this FLIP would provide one important feature to unify the
> >>>>>> stream-SQL and batch-SQL when we backfill the historical data in
> >> batch
> >>>> mode.
> >>>>>>>
> >>>>>>> For the "Syntax" session, I think you could add descriptions of
> >> how
> >>> to
> >>>>>> align backfill time travel with querying the latest data. And I
> >> think
> >>>> you
> >>>>>> should also update the "Discussion thread" in the original FLIP.
> >>>>>>>
> >>>>>>> Moreover, I have a question about getting the table schema from
> >> the
> >>>>>> catalog. I'm not sure whether the Catalog#getTable(tablePath,
> >>> timestamp)
> >>>>>> will be called only once. If we have a backfill query between
> >>> 2023-05-29
> >>>>>> and 2023-06-04 in the past week, and the table schema changed on
> >>>>>> 2023-06-01, will the query below detect the schema changes during
> >>>> backfill
> >>>>>> the whole week?
> >>>>>>>
> >>>>>>> SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> >>>>>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> >>>>>>>
> >>>>>>> Best
> >>>>>>> Yun Tang
> >>>>>>>
> >>>>>>>
> >>>>>>> ________________________________
> >>>>>>> From: Shammon FY <zj...@gmail.com>
> >>>>>>> Sent: Thursday, June 1, 2023 17:57
> >>>>>>> To: dev@flink.apache.org <de...@flink.apache.org>
> >>>>>>> Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >>>>>>>
> >>>>>>> Hi Feng,
> >>>>>>>
> >>>>>>> I have one minor comment about the public interface
> >> `Optional<Long>
> >>>>>>> getSnapshot()` in the `CatalogTable`.
> >>>>>>>
> >>>>>>> As we can get tables from the new method
> >>> `Catalog.getTable(ObjectPath
> >>>>>>> tablePath, long timestamp)`, I think the returned
> >> `CatalogBaseTable`
> >>>>>> will
> >>>>>>> have the information of timestamp. Flink or connector such as
> >>>>>>> iceberg/paimon can create sources from the `CatalogBaseTable`
> >>> directly
> >>>>>>> without the need to get the snapshot ID from
> >>>>>> `CatalogTable.getSnapshot()`.
> >>>>>>> What do you think of it?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shammon FY
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <jing@ververica.com.invalid
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Feng,
> >>>>>>>>
> >>>>>>>> Thanks for the proposal! Very interesting feature. Would you like
> >>> to
> >>>>>> update
> >>>>>>>> your thoughts described in your previous email about why
> >>>>>> SupportsTimeTravel
> >>>>>>>> has been rejected into the FLIP? This will help readers
> >> understand
> >>>> the
> >>>>>>>> context (in the future).
> >>>>>>>>
> >>>>>>>> Since we always directly add overload methods into Catalog
> >>> according
> >>>>>> to new
> >>>>>>>> requirements, which makes the interface bloated. Just out of
> >>>> curiosity,
> >>>>>>>> does it make sense to introduce some DSL design? Like
> >>>>>>>> Catalog.getTable(tablePath).on(timeStamp),
> >>>>>>>> Catalog.getTable(tablePath).current() for the most current
> >> version,
> >>>> and
> >>>>>>>> more room for further extension like timestamp range, etc. I
> >>> haven't
> >>>>>> read
> >>>>>>>> all the source code yet and I'm not sure if it is possible. But a
> >>>>>>>> design like this will keep the Catalog API lean and the API/DSL
> >>> will
> >>>> be
> >>>>>>>> self described and easier to use.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Jing
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> >>>>>>>> krzysiek.chmielewski@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Ok after second though I'm retracting my previous statement
> >> about
> >>>>>> Catalog
> >>>>>>>>> changes you proposed.
> >>>>>>>>> I do see a benefit for Delta connector actually with this change
> >>> and
> >>>>>> see
> >>>>>>>>> why this could be coupled with Catalog.
> >>>>>>>>>
> >>>>>>>>> Delta Connector SQL support, also ships a Delta Catalog
> >>>> implementation
> >>>>>>>> for
> >>>>>>>>> Flink.
> >>>>>>>>> For Delta Catalog, table schema information is fetched from
> >>>> underlying
> >>>>>>>>> _delta_log and not stored in metastore. For time travel we
> >>> actually
> >>>>>> had a
> >>>>>>>>> problem, that if we would like to timetravel back to some old
> >>>> version,
> >>>>>>>>> where schema was slightly different, then we would have a
> >> conflict
> >>>>>> since
> >>>>>>>>> Catalog would return current schema and not how it was for
> >> version
> >>>> X.
> >>>>>>>>>
> >>>>>>>>> With your change, our Delta Catalog can actually fetch schema
> >> for
> >>>>>>>> version X
> >>>>>>>>> and send it to DeltaTableFactory. Currency, Catalog can fetch
> >> only
> >>>>>>>> current
> >>>>>>>>> version. What we would also need however is version
> >>>> (number/timestamp)
> >>>>>>>> for
> >>>>>>>>> this table passed to DynamicTableFactory so we could properly
> >> set
> >>>>>> Delta
> >>>>>>>>> standalone library.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Krzysztof
> >>>>>>>>>
> >>>>>>>>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> >>>>>>>>> krzysiek.chmielewski@gmail.com> napisał(a):
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>> happy to see such a feature.
> >>>>>>>>>> Small note from my end regarding Catalog changes.
> >>>>>>>>>>
> >>>>>>>>>> TL;DR
> >>>>>>>>>> I don't think it is necessary to delegate this feature to the
> >>>>>> catalog.
> >>>>>>>> I
> >>>>>>>>>> think that since "timetravel" is per job/query property, its
> >>> should
> >>>>>> not
> >>>>>>>>> be
> >>>>>>>>>> coupled with the Catalog or table definition. In my opinion
> >> this
> >>> is
> >>>>>>>>>> something that DynamicTableFactory only has to know about. I
> >>> would
> >>>>>>>> rather
> >>>>>>>>>> see this feature as it is - SQL syntax enhancement but delegate
> >>>>>> clearly
> >>>>>>>>> to
> >>>>>>>>>> DynamicTableFactory.
> >>>>>>>>>>
> >>>>>>>>>> I've implemented timetravel feature for Delta Connector  [1]
> >>> using
> >>>>>>>>>> current Flink API.
> >>>>>>>>>> Docs are pending code review, but you can find them here [2]
> >> and
> >>>>>>>> examples
> >>>>>>>>>> are available here [3]
> >>>>>>>>>>
> >>>>>>>>>> The timetravel feature that I've implemented is based on Flink
> >>>> Query
> >>>>>>>>>> hints.
> >>>>>>>>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >>>>>>>>>>
> >>>>>>>>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> >>>>>> handled
> >>>>>>>>> not
> >>>>>>>>>> by Catalog but by DyntamicTableFactory implementation for Delta
> >>>>>>>>> connector.
> >>>>>>>>>> The value of this property is passed to Delta standalone lib
> >> API
> >>>> that
> >>>>>>>>>> returns table view for given version.
> >>>>>>>>>>
> >>>>>>>>>> I'm not sure how/if proposed change could benefit Delta
> >> connector
> >>>>>>>>>> implementation for this feature.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Krzysztof
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> >>>>>>>>>> [2]
> >>>>>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> >>>>>>>>>> [3]
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >>>>>>>>>>
> >>>>>>>>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
> >>> napisał(a):
> >>>>>>>>>>
> >>>>>>>>>>> Hi, Feng
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for driving this FLIP, Time travel is very useful for
> >>> Flink
> >>>>>>>>>>> integrate with data lake system. I have one question why the
> >>>>>>>>>>> implementation
> >>>>>>>>>>> of TimeTravel is delegated to Catalog? Assuming that we use
> >>> Flink
> >>>> to
> >>>>>>>>> query
> >>>>>>>>>>> Hudi table with the time travel syntax, but we don't use the
> >>>>>>>>> HudiCatalog,
> >>>>>>>>>>> instead, we register the hudi table to InMemoryCatalog,  can
> >> we
> >>>>>>>> support
> >>>>>>>>>>> time travel for Hudi table in this case?
> >>>>>>>>>>> In contrast, I think time travel should bind to connector
> >>> instead
> >>>> of
> >>>>>>>>>>> Catalog, so the rejected alternative should be considered.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Ron
> >>>>>>>>>>>
> >>>>>>>>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi, Feng.
> >>>>>>>>>>>> Notice this FLIP only support batch mode for time travel.
> >>> Would
> >>>> it
> >>>>>>>>> also
> >>>>>>>>>>>> make sense to support stream mode to a read a snapshot of the
> >>>> table
> >>>>>>>>> as a
> >>>>>>>>>>>> bounded stream?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>> Yuxia
> >>>>>>>>>>>>
> >>>>>>>>>>>> ----- 原始邮件 -----
> >>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>>>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >>>>>>>>>>>>
> >>>>>>>>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> >>>>>>>>>>>>
> >>>>>>>>>>>> This also depends on whether this is defined in standard or
> >> any
> >>>>>>>> known
> >>>>>>>>>>>> databases that have implemented this. If not, it would be
> >> hard
> >>> to
> >>>>>>>> push
> >>>>>>>>>>> it
> >>>>>>>>>>>> to Calcite.
> >>>>>>>>>>>>
> >>>>>>>>>>>> # getTable(ObjectPath object, long timestamp)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Then we again come to the problem of "casting between
> >> timestamp
> >>>> and
> >>>>>>>>>>>> numeric", which has been disabled in FLINK-21978[1]. If
> >> you're
> >>>>>> gonna
> >>>>>>>>> use
> >>>>>>>>>>>> this, then we need to clarify that problem first.
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> hi, thanks for your reply.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Benchao
> >>>>>>>>>>>>>> did you consider the pushdown abilities compatible
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In the current design, the implementation of TimeTravel is
> >>>>>>>> delegated
> >>>>>>>>>>> to
> >>>>>>>>>>>>> Catalog. We have added a function called getTable(ObjectPath
> >>>>>>>>>>> tablePath,
> >>>>>>>>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable
> >>> at
> >>>> a
> >>>>>>>>>>>> specific
> >>>>>>>>>>>>> time.  Therefore, I think it will not have any impact on the
> >>>>>>>>> original
> >>>>>>>>>>>>> pushdown abilities.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see there is a rejected  design for adding
> >>>>>>>> SupportsTimeTravel,
> >>>>>>>>>>> but
> >>>>>>>>>>>> I
> >>>>>>>>>>>>> didn't see the alternative in  the FLIP doc
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry, the document description is not very clear.
> >> Regarding
> >>>>>>>>> whether
> >>>>>>>>>>> to
> >>>>>>>>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
> >>> Since
> >>>>>>>> we
> >>>>>>>>>>> have
> >>>>>>>>>>>>> already passed the corresponding time in
> >> getTable(ObjectPath,
> >>>> long
> >>>>>>>>>>>>> timestamp) of Catalog, SupportTimeTravel may not be
> >> necessary.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In getTable(ObjectPath object, long timestamp), we can
> >> obtain
> >>>> the
> >>>>>>>>>>> schema
> >>>>>>>>>>>> of
> >>>>>>>>>>>>> the corresponding time point and put the SNAPSHOT that needs
> >>> to
> >>>> be
> >>>>>>>>>>>> consumed
> >>>>>>>>>>>>> into options.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Shammon
> >>>>>>>>>>>>>> Could we support this in Flink too?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I personally think it's possible, but limited by Calcite's
> >>>> syntax
> >>>>>>>>>>>>> restrictions. I believe we should first support this syntax
> >> in
> >>>>>>>>>>> Calcite.
> >>>>>>>>>>>>> Currently, I think it may not be easy  to support this
> >> syntax
> >>> in
> >>>>>>>>>>> Flink's
> >>>>>>>>>>>>> parser. @Benchao, what do you think? Can Calcite support
> >> this
> >>>>>>>> syntax
> >>>>>>>>>>>>> ` VERSION AS OF`  ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Feng.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <
> >> zjureel@gmail.com
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Feng, the feature of time travel sounds great!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> >>>>>>>> iceberg
> >>>>>>>>>>>>> support
> >>>>>>>>>>>>>> snapshot or version. For example, users can query snapshot
> >> 1
> >>>> for
> >>>>>>>>>>> paimon
> >>>>>>>>>>>>> by
> >>>>>>>>>>>>>> the following statement
> >>>>>>>>>>>>>> SELECT * FROM t VERSION AS OF 1
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Could we support this in Flink too?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Shammon FY
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> >>>>>>>> libenchao@apache.org>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regarding the implementation, did you consider the
> >> pushdown
> >>>>>>>>>>> abilities
> >>>>>>>>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> >>>>>>>>> partition
> >>>>>>>>>>>>>> pushdown.
> >>>>>>>>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> >>>>>>>> have a
> >>>>>>>>>>>>> concern
> >>>>>>>>>>>>>>> about this. Of course, it depends on your implementation
> >>>>>>>> detail,
> >>>>>>>>>>> what
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>> important is that we'd better add some cross tests for
> >>> these.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Regarding the interface exposed to Connector, I see there
> >>> is a
> >>>>>>>>>>>> rejected
> >>>>>>>>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> >>>>>>>>>>>> alternative
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> >>>>>>>> clarify
> >>>>>>>>>>>>> because
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> need to know whether the Connector supports this, and what
> >>>>>>>>>>>>>> column/metadata
> >>>>>>>>>>>>>>> corresponds to 'system_time'.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for your reply
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> @Timo @BenChao @yuxia
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> >>>>>>>>> `FOR
> >>>>>>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> >>>>>>>> OF`
> >>>>>>>>> .
> >>>>>>>>>>>> I've
> >>>>>>>>>>>>>>>> updated the syntax part of the FLIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> @Timo
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, I think we need to convert TIMESTAMP to
> >> TIMESTAMP_LTZ
> >>>>>>>> and
> >>>>>>>>>>> then
> >>>>>>>>>>>>>>> convert
> >>>>>>>>>>>>>>>> it into a long value.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> How do we want to query the most recent version of a
> >> table
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> >>>>>>>>> cause
> >>>>>>>>>>>>>>>> inconsistency with the real-time concept.
> >>>>>>>>>>>>>>>> However, from my personal understanding, the scope of
> >> `AS
> >>>>>>>> OF
> >>>>>>>>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> >>>>>>>> record.
> >>>>>>>>>>> So,
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> >>>>>>>>>>>>>>>> Additionally, if no version is specified, the latest
> >>> version
> >>>>>>>>>>> should
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>> used
> >>>>>>>>>>>>>>>> by default.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Feng
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> >>>>>>>>>>> luoyuxia@alumni.sjtu.edu.cn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> >>>>>>>>> introduce
> >>>>>>>>>>>> time
> >>>>>>>>>>>>>>> travel
> >>>>>>>>>>>>>>>>> to Flink to have a better integration with external data
> >>>>>>>>>>> soruces.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I also share same concern about the syntax.
> >>>>>>>>>>>>>>>>> I see in the part of `Whether to support other syntax
> >>>>>>>>>>>>>> implementations`
> >>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> >>>>>>>>>>> SYSTEM_TIME
> >>>>>>>>>>>>> AS
> >>>>>>>>>>>>>>> OF`,
> >>>>>>>>>>>>>>>>> right?
> >>>>>>>>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> >>>>>>>> OF
> >>>>>>>>>>>>>> TIMESTAMP`
> >>>>>>>>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a
> >> mistake
> >>>>>>>> or
> >>>>>>>>>>> by
> >>>>>>>>>>>>>> design?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>>>>>> Yuxia
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ----- 原始邮件 -----
> >>>>>>>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>>>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>>>>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> >>>>>>>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> >>>>>>>>> Mode
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> >>>>>>>>>>> instead
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> `FOR
> >>>>>>>>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is
> >> in
> >>>>>>>>> the
> >>>>>>>>>>> SQL
> >>>>>>>>>>>>>>>> standard
> >>>>>>>>>>>>>>>>> and has been supported in some database vendors such as
> >>>>>>>> SQL
> >>>>>>>>>>>> Server.
> >>>>>>>>>>>>>>> About
> >>>>>>>>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> >>>>>>>>> supports
> >>>>>>>>>>>>> this,
> >>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> >>>>>>>> and
> >>>>>>>>> I
> >>>>>>>>>>>> would
> >>>>>>>>>>>>>>> give
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> >>>>>>>> use
> >>>>>>>>>>> `FOR
> >>>>>>>>>>>>>> SYSTEM
> >>>>>>>>>>>>>>>> AS
> >>>>>>>>>>>>>>>>> OF`.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四
> >> 19:02写道:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Also: How do we want to query the most recent version
> >>>>>>>> of a
> >>>>>>>>>>>> table?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> >>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>>>> docs
> >>>>>>>>>>>>>>>> both
> >>>>>>>>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> >>>>>>>> concerning
> >>>>>>>>>>> is
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>> actually is evalated row-based:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> >>>>>>>>> zone,
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> >>>>>>>> record
> >>>>>>>>> in
> >>>>>>>>>>>>>>> streaming
> >>>>>>>>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> >>>>>>>> query
> >>>>>>>>>>>> starts
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> uses the same result for every row.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This could make it difficult to explain in a join
> >>>>>>>> scenario
> >>>>>>>>>>> of
> >>>>>>>>>>>>>>> multiple
> >>>>>>>>>>>>>>>>>> snapshotted tables.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> >>>>>>>>>>>>>>>>>>> Hi Feng,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> >>>>>>>> sense
> >>>>>>>>> to
> >>>>>>>>>>>>>> finally
> >>>>>>>>>>>>>>>>>>> support querying tables at a specific point in time or
> >>>>>>>>>>>>> hopefully
> >>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. Syntax
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> >>>>>>>>> but
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>> `FOR
> >>>>>>>>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It would be great to support `AS OF` also for
> >>>>>>>>>>> time-versioned
> >>>>>>>>>>>>>> joins
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> have a unified and short syntax.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> >>>>>>>>> this
> >>>>>>>>>>>>>>> available
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> >>>>>>>> until
> >>>>>>>>>>> the
> >>>>>>>>>>>>> next
> >>>>>>>>>>>>>>>>>>> Calcite upgrade is performed.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. Semantics
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> >>>>>>>>>>>> timestamp
> >>>>>>>>>>>>>>> types
> >>>>>>>>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> >>>>>>>>>>>> TIMESTAMP
> >>>>>>>>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> >>>>>>>>>>> timestamp
> >>>>>>>>>>>> be?
> >>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> We definely need to clarify this because the past has
> >>>>>>>>>>> shown
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> daylight saving times make our lives hard.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> >>>>>>>>>>>>>>>>>>>> Hi, everyone.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> >>>>>>>> Support
> >>>>>>>>>>> Time
> >>>>>>>>>>>>>> Travel
> >>>>>>>>>>>>>>>> In
> >>>>>>>>>>>>>>>>>>>> Batch
> >>>>>>>>>>>>>>>>>>>> Mode [1]
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> >>>>>>>>>>>> versions
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>> data.
> >>>>>>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> >>>>>>>>> the
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> schema of
> >>>>>>>>>>>>>>>>>>>> a table as it appeared at that time. With time
> >>>>>>>> travel,
> >>>>>>>>>>> users
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>>>>> analyze and compare historical versions of data.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> With the widespread use of data lake systems such as
> >>>>>>>>>>> Paimon,
> >>>>>>>>>>>>>>>> Iceberg,
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> >>>>>>>>> users'
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>> analysis.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> >>>>>>>>>>>>> welcomed.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> 1.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Feng
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Benchao Li
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>> Benchao Li
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Benchao Li
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best,
> >>> Benchao Li
> >>>
> >>
> >
> >
> > --
> >
> > Best,
> > Benchao Li
>
>

-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Leonard Xu <xb...@gmail.com>.
About the semantics consideration, I have some new input after rethink.

1. We can support both TIMESTAMP and TIMESTAMP_LTZ expression following the syntax  `SELECT [column_name(s)] FROM [table_name] FOR SYSTEM_TIME AS OF `

2. For TIMESTAMP_LTZ type, give a long instant value to CatalogTable is pretty intuitive, for TIMESTAMP_type, it will be implied cast to TIMESTAMP_LTZ type by planner using session timezone and then pass to CatalogTable. This case can be considered as a Function AsOfSnapshot(Table t, TIMESTAMP_LTZ arg), which can pass arg with TIMESTAMP_LTZ type, but our framework supports implicit type conversion thus users can also pass arg with TIMESTAMP type. Hint, Spark[1] did the  implicit type conversion too.

3.I also considered handing over the implicit type conversion to the connector instead of planner, such as passing a TIMESTAMP literal, and the connector using the session timezone to perform type conversion, but this is more complicated than previous planner handling, and it’s not friendly to the connector developers.

4. The last point,  TIMESTAMP_LTZ  '1970-01-01 00:00:04.001’ should be an invalid expression as if you can not define a instant point (i.e TIMSTAMP_LTZ semantics in SQL) from a timestamp literal without timezone. You can use explicit type conversion like `cast(ts_ntz as TIMESTAMP_LTZ)` after `FOR SYSTEM_TIME AS OF ` if you want to use
Timestamp type/expression/literal without timezone.

5. The last last point, the TIMESTAMP_LTZ type of Flink SQL supports DST time[2] well that will help user avoid many corner case.


Best,
Leonard

[1]https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
[2]https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/timezone/#daylight-saving-time-support




> On Jun 9, 2023, at 1:13 PM, Benchao Li <li...@apache.org> wrote:
> 
> As you can see that you must use `UNIX_TIMESTAMP` to do this work, that's
> where the time zone happens.
> 
> What I'm talking about is casting timestamp/timestamp_ltz to long directly,
> that's why the semantic is tricky when you are casting timestamp to long
> using time zone.
> 
> For other systems, such as SQL server[1], they actually uses a string
> instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
> 00:00:00.0000000'`, I'm not sure whether they convert the string implicitly
> to TIMESTAMP_LTZ, or they just have a different definition of the syntax.
> 
> But for us, we are definitely using timestamp/timestmap_ltz literal here,
> that's why it is special, and we must highlight this behavior that we are
> converting a timestamp without time zone literal to long using the session
> time zone.
> 
> [1]
> https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16
> 
> Feng Jin <ji...@gmail.com> 于2023年6月8日周四 11:35写道:
> 
>> Hi all,
>> 
>> thanks for your input
>> 
>> 
>> @Benchao
>> 
>>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
>> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC timezone,
>> which is not usually expected by users.
>> 
>> It was indeed the case before Flink 1.13, but now my understanding is that
>> there have been some slight changes in the definition of TIMESTAMP.
>> 
>> TIMESTAMP is currently used to specify the year, month, day, hour, minute
>> and second. We recommend that users use *UNIX_TIMESTAMP(CAST(timestamp_col
>> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
>> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
>> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will involve
>> using the *LOCAL TIME ZONE*.
>> 
>> 
>> Here is an test:
>> 
>> Flink SQL> SET 'table.local-time-zone' = 'UTC';
>> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
>> STRING)) as `timestamp`;
>> ---------------
>> timestamp
>> --------------
>> 0
>> 
>> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
>> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
>> STRING)) as `timestamp`;
>> ---------------
>> timestamp
>> --------------
>> -28800
>> 
>> Therefore, the current conversion method exposed to users is also using
>> LOCAL TIME ZONE.
>> 
>> 
>> @yuxia
>> 
>> Thank you very much for providing the list of behaviors of TIMESTAMP in
>> other systems.
>> 
>>> I think we can align them to avoid the inconsistency to other engines and
>> provide convenience for the external connectors while integrating Flink's
>> time travel API.
>> 
>> +1 for this.
>> 
>>> Regarding the inconsistency, I think we can consider time-travel as a
>> specical case, and we do needs to highlight this in this FLIP.
>> As for "violate the restriction outlined in FLINK-21978[1]", since we cast
>> timestamp to epochMillis only for the internal use, and won't expose it to
>> users, I don't think it will violate the restriction.
>> Btw, please add a brief desc to explain the meaning of the parameter
>> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long
>> timestamp)`. Maybe something like "timestamp of the table snapt, which is
>> millseconds since 1970-01-01 00:00:00 UTC".
>> 
>> Thank you for the suggestions regarding the document. I will add them to
>> FLIP.
>> 
>> 
>> Best,
>> Feng
>> 
>> 
>> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org> wrote:
>> 
>>> I also share the concern about the timezone problem.
>>> 
>>> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
>> WITHOUT
>>> TIME ZONE", converting it to unix timestamp would use UTC timezone, which
>>> is not usually expected by users.
>>> 
>>> If we want to keep consistent with the standard, we probably should use
>>> "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
>> "TIMESTAMP
>>> WITH LOCAL TIME ZONE", and converting it to unix timestamp will consider
>>> the session timezone, which is the expected result. But it's inconvenient
>>> for users.
>>> 
>>> Taking this a special case, and converting "TIMESTAMP '2023-04-27
>>> 00:00:00'" to a unix timestamp with session timezone, will be convenient
>>> for users, but will break the standard. I will +0.5 for this choice.
>>> 
>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
>>> 
>>>> Hi, Feng Jin.
>>>> I think the concern of Leonard may be the inconsistency of the behavior
>>> of
>>>> TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
>>> statement.
>>>> 
>>>> For the normal sql:
>>>> `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
>>>> But for the sql for timetravl:
>>>> `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
>>>> 00:00:00'`, we will consider the timezone and convert to UTC timestamp.
>>>> 
>>>> The concern is valid. But for time travel, most style of engines,
>>>> Spark[1], Hive[2], Trino[3] also do the time conversion with
>> considering
>>>> the seesion time zone. I think we can align them to avoid the
>>> inconsistency
>>>> to other engines and provide convenience for the external connectors
>>> while
>>>> integrating Flink's time travel API.
>>>> 
>>>> Regarding the inconsistency, I think we can consider time-travel as a
>>>> specical case, and we do needs to highlight this in this FLIP.
>>>> As for "violate the restriction outlined in FLINK-21978[1]", since we
>>> cast
>>>> timestamp to epochMillis only for the internal use, and won't expose it
>>> to
>>>> users, I don't think it will violate the restriction.
>>>> Btw, please add a brief desc to explain the meaning of the parameter
>>>> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
>>> long
>>>> timestamp)`. Maybe something like "timestamp of the table snapt, which
>> is
>>>> millseconds since 1970-01-01 00:00:00 UTC".
>>>> 
>>>> [1]
>>>> 
>>> 
>> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
>>>> [2]
>>>> 
>>> 
>> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
>>>> [3]
>>>> 
>>> 
>> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
>>>> 
>>>> 
>>>> Best regards,
>>>> Yuxia
>>>> 
>>>> ----- 原始邮件 -----
>>>> 发件人: "Feng Jin" <ji...@gmail.com>
>>>> 收件人: "dev" <de...@flink.apache.org>
>>>> 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>>>> 
>>>> Hi everyone
>>>> 
>>>> Thanks everyone for your input.
>>>> 
>>>> 
>>>> @Yun
>>>> 
>>>>> I think you could add descriptions of how to align backfill time
>>> travel
>>>> with querying the latest data. And I think you should also update the
>>>> "Discussion thread" in the original FLIP.
>>>> 
>>>> Thank you for the suggestion, I will update it in the document.
>>>> 
>>>>> I have a question about getting the table schema from the catalog.
>> I'm
>>>> not sure whether the Catalog#getTable(tablePath, timestamp) will be
>>> called
>>>> only once.
>>>> 
>>>> I understand that in a query, the schema of the table is determined
>>> before
>>>> execution. The schema used will be based on the latest schema within
>> the
>>>> TimeTravel period.
>>>> 
>>>> In addition, due to current syntax limitations, we are unable to
>> support
>>>> the use of BETWEEN AND.
>>>> 
>>>> 
>>>> @Jing
>>>> 
>>>>> Would you like to update your thoughts described in your previous
>>> email
>>>> about why SupportsTimeTravel has been rejected into the FLIP?
>>>> 
>>>> Sure,  I updated the doc.
>>>> 
>>>> 
>>>>>   Since we always directly add overload methods into Catalog
>> according
>>>> to new requirements, which makes the interface bloated
>>>> 
>>>> Your concern is valid. If we need to support the long type version in
>> the
>>>> future, we may have to add another method "getTable(ObjectPath, long
>>>> version)". However, I understand that
>>>> "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
>>> requirements.
>>>> The timestamp is for Catalog's use, and Catalog obtains the
>> corresponding
>>>> schema based on this time.
>>>> 
>>>> 
>>>> @liu @Regards
>>>> 
>>>> I am very sorry for the unclear description in the document. I have
>>> updated
>>>> relevant descriptions regarding why it needs to be implemented in
>>> Catalog.
>>>> 
>>>> Travel not only requires obtaining data at the corresponding time
>> point,
>>>> but also requires the corresponding Schema at that time point
>>>> 
>>>> 
>>>> @Shammon
>>>> 
>>>>> Flink or connector such as  iceberg/paimon can create sources from
>> the
>>>> `CatalogBaseTable` directly without the need to get the snapshot ID
>> from
>>>> `CatalogTable.getSnapshot()`.  What do you think of it?
>>>> 
>>>> You are right, we don't need the getSnapshot interface for
>> PaimonCatalog
>>> or
>>>> IcebergCatalog tables, but we may need it for temporary tables.
>>>> 
>>>> 
>>>> 
>>>> Best,
>>>> Feng
>>>> 
>>>> 
>>>> On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:
>>>> 
>>>>> Sorry I replied to the wrong mail. Please ignore the last email.
>>>>> 
>>>>> 
>>>>> Hi Leonard
>>>>> 
>>>>>> 1. Unification SQL
>>>>> 
>>>>> I agree that it is crucial for us to support both batch and streaming
>>>>> processing.  The current design allows for the support of both batch
>>> and
>>>>> streaming processing. I'll update the FLIP later.
>>>>> 
>>>>> 
>>>>>> 2.Semantics
>>>>> 
>>>>> In my opinion, it would be feasible to perform the conversion based
>> on
>>>> the
>>>>> current session time, regardless of whether it is TIMESTAMP or
>>>>> TIMESTAMP_LTZ.
>>>>> 
>>>>> However, this may indeed violate the restriction outlined in
>>>>> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to
>> whether
>>> it
>>>>> is reasonable.
>>>>> 
>>>>> 
>>>>>>  3.  Some external systems may use timestamp value to mark a
>>> version,
>>>>> but others may use version number、file position、log offset.
>>>>> 
>>>>> It is true that most systems support time-related operations, and I
>>>>> believe that the current design is compatible with most systems.
>>> However,
>>>>> if we want to support long data type, it may require Calcite to
>> support
>>>> the
>>>>> VERSION AS OF syntax. I understand that this is something that we may
>>>> need
>>>>> to consider in the future.
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Feng
>>>>> 
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>>>>> 
>>>>> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
>>>>> 
>>>>>> Hi, Feng
>>>>>> 
>>>>>> Thanks for driving this FLIP, very impressive feature that users
>> want,
>>>>>> I’ve some quick questions here.
>>>>>> 
>>>>>> 1.Unification SQL:
>>>>>>        The snapshot  concept exists both in Batch mode and
>> Streaming
>>>>>> mode,  could we consider a unified proposal? I think users won’t
>>> another
>>>>>> SQL syntax named
>>>>>> Time travel for Streaming mode.
>>>>>> 
>>>>>> 2.Semantics:
>>>>>>        Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
>>> long
>>>>>> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we
>>> need
>>>>>> two information i.e. a TIMESTAMP value and current session timezone,
>>>> how
>>>>>> we deal the value with current proposed SQL syntax.
>>>>>> 
>>>>>> 3. Is it enough using sinlge timestamp to track a snapshot(version)
>> of
>>>>>> external table?   Some external systems may use timestamp value to
>>> mark
>>>> a
>>>>>> version, but others may use version number、file position、log offset.
>>>>>> 
>>>>>> Best,
>>>>>> Leonard
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
>>>>>>> 
>>>>>>> Hi Feng,
>>>>>>> 
>>>>>>> I think this FLIP would provide one important feature to unify the
>>>>>> stream-SQL and batch-SQL when we backfill the historical data in
>> batch
>>>> mode.
>>>>>>> 
>>>>>>> For the "Syntax" session, I think you could add descriptions of
>> how
>>> to
>>>>>> align backfill time travel with querying the latest data. And I
>> think
>>>> you
>>>>>> should also update the "Discussion thread" in the original FLIP.
>>>>>>> 
>>>>>>> Moreover, I have a question about getting the table schema from
>> the
>>>>>> catalog. I'm not sure whether the Catalog#getTable(tablePath,
>>> timestamp)
>>>>>> will be called only once. If we have a backfill query between
>>> 2023-05-29
>>>>>> and 2023-06-04 in the past week, and the table schema changed on
>>>>>> 2023-06-01, will the query below detect the schema changes during
>>>> backfill
>>>>>> the whole week?
>>>>>>> 
>>>>>>> SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
>>>>>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
>>>>>>> 
>>>>>>> Best
>>>>>>> Yun Tang
>>>>>>> 
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> From: Shammon FY <zj...@gmail.com>
>>>>>>> Sent: Thursday, June 1, 2023 17:57
>>>>>>> To: dev@flink.apache.org <de...@flink.apache.org>
>>>>>>> Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>>>>>>> 
>>>>>>> Hi Feng,
>>>>>>> 
>>>>>>> I have one minor comment about the public interface
>> `Optional<Long>
>>>>>>> getSnapshot()` in the `CatalogTable`.
>>>>>>> 
>>>>>>> As we can get tables from the new method
>>> `Catalog.getTable(ObjectPath
>>>>>>> tablePath, long timestamp)`, I think the returned
>> `CatalogBaseTable`
>>>>>> will
>>>>>>> have the information of timestamp. Flink or connector such as
>>>>>>> iceberg/paimon can create sources from the `CatalogBaseTable`
>>> directly
>>>>>>> without the need to get the snapshot ID from
>>>>>> `CatalogTable.getSnapshot()`.
>>>>>>> What do you think of it?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Shammon FY
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <jing@ververica.com.invalid
>>> 
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Feng,
>>>>>>>> 
>>>>>>>> Thanks for the proposal! Very interesting feature. Would you like
>>> to
>>>>>> update
>>>>>>>> your thoughts described in your previous email about why
>>>>>> SupportsTimeTravel
>>>>>>>> has been rejected into the FLIP? This will help readers
>> understand
>>>> the
>>>>>>>> context (in the future).
>>>>>>>> 
>>>>>>>> Since we always directly add overload methods into Catalog
>>> according
>>>>>> to new
>>>>>>>> requirements, which makes the interface bloated. Just out of
>>>> curiosity,
>>>>>>>> does it make sense to introduce some DSL design? Like
>>>>>>>> Catalog.getTable(tablePath).on(timeStamp),
>>>>>>>> Catalog.getTable(tablePath).current() for the most current
>> version,
>>>> and
>>>>>>>> more room for further extension like timestamp range, etc. I
>>> haven't
>>>>>> read
>>>>>>>> all the source code yet and I'm not sure if it is possible. But a
>>>>>>>> design like this will keep the Catalog API lean and the API/DSL
>>> will
>>>> be
>>>>>>>> self described and easier to use.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Jing
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
>>>>>>>> krzysiek.chmielewski@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Ok after second though I'm retracting my previous statement
>> about
>>>>>> Catalog
>>>>>>>>> changes you proposed.
>>>>>>>>> I do see a benefit for Delta connector actually with this change
>>> and
>>>>>> see
>>>>>>>>> why this could be coupled with Catalog.
>>>>>>>>> 
>>>>>>>>> Delta Connector SQL support, also ships a Delta Catalog
>>>> implementation
>>>>>>>> for
>>>>>>>>> Flink.
>>>>>>>>> For Delta Catalog, table schema information is fetched from
>>>> underlying
>>>>>>>>> _delta_log and not stored in metastore. For time travel we
>>> actually
>>>>>> had a
>>>>>>>>> problem, that if we would like to timetravel back to some old
>>>> version,
>>>>>>>>> where schema was slightly different, then we would have a
>> conflict
>>>>>> since
>>>>>>>>> Catalog would return current schema and not how it was for
>> version
>>>> X.
>>>>>>>>> 
>>>>>>>>> With your change, our Delta Catalog can actually fetch schema
>> for
>>>>>>>> version X
>>>>>>>>> and send it to DeltaTableFactory. Currency, Catalog can fetch
>> only
>>>>>>>> current
>>>>>>>>> version. What we would also need however is version
>>>> (number/timestamp)
>>>>>>>> for
>>>>>>>>> this table passed to DynamicTableFactory so we could properly
>> set
>>>>>> Delta
>>>>>>>>> standalone library.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Krzysztof
>>>>>>>>> 
>>>>>>>>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
>>>>>>>>> krzysiek.chmielewski@gmail.com> napisał(a):
>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> happy to see such a feature.
>>>>>>>>>> Small note from my end regarding Catalog changes.
>>>>>>>>>> 
>>>>>>>>>> TL;DR
>>>>>>>>>> I don't think it is necessary to delegate this feature to the
>>>>>> catalog.
>>>>>>>> I
>>>>>>>>>> think that since "timetravel" is per job/query property, its
>>> should
>>>>>> not
>>>>>>>>> be
>>>>>>>>>> coupled with the Catalog or table definition. In my opinion
>> this
>>> is
>>>>>>>>>> something that DynamicTableFactory only has to know about. I
>>> would
>>>>>>>> rather
>>>>>>>>>> see this feature as it is - SQL syntax enhancement but delegate
>>>>>> clearly
>>>>>>>>> to
>>>>>>>>>> DynamicTableFactory.
>>>>>>>>>> 
>>>>>>>>>> I've implemented timetravel feature for Delta Connector  [1]
>>> using
>>>>>>>>>> current Flink API.
>>>>>>>>>> Docs are pending code review, but you can find them here [2]
>> and
>>>>>>>> examples
>>>>>>>>>> are available here [3]
>>>>>>>>>> 
>>>>>>>>>> The timetravel feature that I've implemented is based on Flink
>>>> Query
>>>>>>>>>> hints.
>>>>>>>>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
>>>>>>>>>> 
>>>>>>>>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
>>>>>> handled
>>>>>>>>> not
>>>>>>>>>> by Catalog but by DyntamicTableFactory implementation for Delta
>>>>>>>>> connector.
>>>>>>>>>> The value of this property is passed to Delta standalone lib
>> API
>>>> that
>>>>>>>>>> returns table view for given version.
>>>>>>>>>> 
>>>>>>>>>> I'm not sure how/if proposed change could benefit Delta
>> connector
>>>>>>>>>> implementation for this feature.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Krzysztof
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
>>>>>>>>>> [2]
>>>>>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
>>>>>>>>>> [3]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
>>>>>>>>>> 
>>>>>>>>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
>>> napisał(a):
>>>>>>>>>> 
>>>>>>>>>>> Hi, Feng
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for driving this FLIP, Time travel is very useful for
>>> Flink
>>>>>>>>>>> integrate with data lake system. I have one question why the
>>>>>>>>>>> implementation
>>>>>>>>>>> of TimeTravel is delegated to Catalog? Assuming that we use
>>> Flink
>>>> to
>>>>>>>>> query
>>>>>>>>>>> Hudi table with the time travel syntax, but we don't use the
>>>>>>>>> HudiCatalog,
>>>>>>>>>>> instead, we register the hudi table to InMemoryCatalog,  can
>> we
>>>>>>>> support
>>>>>>>>>>> time travel for Hudi table in this case?
>>>>>>>>>>> In contrast, I think time travel should bind to connector
>>> instead
>>>> of
>>>>>>>>>>> Catalog, so the rejected alternative should be considered.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Ron
>>>>>>>>>>> 
>>>>>>>>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi, Feng.
>>>>>>>>>>>> Notice this FLIP only support batch mode for time travel.
>>> Would
>>>> it
>>>>>>>>> also
>>>>>>>>>>>> make sense to support stream mode to a read a snapshot of the
>>>> table
>>>>>>>>> as a
>>>>>>>>>>>> bounded stream?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Yuxia
>>>>>>>>>>>> 
>>>>>>>>>>>> ----- 原始邮件 -----
>>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>>>>>>>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
>>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>>>>>>>>>>>> 
>>>>>>>>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
>>>>>>>>>>>> 
>>>>>>>>>>>> This also depends on whether this is defined in standard or
>> any
>>>>>>>> known
>>>>>>>>>>>> databases that have implemented this. If not, it would be
>> hard
>>> to
>>>>>>>> push
>>>>>>>>>>> it
>>>>>>>>>>>> to Calcite.
>>>>>>>>>>>> 
>>>>>>>>>>>> # getTable(ObjectPath object, long timestamp)
>>>>>>>>>>>> 
>>>>>>>>>>>> Then we again come to the problem of "casting between
>> timestamp
>>>> and
>>>>>>>>>>>> numeric", which has been disabled in FLINK-21978[1]. If
>> you're
>>>>>> gonna
>>>>>>>>> use
>>>>>>>>>>>> this, then we need to clarify that problem first.
>>>>>>>>>>>> 
>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
>>>>>>>>>>>> 
>>>>>>>>>>>>> hi, thanks for your reply.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Benchao
>>>>>>>>>>>>>> did you consider the pushdown abilities compatible
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the current design, the implementation of TimeTravel is
>>>>>>>> delegated
>>>>>>>>>>> to
>>>>>>>>>>>>> Catalog. We have added a function called getTable(ObjectPath
>>>>>>>>>>> tablePath,
>>>>>>>>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable
>>> at
>>>> a
>>>>>>>>>>>> specific
>>>>>>>>>>>>> time.  Therefore, I think it will not have any impact on the
>>>>>>>>> original
>>>>>>>>>>>>> pushdown abilities.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I see there is a rejected  design for adding
>>>>>>>> SupportsTimeTravel,
>>>>>>>>>>> but
>>>>>>>>>>>> I
>>>>>>>>>>>>> didn't see the alternative in  the FLIP doc
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sorry, the document description is not very clear.
>> Regarding
>>>>>>>>> whether
>>>>>>>>>>> to
>>>>>>>>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
>>> Since
>>>>>>>> we
>>>>>>>>>>> have
>>>>>>>>>>>>> already passed the corresponding time in
>> getTable(ObjectPath,
>>>> long
>>>>>>>>>>>>> timestamp) of Catalog, SupportTimeTravel may not be
>> necessary.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In getTable(ObjectPath object, long timestamp), we can
>> obtain
>>>> the
>>>>>>>>>>> schema
>>>>>>>>>>>> of
>>>>>>>>>>>>> the corresponding time point and put the SNAPSHOT that needs
>>> to
>>>> be
>>>>>>>>>>>> consumed
>>>>>>>>>>>>> into options.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> @Shammon
>>>>>>>>>>>>>> Could we support this in Flink too?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I personally think it's possible, but limited by Calcite's
>>>> syntax
>>>>>>>>>>>>> restrictions. I believe we should first support this syntax
>> in
>>>>>>>>>>> Calcite.
>>>>>>>>>>>>> Currently, I think it may not be easy  to support this
>> syntax
>>> in
>>>>>>>>>>> Flink's
>>>>>>>>>>>>> parser. @Benchao, what do you think? Can Calcite support
>> this
>>>>>>>> syntax
>>>>>>>>>>>>> ` VERSION AS OF`  ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Feng.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <
>> zjureel@gmail.com
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Feng, the feature of time travel sounds great!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
>>>>>>>> iceberg
>>>>>>>>>>>>> support
>>>>>>>>>>>>>> snapshot or version. For example, users can query snapshot
>> 1
>>>> for
>>>>>>>>>>> paimon
>>>>>>>>>>>>> by
>>>>>>>>>>>>>> the following statement
>>>>>>>>>>>>>> SELECT * FROM t VERSION AS OF 1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Could we support this in Flink too?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Shammon FY
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
>>>>>>>> libenchao@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regarding the implementation, did you consider the
>> pushdown
>>>>>>>>>>> abilities
>>>>>>>>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
>>>>>>>>> partition
>>>>>>>>>>>>>> pushdown.
>>>>>>>>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
>>>>>>>> have a
>>>>>>>>>>>>> concern
>>>>>>>>>>>>>>> about this. Of course, it depends on your implementation
>>>>>>>> detail,
>>>>>>>>>>> what
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> important is that we'd better add some cross tests for
>>> these.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Regarding the interface exposed to Connector, I see there
>>> is a
>>>>>>>>>>>> rejected
>>>>>>>>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
>>>>>>>>>>>> alternative
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
>>>>>>>> clarify
>>>>>>>>>>>>> because
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> need to know whether the Connector supports this, and what
>>>>>>>>>>>>>> column/metadata
>>>>>>>>>>>>>>> corresponds to 'system_time'.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for your reply
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> @Timo @BenChao @yuxia
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
>>>>>>>>> `FOR
>>>>>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
>>>>>>>> OF`
>>>>>>>>> .
>>>>>>>>>>>> I've
>>>>>>>>>>>>>>>> updated the syntax part of the FLIP.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> @Timo
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I think we need to convert TIMESTAMP to
>> TIMESTAMP_LTZ
>>>>>>>> and
>>>>>>>>>>> then
>>>>>>>>>>>>>>> convert
>>>>>>>>>>>>>>>> it into a long value.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> How do we want to query the most recent version of a
>> table
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
>>>>>>>>> cause
>>>>>>>>>>>>>>>> inconsistency with the real-time concept.
>>>>>>>>>>>>>>>> However, from my personal understanding, the scope of
>> `AS
>>>>>>>> OF
>>>>>>>>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
>>>>>>>> record.
>>>>>>>>>>> So,
>>>>>>>>>>>> I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
>>>>>>>>>>>>>>>> Additionally, if no version is specified, the latest
>>> version
>>>>>>>>>>> should
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> by default.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Feng
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
>>>>>>>>>>> luoyuxia@alumni.sjtu.edu.cn
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
>>>>>>>>> introduce
>>>>>>>>>>>> time
>>>>>>>>>>>>>>> travel
>>>>>>>>>>>>>>>>> to Flink to have a better integration with external data
>>>>>>>>>>> soruces.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I also share same concern about the syntax.
>>>>>>>>>>>>>>>>> I see in the part of `Whether to support other syntax
>>>>>>>>>>>>>> implementations`
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
>>>>>>>>>>> SYSTEM_TIME
>>>>>>>>>>>>> AS
>>>>>>>>>>>>>>> OF`,
>>>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
>>>>>>>> OF
>>>>>>>>>>>>>> TIMESTAMP`
>>>>>>>>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a
>> mistake
>>>>>>>> or
>>>>>>>>>>> by
>>>>>>>>>>>>>> design?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>>> Yuxia
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> ----- 原始邮件 -----
>>>>>>>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>>>>>>>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>>>>>>>>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
>>>>>>>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>>>>>>>>> Mode
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
>>>>>>>>>>> instead
>>>>>>>>>>>> of
>>>>>>>>>>>>>> `FOR
>>>>>>>>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is
>> in
>>>>>>>>> the
>>>>>>>>>>> SQL
>>>>>>>>>>>>>>>> standard
>>>>>>>>>>>>>>>>> and has been supported in some database vendors such as
>>>>>>>> SQL
>>>>>>>>>>>> Server.
>>>>>>>>>>>>>>> About
>>>>>>>>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
>>>>>>>>> supports
>>>>>>>>>>>>> this,
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
>>>>>>>> and
>>>>>>>>> I
>>>>>>>>>>>> would
>>>>>>>>>>>>>>> give
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
>>>>>>>> use
>>>>>>>>>>> `FOR
>>>>>>>>>>>>>> SYSTEM
>>>>>>>>>>>>>>>> AS
>>>>>>>>>>>>>>>>> OF`.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四
>> 19:02写道:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Also: How do we want to query the most recent version
>>>>>>>> of a
>>>>>>>>>>>> table?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
>>>>>>>> to
>>>>>>>>>>> the
>>>>>>>>>>>>> docs
>>>>>>>>>>>>>>>> both
>>>>>>>>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
>>>>>>>> concerning
>>>>>>>>>>> is
>>>>>>>>>>>> the
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> actually is evalated row-based:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Returns the current SQL timestamp in the local time
>>>>>>>>> zone,
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
>>>>>>>> record
>>>>>>>>> in
>>>>>>>>>>>>>>> streaming
>>>>>>>>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
>>>>>>>> query
>>>>>>>>>>>> starts
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> uses the same result for every row.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This could make it difficult to explain in a join
>>>>>>>> scenario
>>>>>>>>>>> of
>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>> snapshotted tables.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
>>>>>>>>>>>>>>>>>>> Hi Feng,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
>>>>>>>> sense
>>>>>>>>> to
>>>>>>>>>>>>>> finally
>>>>>>>>>>>>>>>>>>> support querying tables at a specific point in time or
>>>>>>>>>>>>> hopefully
>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> ranges soon. Following time-versioned tables.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 1. Syntax
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
>>>>>>>>> but
>>>>>>>>>>>> not
>>>>>>>>>>>>>> `FOR
>>>>>>>>>>>>>>>>>>> SYSTEM_TIME AS OF`?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> It would be great to support `AS OF` also for
>>>>>>>>>>> time-versioned
>>>>>>>>>>>>>> joins
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> have a unified and short syntax.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
>>>>>>>>> this
>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> Flink earlier by copying the corresponding classes
>>>>>>>> until
>>>>>>>>>>> the
>>>>>>>>>>>>> next
>>>>>>>>>>>>>>>>>>> Calcite upgrade is performed.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2. Semantics
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
>>>>>>>>>>>> timestamp
>>>>>>>>>>>>>>> types
>>>>>>>>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
>>>>>>>>>>>> TIMESTAMP
>>>>>>>>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
>>>>>>>>>>> timestamp
>>>>>>>>>>>> be?
>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> We definely need to clarify this because the past has
>>>>>>>>>>> shown
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> daylight saving times make our lives hard.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
>>>>>>>>>>>>>>>>>>>> Hi, everyone.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
>>>>>>>> Support
>>>>>>>>>>> Time
>>>>>>>>>>>>>> Travel
>>>>>>>>>>>>>>>> In
>>>>>>>>>>>>>>>>>>>> Batch
>>>>>>>>>>>>>>>>>>>> Mode [1]
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
>>>>>>>>>>>> versions
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> data.
>>>>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>>>> allows users to specify a point in time and retrieve
>>>>>>>>> the
>>>>>>>>>>>> data
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> schema of
>>>>>>>>>>>>>>>>>>>> a table as it appeared at that time. With time
>>>>>>>> travel,
>>>>>>>>>>> users
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>>>>> analyze and compare historical versions of data.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> With the widespread use of data lake systems such as
>>>>>>>>>>> Paimon,
>>>>>>>>>>>>>>>> Iceberg,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
>>>>>>>>> users'
>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>> analysis.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
>>>>>>>>>>>>> welcomed.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 1.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Feng
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Benchao Li
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best,
>>> Benchao Li
>>> 
>> 
> 
> 
> -- 
> 
> Best,
> Benchao Li


Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Benchao Li <li...@apache.org>.
As you can see that you must use `UNIX_TIMESTAMP` to do this work, that's
where the time zone happens.

What I'm talking about is casting timestamp/timestamp_ltz to long directly,
that's why the semantic is tricky when you are casting timestamp to long
using time zone.

For other systems, such as SQL server[1], they actually uses a string
instead of timestamp literal `FOR SYSTEM_TIME AS OF '2021-01-01
00:00:00.0000000'`, I'm not sure whether they convert the string implicitly
to TIMESTAMP_LTZ, or they just have a different definition of the syntax.

But for us, we are definitely using timestamp/timestmap_ltz literal here,
that's why it is special, and we must highlight this behavior that we are
converting a timestamp without time zone literal to long using the session
time zone.

[1]
https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios?view=sql-server-ver16

Feng Jin <ji...@gmail.com> 于2023年6月8日周四 11:35写道:

> Hi all,
>
> thanks for your input
>
>
> @Benchao
>
> >  The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> WITHOUT TIME ZONE", converting it to unix timestamp would use UTC timezone,
> which is not usually expected by users.
>
> It was indeed the case before Flink 1.13, but now my understanding is that
> there have been some slight changes in the definition of TIMESTAMP.
>
> TIMESTAMP is currently used to specify the year, month, day, hour, minute
> and second. We recommend that users use *UNIX_TIMESTAMP(CAST(timestamp_col
> AS STRING))* to convert *TIMESTAMP values* and *long values*. The
> *UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
> whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will involve
> using the *LOCAL TIME ZONE*.
>
>
> Here is an test:
>
> Flink SQL> SET 'table.local-time-zone' = 'UTC';
> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> STRING)) as `timestamp`;
> ---------------
>  timestamp
>  --------------
>  0
>
> Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
> Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
> STRING)) as `timestamp`;
> ---------------
>  timestamp
>  --------------
>  -28800
>
> Therefore, the current conversion method exposed to users is also using
> LOCAL TIME ZONE.
>
>
> @yuxia
>
> Thank you very much for providing the list of behaviors of TIMESTAMP in
> other systems.
>
> > I think we can align them to avoid the inconsistency to other engines and
> provide convenience for the external connectors while integrating Flink's
> time travel API.
>
> +1 for this.
>
> > Regarding the inconsistency, I think we can consider time-travel as a
> specical case, and we do needs to highlight this in this FLIP.
> As for "violate the restriction outlined in FLINK-21978[1]", since we cast
> timestamp to epochMillis only for the internal use, and won't expose it to
> users, I don't think it will violate the restriction.
> Btw, please add a brief desc to explain the meaning of the parameter
> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long
> timestamp)`. Maybe something like "timestamp of the table snapt, which is
> millseconds since 1970-01-01 00:00:00 UTC".
>
> Thank you for the suggestions regarding the document. I will add them to
> FLIP.
>
>
> Best,
> Feng
>
>
> On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org> wrote:
>
> > I also share the concern about the timezone problem.
> >
> > The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
> WITHOUT
> > TIME ZONE", converting it to unix timestamp would use UTC timezone, which
> > is not usually expected by users.
> >
> > If we want to keep consistent with the standard, we probably should use
> > "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is
> "TIMESTAMP
> > WITH LOCAL TIME ZONE", and converting it to unix timestamp will consider
> > the session timezone, which is the expected result. But it's inconvenient
> > for users.
> >
> > Taking this a special case, and converting "TIMESTAMP '2023-04-27
> > 00:00:00'" to a unix timestamp with session timezone, will be convenient
> > for users, but will break the standard. I will +0.5 for this choice.
> >
> > yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
> >
> > > Hi, Feng Jin.
> > > I think the concern of Leonard may be the inconsistency of the behavior
> > of
> > > TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
> > statement.
> > >
> > > For the normal sql:
> > > `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
> > > But for the sql for timetravl:
> > > `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
> > > 00:00:00'`, we will consider the timezone and convert to UTC timestamp.
> > >
> > > The concern is valid. But for time travel, most style of engines,
> > > Spark[1], Hive[2], Trino[3] also do the time conversion with
> considering
> > > the seesion time zone. I think we can align them to avoid the
> > inconsistency
> > > to other engines and provide convenience for the external connectors
> > while
> > > integrating Flink's time travel API.
> > >
> > > Regarding the inconsistency, I think we can consider time-travel as a
> > > specical case, and we do needs to highlight this in this FLIP.
> > > As for "violate the restriction outlined in FLINK-21978[1]", since we
> > cast
> > > timestamp to epochMillis only for the internal use, and won't expose it
> > to
> > > users, I don't think it will violate the restriction.
> > > Btw, please add a brief desc to explain the meaning of the parameter
> > > `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
> > long
> > > timestamp)`. Maybe something like "timestamp of the table snapt, which
> is
> > > millseconds since 1970-01-01 00:00:00 UTC".
> > >
> > > [1]
> > >
> >
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> > > [2]
> > >
> >
> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
> > > [3]
> > >
> >
> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > ----- 原始邮件 -----
> > > 发件人: "Feng Jin" <ji...@gmail.com>
> > > 收件人: "dev" <de...@flink.apache.org>
> > > 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
> > > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >
> > > Hi everyone
> > >
> > > Thanks everyone for your input.
> > >
> > >
> > > @Yun
> > >
> > > >  I think you could add descriptions of how to align backfill time
> > travel
> > > with querying the latest data. And I think you should also update the
> > > "Discussion thread" in the original FLIP.
> > >
> > > Thank you for the suggestion, I will update it in the document.
> > >
> > > >  I have a question about getting the table schema from the catalog.
> I'm
> > > not sure whether the Catalog#getTable(tablePath, timestamp) will be
> > called
> > > only once.
> > >
> > > I understand that in a query, the schema of the table is determined
> > before
> > > execution. The schema used will be based on the latest schema within
> the
> > > TimeTravel period.
> > >
> > > In addition, due to current syntax limitations, we are unable to
> support
> > > the use of BETWEEN AND.
> > >
> > >
> > > @Jing
> > >
> > > >  Would you like to update your thoughts described in your previous
> > email
> > > about why SupportsTimeTravel has been rejected into the FLIP?
> > >
> > > Sure,  I updated the doc.
> > >
> > >
> > > >    Since we always directly add overload methods into Catalog
> according
> > > to new requirements, which makes the interface bloated
> > >
> > > Your concern is valid. If we need to support the long type version in
> the
> > > future, we may have to add another method "getTable(ObjectPath, long
> > > version)". However, I understand that
> > > "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
> > requirements.
> > > The timestamp is for Catalog's use, and Catalog obtains the
> corresponding
> > > schema based on this time.
> > >
> > >
> > > @liu @Regards
> > >
> > > I am very sorry for the unclear description in the document. I have
> > updated
> > > relevant descriptions regarding why it needs to be implemented in
> > Catalog.
> > >
> > > Travel not only requires obtaining data at the corresponding time
> point,
> > > but also requires the corresponding Schema at that time point
> > >
> > >
> > > @Shammon
> > >
> > > > Flink or connector such as  iceberg/paimon can create sources from
> the
> > > `CatalogBaseTable` directly without the need to get the snapshot ID
> from
> > > `CatalogTable.getSnapshot()`.  What do you think of it?
> > >
> > > You are right, we don't need the getSnapshot interface for
> PaimonCatalog
> > or
> > > IcebergCatalog tables, but we may need it for temporary tables.
> > >
> > >
> > >
> > > Best,
> > > Feng
> > >
> > >
> > > On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:
> > >
> > > > Sorry I replied to the wrong mail. Please ignore the last email.
> > > >
> > > >
> > > > Hi Leonard
> > > >
> > > > > 1. Unification SQL
> > > >
> > > > I agree that it is crucial for us to support both batch and streaming
> > > > processing.  The current design allows for the support of both batch
> > and
> > > > streaming processing. I'll update the FLIP later.
> > > >
> > > >
> > > > >  2.Semantics
> > > >
> > > > In my opinion, it would be feasible to perform the conversion based
> on
> > > the
> > > > current session time, regardless of whether it is TIMESTAMP or
> > > > TIMESTAMP_LTZ.
> > > >
> > > > However, this may indeed violate the restriction outlined in
> > > > FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to
> whether
> > it
> > > > is reasonable.
> > > >
> > > >
> > > > >   3.  Some external systems may use timestamp value to mark a
> > version,
> > > > but others may use version number、file position、log offset.
> > > >
> > > > It is true that most systems support time-related operations, and I
> > > > believe that the current design is compatible with most systems.
> > However,
> > > > if we want to support long data type, it may require Calcite to
> support
> > > the
> > > > VERSION AS OF syntax. I understand that this is something that we may
> > > need
> > > > to consider in the future.
> > > >
> > > >
> > > > Best,
> > > > Feng
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-21978
> > > >
> > > > On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
> > > >
> > > >> Hi, Feng
> > > >>
> > > >> Thanks for driving this FLIP, very impressive feature that users
> want,
> > > >> I’ve some quick questions here.
> > > >>
> > > >> 1.Unification SQL:
> > > >>         The snapshot  concept exists both in Batch mode and
> Streaming
> > > >> mode,  could we consider a unified proposal? I think users won’t
> > another
> > > >> SQL syntax named
> > > >> Time travel for Streaming mode.
> > > >>
> > > >> 2.Semantics:
> > > >>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
> > long
> > > >> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we
> > need
> > > >> two information i.e. a TIMESTAMP value and current session timezone,
> > > how
> > > >> we deal the value with current proposed SQL syntax.
> > > >>
> > > >> 3. Is it enough using sinlge timestamp to track a snapshot(version)
> of
> > > >> external table?   Some external systems may use timestamp value to
> > mark
> > > a
> > > >> version, but others may use version number、file position、log offset.
> > > >>
> > > >> Best,
> > > >> Leonard
> > > >>
> > > >>
> > > >>
> > > >> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> > > >> >
> > > >> > Hi Feng,
> > > >> >
> > > >> > I think this FLIP would provide one important feature to unify the
> > > >> stream-SQL and batch-SQL when we backfill the historical data in
> batch
> > > mode.
> > > >> >
> > > >> > For the "Syntax" session, I think you could add descriptions of
> how
> > to
> > > >> align backfill time travel with querying the latest data. And I
> think
> > > you
> > > >> should also update the "Discussion thread" in the original FLIP.
> > > >> >
> > > >> > Moreover, I have a question about getting the table schema from
> the
> > > >> catalog. I'm not sure whether the Catalog#getTable(tablePath,
> > timestamp)
> > > >> will be called only once. If we have a backfill query between
> > 2023-05-29
> > > >> and 2023-06-04 in the past week, and the table schema changed on
> > > >> 2023-06-01, will the query below detect the schema changes during
> > > backfill
> > > >> the whole week?
> > > >> >
> > > >> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> > > >> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> > > >> >
> > > >> > Best
> > > >> > Yun Tang
> > > >> >
> > > >> >
> > > >> > ________________________________
> > > >> > From: Shammon FY <zj...@gmail.com>
> > > >> > Sent: Thursday, June 1, 2023 17:57
> > > >> > To: dev@flink.apache.org <de...@flink.apache.org>
> > > >> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > > >> >
> > > >> > Hi Feng,
> > > >> >
> > > >> > I have one minor comment about the public interface
> `Optional<Long>
> > > >> > getSnapshot()` in the `CatalogTable`.
> > > >> >
> > > >> > As we can get tables from the new method
> > `Catalog.getTable(ObjectPath
> > > >> > tablePath, long timestamp)`, I think the returned
> `CatalogBaseTable`
> > > >> will
> > > >> > have the information of timestamp. Flink or connector such as
> > > >> > iceberg/paimon can create sources from the `CatalogBaseTable`
> > directly
> > > >> > without the need to get the snapshot ID from
> > > >> `CatalogTable.getSnapshot()`.
> > > >> > What do you think of it?
> > > >> >
> > > >> > Best,
> > > >> > Shammon FY
> > > >> >
> > > >> >
> > > >> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <jing@ververica.com.invalid
> >
> > > >> wrote:
> > > >> >
> > > >> >> Hi Feng,
> > > >> >>
> > > >> >> Thanks for the proposal! Very interesting feature. Would you like
> > to
> > > >> update
> > > >> >> your thoughts described in your previous email about why
> > > >> SupportsTimeTravel
> > > >> >> has been rejected into the FLIP? This will help readers
> understand
> > > the
> > > >> >> context (in the future).
> > > >> >>
> > > >> >> Since we always directly add overload methods into Catalog
> > according
> > > >> to new
> > > >> >> requirements, which makes the interface bloated. Just out of
> > > curiosity,
> > > >> >> does it make sense to introduce some DSL design? Like
> > > >> >> Catalog.getTable(tablePath).on(timeStamp),
> > > >> >> Catalog.getTable(tablePath).current() for the most current
> version,
> > > and
> > > >> >> more room for further extension like timestamp range, etc. I
> > haven't
> > > >> read
> > > >> >> all the source code yet and I'm not sure if it is possible. But a
> > > >> >> design like this will keep the Catalog API lean and the API/DSL
> > will
> > > be
> > > >> >> self described and easier to use.
> > > >> >>
> > > >> >> Best regards,
> > > >> >> Jing
> > > >> >>
> > > >> >>
> > > >> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> > > >> >> krzysiek.chmielewski@gmail.com> wrote:
> > > >> >>
> > > >> >>> Ok after second though I'm retracting my previous statement
> about
> > > >> Catalog
> > > >> >>> changes you proposed.
> > > >> >>> I do see a benefit for Delta connector actually with this change
> > and
> > > >> see
> > > >> >>> why this could be coupled with Catalog.
> > > >> >>>
> > > >> >>> Delta Connector SQL support, also ships a Delta Catalog
> > > implementation
> > > >> >> for
> > > >> >>> Flink.
> > > >> >>> For Delta Catalog, table schema information is fetched from
> > > underlying
> > > >> >>> _delta_log and not stored in metastore. For time travel we
> > actually
> > > >> had a
> > > >> >>> problem, that if we would like to timetravel back to some old
> > > version,
> > > >> >>> where schema was slightly different, then we would have a
> conflict
> > > >> since
> > > >> >>> Catalog would return current schema and not how it was for
> version
> > > X.
> > > >> >>>
> > > >> >>> With your change, our Delta Catalog can actually fetch schema
> for
> > > >> >> version X
> > > >> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch
> only
> > > >> >> current
> > > >> >>> version. What we would also need however is version
> > > (number/timestamp)
> > > >> >> for
> > > >> >>> this table passed to DynamicTableFactory so we could properly
> set
> > > >> Delta
> > > >> >>> standalone library.
> > > >> >>>
> > > >> >>> Regards,
> > > >> >>> Krzysztof
> > > >> >>>
> > > >> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> > > >> >>> krzysiek.chmielewski@gmail.com> napisał(a):
> > > >> >>>
> > > >> >>>> Hi,
> > > >> >>>> happy to see such a feature.
> > > >> >>>> Small note from my end regarding Catalog changes.
> > > >> >>>>
> > > >> >>>> TL;DR
> > > >> >>>> I don't think it is necessary to delegate this feature to the
> > > >> catalog.
> > > >> >> I
> > > >> >>>> think that since "timetravel" is per job/query property, its
> > should
> > > >> not
> > > >> >>> be
> > > >> >>>> coupled with the Catalog or table definition. In my opinion
> this
> > is
> > > >> >>>> something that DynamicTableFactory only has to know about. I
> > would
> > > >> >> rather
> > > >> >>>> see this feature as it is - SQL syntax enhancement but delegate
> > > >> clearly
> > > >> >>> to
> > > >> >>>> DynamicTableFactory.
> > > >> >>>>
> > > >> >>>> I've implemented timetravel feature for Delta Connector  [1]
> > using
> > > >> >>>> current Flink API.
> > > >> >>>> Docs are pending code review, but you can find them here [2]
> and
> > > >> >> examples
> > > >> >>>> are available here [3]
> > > >> >>>>
> > > >> >>>> The timetravel feature that I've implemented is based on Flink
> > > Query
> > > >> >>>> hints.
> > > >> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> > > >> >>>>
> > > >> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> > > >> handled
> > > >> >>> not
> > > >> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> > > >> >>> connector.
> > > >> >>>> The value of this property is passed to Delta standalone lib
> API
> > > that
> > > >> >>>> returns table view for given version.
> > > >> >>>>
> > > >> >>>> I'm not sure how/if proposed change could benefit Delta
> connector
> > > >> >>>> implementation for this feature.
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Krzysztof
> > > >> >>>>
> > > >> >>>> [1]
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > > >> >>>> [2]
> > > >> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > > >> >>>> [3]
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> > > >> >>>>
> > > >> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
> > napisał(a):
> > > >> >>>>
> > > >> >>>>> Hi, Feng
> > > >> >>>>>
> > > >> >>>>> Thanks for driving this FLIP, Time travel is very useful for
> > Flink
> > > >> >>>>> integrate with data lake system. I have one question why the
> > > >> >>>>> implementation
> > > >> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use
> > Flink
> > > to
> > > >> >>> query
> > > >> >>>>> Hudi table with the time travel syntax, but we don't use the
> > > >> >>> HudiCatalog,
> > > >> >>>>> instead, we register the hudi table to InMemoryCatalog,  can
> we
> > > >> >> support
> > > >> >>>>> time travel for Hudi table in this case?
> > > >> >>>>> In contrast, I think time travel should bind to connector
> > instead
> > > of
> > > >> >>>>> Catalog, so the rejected alternative should be considered.
> > > >> >>>>>
> > > >> >>>>> Best,
> > > >> >>>>> Ron
> > > >> >>>>>
> > > >> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> > > >> >>>>>
> > > >> >>>>>> Hi, Feng.
> > > >> >>>>>> Notice this FLIP only support batch mode for time travel.
> > Would
> > > it
> > > >> >>> also
> > > >> >>>>>> make sense to support stream mode to a read a snapshot of the
> > > table
> > > >> >>> as a
> > > >> >>>>>> bounded stream?
> > > >> >>>>>>
> > > >> >>>>>> Best regards,
> > > >> >>>>>> Yuxia
> > > >> >>>>>>
> > > >> >>>>>> ----- 原始邮件 -----
> > > >> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > > >> >>>>>> 收件人: "dev" <de...@flink.apache.org>
> > > >> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> > > >> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > > >> >>>>>>
> > > >> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> > > >> >>>>>>
> > > >> >>>>>> This also depends on whether this is defined in standard or
> any
> > > >> >> known
> > > >> >>>>>> databases that have implemented this. If not, it would be
> hard
> > to
> > > >> >> push
> > > >> >>>>> it
> > > >> >>>>>> to Calcite.
> > > >> >>>>>>
> > > >> >>>>>> # getTable(ObjectPath object, long timestamp)
> > > >> >>>>>>
> > > >> >>>>>> Then we again come to the problem of "casting between
> timestamp
> > > and
> > > >> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If
> you're
> > > >> gonna
> > > >> >>> use
> > > >> >>>>>> this, then we need to clarify that problem first.
> > > >> >>>>>>
> > > >> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> > > >> >>>>>>
> > > >> >>>>>>> hi, thanks for your reply.
> > > >> >>>>>>>
> > > >> >>>>>>> @Benchao
> > > >> >>>>>>>> did you consider the pushdown abilities compatible
> > > >> >>>>>>>
> > > >> >>>>>>> In the current design, the implementation of TimeTravel is
> > > >> >> delegated
> > > >> >>>>> to
> > > >> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
> > > >> >>>>> tablePath,
> > > >> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable
> > at
> > > a
> > > >> >>>>>> specific
> > > >> >>>>>>> time.  Therefore, I think it will not have any impact on the
> > > >> >>> original
> > > >> >>>>>>> pushdown abilities.
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>>  I see there is a rejected  design for adding
> > > >> >> SupportsTimeTravel,
> > > >> >>>>> but
> > > >> >>>>>> I
> > > >> >>>>>>> didn't see the alternative in  the FLIP doc
> > > >> >>>>>>>
> > > >> >>>>>>> Sorry, the document description is not very clear.
> Regarding
> > > >> >>> whether
> > > >> >>>>> to
> > > >> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
> > Since
> > > >> >> we
> > > >> >>>>> have
> > > >> >>>>>>> already passed the corresponding time in
> getTable(ObjectPath,
> > > long
> > > >> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be
> necessary.
> > > >> >>>>>>>
> > > >> >>>>>>> In getTable(ObjectPath object, long timestamp), we can
> obtain
> > > the
> > > >> >>>>> schema
> > > >> >>>>>> of
> > > >> >>>>>>> the corresponding time point and put the SNAPSHOT that needs
> > to
> > > be
> > > >> >>>>>> consumed
> > > >> >>>>>>> into options.
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> @Shammon
> > > >> >>>>>>>> Could we support this in Flink too?
> > > >> >>>>>>>
> > > >> >>>>>>> I personally think it's possible, but limited by Calcite's
> > > syntax
> > > >> >>>>>>> restrictions. I believe we should first support this syntax
> in
> > > >> >>>>> Calcite.
> > > >> >>>>>>> Currently, I think it may not be easy  to support this
> syntax
> > in
> > > >> >>>>> Flink's
> > > >> >>>>>>> parser. @Benchao, what do you think? Can Calcite support
> this
> > > >> >> syntax
> > > >> >>>>>>> ` VERSION AS OF`  ?
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> Best,
> > > >> >>>>>>> Feng.
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <
> zjureel@gmail.com
> > >
> > > >> >>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
> > > >> >>>>>>>>
> > > >> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> > > >> >> iceberg
> > > >> >>>>>>> support
> > > >> >>>>>>>> snapshot or version. For example, users can query snapshot
> 1
> > > for
> > > >> >>>>> paimon
> > > >> >>>>>>> by
> > > >> >>>>>>>> the following statement
> > > >> >>>>>>>> SELECT * FROM t VERSION AS OF 1
> > > >> >>>>>>>>
> > > >> >>>>>>>> Could we support this in Flink too?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Best,
> > > >> >>>>>>>> Shammon FY
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> > > >> >> libenchao@apache.org>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Regarding the implementation, did you consider the
> pushdown
> > > >> >>>>> abilities
> > > >> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> > > >> >>> partition
> > > >> >>>>>>>> pushdown.
> > > >> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> > > >> >> have a
> > > >> >>>>>>> concern
> > > >> >>>>>>>>> about this. Of course, it depends on your implementation
> > > >> >> detail,
> > > >> >>>>> what
> > > >> >>>>>>> is
> > > >> >>>>>>>>> important is that we'd better add some cross tests for
> > these.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Regarding the interface exposed to Connector, I see there
> > is a
> > > >> >>>>>> rejected
> > > >> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> > > >> >>>>>> alternative
> > > >> >>>>>>> in
> > > >> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> > > >> >> clarify
> > > >> >>>>>>> because
> > > >> >>>>>>>> we
> > > >> >>>>>>>>> need to know whether the Connector supports this, and what
> > > >> >>>>>>>> column/metadata
> > > >> >>>>>>>>> corresponds to 'system_time'.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Thanks for your reply
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> @Timo @BenChao @yuxia
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> > > >> >>> `FOR
> > > >> >>>>>>>>> SYSTEM_TIME
> > > >> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> > > >> >> OF`
> > > >> >>> .
> > > >> >>>>>> I've
> > > >> >>>>>>>>>> updated the syntax part of the FLIP.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> @Timo
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to
> TIMESTAMP_LTZ
> > > >> >> and
> > > >> >>>>> then
> > > >> >>>>>>>>> convert
> > > >> >>>>>>>>>> it into a long value.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> How do we want to query the most recent version of a
> table
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> > > >> >>> cause
> > > >> >>>>>>>>>> inconsistency with the real-time concept.
> > > >> >>>>>>>>>> However, from my personal understanding, the scope of
> `AS
> > > >> >> OF
> > > >> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> > > >> >> record.
> > > >> >>>>> So,
> > > >> >>>>>> I
> > > >> >>>>>>>>> think
> > > >> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> > > >> >>>>>>>>>> Additionally, if no version is specified, the latest
> > version
> > > >> >>>>> should
> > > >> >>>>>>> be
> > > >> >>>>>>>>> used
> > > >> >>>>>>>>>> by default.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Best,
> > > >> >>>>>>>>>> Feng
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> > > >> >>>>> luoyuxia@alumni.sjtu.edu.cn
> > > >> >>>>>>>
> > > >> >>>>>>>>> wrote:
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> > > >> >>> introduce
> > > >> >>>>>> time
> > > >> >>>>>>>>> travel
> > > >> >>>>>>>>>>> to Flink to have a better integration with external data
> > > >> >>>>> soruces.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> I also share same concern about the syntax.
> > > >> >>>>>>>>>>> I see in the part of `Whether to support other syntax
> > > >> >>>>>>>> implementations`
> > > >> >>>>>>>>> in
> > > >> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> > > >> >>>>> SYSTEM_TIME
> > > >> >>>>>>> AS
> > > >> >>>>>>>>> OF`,
> > > >> >>>>>>>>>>> right?
> > > >> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> > > >> >> OF
> > > >> >>>>>>>> TIMESTAMP`
> > > >> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a
> mistake
> > > >> >> or
> > > >> >>>>> by
> > > >> >>>>>>>> design?
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Best regards,
> > > >> >>>>>>>>>>> Yuxia
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> ----- 原始邮件 -----
> > > >> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > > >> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> > > >> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > > >> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> > > >> >>> Mode
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> > > >> >>>>> instead
> > > >> >>>>>> of
> > > >> >>>>>>>> `FOR
> > > >> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is
> in
> > > >> >>> the
> > > >> >>>>> SQL
> > > >> >>>>>>>>>> standard
> > > >> >>>>>>>>>>> and has been supported in some database vendors such as
> > > >> >> SQL
> > > >> >>>>>> Server.
> > > >> >>>>>>>>> About
> > > >> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> > > >> >>> supports
> > > >> >>>>>>> this,
> > > >> >>>>>>>> if
> > > >> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> > > >> >> and
> > > >> >>> I
> > > >> >>>>>> would
> > > >> >>>>>>>>> give
> > > >> >>>>>>>>>> a
> > > >> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> > > >> >> use
> > > >> >>>>> `FOR
> > > >> >>>>>>>> SYSTEM
> > > >> >>>>>>>>>> AS
> > > >> >>>>>>>>>>> OF`.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四
> 19:02写道:
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>> Also: How do we want to query the most recent version
> > > >> >> of a
> > > >> >>>>>> table?
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> > > >> >> to
> > > >> >>>>> the
> > > >> >>>>>>> docs
> > > >> >>>>>>>>>> both
> > > >> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> > > >> >> concerning
> > > >> >>>>> is
> > > >> >>>>>> the
> > > >> >>>>>>>> it
> > > >> >>>>>>>>>>>> actually is evalated row-based:
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> > > >> >>> zone,
> > > >> >>>>>> the
> > > >> >>>>>>>>> return
> > > >> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> > > >> >> record
> > > >> >>> in
> > > >> >>>>>>>>> streaming
> > > >> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> > > >> >> query
> > > >> >>>>>> starts
> > > >> >>>>>>>> and
> > > >> >>>>>>>>>>>> uses the same result for every row.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> This could make it difficult to explain in a join
> > > >> >> scenario
> > > >> >>>>> of
> > > >> >>>>>>>>> multiple
> > > >> >>>>>>>>>>>> snapshotted tables.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Regards,
> > > >> >>>>>>>>>>>> Timo
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> > > >> >>>>>>>>>>>>> Hi Feng,
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> > > >> >> sense
> > > >> >>> to
> > > >> >>>>>>>> finally
> > > >> >>>>>>>>>>>>> support querying tables at a specific point in time or
> > > >> >>>>>>> hopefully
> > > >> >>>>>>>>> also
> > > >> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Here is some feedback from my side:
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> 1. Syntax
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> > > >> >>> but
> > > >> >>>>>> not
> > > >> >>>>>>>> `FOR
> > > >> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> It would be great to support `AS OF` also for
> > > >> >>>>> time-versioned
> > > >> >>>>>>>> joins
> > > >> >>>>>>>>>> and
> > > >> >>>>>>>>>>>>> have a unified and short syntax.
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> > > >> >>> this
> > > >> >>>>>>>>> available
> > > >> >>>>>>>>>> in
> > > >> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> > > >> >> until
> > > >> >>>>> the
> > > >> >>>>>>> next
> > > >> >>>>>>>>>>>>> Calcite upgrade is performed.
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> 2. Semantics
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> > > >> >>>>>> timestamp
> > > >> >>>>>>>>> types
> > > >> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> > > >> >>>>>> TIMESTAMP
> > > >> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> > > >> >>>>> timestamp
> > > >> >>>>>> be?
> > > >> >>>>>>>> We
> > > >> >>>>>>>>>> will
> > > >> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> We definely need to clarify this because the past has
> > > >> >>>>> shown
> > > >> >>>>>>> that
> > > >> >>>>>>>>>>>>> daylight saving times make our lives hard.
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> Thanks,
> > > >> >>>>>>>>>>>>> Timo
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> > > >> >>>>>>>>>>>>>> Hi, everyone.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> > > >> >> Support
> > > >> >>>>> Time
> > > >> >>>>>>>> Travel
> > > >> >>>>>>>>>> In
> > > >> >>>>>>>>>>>>>> Batch
> > > >> >>>>>>>>>>>>>> Mode [1]
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> > > >> >>>>>> versions
> > > >> >>>>>>> of
> > > >> >>>>>>>>>> data.
> > > >> >>>>>>>>>>>> It
> > > >> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> > > >> >>> the
> > > >> >>>>>> data
> > > >> >>>>>>>> and
> > > >> >>>>>>>>>>>>>> schema of
> > > >> >>>>>>>>>>>>>> a table as it appeared at that time. With time
> > > >> >> travel,
> > > >> >>>>> users
> > > >> >>>>>>> can
> > > >> >>>>>>>>>>> easily
> > > >> >>>>>>>>>>>>>> analyze and compare historical versions of data.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
> > > >> >>>>> Paimon,
> > > >> >>>>>>>>>> Iceberg,
> > > >> >>>>>>>>>>>> and
> > > >> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> > > >> >>> users'
> > > >> >>>>>> data
> > > >> >>>>>>>>>>> analysis.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> > > >> >>>>>>> welcomed.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> 1.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Best.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Feng
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> --
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Best,
> > > >> >>>>>>>>>>> Benchao Li
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> --
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Best,
> > > >> >>>>>>>>> Benchao Li
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> --
> > > >> >>>>>>
> > > >> >>>>>> Best,
> > > >> >>>>>> Benchao Li
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Hi all,

thanks for your input


@Benchao

>  The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP
WITHOUT TIME ZONE", converting it to unix timestamp would use UTC timezone,
which is not usually expected by users.

It was indeed the case before Flink 1.13, but now my understanding is that
there have been some slight changes in the definition of TIMESTAMP.

TIMESTAMP is currently used to specify the year, month, day, hour, minute
and second. We recommend that users use *UNIX_TIMESTAMP(CAST(timestamp_col
AS STRING))* to convert *TIMESTAMP values* and *long values*. The
*UNIX_TIMESTAMP* function will use the *LOCAL TIME ZONE*. Therefore,
whether converting TIMESTAMP or TIMESTAMP_LTZ to Long values will involve
using the *LOCAL TIME ZONE*.


Here is an test:

Flink SQL> SET 'table.local-time-zone' = 'UTC';
Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
STRING)) as `timestamp`;
---------------
 timestamp
 --------------
 0

Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';
Flink SQL> SELECT UNIX_TIMESTAMP(CAST(TIMESTAMP '1970-01-01 00:00:00' as
STRING)) as `timestamp`;
---------------
 timestamp
 --------------
 -28800

Therefore, the current conversion method exposed to users is also using
LOCAL TIME ZONE.


@yuxia

Thank you very much for providing the list of behaviors of TIMESTAMP in
other systems.

> I think we can align them to avoid the inconsistency to other engines and
provide convenience for the external connectors while integrating Flink's
time travel API.

+1 for this.

> Regarding the inconsistency, I think we can consider time-travel as a
specical case, and we do needs to highlight this in this FLIP.
As for "violate the restriction outlined in FLINK-21978[1]", since we cast
timestamp to epochMillis only for the internal use, and won't expose it to
users, I don't think it will violate the restriction.
Btw, please add a brief desc to explain the meaning of the parameter
`timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long
timestamp)`. Maybe something like "timestamp of the table snapt, which is
millseconds since 1970-01-01 00:00:00 UTC".

Thank you for the suggestions regarding the document. I will add them to
FLIP.


Best,
Feng


On Wed, Jun 7, 2023 at 12:18 PM Benchao Li <li...@apache.org> wrote:

> I also share the concern about the timezone problem.
>
> The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP WITHOUT
> TIME ZONE", converting it to unix timestamp would use UTC timezone, which
> is not usually expected by users.
>
> If we want to keep consistent with the standard, we probably should use
> "TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is "TIMESTAMP
> WITH LOCAL TIME ZONE", and converting it to unix timestamp will consider
> the session timezone, which is the expected result. But it's inconvenient
> for users.
>
> Taking this a special case, and converting "TIMESTAMP '2023-04-27
> 00:00:00'" to a unix timestamp with session timezone, will be convenient
> for users, but will break the standard. I will +0.5 for this choice.
>
> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:
>
> > Hi, Feng Jin.
> > I think the concern of Leonard may be the inconsistency of the behavior
> of
> > TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql
> statement.
> >
> > For the normal sql:
> > `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
> > But for the sql for timetravl:
> > `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
> > 00:00:00'`, we will consider the timezone and convert to UTC timestamp.
> >
> > The concern is valid. But for time travel, most style of engines,
> > Spark[1], Hive[2], Trino[3] also do the time conversion with considering
> > the seesion time zone. I think we can align them to avoid the
> inconsistency
> > to other engines and provide convenience for the external connectors
> while
> > integrating Flink's time travel API.
> >
> > Regarding the inconsistency, I think we can consider time-travel as a
> > specical case, and we do needs to highlight this in this FLIP.
> > As for "violate the restriction outlined in FLINK-21978[1]", since we
> cast
> > timestamp to epochMillis only for the internal use, and won't expose it
> to
> > users, I don't think it will violate the restriction.
> > Btw, please add a brief desc to explain the meaning of the parameter
> > `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath,
> long
> > timestamp)`. Maybe something like "timestamp of the table snapt, which is
> > millseconds since 1970-01-01 00:00:00 UTC".
> >
> > [1]
> >
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> > [2]
> >
> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
> > [3]
> >
> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
> >
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Feng Jin" <ji...@gmail.com>
> > 收件人: "dev" <de...@flink.apache.org>
> > 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
> > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >
> > Hi everyone
> >
> > Thanks everyone for your input.
> >
> >
> > @Yun
> >
> > >  I think you could add descriptions of how to align backfill time
> travel
> > with querying the latest data. And I think you should also update the
> > "Discussion thread" in the original FLIP.
> >
> > Thank you for the suggestion, I will update it in the document.
> >
> > >  I have a question about getting the table schema from the catalog. I'm
> > not sure whether the Catalog#getTable(tablePath, timestamp) will be
> called
> > only once.
> >
> > I understand that in a query, the schema of the table is determined
> before
> > execution. The schema used will be based on the latest schema within the
> > TimeTravel period.
> >
> > In addition, due to current syntax limitations, we are unable to support
> > the use of BETWEEN AND.
> >
> >
> > @Jing
> >
> > >  Would you like to update your thoughts described in your previous
> email
> > about why SupportsTimeTravel has been rejected into the FLIP?
> >
> > Sure,  I updated the doc.
> >
> >
> > >    Since we always directly add overload methods into Catalog according
> > to new requirements, which makes the interface bloated
> >
> > Your concern is valid. If we need to support the long type version in the
> > future, we may have to add another method "getTable(ObjectPath, long
> > version)". However, I understand that
> > "Catalog.getTable(tablePath).on(timeStamp)" may not meet the
> requirements.
> > The timestamp is for Catalog's use, and Catalog obtains the corresponding
> > schema based on this time.
> >
> >
> > @liu @Regards
> >
> > I am very sorry for the unclear description in the document. I have
> updated
> > relevant descriptions regarding why it needs to be implemented in
> Catalog.
> >
> > Travel not only requires obtaining data at the corresponding time point,
> > but also requires the corresponding Schema at that time point
> >
> >
> > @Shammon
> >
> > > Flink or connector such as  iceberg/paimon can create sources from the
> > `CatalogBaseTable` directly without the need to get the snapshot ID from
> > `CatalogTable.getSnapshot()`.  What do you think of it?
> >
> > You are right, we don't need the getSnapshot interface for PaimonCatalog
> or
> > IcebergCatalog tables, but we may need it for temporary tables.
> >
> >
> >
> > Best,
> > Feng
> >
> >
> > On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:
> >
> > > Sorry I replied to the wrong mail. Please ignore the last email.
> > >
> > >
> > > Hi Leonard
> > >
> > > > 1. Unification SQL
> > >
> > > I agree that it is crucial for us to support both batch and streaming
> > > processing.  The current design allows for the support of both batch
> and
> > > streaming processing. I'll update the FLIP later.
> > >
> > >
> > > >  2.Semantics
> > >
> > > In my opinion, it would be feasible to perform the conversion based on
> > the
> > > current session time, regardless of whether it is TIMESTAMP or
> > > TIMESTAMP_LTZ.
> > >
> > > However, this may indeed violate the restriction outlined in
> > > FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to whether
> it
> > > is reasonable.
> > >
> > >
> > > >   3.  Some external systems may use timestamp value to mark a
> version,
> > > but others may use version number、file position、log offset.
> > >
> > > It is true that most systems support time-related operations, and I
> > > believe that the current design is compatible with most systems.
> However,
> > > if we want to support long data type, it may require Calcite to support
> > the
> > > VERSION AS OF syntax. I understand that this is something that we may
> > need
> > > to consider in the future.
> > >
> > >
> > > Best,
> > > Feng
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >
> > > On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
> > >
> > >> Hi, Feng
> > >>
> > >> Thanks for driving this FLIP, very impressive feature that users want,
> > >> I’ve some quick questions here.
> > >>
> > >> 1.Unification SQL:
> > >>         The snapshot  concept exists both in Batch mode and Streaming
> > >> mode,  could we consider a unified proposal? I think users won’t
> another
> > >> SQL syntax named
> > >> Time travel for Streaming mode.
> > >>
> > >> 2.Semantics:
> > >>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a
> long
> > >> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we
> need
> > >> two information i.e. a TIMESTAMP value and current session timezone,
> > how
> > >> we deal the value with current proposed SQL syntax.
> > >>
> > >> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
> > >> external table?   Some external systems may use timestamp value to
> mark
> > a
> > >> version, but others may use version number、file position、log offset.
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>
> > >>
> > >> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> > >> >
> > >> > Hi Feng,
> > >> >
> > >> > I think this FLIP would provide one important feature to unify the
> > >> stream-SQL and batch-SQL when we backfill the historical data in batch
> > mode.
> > >> >
> > >> > For the "Syntax" session, I think you could add descriptions of how
> to
> > >> align backfill time travel with querying the latest data. And I think
> > you
> > >> should also update the "Discussion thread" in the original FLIP.
> > >> >
> > >> > Moreover, I have a question about getting the table schema from the
> > >> catalog. I'm not sure whether the Catalog#getTable(tablePath,
> timestamp)
> > >> will be called only once. If we have a backfill query between
> 2023-05-29
> > >> and 2023-06-04 in the past week, and the table schema changed on
> > >> 2023-06-01, will the query below detect the schema changes during
> > backfill
> > >> the whole week?
> > >> >
> > >> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> > >> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> > >> >
> > >> > Best
> > >> > Yun Tang
> > >> >
> > >> >
> > >> > ________________________________
> > >> > From: Shammon FY <zj...@gmail.com>
> > >> > Sent: Thursday, June 1, 2023 17:57
> > >> > To: dev@flink.apache.org <de...@flink.apache.org>
> > >> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >> >
> > >> > Hi Feng,
> > >> >
> > >> > I have one minor comment about the public interface `Optional<Long>
> > >> > getSnapshot()` in the `CatalogTable`.
> > >> >
> > >> > As we can get tables from the new method
> `Catalog.getTable(ObjectPath
> > >> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable`
> > >> will
> > >> > have the information of timestamp. Flink or connector such as
> > >> > iceberg/paimon can create sources from the `CatalogBaseTable`
> directly
> > >> > without the need to get the snapshot ID from
> > >> `CatalogTable.getSnapshot()`.
> > >> > What do you think of it?
> > >> >
> > >> > Best,
> > >> > Shammon FY
> > >> >
> > >> >
> > >> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
> > >> wrote:
> > >> >
> > >> >> Hi Feng,
> > >> >>
> > >> >> Thanks for the proposal! Very interesting feature. Would you like
> to
> > >> update
> > >> >> your thoughts described in your previous email about why
> > >> SupportsTimeTravel
> > >> >> has been rejected into the FLIP? This will help readers understand
> > the
> > >> >> context (in the future).
> > >> >>
> > >> >> Since we always directly add overload methods into Catalog
> according
> > >> to new
> > >> >> requirements, which makes the interface bloated. Just out of
> > curiosity,
> > >> >> does it make sense to introduce some DSL design? Like
> > >> >> Catalog.getTable(tablePath).on(timeStamp),
> > >> >> Catalog.getTable(tablePath).current() for the most current version,
> > and
> > >> >> more room for further extension like timestamp range, etc. I
> haven't
> > >> read
> > >> >> all the source code yet and I'm not sure if it is possible. But a
> > >> >> design like this will keep the Catalog API lean and the API/DSL
> will
> > be
> > >> >> self described and easier to use.
> > >> >>
> > >> >> Best regards,
> > >> >> Jing
> > >> >>
> > >> >>
> > >> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> > >> >> krzysiek.chmielewski@gmail.com> wrote:
> > >> >>
> > >> >>> Ok after second though I'm retracting my previous statement about
> > >> Catalog
> > >> >>> changes you proposed.
> > >> >>> I do see a benefit for Delta connector actually with this change
> and
> > >> see
> > >> >>> why this could be coupled with Catalog.
> > >> >>>
> > >> >>> Delta Connector SQL support, also ships a Delta Catalog
> > implementation
> > >> >> for
> > >> >>> Flink.
> > >> >>> For Delta Catalog, table schema information is fetched from
> > underlying
> > >> >>> _delta_log and not stored in metastore. For time travel we
> actually
> > >> had a
> > >> >>> problem, that if we would like to timetravel back to some old
> > version,
> > >> >>> where schema was slightly different, then we would have a conflict
> > >> since
> > >> >>> Catalog would return current schema and not how it was for version
> > X.
> > >> >>>
> > >> >>> With your change, our Delta Catalog can actually fetch schema for
> > >> >> version X
> > >> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
> > >> >> current
> > >> >>> version. What we would also need however is version
> > (number/timestamp)
> > >> >> for
> > >> >>> this table passed to DynamicTableFactory so we could properly set
> > >> Delta
> > >> >>> standalone library.
> > >> >>>
> > >> >>> Regards,
> > >> >>> Krzysztof
> > >> >>>
> > >> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> > >> >>> krzysiek.chmielewski@gmail.com> napisał(a):
> > >> >>>
> > >> >>>> Hi,
> > >> >>>> happy to see such a feature.
> > >> >>>> Small note from my end regarding Catalog changes.
> > >> >>>>
> > >> >>>> TL;DR
> > >> >>>> I don't think it is necessary to delegate this feature to the
> > >> catalog.
> > >> >> I
> > >> >>>> think that since "timetravel" is per job/query property, its
> should
> > >> not
> > >> >>> be
> > >> >>>> coupled with the Catalog or table definition. In my opinion this
> is
> > >> >>>> something that DynamicTableFactory only has to know about. I
> would
> > >> >> rather
> > >> >>>> see this feature as it is - SQL syntax enhancement but delegate
> > >> clearly
> > >> >>> to
> > >> >>>> DynamicTableFactory.
> > >> >>>>
> > >> >>>> I've implemented timetravel feature for Delta Connector  [1]
> using
> > >> >>>> current Flink API.
> > >> >>>> Docs are pending code review, but you can find them here [2] and
> > >> >> examples
> > >> >>>> are available here [3]
> > >> >>>>
> > >> >>>> The timetravel feature that I've implemented is based on Flink
> > Query
> > >> >>>> hints.
> > >> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> > >> >>>>
> > >> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> > >> handled
> > >> >>> not
> > >> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> > >> >>> connector.
> > >> >>>> The value of this property is passed to Delta standalone lib API
> > that
> > >> >>>> returns table view for given version.
> > >> >>>>
> > >> >>>> I'm not sure how/if proposed change could benefit Delta connector
> > >> >>>> implementation for this feature.
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Krzysztof
> > >> >>>>
> > >> >>>> [1]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > >> >>>> [2]
> > >> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > >> >>>> [3]
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> > >> >>>>
> > >> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com>
> napisał(a):
> > >> >>>>
> > >> >>>>> Hi, Feng
> > >> >>>>>
> > >> >>>>> Thanks for driving this FLIP, Time travel is very useful for
> Flink
> > >> >>>>> integrate with data lake system. I have one question why the
> > >> >>>>> implementation
> > >> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use
> Flink
> > to
> > >> >>> query
> > >> >>>>> Hudi table with the time travel syntax, but we don't use the
> > >> >>> HudiCatalog,
> > >> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
> > >> >> support
> > >> >>>>> time travel for Hudi table in this case?
> > >> >>>>> In contrast, I think time travel should bind to connector
> instead
> > of
> > >> >>>>> Catalog, so the rejected alternative should be considered.
> > >> >>>>>
> > >> >>>>> Best,
> > >> >>>>> Ron
> > >> >>>>>
> > >> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> > >> >>>>>
> > >> >>>>>> Hi, Feng.
> > >> >>>>>> Notice this FLIP only support batch mode for time travel.
> Would
> > it
> > >> >>> also
> > >> >>>>>> make sense to support stream mode to a read a snapshot of the
> > table
> > >> >>> as a
> > >> >>>>>> bounded stream?
> > >> >>>>>>
> > >> >>>>>> Best regards,
> > >> >>>>>> Yuxia
> > >> >>>>>>
> > >> >>>>>> ----- 原始邮件 -----
> > >> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > >> >>>>>> 收件人: "dev" <de...@flink.apache.org>
> > >> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> > >> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >> >>>>>>
> > >> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> > >> >>>>>>
> > >> >>>>>> This also depends on whether this is defined in standard or any
> > >> >> known
> > >> >>>>>> databases that have implemented this. If not, it would be hard
> to
> > >> >> push
> > >> >>>>> it
> > >> >>>>>> to Calcite.
> > >> >>>>>>
> > >> >>>>>> # getTable(ObjectPath object, long timestamp)
> > >> >>>>>>
> > >> >>>>>> Then we again come to the problem of "casting between timestamp
> > and
> > >> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're
> > >> gonna
> > >> >>> use
> > >> >>>>>> this, then we need to clarify that problem first.
> > >> >>>>>>
> > >> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> > >> >>>>>>
> > >> >>>>>>> hi, thanks for your reply.
> > >> >>>>>>>
> > >> >>>>>>> @Benchao
> > >> >>>>>>>> did you consider the pushdown abilities compatible
> > >> >>>>>>>
> > >> >>>>>>> In the current design, the implementation of TimeTravel is
> > >> >> delegated
> > >> >>>>> to
> > >> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
> > >> >>>>> tablePath,
> > >> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable
> at
> > a
> > >> >>>>>> specific
> > >> >>>>>>> time.  Therefore, I think it will not have any impact on the
> > >> >>> original
> > >> >>>>>>> pushdown abilities.
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>>  I see there is a rejected  design for adding
> > >> >> SupportsTimeTravel,
> > >> >>>>> but
> > >> >>>>>> I
> > >> >>>>>>> didn't see the alternative in  the FLIP doc
> > >> >>>>>>>
> > >> >>>>>>> Sorry, the document description is not very clear.  Regarding
> > >> >>> whether
> > >> >>>>> to
> > >> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia.
> Since
> > >> >> we
> > >> >>>>> have
> > >> >>>>>>> already passed the corresponding time in getTable(ObjectPath,
> > long
> > >> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
> > >> >>>>>>>
> > >> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain
> > the
> > >> >>>>> schema
> > >> >>>>>> of
> > >> >>>>>>> the corresponding time point and put the SNAPSHOT that needs
> to
> > be
> > >> >>>>>> consumed
> > >> >>>>>>> into options.
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> @Shammon
> > >> >>>>>>>> Could we support this in Flink too?
> > >> >>>>>>>
> > >> >>>>>>> I personally think it's possible, but limited by Calcite's
> > syntax
> > >> >>>>>>> restrictions. I believe we should first support this syntax in
> > >> >>>>> Calcite.
> > >> >>>>>>> Currently, I think it may not be easy  to support this syntax
> in
> > >> >>>>> Flink's
> > >> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
> > >> >> syntax
> > >> >>>>>>> ` VERSION AS OF`  ?
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> Best,
> > >> >>>>>>> Feng.
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zjureel@gmail.com
> >
> > >> >>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
> > >> >>>>>>>>
> > >> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> > >> >> iceberg
> > >> >>>>>>> support
> > >> >>>>>>>> snapshot or version. For example, users can query snapshot 1
> > for
> > >> >>>>> paimon
> > >> >>>>>>> by
> > >> >>>>>>>> the following statement
> > >> >>>>>>>> SELECT * FROM t VERSION AS OF 1
> > >> >>>>>>>>
> > >> >>>>>>>> Could we support this in Flink too?
> > >> >>>>>>>>
> > >> >>>>>>>> Best,
> > >> >>>>>>>> Shammon FY
> > >> >>>>>>>>
> > >> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> > >> >> libenchao@apache.org>
> > >> >>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Regarding the implementation, did you consider the pushdown
> > >> >>>>> abilities
> > >> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> > >> >>> partition
> > >> >>>>>>>> pushdown.
> > >> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> > >> >> have a
> > >> >>>>>>> concern
> > >> >>>>>>>>> about this. Of course, it depends on your implementation
> > >> >> detail,
> > >> >>>>> what
> > >> >>>>>>> is
> > >> >>>>>>>>> important is that we'd better add some cross tests for
> these.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Regarding the interface exposed to Connector, I see there
> is a
> > >> >>>>>> rejected
> > >> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> > >> >>>>>> alternative
> > >> >>>>>>> in
> > >> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> > >> >> clarify
> > >> >>>>>>> because
> > >> >>>>>>>> we
> > >> >>>>>>>>> need to know whether the Connector supports this, and what
> > >> >>>>>>>> column/metadata
> > >> >>>>>>>>> corresponds to 'system_time'.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Thanks for your reply
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> @Timo @BenChao @yuxia
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> > >> >>> `FOR
> > >> >>>>>>>>> SYSTEM_TIME
> > >> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> > >> >> OF`
> > >> >>> .
> > >> >>>>>> I've
> > >> >>>>>>>>>> updated the syntax part of the FLIP.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> @Timo
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> > >> >> and
> > >> >>>>> then
> > >> >>>>>>>>> convert
> > >> >>>>>>>>>> it into a long value.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> How do we want to query the most recent version of a table
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> > >> >>> cause
> > >> >>>>>>>>>> inconsistency with the real-time concept.
> > >> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
> > >> >> OF
> > >> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> > >> >> record.
> > >> >>>>> So,
> > >> >>>>>> I
> > >> >>>>>>>>> think
> > >> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> > >> >>>>>>>>>> Additionally, if no version is specified, the latest
> version
> > >> >>>>> should
> > >> >>>>>>> be
> > >> >>>>>>>>> used
> > >> >>>>>>>>>> by default.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Best,
> > >> >>>>>>>>>> Feng
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> > >> >>>>> luoyuxia@alumni.sjtu.edu.cn
> > >> >>>>>>>
> > >> >>>>>>>>> wrote:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> > >> >>> introduce
> > >> >>>>>> time
> > >> >>>>>>>>> travel
> > >> >>>>>>>>>>> to Flink to have a better integration with external data
> > >> >>>>> soruces.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> I also share same concern about the syntax.
> > >> >>>>>>>>>>> I see in the part of `Whether to support other syntax
> > >> >>>>>>>> implementations`
> > >> >>>>>>>>> in
> > >> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> > >> >>>>> SYSTEM_TIME
> > >> >>>>>>> AS
> > >> >>>>>>>>> OF`,
> > >> >>>>>>>>>>> right?
> > >> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> > >> >> OF
> > >> >>>>>>>> TIMESTAMP`
> > >> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> > >> >> or
> > >> >>>>> by
> > >> >>>>>>>> design?
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Best regards,
> > >> >>>>>>>>>>> Yuxia
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> ----- 原始邮件 -----
> > >> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> > >> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> > >> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > >> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> > >> >>> Mode
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> > >> >>>>> instead
> > >> >>>>>> of
> > >> >>>>>>>> `FOR
> > >> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> > >> >>> the
> > >> >>>>> SQL
> > >> >>>>>>>>>> standard
> > >> >>>>>>>>>>> and has been supported in some database vendors such as
> > >> >> SQL
> > >> >>>>>> Server.
> > >> >>>>>>>>> About
> > >> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> > >> >>> supports
> > >> >>>>>>> this,
> > >> >>>>>>>> if
> > >> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> > >> >> and
> > >> >>> I
> > >> >>>>>> would
> > >> >>>>>>>>> give
> > >> >>>>>>>>>> a
> > >> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> > >> >> use
> > >> >>>>> `FOR
> > >> >>>>>>>> SYSTEM
> > >> >>>>>>>>>> AS
> > >> >>>>>>>>>>> OF`.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> Also: How do we want to query the most recent version
> > >> >> of a
> > >> >>>>>> table?
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> > >> >> to
> > >> >>>>> the
> > >> >>>>>>> docs
> > >> >>>>>>>>>> both
> > >> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> > >> >> concerning
> > >> >>>>> is
> > >> >>>>>> the
> > >> >>>>>>>> it
> > >> >>>>>>>>>>>> actually is evalated row-based:
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> > >> >>> zone,
> > >> >>>>>> the
> > >> >>>>>>>>> return
> > >> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> > >> >> record
> > >> >>> in
> > >> >>>>>>>>> streaming
> > >> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> > >> >> query
> > >> >>>>>> starts
> > >> >>>>>>>> and
> > >> >>>>>>>>>>>> uses the same result for every row.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> This could make it difficult to explain in a join
> > >> >> scenario
> > >> >>>>> of
> > >> >>>>>>>>> multiple
> > >> >>>>>>>>>>>> snapshotted tables.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Regards,
> > >> >>>>>>>>>>>> Timo
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> > >> >>>>>>>>>>>>> Hi Feng,
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> > >> >> sense
> > >> >>> to
> > >> >>>>>>>> finally
> > >> >>>>>>>>>>>>> support querying tables at a specific point in time or
> > >> >>>>>>> hopefully
> > >> >>>>>>>>> also
> > >> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Here is some feedback from my side:
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> 1. Syntax
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> > >> >>> but
> > >> >>>>>> not
> > >> >>>>>>>> `FOR
> > >> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> It would be great to support `AS OF` also for
> > >> >>>>> time-versioned
> > >> >>>>>>>> joins
> > >> >>>>>>>>>> and
> > >> >>>>>>>>>>>>> have a unified and short syntax.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> > >> >>> this
> > >> >>>>>>>>> available
> > >> >>>>>>>>>> in
> > >> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> > >> >> until
> > >> >>>>> the
> > >> >>>>>>> next
> > >> >>>>>>>>>>>>> Calcite upgrade is performed.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> 2. Semantics
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> > >> >>>>>> timestamp
> > >> >>>>>>>>> types
> > >> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> > >> >>>>>> TIMESTAMP
> > >> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> > >> >>>>> timestamp
> > >> >>>>>> be?
> > >> >>>>>>>> We
> > >> >>>>>>>>>> will
> > >> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> We definely need to clarify this because the past has
> > >> >>>>> shown
> > >> >>>>>>> that
> > >> >>>>>>>>>>>>> daylight saving times make our lives hard.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Thanks,
> > >> >>>>>>>>>>>>> Timo
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> > >> >>>>>>>>>>>>>> Hi, everyone.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> > >> >> Support
> > >> >>>>> Time
> > >> >>>>>>>> Travel
> > >> >>>>>>>>>> In
> > >> >>>>>>>>>>>>>> Batch
> > >> >>>>>>>>>>>>>> Mode [1]
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> > >> >>>>>> versions
> > >> >>>>>>> of
> > >> >>>>>>>>>> data.
> > >> >>>>>>>>>>>> It
> > >> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> > >> >>> the
> > >> >>>>>> data
> > >> >>>>>>>> and
> > >> >>>>>>>>>>>>>> schema of
> > >> >>>>>>>>>>>>>> a table as it appeared at that time. With time
> > >> >> travel,
> > >> >>>>> users
> > >> >>>>>>> can
> > >> >>>>>>>>>>> easily
> > >> >>>>>>>>>>>>>> analyze and compare historical versions of data.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
> > >> >>>>> Paimon,
> > >> >>>>>>>>>> Iceberg,
> > >> >>>>>>>>>>>> and
> > >> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> > >> >>> users'
> > >> >>>>>> data
> > >> >>>>>>>>>>> analysis.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> > >> >>>>>>> welcomed.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> 1.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Best.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Feng
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> --
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Best,
> > >> >>>>>>>>>>> Benchao Li
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> --
> > >> >>>>>>>>>
> > >> >>>>>>>>> Best,
> > >> >>>>>>>>> Benchao Li
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> --
> > >> >>>>>>
> > >> >>>>>> Best,
> > >> >>>>>> Benchao Li
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >>
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Benchao Li <li...@apache.org>.
I also share the concern about the timezone problem.

The type for "TIMESTAMP '2023-04-27 00:00:00'" should be "TIMESTAMP WITHOUT
TIME ZONE", converting it to unix timestamp would use UTC timezone, which
is not usually expected by users.

If we want to keep consistent with the standard, we probably should use
"TIMESTAMP WITH LOCAL ZONE '2023-04-27 00:00:00'", which type is "TIMESTAMP
WITH LOCAL TIME ZONE", and converting it to unix timestamp will consider
the session timezone, which is the expected result. But it's inconvenient
for users.

Taking this a special case, and converting "TIMESTAMP '2023-04-27
00:00:00'" to a unix timestamp with session timezone, will be convenient
for users, but will break the standard. I will +0.5 for this choice.

yuxia <lu...@alumni.sjtu.edu.cn> 于2023年6月7日周三 12:06写道:

> Hi, Feng Jin.
> I think the concern of Leonard may be the inconsistency of the behavior of
> TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql statement.
>
> For the normal sql:
> `SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
> But for the sql for timetravl:
> `SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27
> 00:00:00'`, we will consider the timezone and convert to UTC timestamp.
>
> The concern is valid. But for time travel, most style of engines,
> Spark[1], Hive[2], Trino[3] also do the time conversion with considering
> the seesion time zone. I think we can align them to avoid the inconsistency
> to other engines and provide convenience for the external connectors while
> integrating Flink's time travel API.
>
> Regarding the inconsistency, I think we can consider time-travel as a
> specical case, and we do needs to highlight this in this FLIP.
> As for "violate the restriction outlined in FLINK-21978[1]", since we cast
> timestamp to epochMillis only for the internal use, and won't expose it to
> users, I don't think it will violate the restriction.
> Btw, please add a brief desc to explain the meaning of the parameter
> `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long
> timestamp)`. Maybe something like "timestamp of the table snapt, which is
> millseconds since 1970-01-01 00:00:00 UTC".
>
> [1]
> https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
> [2]
> https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
> [3]
> https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443
>
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Feng Jin" <ji...@gmail.com>
> 收件人: "dev" <de...@flink.apache.org>
> 发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>
> Hi everyone
>
> Thanks everyone for your input.
>
>
> @Yun
>
> >  I think you could add descriptions of how to align backfill time travel
> with querying the latest data. And I think you should also update the
> "Discussion thread" in the original FLIP.
>
> Thank you for the suggestion, I will update it in the document.
>
> >  I have a question about getting the table schema from the catalog. I'm
> not sure whether the Catalog#getTable(tablePath, timestamp) will be called
> only once.
>
> I understand that in a query, the schema of the table is determined before
> execution. The schema used will be based on the latest schema within the
> TimeTravel period.
>
> In addition, due to current syntax limitations, we are unable to support
> the use of BETWEEN AND.
>
>
> @Jing
>
> >  Would you like to update your thoughts described in your previous email
> about why SupportsTimeTravel has been rejected into the FLIP?
>
> Sure,  I updated the doc.
>
>
> >    Since we always directly add overload methods into Catalog according
> to new requirements, which makes the interface bloated
>
> Your concern is valid. If we need to support the long type version in the
> future, we may have to add another method "getTable(ObjectPath, long
> version)". However, I understand that
> "Catalog.getTable(tablePath).on(timeStamp)" may not meet the requirements.
> The timestamp is for Catalog's use, and Catalog obtains the corresponding
> schema based on this time.
>
>
> @liu @Regards
>
> I am very sorry for the unclear description in the document. I have updated
> relevant descriptions regarding why it needs to be implemented in Catalog.
>
> Travel not only requires obtaining data at the corresponding time point,
> but also requires the corresponding Schema at that time point
>
>
> @Shammon
>
> > Flink or connector such as  iceberg/paimon can create sources from the
> `CatalogBaseTable` directly without the need to get the snapshot ID from
> `CatalogTable.getSnapshot()`.  What do you think of it?
>
> You are right, we don't need the getSnapshot interface for PaimonCatalog or
> IcebergCatalog tables, but we may need it for temporary tables.
>
>
>
> Best,
> Feng
>
>
> On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:
>
> > Sorry I replied to the wrong mail. Please ignore the last email.
> >
> >
> > Hi Leonard
> >
> > > 1. Unification SQL
> >
> > I agree that it is crucial for us to support both batch and streaming
> > processing.  The current design allows for the support of both batch and
> > streaming processing. I'll update the FLIP later.
> >
> >
> > >  2.Semantics
> >
> > In my opinion, it would be feasible to perform the conversion based on
> the
> > current session time, regardless of whether it is TIMESTAMP or
> > TIMESTAMP_LTZ.
> >
> > However, this may indeed violate the restriction outlined in
> > FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to whether it
> > is reasonable.
> >
> >
> > >   3.  Some external systems may use timestamp value to mark a version,
> > but others may use version number、file position、log offset.
> >
> > It is true that most systems support time-related operations, and I
> > believe that the current design is compatible with most systems. However,
> > if we want to support long data type, it may require Calcite to support
> the
> > VERSION AS OF syntax. I understand that this is something that we may
> need
> > to consider in the future.
> >
> >
> > Best,
> > Feng
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21978
> >
> > On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
> >
> >> Hi, Feng
> >>
> >> Thanks for driving this FLIP, very impressive feature that users want,
> >> I’ve some quick questions here.
> >>
> >> 1.Unification SQL:
> >>         The snapshot  concept exists both in Batch mode and Streaming
> >> mode,  could we consider a unified proposal? I think users won’t another
> >> SQL syntax named
> >> Time travel for Streaming mode.
> >>
> >> 2.Semantics:
> >>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long
> >> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need
> >> two information i.e. a TIMESTAMP value and current session timezone,
> how
> >> we deal the value with current proposed SQL syntax.
> >>
> >> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
> >> external table?   Some external systems may use timestamp value to mark
> a
> >> version, but others may use version number、file position、log offset.
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >>
> >> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> >> >
> >> > Hi Feng,
> >> >
> >> > I think this FLIP would provide one important feature to unify the
> >> stream-SQL and batch-SQL when we backfill the historical data in batch
> mode.
> >> >
> >> > For the "Syntax" session, I think you could add descriptions of how to
> >> align backfill time travel with querying the latest data. And I think
> you
> >> should also update the "Discussion thread" in the original FLIP.
> >> >
> >> > Moreover, I have a question about getting the table schema from the
> >> catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp)
> >> will be called only once. If we have a backfill query between 2023-05-29
> >> and 2023-06-04 in the past week, and the table schema changed on
> >> 2023-06-01, will the query below detect the schema changes during
> backfill
> >> the whole week?
> >> >
> >> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> >> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> >> >
> >> > Best
> >> > Yun Tang
> >> >
> >> >
> >> > ________________________________
> >> > From: Shammon FY <zj...@gmail.com>
> >> > Sent: Thursday, June 1, 2023 17:57
> >> > To: dev@flink.apache.org <de...@flink.apache.org>
> >> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >> >
> >> > Hi Feng,
> >> >
> >> > I have one minor comment about the public interface `Optional<Long>
> >> > getSnapshot()` in the `CatalogTable`.
> >> >
> >> > As we can get tables from the new method `Catalog.getTable(ObjectPath
> >> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable`
> >> will
> >> > have the information of timestamp. Flink or connector such as
> >> > iceberg/paimon can create sources from the `CatalogBaseTable` directly
> >> > without the need to get the snapshot ID from
> >> `CatalogTable.getSnapshot()`.
> >> > What do you think of it?
> >> >
> >> > Best,
> >> > Shammon FY
> >> >
> >> >
> >> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
> >> wrote:
> >> >
> >> >> Hi Feng,
> >> >>
> >> >> Thanks for the proposal! Very interesting feature. Would you like to
> >> update
> >> >> your thoughts described in your previous email about why
> >> SupportsTimeTravel
> >> >> has been rejected into the FLIP? This will help readers understand
> the
> >> >> context (in the future).
> >> >>
> >> >> Since we always directly add overload methods into Catalog according
> >> to new
> >> >> requirements, which makes the interface bloated. Just out of
> curiosity,
> >> >> does it make sense to introduce some DSL design? Like
> >> >> Catalog.getTable(tablePath).on(timeStamp),
> >> >> Catalog.getTable(tablePath).current() for the most current version,
> and
> >> >> more room for further extension like timestamp range, etc. I haven't
> >> read
> >> >> all the source code yet and I'm not sure if it is possible. But a
> >> >> design like this will keep the Catalog API lean and the API/DSL will
> be
> >> >> self described and easier to use.
> >> >>
> >> >> Best regards,
> >> >> Jing
> >> >>
> >> >>
> >> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> >> >> krzysiek.chmielewski@gmail.com> wrote:
> >> >>
> >> >>> Ok after second though I'm retracting my previous statement about
> >> Catalog
> >> >>> changes you proposed.
> >> >>> I do see a benefit for Delta connector actually with this change and
> >> see
> >> >>> why this could be coupled with Catalog.
> >> >>>
> >> >>> Delta Connector SQL support, also ships a Delta Catalog
> implementation
> >> >> for
> >> >>> Flink.
> >> >>> For Delta Catalog, table schema information is fetched from
> underlying
> >> >>> _delta_log and not stored in metastore. For time travel we actually
> >> had a
> >> >>> problem, that if we would like to timetravel back to some old
> version,
> >> >>> where schema was slightly different, then we would have a conflict
> >> since
> >> >>> Catalog would return current schema and not how it was for version
> X.
> >> >>>
> >> >>> With your change, our Delta Catalog can actually fetch schema for
> >> >> version X
> >> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
> >> >> current
> >> >>> version. What we would also need however is version
> (number/timestamp)
> >> >> for
> >> >>> this table passed to DynamicTableFactory so we could properly set
> >> Delta
> >> >>> standalone library.
> >> >>>
> >> >>> Regards,
> >> >>> Krzysztof
> >> >>>
> >> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> >> >>> krzysiek.chmielewski@gmail.com> napisał(a):
> >> >>>
> >> >>>> Hi,
> >> >>>> happy to see such a feature.
> >> >>>> Small note from my end regarding Catalog changes.
> >> >>>>
> >> >>>> TL;DR
> >> >>>> I don't think it is necessary to delegate this feature to the
> >> catalog.
> >> >> I
> >> >>>> think that since "timetravel" is per job/query property, its should
> >> not
> >> >>> be
> >> >>>> coupled with the Catalog or table definition. In my opinion this is
> >> >>>> something that DynamicTableFactory only has to know about. I would
> >> >> rather
> >> >>>> see this feature as it is - SQL syntax enhancement but delegate
> >> clearly
> >> >>> to
> >> >>>> DynamicTableFactory.
> >> >>>>
> >> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
> >> >>>> current Flink API.
> >> >>>> Docs are pending code review, but you can find them here [2] and
> >> >> examples
> >> >>>> are available here [3]
> >> >>>>
> >> >>>> The timetravel feature that I've implemented is based on Flink
> Query
> >> >>>> hints.
> >> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >> >>>>
> >> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
> >> handled
> >> >>> not
> >> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> >> >>> connector.
> >> >>>> The value of this property is passed to Delta standalone lib API
> that
> >> >>>> returns table view for given version.
> >> >>>>
> >> >>>> I'm not sure how/if proposed change could benefit Delta connector
> >> >>>> implementation for this feature.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Krzysztof
> >> >>>>
> >> >>>> [1]
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> >> >>>> [2]
> >> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> >> >>>> [3]
> >> >>>>
> >> >>>
> >> >>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >> >>>>
> >> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
> >> >>>>
> >> >>>>> Hi, Feng
> >> >>>>>
> >> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
> >> >>>>> integrate with data lake system. I have one question why the
> >> >>>>> implementation
> >> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink
> to
> >> >>> query
> >> >>>>> Hudi table with the time travel syntax, but we don't use the
> >> >>> HudiCatalog,
> >> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
> >> >> support
> >> >>>>> time travel for Hudi table in this case?
> >> >>>>> In contrast, I think time travel should bind to connector instead
> of
> >> >>>>> Catalog, so the rejected alternative should be considered.
> >> >>>>>
> >> >>>>> Best,
> >> >>>>> Ron
> >> >>>>>
> >> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> >> >>>>>
> >> >>>>>> Hi, Feng.
> >> >>>>>> Notice this FLIP only support batch mode for time travel.  Would
> it
> >> >>> also
> >> >>>>>> make sense to support stream mode to a read a snapshot of the
> table
> >> >>> as a
> >> >>>>>> bounded stream?
> >> >>>>>>
> >> >>>>>> Best regards,
> >> >>>>>> Yuxia
> >> >>>>>>
> >> >>>>>> ----- 原始邮件 -----
> >> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >> >>>>>> 收件人: "dev" <de...@flink.apache.org>
> >> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >> >>>>>>
> >> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> >> >>>>>>
> >> >>>>>> This also depends on whether this is defined in standard or any
> >> >> known
> >> >>>>>> databases that have implemented this. If not, it would be hard to
> >> >> push
> >> >>>>> it
> >> >>>>>> to Calcite.
> >> >>>>>>
> >> >>>>>> # getTable(ObjectPath object, long timestamp)
> >> >>>>>>
> >> >>>>>> Then we again come to the problem of "casting between timestamp
> and
> >> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're
> >> gonna
> >> >>> use
> >> >>>>>> this, then we need to clarify that problem first.
> >> >>>>>>
> >> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> >> >>>>>>
> >> >>>>>>> hi, thanks for your reply.
> >> >>>>>>>
> >> >>>>>>> @Benchao
> >> >>>>>>>> did you consider the pushdown abilities compatible
> >> >>>>>>>
> >> >>>>>>> In the current design, the implementation of TimeTravel is
> >> >> delegated
> >> >>>>> to
> >> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
> >> >>>>> tablePath,
> >> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at
> a
> >> >>>>>> specific
> >> >>>>>>> time.  Therefore, I think it will not have any impact on the
> >> >>> original
> >> >>>>>>> pushdown abilities.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>>  I see there is a rejected  design for adding
> >> >> SupportsTimeTravel,
> >> >>>>> but
> >> >>>>>> I
> >> >>>>>>> didn't see the alternative in  the FLIP doc
> >> >>>>>>>
> >> >>>>>>> Sorry, the document description is not very clear.  Regarding
> >> >>> whether
> >> >>>>> to
> >> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
> >> >> we
> >> >>>>> have
> >> >>>>>>> already passed the corresponding time in getTable(ObjectPath,
> long
> >> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
> >> >>>>>>>
> >> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain
> the
> >> >>>>> schema
> >> >>>>>> of
> >> >>>>>>> the corresponding time point and put the SNAPSHOT that needs to
> be
> >> >>>>>> consumed
> >> >>>>>>> into options.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> @Shammon
> >> >>>>>>>> Could we support this in Flink too?
> >> >>>>>>>
> >> >>>>>>> I personally think it's possible, but limited by Calcite's
> syntax
> >> >>>>>>> restrictions. I believe we should first support this syntax in
> >> >>>>> Calcite.
> >> >>>>>>> Currently, I think it may not be easy  to support this syntax in
> >> >>>>> Flink's
> >> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
> >> >> syntax
> >> >>>>>>> ` VERSION AS OF`  ?
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> Feng.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
> >> >>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
> >> >>>>>>>>
> >> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> >> >> iceberg
> >> >>>>>>> support
> >> >>>>>>>> snapshot or version. For example, users can query snapshot 1
> for
> >> >>>>> paimon
> >> >>>>>>> by
> >> >>>>>>>> the following statement
> >> >>>>>>>> SELECT * FROM t VERSION AS OF 1
> >> >>>>>>>>
> >> >>>>>>>> Could we support this in Flink too?
> >> >>>>>>>>
> >> >>>>>>>> Best,
> >> >>>>>>>> Shammon FY
> >> >>>>>>>>
> >> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> >> >> libenchao@apache.org>
> >> >>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Regarding the implementation, did you consider the pushdown
> >> >>>>> abilities
> >> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> >> >>> partition
> >> >>>>>>>> pushdown.
> >> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> >> >> have a
> >> >>>>>>> concern
> >> >>>>>>>>> about this. Of course, it depends on your implementation
> >> >> detail,
> >> >>>>> what
> >> >>>>>>> is
> >> >>>>>>>>> important is that we'd better add some cross tests for these.
> >> >>>>>>>>>
> >> >>>>>>>>> Regarding the interface exposed to Connector, I see there is a
> >> >>>>>> rejected
> >> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> >> >>>>>> alternative
> >> >>>>>>> in
> >> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> >> >> clarify
> >> >>>>>>> because
> >> >>>>>>>> we
> >> >>>>>>>>> need to know whether the Connector supports this, and what
> >> >>>>>>>> column/metadata
> >> >>>>>>>>> corresponds to 'system_time'.
> >> >>>>>>>>>
> >> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> >> >>>>>>>>>
> >> >>>>>>>>>> Thanks for your reply
> >> >>>>>>>>>>
> >> >>>>>>>>>> @Timo @BenChao @yuxia
> >> >>>>>>>>>>
> >> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> >> >>> `FOR
> >> >>>>>>>>> SYSTEM_TIME
> >> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> >> >> OF`
> >> >>> .
> >> >>>>>> I've
> >> >>>>>>>>>> updated the syntax part of the FLIP.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> @Timo
> >> >>>>>>>>>>
> >> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> >> >>>>>>>>>>
> >> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> >> >> and
> >> >>>>> then
> >> >>>>>>>>> convert
> >> >>>>>>>>>> it into a long value.
> >> >>>>>>>>>>
> >> >>>>>>>>>>> How do we want to query the most recent version of a table
> >> >>>>>>>>>>
> >> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> >> >>> cause
> >> >>>>>>>>>> inconsistency with the real-time concept.
> >> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
> >> >> OF
> >> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> >> >> record.
> >> >>>>> So,
> >> >>>>>> I
> >> >>>>>>>>> think
> >> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> >> >>>>>>>>>> Additionally, if no version is specified, the latest version
> >> >>>>> should
> >> >>>>>>> be
> >> >>>>>>>>> used
> >> >>>>>>>>>> by default.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Best,
> >> >>>>>>>>>> Feng
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> >> >>>>> luoyuxia@alumni.sjtu.edu.cn
> >> >>>>>>>
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> >> >>> introduce
> >> >>>>>> time
> >> >>>>>>>>> travel
> >> >>>>>>>>>>> to Flink to have a better integration with external data
> >> >>>>> soruces.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I also share same concern about the syntax.
> >> >>>>>>>>>>> I see in the part of `Whether to support other syntax
> >> >>>>>>>> implementations`
> >> >>>>>>>>> in
> >> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> >> >>>>> SYSTEM_TIME
> >> >>>>>>> AS
> >> >>>>>>>>> OF`,
> >> >>>>>>>>>>> right?
> >> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> >> >> OF
> >> >>>>>>>> TIMESTAMP`
> >> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> >> >> or
> >> >>>>> by
> >> >>>>>>>> design?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Best regards,
> >> >>>>>>>>>>> Yuxia
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> ----- 原始邮件 -----
> >> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> >> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> >> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> >> >>> Mode
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> >> >>>>> instead
> >> >>>>>> of
> >> >>>>>>>> `FOR
> >> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> >> >>> the
> >> >>>>> SQL
> >> >>>>>>>>>> standard
> >> >>>>>>>>>>> and has been supported in some database vendors such as
> >> >> SQL
> >> >>>>>> Server.
> >> >>>>>>>>> About
> >> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> >> >>> supports
> >> >>>>>>> this,
> >> >>>>>>>> if
> >> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> >> >> and
> >> >>> I
> >> >>>>>> would
> >> >>>>>>>>> give
> >> >>>>>>>>>> a
> >> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> >> >> use
> >> >>>>> `FOR
> >> >>>>>>>> SYSTEM
> >> >>>>>>>>>> AS
> >> >>>>>>>>>>> OF`.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Also: How do we want to query the most recent version
> >> >> of a
> >> >>>>>> table?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> >> >> to
> >> >>>>> the
> >> >>>>>>> docs
> >> >>>>>>>>>> both
> >> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> >> >> concerning
> >> >>>>> is
> >> >>>>>> the
> >> >>>>>>>> it
> >> >>>>>>>>>>>> actually is evalated row-based:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> >> >>> zone,
> >> >>>>>> the
> >> >>>>>>>>> return
> >> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> >> >> record
> >> >>> in
> >> >>>>>>>>> streaming
> >> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> >> >> query
> >> >>>>>> starts
> >> >>>>>>>> and
> >> >>>>>>>>>>>> uses the same result for every row.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> This could make it difficult to explain in a join
> >> >> scenario
> >> >>>>> of
> >> >>>>>>>>> multiple
> >> >>>>>>>>>>>> snapshotted tables.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Regards,
> >> >>>>>>>>>>>> Timo
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> >> >>>>>>>>>>>>> Hi Feng,
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> >> >> sense
> >> >>> to
> >> >>>>>>>> finally
> >> >>>>>>>>>>>>> support querying tables at a specific point in time or
> >> >>>>>>> hopefully
> >> >>>>>>>>> also
> >> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Here is some feedback from my side:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> 1. Syntax
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> >> >>> but
> >> >>>>>> not
> >> >>>>>>>> `FOR
> >> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> It would be great to support `AS OF` also for
> >> >>>>> time-versioned
> >> >>>>>>>> joins
> >> >>>>>>>>>> and
> >> >>>>>>>>>>>>> have a unified and short syntax.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> >> >>> this
> >> >>>>>>>>> available
> >> >>>>>>>>>> in
> >> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> >> >> until
> >> >>>>> the
> >> >>>>>>> next
> >> >>>>>>>>>>>>> Calcite upgrade is performed.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> 2. Semantics
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> >> >>>>>> timestamp
> >> >>>>>>>>> types
> >> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> >> >>>>>> TIMESTAMP
> >> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> >> >>>>> timestamp
> >> >>>>>> be?
> >> >>>>>>>> We
> >> >>>>>>>>>> will
> >> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> We definely need to clarify this because the past has
> >> >>>>> shown
> >> >>>>>>> that
> >> >>>>>>>>>>>>> daylight saving times make our lives hard.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>>> Timo
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> >> >>>>>>>>>>>>>> Hi, everyone.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> >> >> Support
> >> >>>>> Time
> >> >>>>>>>> Travel
> >> >>>>>>>>>> In
> >> >>>>>>>>>>>>>> Batch
> >> >>>>>>>>>>>>>> Mode [1]
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> >> >>>>>> versions
> >> >>>>>>> of
> >> >>>>>>>>>> data.
> >> >>>>>>>>>>>> It
> >> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> >> >>> the
> >> >>>>>> data
> >> >>>>>>>> and
> >> >>>>>>>>>>>>>> schema of
> >> >>>>>>>>>>>>>> a table as it appeared at that time. With time
> >> >> travel,
> >> >>>>> users
> >> >>>>>>> can
> >> >>>>>>>>>>> easily
> >> >>>>>>>>>>>>>> analyze and compare historical versions of data.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
> >> >>>>> Paimon,
> >> >>>>>>>>>> Iceberg,
> >> >>>>>>>>>>>> and
> >> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> >> >>> users'
> >> >>>>>> data
> >> >>>>>>>>>>> analysis.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> >> >>>>>>> welcomed.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> 1.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Best.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Feng
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> --
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Best,
> >> >>>>>>>>>>> Benchao Li
> >> >>>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>>
> >> >>>>>>>>> Best,
> >> >>>>>>>>> Benchao Li
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>>
> >> >>>>>> Best,
> >> >>>>>> Benchao Li
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by yuxia <lu...@alumni.sjtu.edu.cn>.
Hi, Feng Jin.
I think the concern of Leonard may be the inconsistency of the behavior of TIMESTAMP '2023-04-27 00:00:00' beween timetravel and other sql statement.

For the normal sql:
`SELECT TIMESTAMP '2023-04-27 00:00:00'`, we won't consider timezone.
But for the sql for timetravl:
`SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP '2023-04-27 00:00:00'`, we will consider the timezone and convert to UTC timestamp.

The concern is valid. But for time travel, most style of engines, Spark[1], Hive[2], Trino[3] also do the time conversion with considering the seesion time zone. I think we can align them to avoid the inconsistency to other engines and provide convenience for the external connectors while integrating Flink's time travel API.

Regarding the inconsistency, I think we can consider time-travel as a specical case, and we do needs to highlight this in this FLIP.
As for "violate the restriction outlined in FLINK-21978[1]", since we cast timestamp to epochMillis only for the internal use, and won't expose it to users, I don't think it will violate the restriction.
Btw, please add a brief desc to explain the meaning of the parameter `timestamp` in method `CatalogBaseTable getTable(ObjectPath tablePath, long timestamp)`. Maybe something like "timestamp of the table snapt, which is millseconds since 1970-01-01 00:00:00 UTC".

[1] https://github.com/apache/spark/blob/0ed48feab65f2d86f5dda3e16bd53f2f795f5bc5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TimeTravelSpec.scala#L56
[2] https://github.com/apache/hive/blob/f5e69dc38d7ff26c70be19adc9d1a3ae90dc4cf2/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L989
[3] https://github.com/trinodb/trino/blob/2433d9e60f1abb0d85c32374c1758525560e1a86/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L443


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Feng Jin" <ji...@gmail.com>
收件人: "dev" <de...@flink.apache.org>
发送时间: 星期二, 2023年 6 月 06日 下午 10:15:47
主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Hi everyone

Thanks everyone for your input.


@Yun

>  I think you could add descriptions of how to align backfill time travel
with querying the latest data. And I think you should also update the
"Discussion thread" in the original FLIP.

Thank you for the suggestion, I will update it in the document.

>  I have a question about getting the table schema from the catalog. I'm
not sure whether the Catalog#getTable(tablePath, timestamp) will be called
only once.

I understand that in a query, the schema of the table is determined before
execution. The schema used will be based on the latest schema within the
TimeTravel period.

In addition, due to current syntax limitations, we are unable to support
the use of BETWEEN AND.


@Jing

>  Would you like to update your thoughts described in your previous email
about why SupportsTimeTravel has been rejected into the FLIP?

Sure,  I updated the doc.


>    Since we always directly add overload methods into Catalog according
to new requirements, which makes the interface bloated

Your concern is valid. If we need to support the long type version in the
future, we may have to add another method "getTable(ObjectPath, long
version)". However, I understand that
"Catalog.getTable(tablePath).on(timeStamp)" may not meet the requirements.
The timestamp is for Catalog's use, and Catalog obtains the corresponding
schema based on this time.


@liu @Regards

I am very sorry for the unclear description in the document. I have updated
relevant descriptions regarding why it needs to be implemented in Catalog.

Travel not only requires obtaining data at the corresponding time point,
but also requires the corresponding Schema at that time point


@Shammon

> Flink or connector such as  iceberg/paimon can create sources from the
`CatalogBaseTable` directly without the need to get the snapshot ID from
`CatalogTable.getSnapshot()`.  What do you think of it?

You are right, we don't need the getSnapshot interface for PaimonCatalog or
IcebergCatalog tables, but we may need it for temporary tables.



Best,
Feng


On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:

> Sorry I replied to the wrong mail. Please ignore the last email.
>
>
> Hi Leonard
>
> > 1. Unification SQL
>
> I agree that it is crucial for us to support both batch and streaming
> processing.  The current design allows for the support of both batch and
> streaming processing. I'll update the FLIP later.
>
>
> >  2.Semantics
>
> In my opinion, it would be feasible to perform the conversion based on the
> current session time, regardless of whether it is TIMESTAMP or
> TIMESTAMP_LTZ.
>
> However, this may indeed violate the restriction outlined in
> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to whether it
> is reasonable.
>
>
> >   3.  Some external systems may use timestamp value to mark a version,
> but others may use version number、file position、log offset.
>
> It is true that most systems support time-related operations, and I
> believe that the current design is compatible with most systems. However,
> if we want to support long data type, it may require Calcite to support the
> VERSION AS OF syntax. I understand that this is something that we may need
> to consider in the future.
>
>
> Best,
> Feng
>
> [1] https://issues.apache.org/jira/browse/FLINK-21978
>
> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
>
>> Hi, Feng
>>
>> Thanks for driving this FLIP, very impressive feature that users want,
>> I’ve some quick questions here.
>>
>> 1.Unification SQL:
>>         The snapshot  concept exists both in Batch mode and Streaming
>> mode,  could we consider a unified proposal? I think users won’t another
>> SQL syntax named
>> Time travel for Streaming mode.
>>
>> 2.Semantics:
>>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long
>> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need
>> two information i.e. a TIMESTAMP value and current session timezone,  how
>> we deal the value with current proposed SQL syntax.
>>
>> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
>> external table?   Some external systems may use timestamp value to mark a
>> version, but others may use version number、file position、log offset.
>>
>> Best,
>> Leonard
>>
>>
>>
>> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
>> >
>> > Hi Feng,
>> >
>> > I think this FLIP would provide one important feature to unify the
>> stream-SQL and batch-SQL when we backfill the historical data in batch mode.
>> >
>> > For the "Syntax" session, I think you could add descriptions of how to
>> align backfill time travel with querying the latest data. And I think you
>> should also update the "Discussion thread" in the original FLIP.
>> >
>> > Moreover, I have a question about getting the table schema from the
>> catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp)
>> will be called only once. If we have a backfill query between 2023-05-29
>> and 2023-06-04 in the past week, and the table schema changed on
>> 2023-06-01, will the query below detect the schema changes during backfill
>> the whole week?
>> >
>> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
>> >
>> > Best
>> > Yun Tang
>> >
>> >
>> > ________________________________
>> > From: Shammon FY <zj...@gmail.com>
>> > Sent: Thursday, June 1, 2023 17:57
>> > To: dev@flink.apache.org <de...@flink.apache.org>
>> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>> >
>> > Hi Feng,
>> >
>> > I have one minor comment about the public interface `Optional<Long>
>> > getSnapshot()` in the `CatalogTable`.
>> >
>> > As we can get tables from the new method `Catalog.getTable(ObjectPath
>> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable`
>> will
>> > have the information of timestamp. Flink or connector such as
>> > iceberg/paimon can create sources from the `CatalogBaseTable` directly
>> > without the need to get the snapshot ID from
>> `CatalogTable.getSnapshot()`.
>> > What do you think of it?
>> >
>> > Best,
>> > Shammon FY
>> >
>> >
>> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
>> wrote:
>> >
>> >> Hi Feng,
>> >>
>> >> Thanks for the proposal! Very interesting feature. Would you like to
>> update
>> >> your thoughts described in your previous email about why
>> SupportsTimeTravel
>> >> has been rejected into the FLIP? This will help readers understand the
>> >> context (in the future).
>> >>
>> >> Since we always directly add overload methods into Catalog according
>> to new
>> >> requirements, which makes the interface bloated. Just out of curiosity,
>> >> does it make sense to introduce some DSL design? Like
>> >> Catalog.getTable(tablePath).on(timeStamp),
>> >> Catalog.getTable(tablePath).current() for the most current version, and
>> >> more room for further extension like timestamp range, etc. I haven't
>> read
>> >> all the source code yet and I'm not sure if it is possible. But a
>> >> design like this will keep the Catalog API lean and the API/DSL will be
>> >> self described and easier to use.
>> >>
>> >> Best regards,
>> >> Jing
>> >>
>> >>
>> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
>> >> krzysiek.chmielewski@gmail.com> wrote:
>> >>
>> >>> Ok after second though I'm retracting my previous statement about
>> Catalog
>> >>> changes you proposed.
>> >>> I do see a benefit for Delta connector actually with this change and
>> see
>> >>> why this could be coupled with Catalog.
>> >>>
>> >>> Delta Connector SQL support, also ships a Delta Catalog implementation
>> >> for
>> >>> Flink.
>> >>> For Delta Catalog, table schema information is fetched from underlying
>> >>> _delta_log and not stored in metastore. For time travel we actually
>> had a
>> >>> problem, that if we would like to timetravel back to some old version,
>> >>> where schema was slightly different, then we would have a conflict
>> since
>> >>> Catalog would return current schema and not how it was for version X.
>> >>>
>> >>> With your change, our Delta Catalog can actually fetch schema for
>> >> version X
>> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
>> >> current
>> >>> version. What we would also need however is version (number/timestamp)
>> >> for
>> >>> this table passed to DynamicTableFactory so we could properly set
>> Delta
>> >>> standalone library.
>> >>>
>> >>> Regards,
>> >>> Krzysztof
>> >>>
>> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
>> >>> krzysiek.chmielewski@gmail.com> napisał(a):
>> >>>
>> >>>> Hi,
>> >>>> happy to see such a feature.
>> >>>> Small note from my end regarding Catalog changes.
>> >>>>
>> >>>> TL;DR
>> >>>> I don't think it is necessary to delegate this feature to the
>> catalog.
>> >> I
>> >>>> think that since "timetravel" is per job/query property, its should
>> not
>> >>> be
>> >>>> coupled with the Catalog or table definition. In my opinion this is
>> >>>> something that DynamicTableFactory only has to know about. I would
>> >> rather
>> >>>> see this feature as it is - SQL syntax enhancement but delegate
>> clearly
>> >>> to
>> >>>> DynamicTableFactory.
>> >>>>
>> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
>> >>>> current Flink API.
>> >>>> Docs are pending code review, but you can find them here [2] and
>> >> examples
>> >>>> are available here [3]
>> >>>>
>> >>>> The timetravel feature that I've implemented is based on Flink Query
>> >>>> hints.
>> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
>> >>>>
>> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
>> handled
>> >>> not
>> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
>> >>> connector.
>> >>>> The value of this property is passed to Delta standalone lib API that
>> >>>> returns table view for given version.
>> >>>>
>> >>>> I'm not sure how/if proposed change could benefit Delta connector
>> >>>> implementation for this feature.
>> >>>>
>> >>>> Thanks,
>> >>>> Krzysztof
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>
>> >>
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
>> >>>> [2]
>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
>> >>>> [3]
>> >>>>
>> >>>
>> >>
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
>> >>>>
>> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
>> >>>>
>> >>>>> Hi, Feng
>> >>>>>
>> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
>> >>>>> integrate with data lake system. I have one question why the
>> >>>>> implementation
>> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
>> >>> query
>> >>>>> Hudi table with the time travel syntax, but we don't use the
>> >>> HudiCatalog,
>> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
>> >> support
>> >>>>> time travel for Hudi table in this case?
>> >>>>> In contrast, I think time travel should bind to connector instead of
>> >>>>> Catalog, so the rejected alternative should be considered.
>> >>>>>
>> >>>>> Best,
>> >>>>> Ron
>> >>>>>
>> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
>> >>>>>
>> >>>>>> Hi, Feng.
>> >>>>>> Notice this FLIP only support batch mode for time travel.  Would it
>> >>> also
>> >>>>>> make sense to support stream mode to a read a snapshot of the table
>> >>> as a
>> >>>>>> bounded stream?
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Yuxia
>> >>>>>>
>> >>>>>> ----- 原始邮件 -----
>> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> >>>>>> 收件人: "dev" <de...@flink.apache.org>
>> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
>> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>> >>>>>>
>> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
>> >>>>>>
>> >>>>>> This also depends on whether this is defined in standard or any
>> >> known
>> >>>>>> databases that have implemented this. If not, it would be hard to
>> >> push
>> >>>>> it
>> >>>>>> to Calcite.
>> >>>>>>
>> >>>>>> # getTable(ObjectPath object, long timestamp)
>> >>>>>>
>> >>>>>> Then we again come to the problem of "casting between timestamp and
>> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're
>> gonna
>> >>> use
>> >>>>>> this, then we need to clarify that problem first.
>> >>>>>>
>> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>> >>>>>>
>> >>>>>>
>> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
>> >>>>>>
>> >>>>>>> hi, thanks for your reply.
>> >>>>>>>
>> >>>>>>> @Benchao
>> >>>>>>>> did you consider the pushdown abilities compatible
>> >>>>>>>
>> >>>>>>> In the current design, the implementation of TimeTravel is
>> >> delegated
>> >>>>> to
>> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
>> >>>>> tablePath,
>> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at a
>> >>>>>> specific
>> >>>>>>> time.  Therefore, I think it will not have any impact on the
>> >>> original
>> >>>>>>> pushdown abilities.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>  I see there is a rejected  design for adding
>> >> SupportsTimeTravel,
>> >>>>> but
>> >>>>>> I
>> >>>>>>> didn't see the alternative in  the FLIP doc
>> >>>>>>>
>> >>>>>>> Sorry, the document description is not very clear.  Regarding
>> >>> whether
>> >>>>> to
>> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
>> >> we
>> >>>>> have
>> >>>>>>> already passed the corresponding time in getTable(ObjectPath, long
>> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
>> >>>>>>>
>> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain the
>> >>>>> schema
>> >>>>>> of
>> >>>>>>> the corresponding time point and put the SNAPSHOT that needs to be
>> >>>>>> consumed
>> >>>>>>> into options.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> @Shammon
>> >>>>>>>> Could we support this in Flink too?
>> >>>>>>>
>> >>>>>>> I personally think it's possible, but limited by Calcite's syntax
>> >>>>>>> restrictions. I believe we should first support this syntax in
>> >>>>> Calcite.
>> >>>>>>> Currently, I think it may not be easy  to support this syntax in
>> >>>>> Flink's
>> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
>> >> syntax
>> >>>>>>> ` VERSION AS OF`  ?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Feng.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
>> >>>>>>>>
>> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
>> >> iceberg
>> >>>>>>> support
>> >>>>>>>> snapshot or version. For example, users can query snapshot 1 for
>> >>>>> paimon
>> >>>>>>> by
>> >>>>>>>> the following statement
>> >>>>>>>> SELECT * FROM t VERSION AS OF 1
>> >>>>>>>>
>> >>>>>>>> Could we support this in Flink too?
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Shammon FY
>> >>>>>>>>
>> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
>> >> libenchao@apache.org>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Regarding the implementation, did you consider the pushdown
>> >>>>> abilities
>> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
>> >>> partition
>> >>>>>>>> pushdown.
>> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
>> >> have a
>> >>>>>>> concern
>> >>>>>>>>> about this. Of course, it depends on your implementation
>> >> detail,
>> >>>>> what
>> >>>>>>> is
>> >>>>>>>>> important is that we'd better add some cross tests for these.
>> >>>>>>>>>
>> >>>>>>>>> Regarding the interface exposed to Connector, I see there is a
>> >>>>>> rejected
>> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
>> >>>>>> alternative
>> >>>>>>> in
>> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
>> >> clarify
>> >>>>>>> because
>> >>>>>>>> we
>> >>>>>>>>> need to know whether the Connector supports this, and what
>> >>>>>>>> column/metadata
>> >>>>>>>>> corresponds to 'system_time'.
>> >>>>>>>>>
>> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks for your reply
>> >>>>>>>>>>
>> >>>>>>>>>> @Timo @BenChao @yuxia
>> >>>>>>>>>>
>> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
>> >>> `FOR
>> >>>>>>>>> SYSTEM_TIME
>> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
>> >> OF`
>> >>> .
>> >>>>>> I've
>> >>>>>>>>>> updated the syntax part of the FLIP.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> @Timo
>> >>>>>>>>>>
>> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
>> >>>>>>>>>>
>> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
>> >> and
>> >>>>> then
>> >>>>>>>>> convert
>> >>>>>>>>>> it into a long value.
>> >>>>>>>>>>
>> >>>>>>>>>>> How do we want to query the most recent version of a table
>> >>>>>>>>>>
>> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
>> >>> cause
>> >>>>>>>>>> inconsistency with the real-time concept.
>> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
>> >> OF
>> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
>> >> record.
>> >>>>> So,
>> >>>>>> I
>> >>>>>>>>> think
>> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
>> >>>>>>>>>> Additionally, if no version is specified, the latest version
>> >>>>> should
>> >>>>>>> be
>> >>>>>>>>> used
>> >>>>>>>>>> by default.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Feng
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
>> >>>>> luoyuxia@alumni.sjtu.edu.cn
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
>> >>> introduce
>> >>>>>> time
>> >>>>>>>>> travel
>> >>>>>>>>>>> to Flink to have a better integration with external data
>> >>>>> soruces.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I also share same concern about the syntax.
>> >>>>>>>>>>> I see in the part of `Whether to support other syntax
>> >>>>>>>> implementations`
>> >>>>>>>>> in
>> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
>> >>>>> SYSTEM_TIME
>> >>>>>>> AS
>> >>>>>>>>> OF`,
>> >>>>>>>>>>> right?
>> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
>> >> OF
>> >>>>>>>> TIMESTAMP`
>> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
>> >> or
>> >>>>> by
>> >>>>>>>> design?
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best regards,
>> >>>>>>>>>>> Yuxia
>> >>>>>>>>>>>
>> >>>>>>>>>>> ----- 原始邮件 -----
>> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
>> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>> >>> Mode
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
>> >>>>> instead
>> >>>>>> of
>> >>>>>>>> `FOR
>> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
>> >>> the
>> >>>>> SQL
>> >>>>>>>>>> standard
>> >>>>>>>>>>> and has been supported in some database vendors such as
>> >> SQL
>> >>>>>> Server.
>> >>>>>>>>> About
>> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
>> >>> supports
>> >>>>>>> this,
>> >>>>>>>> if
>> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
>> >> and
>> >>> I
>> >>>>>> would
>> >>>>>>>>> give
>> >>>>>>>>>> a
>> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
>> >> use
>> >>>>> `FOR
>> >>>>>>>> SYSTEM
>> >>>>>>>>>> AS
>> >>>>>>>>>>> OF`.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Also: How do we want to query the most recent version
>> >> of a
>> >>>>>> table?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
>> >> to
>> >>>>> the
>> >>>>>>> docs
>> >>>>>>>>>> both
>> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
>> >> concerning
>> >>>>> is
>> >>>>>> the
>> >>>>>>>> it
>> >>>>>>>>>>>> actually is evalated row-based:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
>> >>> zone,
>> >>>>>> the
>> >>>>>>>>> return
>> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
>> >> record
>> >>> in
>> >>>>>>>>> streaming
>> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
>> >> query
>> >>>>>> starts
>> >>>>>>>> and
>> >>>>>>>>>>>> uses the same result for every row.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This could make it difficult to explain in a join
>> >> scenario
>> >>>>> of
>> >>>>>>>>> multiple
>> >>>>>>>>>>>> snapshotted tables.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regards,
>> >>>>>>>>>>>> Timo
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
>> >>>>>>>>>>>>> Hi Feng,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
>> >> sense
>> >>> to
>> >>>>>>>> finally
>> >>>>>>>>>>>>> support querying tables at a specific point in time or
>> >>>>>>> hopefully
>> >>>>>>>>> also
>> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Here is some feedback from my side:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> 1. Syntax
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
>> >>> but
>> >>>>>> not
>> >>>>>>>> `FOR
>> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> It would be great to support `AS OF` also for
>> >>>>> time-versioned
>> >>>>>>>> joins
>> >>>>>>>>>> and
>> >>>>>>>>>>>>> have a unified and short syntax.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
>> >>> this
>> >>>>>>>>> available
>> >>>>>>>>>> in
>> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
>> >> until
>> >>>>> the
>> >>>>>>> next
>> >>>>>>>>>>>>> Calcite upgrade is performed.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> 2. Semantics
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
>> >>>>>> timestamp
>> >>>>>>>>> types
>> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
>> >>>>>> TIMESTAMP
>> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
>> >>>>> timestamp
>> >>>>>> be?
>> >>>>>>>> We
>> >>>>>>>>>> will
>> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We definely need to clarify this because the past has
>> >>>>> shown
>> >>>>>>> that
>> >>>>>>>>>>>>> daylight saving times make our lives hard.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
>> >>>>>>>>>>>>>> Hi, everyone.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
>> >> Support
>> >>>>> Time
>> >>>>>>>> Travel
>> >>>>>>>>>> In
>> >>>>>>>>>>>>>> Batch
>> >>>>>>>>>>>>>> Mode [1]
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
>> >>>>>> versions
>> >>>>>>> of
>> >>>>>>>>>> data.
>> >>>>>>>>>>>> It
>> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
>> >>> the
>> >>>>>> data
>> >>>>>>>> and
>> >>>>>>>>>>>>>> schema of
>> >>>>>>>>>>>>>> a table as it appeared at that time. With time
>> >> travel,
>> >>>>> users
>> >>>>>>> can
>> >>>>>>>>>>> easily
>> >>>>>>>>>>>>>> analyze and compare historical versions of data.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
>> >>>>> Paimon,
>> >>>>>>>>>> Iceberg,
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
>> >>> users'
>> >>>>>> data
>> >>>>>>>>>>> analysis.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
>> >>>>>>> welcomed.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 1.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Best.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Feng
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Benchao Li
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Benchao Li
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Benchao Li
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Hi Leonard,

Thanks for your reply.

> 1. a  How to construct a CatalogDescriptor ?

I think it would be helpful to add a method for constructing a
CatalogDescriptor, as you mentioned in 1.c. I will update the documentation
later.

> 1.b  How to visit the fields ? Could we use Configuration instead of
Map<String, String> ?

I believe that the use of Map<String, String> options is only intended for
creating a catalog and not for accessing internal parameters.

Since all of the relevant parameters for CREATE CATALOG are also stored in
Map<String, String> options, my understanding is that using Map<String,
String> options should suffice.

Here is the implementation of execute CREATE CATALOG statement.
```java
private TableResultInternal createCatalog(CreateCatalogOperation operation)
{
String exMsg = getDDLOpExecuteErrorMsg(operation.asSummaryString());
try {
String catalogName = operation.getCatalogName();
Map<String, String> properties = operation.getProperties();

Catalog catalog =
FactoryUtil.createCatalog(
catalogName,
properties,
tableConfig,
resourceManager.getUserClassLoader());
catalogManager.registerCatalog(catalogName, catalog);

return TableResultImpl.TABLE_RESULT_OK;
} catch (CatalogException e) {
throw new ValidationException(exMsg, e);
}
}
```


>  2. Do we have plan to offer a default CatalogStore if user didn’t config
this?

Yes, the in-memory catalogStore will be used as the default CatalogStore
even if the user has not configured one


Best,
Feng



On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:

> Hi, Feng
>
> Thanks for driving this FLIP, very impressive feature that users want,
> I’ve some quick questions here.
>
> 1.Unification SQL:
>         The snapshot  concept exists both in Batch mode and Streaming
> mode,  could we consider a unified proposal? I think users won’t another
> SQL syntax named
> Time travel for Streaming mode.
>
> 2.Semantics:
>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long
> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need
> two information i.e. a TIMESTAMP value and current session timezone,  how
> we deal the value with current proposed SQL syntax.
>
> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
> external table?   Some external systems may use timestamp value to mark a
> version, but others may use version number、file position、log offset.
>
> Best,
> Leonard
>
>
>
> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> >
> > Hi Feng,
> >
> > I think this FLIP would provide one important feature to unify the
> stream-SQL and batch-SQL when we backfill the historical data in batch mode.
> >
> > For the "Syntax" session, I think you could add descriptions of how to
> align backfill time travel with querying the latest data. And I think you
> should also update the "Discussion thread" in the original FLIP.
> >
> > Moreover, I have a question about getting the table schema from the
> catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp)
> will be called only once. If we have a backfill query between 2023-05-29
> and 2023-06-04 in the past week, and the table schema changed on
> 2023-06-01, will the query below detect the schema changes during backfill
> the whole week?
> >
> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> >
> > Best
> > Yun Tang
> >
> >
> > ________________________________
> > From: Shammon FY <zj...@gmail.com>
> > Sent: Thursday, June 1, 2023 17:57
> > To: dev@flink.apache.org <de...@flink.apache.org>
> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >
> > Hi Feng,
> >
> > I have one minor comment about the public interface `Optional<Long>
> > getSnapshot()` in the `CatalogTable`.
> >
> > As we can get tables from the new method `Catalog.getTable(ObjectPath
> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable` will
> > have the information of timestamp. Flink or connector such as
> > iceberg/paimon can create sources from the `CatalogBaseTable` directly
> > without the need to get the snapshot ID from
> `CatalogTable.getSnapshot()`.
> > What do you think of it?
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
> wrote:
> >
> >> Hi Feng,
> >>
> >> Thanks for the proposal! Very interesting feature. Would you like to
> update
> >> your thoughts described in your previous email about why
> SupportsTimeTravel
> >> has been rejected into the FLIP? This will help readers understand the
> >> context (in the future).
> >>
> >> Since we always directly add overload methods into Catalog according to
> new
> >> requirements, which makes the interface bloated. Just out of curiosity,
> >> does it make sense to introduce some DSL design? Like
> >> Catalog.getTable(tablePath).on(timeStamp),
> >> Catalog.getTable(tablePath).current() for the most current version, and
> >> more room for further extension like timestamp range, etc. I haven't
> read
> >> all the source code yet and I'm not sure if it is possible. But a
> >> design like this will keep the Catalog API lean and the API/DSL will be
> >> self described and easier to use.
> >>
> >> Best regards,
> >> Jing
> >>
> >>
> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> >> krzysiek.chmielewski@gmail.com> wrote:
> >>
> >>> Ok after second though I'm retracting my previous statement about
> Catalog
> >>> changes you proposed.
> >>> I do see a benefit for Delta connector actually with this change and
> see
> >>> why this could be coupled with Catalog.
> >>>
> >>> Delta Connector SQL support, also ships a Delta Catalog implementation
> >> for
> >>> Flink.
> >>> For Delta Catalog, table schema information is fetched from underlying
> >>> _delta_log and not stored in metastore. For time travel we actually
> had a
> >>> problem, that if we would like to timetravel back to some old version,
> >>> where schema was slightly different, then we would have a conflict
> since
> >>> Catalog would return current schema and not how it was for version X.
> >>>
> >>> With your change, our Delta Catalog can actually fetch schema for
> >> version X
> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
> >> current
> >>> version. What we would also need however is version (number/timestamp)
> >> for
> >>> this table passed to DynamicTableFactory so we could properly set Delta
> >>> standalone library.
> >>>
> >>> Regards,
> >>> Krzysztof
> >>>
> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> >>> krzysiek.chmielewski@gmail.com> napisał(a):
> >>>
> >>>> Hi,
> >>>> happy to see such a feature.
> >>>> Small note from my end regarding Catalog changes.
> >>>>
> >>>> TL;DR
> >>>> I don't think it is necessary to delegate this feature to the catalog.
> >> I
> >>>> think that since "timetravel" is per job/query property, its should
> not
> >>> be
> >>>> coupled with the Catalog or table definition. In my opinion this is
> >>>> something that DynamicTableFactory only has to know about. I would
> >> rather
> >>>> see this feature as it is - SQL syntax enhancement but delegate
> clearly
> >>> to
> >>>> DynamicTableFactory.
> >>>>
> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
> >>>> current Flink API.
> >>>> Docs are pending code review, but you can find them here [2] and
> >> examples
> >>>> are available here [3]
> >>>>
> >>>> The timetravel feature that I've implemented is based on Flink Query
> >>>> hints.
> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >>>>
> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
> >>> not
> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> >>> connector.
> >>>> The value of this property is passed to Delta standalone lib API that
> >>>> returns table view for given version.
> >>>>
> >>>> I'm not sure how/if proposed change could benefit Delta connector
> >>>> implementation for this feature.
> >>>>
> >>>> Thanks,
> >>>> Krzysztof
> >>>>
> >>>> [1]
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> >>>> [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> >>>> [3]
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >>>>
> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
> >>>>
> >>>>> Hi, Feng
> >>>>>
> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
> >>>>> integrate with data lake system. I have one question why the
> >>>>> implementation
> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
> >>> query
> >>>>> Hudi table with the time travel syntax, but we don't use the
> >>> HudiCatalog,
> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
> >> support
> >>>>> time travel for Hudi table in this case?
> >>>>> In contrast, I think time travel should bind to connector instead of
> >>>>> Catalog, so the rejected alternative should be considered.
> >>>>>
> >>>>> Best,
> >>>>> Ron
> >>>>>
> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> >>>>>
> >>>>>> Hi, Feng.
> >>>>>> Notice this FLIP only support batch mode for time travel.  Would it
> >>> also
> >>>>>> make sense to support stream mode to a read a snapshot of the table
> >>> as a
> >>>>>> bounded stream?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Yuxia
> >>>>>>
> >>>>>> ----- 原始邮件 -----
> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >>>>>>
> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> >>>>>>
> >>>>>> This also depends on whether this is defined in standard or any
> >> known
> >>>>>> databases that have implemented this. If not, it would be hard to
> >> push
> >>>>> it
> >>>>>> to Calcite.
> >>>>>>
> >>>>>> # getTable(ObjectPath object, long timestamp)
> >>>>>>
> >>>>>> Then we again come to the problem of "casting between timestamp and
> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're gonna
> >>> use
> >>>>>> this, then we need to clarify that problem first.
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> >>>>>>
> >>>>>>
> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> >>>>>>
> >>>>>>> hi, thanks for your reply.
> >>>>>>>
> >>>>>>> @Benchao
> >>>>>>>> did you consider the pushdown abilities compatible
> >>>>>>>
> >>>>>>> In the current design, the implementation of TimeTravel is
> >> delegated
> >>>>> to
> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
> >>>>> tablePath,
> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at a
> >>>>>> specific
> >>>>>>> time.  Therefore, I think it will not have any impact on the
> >>> original
> >>>>>>> pushdown abilities.
> >>>>>>>
> >>>>>>>
> >>>>>>>>  I see there is a rejected  design for adding
> >> SupportsTimeTravel,
> >>>>> but
> >>>>>> I
> >>>>>>> didn't see the alternative in  the FLIP doc
> >>>>>>>
> >>>>>>> Sorry, the document description is not very clear.  Regarding
> >>> whether
> >>>>> to
> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
> >> we
> >>>>> have
> >>>>>>> already passed the corresponding time in getTable(ObjectPath, long
> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
> >>>>>>>
> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain the
> >>>>> schema
> >>>>>> of
> >>>>>>> the corresponding time point and put the SNAPSHOT that needs to be
> >>>>>> consumed
> >>>>>>> into options.
> >>>>>>>
> >>>>>>>
> >>>>>>> @Shammon
> >>>>>>>> Could we support this in Flink too?
> >>>>>>>
> >>>>>>> I personally think it's possible, but limited by Calcite's syntax
> >>>>>>> restrictions. I believe we should first support this syntax in
> >>>>> Calcite.
> >>>>>>> Currently, I think it may not be easy  to support this syntax in
> >>>>> Flink's
> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
> >> syntax
> >>>>>>> ` VERSION AS OF`  ?
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Feng.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
> >>>>>>>>
> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> >> iceberg
> >>>>>>> support
> >>>>>>>> snapshot or version. For example, users can query snapshot 1 for
> >>>>> paimon
> >>>>>>> by
> >>>>>>>> the following statement
> >>>>>>>> SELECT * FROM t VERSION AS OF 1
> >>>>>>>>
> >>>>>>>> Could we support this in Flink too?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Shammon FY
> >>>>>>>>
> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> >> libenchao@apache.org>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Regarding the implementation, did you consider the pushdown
> >>>>> abilities
> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> >>> partition
> >>>>>>>> pushdown.
> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> >> have a
> >>>>>>> concern
> >>>>>>>>> about this. Of course, it depends on your implementation
> >> detail,
> >>>>> what
> >>>>>>> is
> >>>>>>>>> important is that we'd better add some cross tests for these.
> >>>>>>>>>
> >>>>>>>>> Regarding the interface exposed to Connector, I see there is a
> >>>>>> rejected
> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> >>>>>> alternative
> >>>>>>> in
> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> >> clarify
> >>>>>>> because
> >>>>>>>> we
> >>>>>>>>> need to know whether the Connector supports this, and what
> >>>>>>>> column/metadata
> >>>>>>>>> corresponds to 'system_time'.
> >>>>>>>>>
> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> >>>>>>>>>
> >>>>>>>>>> Thanks for your reply
> >>>>>>>>>>
> >>>>>>>>>> @Timo @BenChao @yuxia
> >>>>>>>>>>
> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> >>> `FOR
> >>>>>>>>> SYSTEM_TIME
> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> >> OF`
> >>> .
> >>>>>> I've
> >>>>>>>>>> updated the syntax part of the FLIP.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> @Timo
> >>>>>>>>>>
> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>
> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> >> and
> >>>>> then
> >>>>>>>>> convert
> >>>>>>>>>> it into a long value.
> >>>>>>>>>>
> >>>>>>>>>>> How do we want to query the most recent version of a table
> >>>>>>>>>>
> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> >>> cause
> >>>>>>>>>> inconsistency with the real-time concept.
> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
> >> OF
> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> >> record.
> >>>>> So,
> >>>>>> I
> >>>>>>>>> think
> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> >>>>>>>>>> Additionally, if no version is specified, the latest version
> >>>>> should
> >>>>>>> be
> >>>>>>>>> used
> >>>>>>>>>> by default.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Feng
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> >>>>> luoyuxia@alumni.sjtu.edu.cn
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> >>> introduce
> >>>>>> time
> >>>>>>>>> travel
> >>>>>>>>>>> to Flink to have a better integration with external data
> >>>>> soruces.
> >>>>>>>>>>>
> >>>>>>>>>>> I also share same concern about the syntax.
> >>>>>>>>>>> I see in the part of `Whether to support other syntax
> >>>>>>>> implementations`
> >>>>>>>>> in
> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> >>>>> SYSTEM_TIME
> >>>>>>> AS
> >>>>>>>>> OF`,
> >>>>>>>>>>> right?
> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> >> OF
> >>>>>>>> TIMESTAMP`
> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> >> or
> >>>>> by
> >>>>>>>> design?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Yuxia
> >>>>>>>>>>>
> >>>>>>>>>>> ----- 原始邮件 -----
> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> >>> Mode
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> >>>>> instead
> >>>>>> of
> >>>>>>>> `FOR
> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> >>> the
> >>>>> SQL
> >>>>>>>>>> standard
> >>>>>>>>>>> and has been supported in some database vendors such as
> >> SQL
> >>>>>> Server.
> >>>>>>>>> About
> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> >>> supports
> >>>>>>> this,
> >>>>>>>> if
> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> >> and
> >>> I
> >>>>>> would
> >>>>>>>>> give
> >>>>>>>>>> a
> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> >> use
> >>>>> `FOR
> >>>>>>>> SYSTEM
> >>>>>>>>>> AS
> >>>>>>>>>>> OF`.
> >>>>>>>>>>>
> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Also: How do we want to query the most recent version
> >> of a
> >>>>>> table?
> >>>>>>>>>>>>
> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> >> to
> >>>>> the
> >>>>>>> docs
> >>>>>>>>>> both
> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> >> concerning
> >>>>> is
> >>>>>> the
> >>>>>>>> it
> >>>>>>>>>>>> actually is evalated row-based:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> >>> zone,
> >>>>>> the
> >>>>>>>>> return
> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> >> record
> >>> in
> >>>>>>>>> streaming
> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> >> query
> >>>>>> starts
> >>>>>>>> and
> >>>>>>>>>>>> uses the same result for every row.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This could make it difficult to explain in a join
> >> scenario
> >>>>> of
> >>>>>>>>> multiple
> >>>>>>>>>>>> snapshotted tables.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> >>>>>>>>>>>>> Hi Feng,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> >> sense
> >>> to
> >>>>>>>> finally
> >>>>>>>>>>>>> support querying tables at a specific point in time or
> >>>>>>> hopefully
> >>>>>>>>> also
> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. Syntax
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> >>> but
> >>>>>> not
> >>>>>>>> `FOR
> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It would be great to support `AS OF` also for
> >>>>> time-versioned
> >>>>>>>> joins
> >>>>>>>>>> and
> >>>>>>>>>>>>> have a unified and short syntax.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> >>> this
> >>>>>>>>> available
> >>>>>>>>>> in
> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> >> until
> >>>>> the
> >>>>>>> next
> >>>>>>>>>>>>> Calcite upgrade is performed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2. Semantics
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> >>>>>> timestamp
> >>>>>>>>> types
> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> >>>>>> TIMESTAMP
> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> >>>>> timestamp
> >>>>>> be?
> >>>>>>>> We
> >>>>>>>>>> will
> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We definely need to clarify this because the past has
> >>>>> shown
> >>>>>>> that
> >>>>>>>>>>>>> daylight saving times make our lives hard.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> >>>>>>>>>>>>>> Hi, everyone.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> >> Support
> >>>>> Time
> >>>>>>>> Travel
> >>>>>>>>>> In
> >>>>>>>>>>>>>> Batch
> >>>>>>>>>>>>>> Mode [1]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> >>>>>> versions
> >>>>>>> of
> >>>>>>>>>> data.
> >>>>>>>>>>>> It
> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> >>> the
> >>>>>> data
> >>>>>>>> and
> >>>>>>>>>>>>>> schema of
> >>>>>>>>>>>>>> a table as it appeared at that time. With time
> >> travel,
> >>>>> users
> >>>>>>> can
> >>>>>>>>>>> easily
> >>>>>>>>>>>>>> analyze and compare historical versions of data.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
> >>>>> Paimon,
> >>>>>>>>>> Iceberg,
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> >>> users'
> >>>>>> data
> >>>>>>>>>>> analysis.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> >>>>>>> welcomed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Feng
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Benchao Li
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Benchao Li
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Best,
> >>>>>> Benchao Li
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Hi everyone

Thanks everyone for your input.


@Yun

>  I think you could add descriptions of how to align backfill time travel
with querying the latest data. And I think you should also update the
"Discussion thread" in the original FLIP.

Thank you for the suggestion, I will update it in the document.

>  I have a question about getting the table schema from the catalog. I'm
not sure whether the Catalog#getTable(tablePath, timestamp) will be called
only once.

I understand that in a query, the schema of the table is determined before
execution. The schema used will be based on the latest schema within the
TimeTravel period.

In addition, due to current syntax limitations, we are unable to support
the use of BETWEEN AND.


@Jing

>  Would you like to update your thoughts described in your previous email
about why SupportsTimeTravel has been rejected into the FLIP?

Sure,  I updated the doc.


>    Since we always directly add overload methods into Catalog according
to new requirements, which makes the interface bloated

Your concern is valid. If we need to support the long type version in the
future, we may have to add another method "getTable(ObjectPath, long
version)". However, I understand that
"Catalog.getTable(tablePath).on(timeStamp)" may not meet the requirements.
The timestamp is for Catalog's use, and Catalog obtains the corresponding
schema based on this time.


@liu @Regards

I am very sorry for the unclear description in the document. I have updated
relevant descriptions regarding why it needs to be implemented in Catalog.

Travel not only requires obtaining data at the corresponding time point,
but also requires the corresponding Schema at that time point


@Shammon

> Flink or connector such as  iceberg/paimon can create sources from the
`CatalogBaseTable` directly without the need to get the snapshot ID from
`CatalogTable.getSnapshot()`.  What do you think of it?

You are right, we don't need the getSnapshot interface for PaimonCatalog or
IcebergCatalog tables, but we may need it for temporary tables.



Best,
Feng


On Tue, Jun 6, 2023 at 9:32 PM Feng Jin <ji...@gmail.com> wrote:

> Sorry I replied to the wrong mail. Please ignore the last email.
>
>
> Hi Leonard
>
> > 1. Unification SQL
>
> I agree that it is crucial for us to support both batch and streaming
> processing.  The current design allows for the support of both batch and
> streaming processing. I'll update the FLIP later.
>
>
> >  2.Semantics
>
> In my opinion, it would be feasible to perform the conversion based on the
> current session time, regardless of whether it is TIMESTAMP or
> TIMESTAMP_LTZ.
>
> However, this may indeed violate the restriction outlined in
> FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to whether it
> is reasonable.
>
>
> >   3.  Some external systems may use timestamp value to mark a version,
> but others may use version number、file position、log offset.
>
> It is true that most systems support time-related operations, and I
> believe that the current design is compatible with most systems. However,
> if we want to support long data type, it may require Calcite to support the
> VERSION AS OF syntax. I understand that this is something that we may need
> to consider in the future.
>
>
> Best,
> Feng
>
> [1] https://issues.apache.org/jira/browse/FLINK-21978
>
> On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:
>
>> Hi, Feng
>>
>> Thanks for driving this FLIP, very impressive feature that users want,
>> I’ve some quick questions here.
>>
>> 1.Unification SQL:
>>         The snapshot  concept exists both in Batch mode and Streaming
>> mode,  could we consider a unified proposal? I think users won’t another
>> SQL syntax named
>> Time travel for Streaming mode.
>>
>> 2.Semantics:
>>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long
>> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need
>> two information i.e. a TIMESTAMP value and current session timezone,  how
>> we deal the value with current proposed SQL syntax.
>>
>> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
>> external table?   Some external systems may use timestamp value to mark a
>> version, but others may use version number、file position、log offset.
>>
>> Best,
>> Leonard
>>
>>
>>
>> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
>> >
>> > Hi Feng,
>> >
>> > I think this FLIP would provide one important feature to unify the
>> stream-SQL and batch-SQL when we backfill the historical data in batch mode.
>> >
>> > For the "Syntax" session, I think you could add descriptions of how to
>> align backfill time travel with querying the latest data. And I think you
>> should also update the "Discussion thread" in the original FLIP.
>> >
>> > Moreover, I have a question about getting the table schema from the
>> catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp)
>> will be called only once. If we have a backfill query between 2023-05-29
>> and 2023-06-04 in the past week, and the table schema changed on
>> 2023-06-01, will the query below detect the schema changes during backfill
>> the whole week?
>> >
>> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
>> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
>> >
>> > Best
>> > Yun Tang
>> >
>> >
>> > ________________________________
>> > From: Shammon FY <zj...@gmail.com>
>> > Sent: Thursday, June 1, 2023 17:57
>> > To: dev@flink.apache.org <de...@flink.apache.org>
>> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>> >
>> > Hi Feng,
>> >
>> > I have one minor comment about the public interface `Optional<Long>
>> > getSnapshot()` in the `CatalogTable`.
>> >
>> > As we can get tables from the new method `Catalog.getTable(ObjectPath
>> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable`
>> will
>> > have the information of timestamp. Flink or connector such as
>> > iceberg/paimon can create sources from the `CatalogBaseTable` directly
>> > without the need to get the snapshot ID from
>> `CatalogTable.getSnapshot()`.
>> > What do you think of it?
>> >
>> > Best,
>> > Shammon FY
>> >
>> >
>> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
>> wrote:
>> >
>> >> Hi Feng,
>> >>
>> >> Thanks for the proposal! Very interesting feature. Would you like to
>> update
>> >> your thoughts described in your previous email about why
>> SupportsTimeTravel
>> >> has been rejected into the FLIP? This will help readers understand the
>> >> context (in the future).
>> >>
>> >> Since we always directly add overload methods into Catalog according
>> to new
>> >> requirements, which makes the interface bloated. Just out of curiosity,
>> >> does it make sense to introduce some DSL design? Like
>> >> Catalog.getTable(tablePath).on(timeStamp),
>> >> Catalog.getTable(tablePath).current() for the most current version, and
>> >> more room for further extension like timestamp range, etc. I haven't
>> read
>> >> all the source code yet and I'm not sure if it is possible. But a
>> >> design like this will keep the Catalog API lean and the API/DSL will be
>> >> self described and easier to use.
>> >>
>> >> Best regards,
>> >> Jing
>> >>
>> >>
>> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
>> >> krzysiek.chmielewski@gmail.com> wrote:
>> >>
>> >>> Ok after second though I'm retracting my previous statement about
>> Catalog
>> >>> changes you proposed.
>> >>> I do see a benefit for Delta connector actually with this change and
>> see
>> >>> why this could be coupled with Catalog.
>> >>>
>> >>> Delta Connector SQL support, also ships a Delta Catalog implementation
>> >> for
>> >>> Flink.
>> >>> For Delta Catalog, table schema information is fetched from underlying
>> >>> _delta_log and not stored in metastore. For time travel we actually
>> had a
>> >>> problem, that if we would like to timetravel back to some old version,
>> >>> where schema was slightly different, then we would have a conflict
>> since
>> >>> Catalog would return current schema and not how it was for version X.
>> >>>
>> >>> With your change, our Delta Catalog can actually fetch schema for
>> >> version X
>> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
>> >> current
>> >>> version. What we would also need however is version (number/timestamp)
>> >> for
>> >>> this table passed to DynamicTableFactory so we could properly set
>> Delta
>> >>> standalone library.
>> >>>
>> >>> Regards,
>> >>> Krzysztof
>> >>>
>> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
>> >>> krzysiek.chmielewski@gmail.com> napisał(a):
>> >>>
>> >>>> Hi,
>> >>>> happy to see such a feature.
>> >>>> Small note from my end regarding Catalog changes.
>> >>>>
>> >>>> TL;DR
>> >>>> I don't think it is necessary to delegate this feature to the
>> catalog.
>> >> I
>> >>>> think that since "timetravel" is per job/query property, its should
>> not
>> >>> be
>> >>>> coupled with the Catalog or table definition. In my opinion this is
>> >>>> something that DynamicTableFactory only has to know about. I would
>> >> rather
>> >>>> see this feature as it is - SQL syntax enhancement but delegate
>> clearly
>> >>> to
>> >>>> DynamicTableFactory.
>> >>>>
>> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
>> >>>> current Flink API.
>> >>>> Docs are pending code review, but you can find them here [2] and
>> >> examples
>> >>>> are available here [3]
>> >>>>
>> >>>> The timetravel feature that I've implemented is based on Flink Query
>> >>>> hints.
>> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
>> >>>>
>> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is
>> handled
>> >>> not
>> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
>> >>> connector.
>> >>>> The value of this property is passed to Delta standalone lib API that
>> >>>> returns table view for given version.
>> >>>>
>> >>>> I'm not sure how/if proposed change could benefit Delta connector
>> >>>> implementation for this feature.
>> >>>>
>> >>>> Thanks,
>> >>>> Krzysztof
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>
>> >>
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
>> >>>> [2]
>> https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
>> >>>> [3]
>> >>>>
>> >>>
>> >>
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
>> >>>>
>> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
>> >>>>
>> >>>>> Hi, Feng
>> >>>>>
>> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
>> >>>>> integrate with data lake system. I have one question why the
>> >>>>> implementation
>> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
>> >>> query
>> >>>>> Hudi table with the time travel syntax, but we don't use the
>> >>> HudiCatalog,
>> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
>> >> support
>> >>>>> time travel for Hudi table in this case?
>> >>>>> In contrast, I think time travel should bind to connector instead of
>> >>>>> Catalog, so the rejected alternative should be considered.
>> >>>>>
>> >>>>> Best,
>> >>>>> Ron
>> >>>>>
>> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
>> >>>>>
>> >>>>>> Hi, Feng.
>> >>>>>> Notice this FLIP only support batch mode for time travel.  Would it
>> >>> also
>> >>>>>> make sense to support stream mode to a read a snapshot of the table
>> >>> as a
>> >>>>>> bounded stream?
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Yuxia
>> >>>>>>
>> >>>>>> ----- 原始邮件 -----
>> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> >>>>>> 收件人: "dev" <de...@flink.apache.org>
>> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
>> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>> >>>>>>
>> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
>> >>>>>>
>> >>>>>> This also depends on whether this is defined in standard or any
>> >> known
>> >>>>>> databases that have implemented this. If not, it would be hard to
>> >> push
>> >>>>> it
>> >>>>>> to Calcite.
>> >>>>>>
>> >>>>>> # getTable(ObjectPath object, long timestamp)
>> >>>>>>
>> >>>>>> Then we again come to the problem of "casting between timestamp and
>> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're
>> gonna
>> >>> use
>> >>>>>> this, then we need to clarify that problem first.
>> >>>>>>
>> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>> >>>>>>
>> >>>>>>
>> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
>> >>>>>>
>> >>>>>>> hi, thanks for your reply.
>> >>>>>>>
>> >>>>>>> @Benchao
>> >>>>>>>> did you consider the pushdown abilities compatible
>> >>>>>>>
>> >>>>>>> In the current design, the implementation of TimeTravel is
>> >> delegated
>> >>>>> to
>> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
>> >>>>> tablePath,
>> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at a
>> >>>>>> specific
>> >>>>>>> time.  Therefore, I think it will not have any impact on the
>> >>> original
>> >>>>>>> pushdown abilities.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>  I see there is a rejected  design for adding
>> >> SupportsTimeTravel,
>> >>>>> but
>> >>>>>> I
>> >>>>>>> didn't see the alternative in  the FLIP doc
>> >>>>>>>
>> >>>>>>> Sorry, the document description is not very clear.  Regarding
>> >>> whether
>> >>>>> to
>> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
>> >> we
>> >>>>> have
>> >>>>>>> already passed the corresponding time in getTable(ObjectPath, long
>> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
>> >>>>>>>
>> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain the
>> >>>>> schema
>> >>>>>> of
>> >>>>>>> the corresponding time point and put the SNAPSHOT that needs to be
>> >>>>>> consumed
>> >>>>>>> into options.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> @Shammon
>> >>>>>>>> Could we support this in Flink too?
>> >>>>>>>
>> >>>>>>> I personally think it's possible, but limited by Calcite's syntax
>> >>>>>>> restrictions. I believe we should first support this syntax in
>> >>>>> Calcite.
>> >>>>>>> Currently, I think it may not be easy  to support this syntax in
>> >>>>> Flink's
>> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
>> >> syntax
>> >>>>>>> ` VERSION AS OF`  ?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Feng.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
>> >>>>>>>>
>> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
>> >> iceberg
>> >>>>>>> support
>> >>>>>>>> snapshot or version. For example, users can query snapshot 1 for
>> >>>>> paimon
>> >>>>>>> by
>> >>>>>>>> the following statement
>> >>>>>>>> SELECT * FROM t VERSION AS OF 1
>> >>>>>>>>
>> >>>>>>>> Could we support this in Flink too?
>> >>>>>>>>
>> >>>>>>>> Best,
>> >>>>>>>> Shammon FY
>> >>>>>>>>
>> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
>> >> libenchao@apache.org>
>> >>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Regarding the implementation, did you consider the pushdown
>> >>>>> abilities
>> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
>> >>> partition
>> >>>>>>>> pushdown.
>> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
>> >> have a
>> >>>>>>> concern
>> >>>>>>>>> about this. Of course, it depends on your implementation
>> >> detail,
>> >>>>> what
>> >>>>>>> is
>> >>>>>>>>> important is that we'd better add some cross tests for these.
>> >>>>>>>>>
>> >>>>>>>>> Regarding the interface exposed to Connector, I see there is a
>> >>>>>> rejected
>> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
>> >>>>>> alternative
>> >>>>>>> in
>> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
>> >> clarify
>> >>>>>>> because
>> >>>>>>>> we
>> >>>>>>>>> need to know whether the Connector supports this, and what
>> >>>>>>>> column/metadata
>> >>>>>>>>> corresponds to 'system_time'.
>> >>>>>>>>>
>> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
>> >>>>>>>>>
>> >>>>>>>>>> Thanks for your reply
>> >>>>>>>>>>
>> >>>>>>>>>> @Timo @BenChao @yuxia
>> >>>>>>>>>>
>> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
>> >>> `FOR
>> >>>>>>>>> SYSTEM_TIME
>> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
>> >> OF`
>> >>> .
>> >>>>>> I've
>> >>>>>>>>>> updated the syntax part of the FLIP.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> @Timo
>> >>>>>>>>>>
>> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
>> >>>>>>>>>>
>> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
>> >> and
>> >>>>> then
>> >>>>>>>>> convert
>> >>>>>>>>>> it into a long value.
>> >>>>>>>>>>
>> >>>>>>>>>>> How do we want to query the most recent version of a table
>> >>>>>>>>>>
>> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
>> >>> cause
>> >>>>>>>>>> inconsistency with the real-time concept.
>> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
>> >> OF
>> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
>> >> record.
>> >>>>> So,
>> >>>>>> I
>> >>>>>>>>> think
>> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
>> >>>>>>>>>> Additionally, if no version is specified, the latest version
>> >>>>> should
>> >>>>>>> be
>> >>>>>>>>> used
>> >>>>>>>>>> by default.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Feng
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
>> >>>>> luoyuxia@alumni.sjtu.edu.cn
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
>> >>> introduce
>> >>>>>> time
>> >>>>>>>>> travel
>> >>>>>>>>>>> to Flink to have a better integration with external data
>> >>>>> soruces.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I also share same concern about the syntax.
>> >>>>>>>>>>> I see in the part of `Whether to support other syntax
>> >>>>>>>> implementations`
>> >>>>>>>>> in
>> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
>> >>>>> SYSTEM_TIME
>> >>>>>>> AS
>> >>>>>>>>> OF`,
>> >>>>>>>>>>> right?
>> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
>> >> OF
>> >>>>>>>> TIMESTAMP`
>> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
>> >> or
>> >>>>> by
>> >>>>>>>> design?
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best regards,
>> >>>>>>>>>>> Yuxia
>> >>>>>>>>>>>
>> >>>>>>>>>>> ----- 原始邮件 -----
>> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
>> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>> >>> Mode
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
>> >>>>> instead
>> >>>>>> of
>> >>>>>>>> `FOR
>> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
>> >>> the
>> >>>>> SQL
>> >>>>>>>>>> standard
>> >>>>>>>>>>> and has been supported in some database vendors such as
>> >> SQL
>> >>>>>> Server.
>> >>>>>>>>> About
>> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
>> >>> supports
>> >>>>>>> this,
>> >>>>>>>> if
>> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
>> >> and
>> >>> I
>> >>>>>> would
>> >>>>>>>>> give
>> >>>>>>>>>> a
>> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
>> >> use
>> >>>>> `FOR
>> >>>>>>>> SYSTEM
>> >>>>>>>>>> AS
>> >>>>>>>>>>> OF`.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Also: How do we want to query the most recent version
>> >> of a
>> >>>>>> table?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
>> >> to
>> >>>>> the
>> >>>>>>> docs
>> >>>>>>>>>> both
>> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
>> >> concerning
>> >>>>> is
>> >>>>>> the
>> >>>>>>>> it
>> >>>>>>>>>>>> actually is evalated row-based:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
>> >>> zone,
>> >>>>>> the
>> >>>>>>>>> return
>> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
>> >> record
>> >>> in
>> >>>>>>>>> streaming
>> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
>> >> query
>> >>>>>> starts
>> >>>>>>>> and
>> >>>>>>>>>>>> uses the same result for every row.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This could make it difficult to explain in a join
>> >> scenario
>> >>>>> of
>> >>>>>>>>> multiple
>> >>>>>>>>>>>> snapshotted tables.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regards,
>> >>>>>>>>>>>> Timo
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
>> >>>>>>>>>>>>> Hi Feng,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
>> >> sense
>> >>> to
>> >>>>>>>> finally
>> >>>>>>>>>>>>> support querying tables at a specific point in time or
>> >>>>>>> hopefully
>> >>>>>>>>> also
>> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Here is some feedback from my side:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> 1. Syntax
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
>> >>> but
>> >>>>>> not
>> >>>>>>>> `FOR
>> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> It would be great to support `AS OF` also for
>> >>>>> time-versioned
>> >>>>>>>> joins
>> >>>>>>>>>> and
>> >>>>>>>>>>>>> have a unified and short syntax.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
>> >>> this
>> >>>>>>>>> available
>> >>>>>>>>>> in
>> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
>> >> until
>> >>>>> the
>> >>>>>>> next
>> >>>>>>>>>>>>> Calcite upgrade is performed.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> 2. Semantics
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
>> >>>>>> timestamp
>> >>>>>>>>> types
>> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
>> >>>>>> TIMESTAMP
>> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
>> >>>>> timestamp
>> >>>>>> be?
>> >>>>>>>> We
>> >>>>>>>>>> will
>> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> We definely need to clarify this because the past has
>> >>>>> shown
>> >>>>>>> that
>> >>>>>>>>>>>>> daylight saving times make our lives hard.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
>> >>>>>>>>>>>>>> Hi, everyone.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
>> >> Support
>> >>>>> Time
>> >>>>>>>> Travel
>> >>>>>>>>>> In
>> >>>>>>>>>>>>>> Batch
>> >>>>>>>>>>>>>> Mode [1]
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
>> >>>>>> versions
>> >>>>>>> of
>> >>>>>>>>>> data.
>> >>>>>>>>>>>> It
>> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
>> >>> the
>> >>>>>> data
>> >>>>>>>> and
>> >>>>>>>>>>>>>> schema of
>> >>>>>>>>>>>>>> a table as it appeared at that time. With time
>> >> travel,
>> >>>>> users
>> >>>>>>> can
>> >>>>>>>>>>> easily
>> >>>>>>>>>>>>>> analyze and compare historical versions of data.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
>> >>>>> Paimon,
>> >>>>>>>>>> Iceberg,
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
>> >>> users'
>> >>>>>> data
>> >>>>>>>>>>> analysis.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
>> >>>>>>> welcomed.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 1.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Best.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Feng
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Benchao Li
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>>
>> >>>>>>>>> Best,
>> >>>>>>>>> Benchao Li
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Benchao Li
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Feng Jin <ji...@gmail.com>.
Sorry I replied to the wrong mail. Please ignore the last email.


Hi Leonard

> 1. Unification SQL

I agree that it is crucial for us to support both batch and streaming
processing.  The current design allows for the support of both batch and
streaming processing. I'll update the FLIP later.


>  2.Semantics

In my opinion, it would be feasible to perform the conversion based on the
current session time, regardless of whether it is TIMESTAMP or
TIMESTAMP_LTZ.

However, this may indeed violate the restriction outlined in
FLINK-21978[1]  as Benchao mentioned, and I am uncertain as to whether it
is reasonable.


>   3.  Some external systems may use timestamp value to mark a version,
but others may use version number、file position、log offset.

It is true that most systems support time-related operations, and I believe
that the current design is compatible with most systems. However, if we
want to support long data type, it may require Calcite to support the
VERSION AS OF syntax. I understand that this is something that we may need
to consider in the future.


Best,
Feng

[1] https://issues.apache.org/jira/browse/FLINK-21978

On Tue, Jun 6, 2023 at 8:28 PM Leonard Xu <xb...@gmail.com> wrote:

> Hi, Feng
>
> Thanks for driving this FLIP, very impressive feature that users want,
> I’ve some quick questions here.
>
> 1.Unification SQL:
>         The snapshot  concept exists both in Batch mode and Streaming
> mode,  could we consider a unified proposal? I think users won’t another
> SQL syntax named
> Time travel for Streaming mode.
>
> 2.Semantics:
>         Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long
> timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need
> two information i.e. a TIMESTAMP value and current session timezone,  how
> we deal the value with current proposed SQL syntax.
>
> 3. Is it enough using sinlge timestamp to track a snapshot(version) of
> external table?   Some external systems may use timestamp value to mark a
> version, but others may use version number、file position、log offset.
>
> Best,
> Leonard
>
>
>
> > On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> >
> > Hi Feng,
> >
> > I think this FLIP would provide one important feature to unify the
> stream-SQL and batch-SQL when we backfill the historical data in batch mode.
> >
> > For the "Syntax" session, I think you could add descriptions of how to
> align backfill time travel with querying the latest data. And I think you
> should also update the "Discussion thread" in the original FLIP.
> >
> > Moreover, I have a question about getting the table schema from the
> catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp)
> will be called only once. If we have a backfill query between 2023-05-29
> and 2023-06-04 in the past week, and the table schema changed on
> 2023-06-01, will the query below detect the schema changes during backfill
> the whole week?
> >
> > SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN
> '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> >
> > Best
> > Yun Tang
> >
> >
> > ________________________________
> > From: Shammon FY <zj...@gmail.com>
> > Sent: Thursday, June 1, 2023 17:57
> > To: dev@flink.apache.org <de...@flink.apache.org>
> > Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >
> > Hi Feng,
> >
> > I have one minor comment about the public interface `Optional<Long>
> > getSnapshot()` in the `CatalogTable`.
> >
> > As we can get tables from the new method `Catalog.getTable(ObjectPath
> > tablePath, long timestamp)`, I think the returned `CatalogBaseTable` will
> > have the information of timestamp. Flink or connector such as
> > iceberg/paimon can create sources from the `CatalogBaseTable` directly
> > without the need to get the snapshot ID from
> `CatalogTable.getSnapshot()`.
> > What do you think of it?
> >
> > Best,
> > Shammon FY
> >
> >
> > On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid>
> wrote:
> >
> >> Hi Feng,
> >>
> >> Thanks for the proposal! Very interesting feature. Would you like to
> update
> >> your thoughts described in your previous email about why
> SupportsTimeTravel
> >> has been rejected into the FLIP? This will help readers understand the
> >> context (in the future).
> >>
> >> Since we always directly add overload methods into Catalog according to
> new
> >> requirements, which makes the interface bloated. Just out of curiosity,
> >> does it make sense to introduce some DSL design? Like
> >> Catalog.getTable(tablePath).on(timeStamp),
> >> Catalog.getTable(tablePath).current() for the most current version, and
> >> more room for further extension like timestamp range, etc. I haven't
> read
> >> all the source code yet and I'm not sure if it is possible. But a
> >> design like this will keep the Catalog API lean and the API/DSL will be
> >> self described and easier to use.
> >>
> >> Best regards,
> >> Jing
> >>
> >>
> >> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> >> krzysiek.chmielewski@gmail.com> wrote:
> >>
> >>> Ok after second though I'm retracting my previous statement about
> Catalog
> >>> changes you proposed.
> >>> I do see a benefit for Delta connector actually with this change and
> see
> >>> why this could be coupled with Catalog.
> >>>
> >>> Delta Connector SQL support, also ships a Delta Catalog implementation
> >> for
> >>> Flink.
> >>> For Delta Catalog, table schema information is fetched from underlying
> >>> _delta_log and not stored in metastore. For time travel we actually
> had a
> >>> problem, that if we would like to timetravel back to some old version,
> >>> where schema was slightly different, then we would have a conflict
> since
> >>> Catalog would return current schema and not how it was for version X.
> >>>
> >>> With your change, our Delta Catalog can actually fetch schema for
> >> version X
> >>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
> >> current
> >>> version. What we would also need however is version (number/timestamp)
> >> for
> >>> this table passed to DynamicTableFactory so we could properly set Delta
> >>> standalone library.
> >>>
> >>> Regards,
> >>> Krzysztof
> >>>
> >>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> >>> krzysiek.chmielewski@gmail.com> napisał(a):
> >>>
> >>>> Hi,
> >>>> happy to see such a feature.
> >>>> Small note from my end regarding Catalog changes.
> >>>>
> >>>> TL;DR
> >>>> I don't think it is necessary to delegate this feature to the catalog.
> >> I
> >>>> think that since "timetravel" is per job/query property, its should
> not
> >>> be
> >>>> coupled with the Catalog or table definition. In my opinion this is
> >>>> something that DynamicTableFactory only has to know about. I would
> >> rather
> >>>> see this feature as it is - SQL syntax enhancement but delegate
> clearly
> >>> to
> >>>> DynamicTableFactory.
> >>>>
> >>>> I've implemented timetravel feature for Delta Connector  [1]  using
> >>>> current Flink API.
> >>>> Docs are pending code review, but you can find them here [2] and
> >> examples
> >>>> are available here [3]
> >>>>
> >>>> The timetravel feature that I've implemented is based on Flink Query
> >>>> hints.
> >>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> >>>>
> >>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
> >>> not
> >>>> by Catalog but by DyntamicTableFactory implementation for Delta
> >>> connector.
> >>>> The value of this property is passed to Delta standalone lib API that
> >>>> returns table view for given version.
> >>>>
> >>>> I'm not sure how/if proposed change could benefit Delta connector
> >>>> implementation for this feature.
> >>>>
> >>>> Thanks,
> >>>> Krzysztof
> >>>>
> >>>> [1]
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> >>>> [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> >>>> [3]
> >>>>
> >>>
> >>
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> >>>>
> >>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
> >>>>
> >>>>> Hi, Feng
> >>>>>
> >>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
> >>>>> integrate with data lake system. I have one question why the
> >>>>> implementation
> >>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
> >>> query
> >>>>> Hudi table with the time travel syntax, but we don't use the
> >>> HudiCatalog,
> >>>>> instead, we register the hudi table to InMemoryCatalog,  can we
> >> support
> >>>>> time travel for Hudi table in this case?
> >>>>> In contrast, I think time travel should bind to connector instead of
> >>>>> Catalog, so the rejected alternative should be considered.
> >>>>>
> >>>>> Best,
> >>>>> Ron
> >>>>>
> >>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> >>>>>
> >>>>>> Hi, Feng.
> >>>>>> Notice this FLIP only support batch mode for time travel.  Would it
> >>> also
> >>>>>> make sense to support stream mode to a read a snapshot of the table
> >>> as a
> >>>>>> bounded stream?
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Yuxia
> >>>>>>
> >>>>>> ----- 原始邮件 -----
> >>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> >>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> >>>>>>
> >>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
> >>>>>>
> >>>>>> This also depends on whether this is defined in standard or any
> >> known
> >>>>>> databases that have implemented this. If not, it would be hard to
> >> push
> >>>>> it
> >>>>>> to Calcite.
> >>>>>>
> >>>>>> # getTable(ObjectPath object, long timestamp)
> >>>>>>
> >>>>>> Then we again come to the problem of "casting between timestamp and
> >>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're gonna
> >>> use
> >>>>>> this, then we need to clarify that problem first.
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
> >>>>>>
> >>>>>>
> >>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> >>>>>>
> >>>>>>> hi, thanks for your reply.
> >>>>>>>
> >>>>>>> @Benchao
> >>>>>>>> did you consider the pushdown abilities compatible
> >>>>>>>
> >>>>>>> In the current design, the implementation of TimeTravel is
> >> delegated
> >>>>> to
> >>>>>>> Catalog. We have added a function called getTable(ObjectPath
> >>>>> tablePath,
> >>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at a
> >>>>>> specific
> >>>>>>> time.  Therefore, I think it will not have any impact on the
> >>> original
> >>>>>>> pushdown abilities.
> >>>>>>>
> >>>>>>>
> >>>>>>>>  I see there is a rejected  design for adding
> >> SupportsTimeTravel,
> >>>>> but
> >>>>>> I
> >>>>>>> didn't see the alternative in  the FLIP doc
> >>>>>>>
> >>>>>>> Sorry, the document description is not very clear.  Regarding
> >>> whether
> >>>>> to
> >>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
> >> we
> >>>>> have
> >>>>>>> already passed the corresponding time in getTable(ObjectPath, long
> >>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
> >>>>>>>
> >>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain the
> >>>>> schema
> >>>>>> of
> >>>>>>> the corresponding time point and put the SNAPSHOT that needs to be
> >>>>>> consumed
> >>>>>>> into options.
> >>>>>>>
> >>>>>>>
> >>>>>>> @Shammon
> >>>>>>>> Could we support this in Flink too?
> >>>>>>>
> >>>>>>> I personally think it's possible, but limited by Calcite's syntax
> >>>>>>> restrictions. I believe we should first support this syntax in
> >>>>> Calcite.
> >>>>>>> Currently, I think it may not be easy  to support this syntax in
> >>>>> Flink's
> >>>>>>> parser. @Benchao, what do you think? Can Calcite support this
> >> syntax
> >>>>>>> ` VERSION AS OF`  ?
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Feng.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Feng, the feature of time travel sounds great!
> >>>>>>>>
> >>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
> >> iceberg
> >>>>>>> support
> >>>>>>>> snapshot or version. For example, users can query snapshot 1 for
> >>>>> paimon
> >>>>>>> by
> >>>>>>>> the following statement
> >>>>>>>> SELECT * FROM t VERSION AS OF 1
> >>>>>>>>
> >>>>>>>> Could we support this in Flink too?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Shammon FY
> >>>>>>>>
> >>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> >> libenchao@apache.org>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Regarding the implementation, did you consider the pushdown
> >>>>> abilities
> >>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
> >>> partition
> >>>>>>>> pushdown.
> >>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
> >> have a
> >>>>>>> concern
> >>>>>>>>> about this. Of course, it depends on your implementation
> >> detail,
> >>>>> what
> >>>>>>> is
> >>>>>>>>> important is that we'd better add some cross tests for these.
> >>>>>>>>>
> >>>>>>>>> Regarding the interface exposed to Connector, I see there is a
> >>>>>> rejected
> >>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
> >>>>>> alternative
> >>>>>>> in
> >>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
> >> clarify
> >>>>>>> because
> >>>>>>>> we
> >>>>>>>>> need to know whether the Connector supports this, and what
> >>>>>>>> column/metadata
> >>>>>>>>> corresponds to 'system_time'.
> >>>>>>>>>
> >>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> >>>>>>>>>
> >>>>>>>>>> Thanks for your reply
> >>>>>>>>>>
> >>>>>>>>>> @Timo @BenChao @yuxia
> >>>>>>>>>>
> >>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
> >>> `FOR
> >>>>>>>>> SYSTEM_TIME
> >>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> >> OF`
> >>> .
> >>>>>> I've
> >>>>>>>>>> updated the syntax part of the FLIP.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> @Timo
> >>>>>>>>>>
> >>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>
> >>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> >> and
> >>>>> then
> >>>>>>>>> convert
> >>>>>>>>>> it into a long value.
> >>>>>>>>>>
> >>>>>>>>>>> How do we want to query the most recent version of a table
> >>>>>>>>>>
> >>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> >>> cause
> >>>>>>>>>> inconsistency with the real-time concept.
> >>>>>>>>>> However, from my personal understanding, the scope of  `AS
> >> OF
> >>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
> >> record.
> >>>>> So,
> >>>>>> I
> >>>>>>>>> think
> >>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
> >>>>>>>>>> Additionally, if no version is specified, the latest version
> >>>>> should
> >>>>>>> be
> >>>>>>>>> used
> >>>>>>>>>> by default.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Feng
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
> >>>>> luoyuxia@alumni.sjtu.edu.cn
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
> >>> introduce
> >>>>>> time
> >>>>>>>>> travel
> >>>>>>>>>>> to Flink to have a better integration with external data
> >>>>> soruces.
> >>>>>>>>>>>
> >>>>>>>>>>> I also share same concern about the syntax.
> >>>>>>>>>>> I see in the part of `Whether to support other syntax
> >>>>>>>> implementations`
> >>>>>>>>> in
> >>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
> >>>>> SYSTEM_TIME
> >>>>>>> AS
> >>>>>>>>> OF`,
> >>>>>>>>>>> right?
> >>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
> >> OF
> >>>>>>>> TIMESTAMP`
> >>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> >> or
> >>>>> by
> >>>>>>>> design?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Yuxia
> >>>>>>>>>>>
> >>>>>>>>>>> ----- 原始邮件 -----
> >>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
> >>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
> >>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> >>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> >>> Mode
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
> >>>>> instead
> >>>>>> of
> >>>>>>>> `FOR
> >>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> >>> the
> >>>>> SQL
> >>>>>>>>>> standard
> >>>>>>>>>>> and has been supported in some database vendors such as
> >> SQL
> >>>>>> Server.
> >>>>>>>>> About
> >>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
> >>> supports
> >>>>>>> this,
> >>>>>>>> if
> >>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
> >> and
> >>> I
> >>>>>> would
> >>>>>>>>> give
> >>>>>>>>>> a
> >>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
> >> use
> >>>>> `FOR
> >>>>>>>> SYSTEM
> >>>>>>>>>> AS
> >>>>>>>>>>> OF`.
> >>>>>>>>>>>
> >>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> >>>>>>>>>>>
> >>>>>>>>>>>> Also: How do we want to query the most recent version
> >> of a
> >>>>>> table?
> >>>>>>>>>>>>
> >>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> >> to
> >>>>> the
> >>>>>>> docs
> >>>>>>>>>> both
> >>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
> >> concerning
> >>>>> is
> >>>>>> the
> >>>>>>>> it
> >>>>>>>>>>>> actually is evalated row-based:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Returns the current SQL timestamp in the local time
> >>> zone,
> >>>>>> the
> >>>>>>>>> return
> >>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
> >> record
> >>> in
> >>>>>>>>> streaming
> >>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
> >> query
> >>>>>> starts
> >>>>>>>> and
> >>>>>>>>>>>> uses the same result for every row.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This could make it difficult to explain in a join
> >> scenario
> >>>>> of
> >>>>>>>>> multiple
> >>>>>>>>>>>> snapshotted tables.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Timo
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
> >>>>>>>>>>>>> Hi Feng,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
> >> sense
> >>> to
> >>>>>>>> finally
> >>>>>>>>>>>>> support querying tables at a specific point in time or
> >>>>>>> hopefully
> >>>>>>>>> also
> >>>>>>>>>>>>> ranges soon. Following time-versioned tables.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here is some feedback from my side:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. Syntax
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
> >>> but
> >>>>>> not
> >>>>>>>> `FOR
> >>>>>>>>>>>>> SYSTEM_TIME AS OF`?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It would be great to support `AS OF` also for
> >>>>> time-versioned
> >>>>>>>> joins
> >>>>>>>>>> and
> >>>>>>>>>>>>> have a unified and short syntax.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
> >>> this
> >>>>>>>>> available
> >>>>>>>>>> in
> >>>>>>>>>>>>> Flink earlier by copying the corresponding classes
> >> until
> >>>>> the
> >>>>>>> next
> >>>>>>>>>>>>> Calcite upgrade is performed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2. Semantics
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
> >>>>>> timestamp
> >>>>>>>>> types
> >>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> >>>>>> TIMESTAMP
> >>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
> >>>>> timestamp
> >>>>>> be?
> >>>>>>>> We
> >>>>>>>>>> will
> >>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We definely need to clarify this because the past has
> >>>>> shown
> >>>>>>> that
> >>>>>>>>>>>>> daylight saving times make our lives hard.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Timo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
> >>>>>>>>>>>>>> Hi, everyone.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
> >> Support
> >>>>> Time
> >>>>>>>> Travel
> >>>>>>>>>> In
> >>>>>>>>>>>>>> Batch
> >>>>>>>>>>>>>> Mode [1]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
> >>>>>> versions
> >>>>>>> of
> >>>>>>>>>> data.
> >>>>>>>>>>>> It
> >>>>>>>>>>>>>> allows users to specify a point in time and retrieve
> >>> the
> >>>>>> data
> >>>>>>>> and
> >>>>>>>>>>>>>> schema of
> >>>>>>>>>>>>>> a table as it appeared at that time. With time
> >> travel,
> >>>>> users
> >>>>>>> can
> >>>>>>>>>>> easily
> >>>>>>>>>>>>>> analyze and compare historical versions of data.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> With the widespread use of data lake systems such as
> >>>>> Paimon,
> >>>>>>>>>> Iceberg,
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
> >>> users'
> >>>>>> data
> >>>>>>>>>>> analysis.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
> >>>>>>> welcomed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Feng
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Benchao Li
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Benchao Li
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Best,
> >>>>>> Benchao Li
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Leonard Xu <xb...@gmail.com>.
Hi, Feng

Thanks for driving this FLIP, very impressive feature that users want, I’ve some quick questions here.

1.Unification SQL: 
	The snapshot  concept exists both in Batch mode and Streaming mode,  could we consider a unified proposal? I think users won’t another SQL syntax named
Time travel for Streaming mode.

2.Semantics:
	Flink supports TIMESTAMP and TIMESTAMP_LTZ types, to get a long timestamp value (getTable(ObjectPath tablePath, long timestamp)) we need two information i.e. a TIMESTAMP value and current session timezone,  how we deal the value with current proposed SQL syntax.

3. Is it enough using sinlge timestamp to track a snapshot(version) of external table?   Some external systems may use timestamp value to mark a version, but others may use version number、file position、log offset.

Best,
Leonard



> On Jun 5, 2023, at 3:28 PM, Yun Tang <my...@live.com> wrote:
> 
> Hi Feng,
> 
> I think this FLIP would provide one important feature to unify the stream-SQL and batch-SQL when we backfill the historical data in batch mode.
> 
> For the "Syntax" session, I think you could add descriptions of how to align backfill time travel with querying the latest data. And I think you should also update the "Discussion thread" in the original FLIP.
> 
> Moreover, I have a question about getting the table schema from the catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp) will be called only once. If we have a backfill query between 2023-05-29 and 2023-06-04 in the past week, and the table schema changed on 2023-06-01, will the query below detect the schema changes during backfill the whole week?
> 
> SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'
> 
> Best
> Yun Tang
> 
> 
> ________________________________
> From: Shammon FY <zj...@gmail.com>
> Sent: Thursday, June 1, 2023 17:57
> To: dev@flink.apache.org <de...@flink.apache.org>
> Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> 
> Hi Feng,
> 
> I have one minor comment about the public interface `Optional<Long>
> getSnapshot()` in the `CatalogTable`.
> 
> As we can get tables from the new method `Catalog.getTable(ObjectPath
> tablePath, long timestamp)`, I think the returned `CatalogBaseTable` will
> have the information of timestamp. Flink or connector such as
> iceberg/paimon can create sources from the `CatalogBaseTable` directly
> without the need to get the snapshot ID from `CatalogTable.getSnapshot()`.
> What do you think of it?
> 
> Best,
> Shammon FY
> 
> 
> On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid> wrote:
> 
>> Hi Feng,
>> 
>> Thanks for the proposal! Very interesting feature. Would you like to update
>> your thoughts described in your previous email about why SupportsTimeTravel
>> has been rejected into the FLIP? This will help readers understand the
>> context (in the future).
>> 
>> Since we always directly add overload methods into Catalog according to new
>> requirements, which makes the interface bloated. Just out of curiosity,
>> does it make sense to introduce some DSL design? Like
>> Catalog.getTable(tablePath).on(timeStamp),
>> Catalog.getTable(tablePath).current() for the most current version, and
>> more room for further extension like timestamp range, etc. I haven't read
>> all the source code yet and I'm not sure if it is possible. But a
>> design like this will keep the Catalog API lean and the API/DSL will be
>> self described and easier to use.
>> 
>> Best regards,
>> Jing
>> 
>> 
>> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
>> krzysiek.chmielewski@gmail.com> wrote:
>> 
>>> Ok after second though I'm retracting my previous statement about Catalog
>>> changes you proposed.
>>> I do see a benefit for Delta connector actually with this change and see
>>> why this could be coupled with Catalog.
>>> 
>>> Delta Connector SQL support, also ships a Delta Catalog implementation
>> for
>>> Flink.
>>> For Delta Catalog, table schema information is fetched from underlying
>>> _delta_log and not stored in metastore. For time travel we actually had a
>>> problem, that if we would like to timetravel back to some old version,
>>> where schema was slightly different, then we would have a conflict since
>>> Catalog would return current schema and not how it was for version X.
>>> 
>>> With your change, our Delta Catalog can actually fetch schema for
>> version X
>>> and send it to DeltaTableFactory. Currency, Catalog can fetch only
>> current
>>> version. What we would also need however is version (number/timestamp)
>> for
>>> this table passed to DynamicTableFactory so we could properly set Delta
>>> standalone library.
>>> 
>>> Regards,
>>> Krzysztof
>>> 
>>> śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
>>> krzysiek.chmielewski@gmail.com> napisał(a):
>>> 
>>>> Hi,
>>>> happy to see such a feature.
>>>> Small note from my end regarding Catalog changes.
>>>> 
>>>> TL;DR
>>>> I don't think it is necessary to delegate this feature to the catalog.
>> I
>>>> think that since "timetravel" is per job/query property, its should not
>>> be
>>>> coupled with the Catalog or table definition. In my opinion this is
>>>> something that DynamicTableFactory only has to know about. I would
>> rather
>>>> see this feature as it is - SQL syntax enhancement but delegate clearly
>>> to
>>>> DynamicTableFactory.
>>>> 
>>>> I've implemented timetravel feature for Delta Connector  [1]  using
>>>> current Flink API.
>>>> Docs are pending code review, but you can find them here [2] and
>> examples
>>>> are available here [3]
>>>> 
>>>> The timetravel feature that I've implemented is based on Flink Query
>>>> hints.
>>>> "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
>>>> 
>>>> The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
>>> not
>>>> by Catalog but by DyntamicTableFactory implementation for Delta
>>> connector.
>>>> The value of this property is passed to Delta standalone lib API that
>>>> returns table view for given version.
>>>> 
>>>> I'm not sure how/if proposed change could benefit Delta connector
>>>> implementation for this feature.
>>>> 
>>>> Thanks,
>>>> Krzysztof
>>>> 
>>>> [1]
>>>> 
>>> 
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
>>>> [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
>>>> [3]
>>>> 
>>> 
>> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
>>>> 
>>>> śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
>>>> 
>>>>> Hi, Feng
>>>>> 
>>>>> Thanks for driving this FLIP, Time travel is very useful for Flink
>>>>> integrate with data lake system. I have one question why the
>>>>> implementation
>>>>> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
>>> query
>>>>> Hudi table with the time travel syntax, but we don't use the
>>> HudiCatalog,
>>>>> instead, we register the hudi table to InMemoryCatalog,  can we
>> support
>>>>> time travel for Hudi table in this case?
>>>>> In contrast, I think time travel should bind to connector instead of
>>>>> Catalog, so the rejected alternative should be considered.
>>>>> 
>>>>> Best,
>>>>> Ron
>>>>> 
>>>>> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
>>>>> 
>>>>>> Hi, Feng.
>>>>>> Notice this FLIP only support batch mode for time travel.  Would it
>>> also
>>>>>> make sense to support stream mode to a read a snapshot of the table
>>> as a
>>>>>> bounded stream?
>>>>>> 
>>>>>> Best regards,
>>>>>> Yuxia
>>>>>> 
>>>>>> ----- 原始邮件 -----
>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>>>>>> 收件人: "dev" <de...@flink.apache.org>
>>>>>> 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
>>>>>> 
>>>>>> # Can Calcite support this syntax ` VERSION AS OF`  ?
>>>>>> 
>>>>>> This also depends on whether this is defined in standard or any
>> known
>>>>>> databases that have implemented this. If not, it would be hard to
>> push
>>>>> it
>>>>>> to Calcite.
>>>>>> 
>>>>>> # getTable(ObjectPath object, long timestamp)
>>>>>> 
>>>>>> Then we again come to the problem of "casting between timestamp and
>>>>>> numeric", which has been disabled in FLINK-21978[1]. If you're gonna
>>> use
>>>>>> this, then we need to clarify that problem first.
>>>>>> 
>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-21978
>>>>>> 
>>>>>> 
>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
>>>>>> 
>>>>>>> hi, thanks for your reply.
>>>>>>> 
>>>>>>> @Benchao
>>>>>>>> did you consider the pushdown abilities compatible
>>>>>>> 
>>>>>>> In the current design, the implementation of TimeTravel is
>> delegated
>>>>> to
>>>>>>> Catalog. We have added a function called getTable(ObjectPath
>>>>> tablePath,
>>>>>>> long timestamp) to obtain the corresponding CatalogBaseTable at a
>>>>>> specific
>>>>>>> time.  Therefore, I think it will not have any impact on the
>>> original
>>>>>>> pushdown abilities.
>>>>>>> 
>>>>>>> 
>>>>>>>>  I see there is a rejected  design for adding
>> SupportsTimeTravel,
>>>>> but
>>>>>> I
>>>>>>> didn't see the alternative in  the FLIP doc
>>>>>>> 
>>>>>>> Sorry, the document description is not very clear.  Regarding
>>> whether
>>>>> to
>>>>>>> support SupportTimeTravel, I have discussed it with yuxia. Since
>> we
>>>>> have
>>>>>>> already passed the corresponding time in getTable(ObjectPath, long
>>>>>>> timestamp) of Catalog, SupportTimeTravel may not be necessary.
>>>>>>> 
>>>>>>> In getTable(ObjectPath object, long timestamp), we can obtain the
>>>>> schema
>>>>>> of
>>>>>>> the corresponding time point and put the SNAPSHOT that needs to be
>>>>>> consumed
>>>>>>> into options.
>>>>>>> 
>>>>>>> 
>>>>>>> @Shammon
>>>>>>>> Could we support this in Flink too?
>>>>>>> 
>>>>>>> I personally think it's possible, but limited by Calcite's syntax
>>>>>>> restrictions. I believe we should first support this syntax in
>>>>> Calcite.
>>>>>>> Currently, I think it may not be easy  to support this syntax in
>>>>> Flink's
>>>>>>> parser. @Benchao, what do you think? Can Calcite support this
>> syntax
>>>>>>> ` VERSION AS OF`  ?
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Feng.
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Feng, the feature of time travel sounds great!
>>>>>>>> 
>>>>>>>> In addition to SYSTEM_TIME, lake houses such as paimon and
>> iceberg
>>>>>>> support
>>>>>>>> snapshot or version. For example, users can query snapshot 1 for
>>>>> paimon
>>>>>>> by
>>>>>>>> the following statement
>>>>>>>> SELECT * FROM t VERSION AS OF 1
>>>>>>>> 
>>>>>>>> Could we support this in Flink too?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Shammon FY
>>>>>>>> 
>>>>>>>> On Fri, May 26, 2023 at 1:20 PM Benchao Li <
>> libenchao@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Regarding the implementation, did you consider the pushdown
>>>>> abilities
>>>>>>>>> compatible, e.g., projection pushdown, filter pushdown,
>>> partition
>>>>>>>> pushdown.
>>>>>>>>> Since `Snapshot` is not handled much in existing rules, I
>> have a
>>>>>>> concern
>>>>>>>>> about this. Of course, it depends on your implementation
>> detail,
>>>>> what
>>>>>>> is
>>>>>>>>> important is that we'd better add some cross tests for these.
>>>>>>>>> 
>>>>>>>>> Regarding the interface exposed to Connector, I see there is a
>>>>>> rejected
>>>>>>>>> design for adding SupportsTimeTravel, but I didn't see the
>>>>>> alternative
>>>>>>> in
>>>>>>>>> the FLIP doc. IMO, this is an important thing we need to
>> clarify
>>>>>>> because
>>>>>>>> we
>>>>>>>>> need to know whether the Connector supports this, and what
>>>>>>>> column/metadata
>>>>>>>>> corresponds to 'system_time'.
>>>>>>>>> 
>>>>>>>>> Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
>>>>>>>>> 
>>>>>>>>>> Thanks for your reply
>>>>>>>>>> 
>>>>>>>>>> @Timo @BenChao @yuxia
>>>>>>>>>> 
>>>>>>>>>> Sorry for the mistake,  Currently , calcite only supports
>>> `FOR
>>>>>>>>> SYSTEM_TIME
>>>>>>>>>> AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
>> OF`
>>> .
>>>>>> I've
>>>>>>>>>> updated the syntax part of the FLIP.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> @Timo
>>>>>>>>>> 
>>>>>>>>>>> We will convert it to TIMESTAMP_LTZ?
>>>>>>>>>> 
>>>>>>>>>> Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
>> and
>>>>> then
>>>>>>>>> convert
>>>>>>>>>> it into a long value.
>>>>>>>>>> 
>>>>>>>>>>> How do we want to query the most recent version of a table
>>>>>>>>>> 
>>>>>>>>>> I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
>>> cause
>>>>>>>>>> inconsistency with the real-time concept.
>>>>>>>>>> However, from my personal understanding, the scope of  `AS
>> OF
>>>>>>>>>> CURRENT_TIMESTAMP` is the table itself, not the table
>> record.
>>>>> So,
>>>>>> I
>>>>>>>>> think
>>>>>>>>>> using CURRENT_TIMESTAMP should also be reasonable?.
>>>>>>>>>> Additionally, if no version is specified, the latest version
>>>>> should
>>>>>>> be
>>>>>>>>> used
>>>>>>>>>> by default.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Feng
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 25, 2023 at 7:47 PM yuxia <
>>>>> luoyuxia@alumni.sjtu.edu.cn
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks Feng for bringing this up. It'll be great to
>>> introduce
>>>>>> time
>>>>>>>>> travel
>>>>>>>>>>> to Flink to have a better integration with external data
>>>>> soruces.
>>>>>>>>>>> 
>>>>>>>>>>> I also share same concern about the syntax.
>>>>>>>>>>> I see in the part of `Whether to support other syntax
>>>>>>>> implementations`
>>>>>>>>> in
>>>>>>>>>>> this FLIP, seems the syntax in Calcite should be `FOR
>>>>> SYSTEM_TIME
>>>>>>> AS
>>>>>>>>> OF`,
>>>>>>>>>>> right?
>>>>>>>>>>> But the the syntax part in this FLIP, it seems to be `AS
>> OF
>>>>>>>> TIMESTAMP`
>>>>>>>>>>> instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
>> or
>>>>> by
>>>>>>>> design?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Yuxia
>>>>>>>>>>> 
>>>>>>>>>>> ----- 原始邮件 -----
>>>>>>>>>>> 发件人: "Benchao Li" <li...@apache.org>
>>>>>>>>>>> 收件人: "dev" <de...@flink.apache.org>
>>>>>>>>>>> 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
>>>>>>>>>>> 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
>>> Mode
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Feng, it's exciting to have this ability.
>>>>>>>>>>> 
>>>>>>>>>>> Regarding the syntax section, are you proposing `AS OF`
>>>>> instead
>>>>>> of
>>>>>>>> `FOR
>>>>>>>>>>> SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
>>> the
>>>>> SQL
>>>>>>>>>> standard
>>>>>>>>>>> and has been supported in some database vendors such as
>> SQL
>>>>>> Server.
>>>>>>>>> About
>>>>>>>>>>> `AS OF`, is it in the standard or any database vendor
>>> supports
>>>>>>> this,
>>>>>>>> if
>>>>>>>>>>> yes, I think it's worth to add this support to Calcite,
>> and
>>> I
>>>>>> would
>>>>>>>>> give
>>>>>>>>>> a
>>>>>>>>>>> hand in Calcite side. Otherwise, I think we'd better to
>> use
>>>>> `FOR
>>>>>>>> SYSTEM
>>>>>>>>>> AS
>>>>>>>>>>> OF`.
>>>>>>>>>>> 
>>>>>>>>>>> Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Also: How do we want to query the most recent version
>> of a
>>>>>> table?
>>>>>>>>>>>> 
>>>>>>>>>>>> `AS OF CURRENT_TIMESTAMP` would be ideal, but according
>> to
>>>>> the
>>>>>>> docs
>>>>>>>>>> both
>>>>>>>>>>>> the type is TIMESTAMP_LTZ and what is even more
>> concerning
>>>>> is
>>>>>> the
>>>>>>>> it
>>>>>>>>>>>> actually is evalated row-based:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Returns the current SQL timestamp in the local time
>>> zone,
>>>>>> the
>>>>>>>>> return
>>>>>>>>>>>> type is TIMESTAMP_LTZ(3). It is evaluated for each
>> record
>>> in
>>>>>>>>> streaming
>>>>>>>>>>>> mode. But in batch mode, it is evaluated once as the
>> query
>>>>>> starts
>>>>>>>> and
>>>>>>>>>>>> uses the same result for every row.
>>>>>>>>>>>> 
>>>>>>>>>>>> This could make it difficult to explain in a join
>> scenario
>>>>> of
>>>>>>>>> multiple
>>>>>>>>>>>> snapshotted tables.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Timo
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 25.05.23 12:29, Timo Walther wrote:
>>>>>>>>>>>>> Hi Feng,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> thanks for proposing this FLIP. It makes a lot of
>> sense
>>> to
>>>>>>>> finally
>>>>>>>>>>>>> support querying tables at a specific point in time or
>>>>>>> hopefully
>>>>>>>>> also
>>>>>>>>>>>>> ranges soon. Following time-versioned tables.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here is some feedback from my side:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. Syntax
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you elaborate a bit on the Calcite restrictions?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Does Calcite currently support `AS OF` syntax for this
>>> but
>>>>>> not
>>>>>>>> `FOR
>>>>>>>>>>>>> SYSTEM_TIME AS OF`?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would be great to support `AS OF` also for
>>>>> time-versioned
>>>>>>>> joins
>>>>>>>>>> and
>>>>>>>>>>>>> have a unified and short syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Once a fix is merged in Calcite for this, we can make
>>> this
>>>>>>>>> available
>>>>>>>>>> in
>>>>>>>>>>>>> Flink earlier by copying the corresponding classes
>> until
>>>>> the
>>>>>>> next
>>>>>>>>>>>>> Calcite upgrade is performed.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2. Semantics
>>>>>>>>>>>>> 
>>>>>>>>>>>>> How do we interpret the timestamp? In Flink we have 2
>>>>>> timestamp
>>>>>>>>> types
>>>>>>>>>>>>> (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
>>>>>> TIMESTAMP
>>>>>>>>>>>>> '2023-04-27 00:00:00', in which timezone will the
>>>>> timestamp
>>>>>> be?
>>>>>>>> We
>>>>>>>>>> will
>>>>>>>>>>>>> convert it to TIMESTAMP_LTZ?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We definely need to clarify this because the past has
>>>>> shown
>>>>>>> that
>>>>>>>>>>>>> daylight saving times make our lives hard.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 25.05.23 10:57, Feng Jin wrote:
>>>>>>>>>>>>>> Hi, everyone.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I’d like to start a discussion about FLIP-308:
>> Support
>>>>> Time
>>>>>>>> Travel
>>>>>>>>>> In
>>>>>>>>>>>>>> Batch
>>>>>>>>>>>>>> Mode [1]
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Time travel is a SQL syntax used to query historical
>>>>>> versions
>>>>>>> of
>>>>>>>>>> data.
>>>>>>>>>>>> It
>>>>>>>>>>>>>> allows users to specify a point in time and retrieve
>>> the
>>>>>> data
>>>>>>>> and
>>>>>>>>>>>>>> schema of
>>>>>>>>>>>>>> a table as it appeared at that time. With time
>> travel,
>>>>> users
>>>>>>> can
>>>>>>>>>>> easily
>>>>>>>>>>>>>> analyze and compare historical versions of data.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With the widespread use of data lake systems such as
>>>>> Paimon,
>>>>>>>>>> Iceberg,
>>>>>>>>>>>> and
>>>>>>>>>>>>>> Hudi, time travel can provide more convenience for
>>> users'
>>>>>> data
>>>>>>>>>>> analysis.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looking forward to your opinions, any suggestions are
>>>>>>> welcomed.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1.
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Feng
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Benchao Li
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Benchao Li
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Best,
>>>>>> Benchao Li
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Posted by Yun Tang <my...@live.com>.
Hi Feng,

I think this FLIP would provide one important feature to unify the stream-SQL and batch-SQL when we backfill the historical data in batch mode.

For the "Syntax" session, I think you could add descriptions of how to align backfill time travel with querying the latest data. And I think you should also update the "Discussion thread" in the original FLIP.

Moreover, I have a question about getting the table schema from the catalog. I'm not sure whether the Catalog#getTable(tablePath, timestamp) will be called only once. If we have a backfill query between 2023-05-29 and 2023-06-04 in the past week, and the table schema changed on 2023-06-01, will the query below detect the schema changes during backfill the whole week?

SELECT * FROM paimon_tb FOR SYSTEM_TIME AS OF TIMESTAMP BETWEEN '2023-05-29 00:00:00' AND '2023-06-05 00:00:00'

Best
Yun Tang


________________________________
From: Shammon FY <zj...@gmail.com>
Sent: Thursday, June 1, 2023 17:57
To: dev@flink.apache.org <de...@flink.apache.org>
Subject: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode

Hi Feng,

I have one minor comment about the public interface `Optional<Long>
getSnapshot()` in the `CatalogTable`.

As we can get tables from the new method `Catalog.getTable(ObjectPath
tablePath, long timestamp)`, I think the returned `CatalogBaseTable` will
have the information of timestamp. Flink or connector such as
iceberg/paimon can create sources from the `CatalogBaseTable` directly
without the need to get the snapshot ID from `CatalogTable.getSnapshot()`.
What do you think of it?

Best,
Shammon FY


On Thu, Jun 1, 2023 at 7:22 AM Jing Ge <ji...@ververica.com.invalid> wrote:

> Hi Feng,
>
> Thanks for the proposal! Very interesting feature. Would you like to update
> your thoughts described in your previous email about why SupportsTimeTravel
> has been rejected into the FLIP? This will help readers understand the
> context (in the future).
>
> Since we always directly add overload methods into Catalog according to new
> requirements, which makes the interface bloated. Just out of curiosity,
> does it make sense to introduce some DSL design? Like
> Catalog.getTable(tablePath).on(timeStamp),
> Catalog.getTable(tablePath).current() for the most current version, and
> more room for further extension like timestamp range, etc. I haven't read
> all the source code yet and I'm not sure if it is possible. But a
> design like this will keep the Catalog API lean and the API/DSL will be
> self described and easier to use.
>
> Best regards,
> Jing
>
>
> On Wed, May 31, 2023 at 12:08 PM Krzysztof Chmielewski <
> krzysiek.chmielewski@gmail.com> wrote:
>
> > Ok after second though I'm retracting my previous statement about Catalog
> > changes you proposed.
> > I do see a benefit for Delta connector actually with this change and see
> > why this could be coupled with Catalog.
> >
> > Delta Connector SQL support, also ships a Delta Catalog implementation
> for
> > Flink.
> > For Delta Catalog, table schema information is fetched from underlying
> > _delta_log and not stored in metastore. For time travel we actually had a
> > problem, that if we would like to timetravel back to some old version,
> > where schema was slightly different, then we would have a conflict since
> > Catalog would return current schema and not how it was for version X.
> >
> > With your change, our Delta Catalog can actually fetch schema for
> version X
> > and send it to DeltaTableFactory. Currency, Catalog can fetch only
> current
> > version. What we would also need however is version (number/timestamp)
> for
> > this table passed to DynamicTableFactory so we could properly set Delta
> > standalone library.
> >
> > Regards,
> > Krzysztof
> >
> > śr., 31 maj 2023 o 10:37 Krzysztof Chmielewski <
> > krzysiek.chmielewski@gmail.com> napisał(a):
> >
> > > Hi,
> > > happy to see such a feature.
> > > Small note from my end regarding Catalog changes.
> > >
> > > TL;DR
> > > I don't think it is necessary to delegate this feature to the catalog.
> I
> > > think that since "timetravel" is per job/query property, its should not
> > be
> > > coupled with the Catalog or table definition. In my opinion this is
> > > something that DynamicTableFactory only has to know about. I would
> rather
> > > see this feature as it is - SQL syntax enhancement but delegate clearly
> > to
> > > DynamicTableFactory.
> > >
> > > I've implemented timetravel feature for Delta Connector  [1]  using
> > > current Flink API.
> > > Docs are pending code review, but you can find them here [2] and
> examples
> > > are available here [3]
> > >
> > > The timetravel feature that I've implemented is based on Flink Query
> > > hints.
> > > "SELECT * FROM sourceTable /*+ OPTIONS('versionAsOf' = '1') */"
> > >
> > > The " versionAsOf" (we also have 'timestampAsOf') parameter is handled
> > not
> > > by Catalog but by DyntamicTableFactory implementation for Delta
> > connector.
> > > The value of this property is passed to Delta standalone lib API that
> > > returns table view for given version.
> > >
> > > I'm not sure how/if proposed change could benefit Delta connector
> > > implementation for this feature.
> > >
> > > Thanks,
> > > Krzysztof
> > >
> > > [1]
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/flink
> > > [2] https://github.com/kristoffSC/connectors/tree/FlinkSQL_PR_15-docs
> > > [3]
> > >
> >
> https://github.com/delta-io/connectors/tree/flink_table_catalog_feature_branch/examples/flink-example/src/main/java/org/example/sql
> > >
> > > śr., 31 maj 2023 o 06:03 liu ron <ro...@gmail.com> napisał(a):
> > >
> > >> Hi, Feng
> > >>
> > >> Thanks for driving this FLIP, Time travel is very useful for Flink
> > >> integrate with data lake system. I have one question why the
> > >> implementation
> > >> of TimeTravel is delegated to Catalog? Assuming that we use Flink to
> > query
> > >> Hudi table with the time travel syntax, but we don't use the
> > HudiCatalog,
> > >> instead, we register the hudi table to InMemoryCatalog,  can we
> support
> > >> time travel for Hudi table in this case?
> > >> In contrast, I think time travel should bind to connector instead of
> > >> Catalog, so the rejected alternative should be considered.
> > >>
> > >> Best,
> > >> Ron
> > >>
> > >> yuxia <lu...@alumni.sjtu.edu.cn> 于2023年5月30日周二 09:40写道:
> > >>
> > >> > Hi, Feng.
> > >> > Notice this FLIP only support batch mode for time travel.  Would it
> > also
> > >> > make sense to support stream mode to a read a snapshot of the table
> > as a
> > >> > bounded stream?
> > >> >
> > >> > Best regards,
> > >> > Yuxia
> > >> >
> > >> > ----- 原始邮件 -----
> > >> > 发件人: "Benchao Li" <li...@apache.org>
> > >> > 收件人: "dev" <de...@flink.apache.org>
> > >> > 发送时间: 星期一, 2023年 5 月 29日 下午 6:04:53
> > >> > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch Mode
> > >> >
> > >> > # Can Calcite support this syntax ` VERSION AS OF`  ?
> > >> >
> > >> > This also depends on whether this is defined in standard or any
> known
> > >> > databases that have implemented this. If not, it would be hard to
> push
> > >> it
> > >> > to Calcite.
> > >> >
> > >> > # getTable(ObjectPath object, long timestamp)
> > >> >
> > >> > Then we again come to the problem of "casting between timestamp and
> > >> > numeric", which has been disabled in FLINK-21978[1]. If you're gonna
> > use
> > >> > this, then we need to clarify that problem first.
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/FLINK-21978
> > >> >
> > >> >
> > >> > Feng Jin <ji...@gmail.com> 于2023年5月29日周一 15:57写道:
> > >> >
> > >> > > hi, thanks for your reply.
> > >> > >
> > >> > > @Benchao
> > >> > > > did you consider the pushdown abilities compatible
> > >> > >
> > >> > > In the current design, the implementation of TimeTravel is
> delegated
> > >> to
> > >> > > Catalog. We have added a function called getTable(ObjectPath
> > >> tablePath,
> > >> > > long timestamp) to obtain the corresponding CatalogBaseTable at a
> > >> > specific
> > >> > > time.  Therefore, I think it will not have any impact on the
> > original
> > >> > > pushdown abilities.
> > >> > >
> > >> > >
> > >> > > >   I see there is a rejected  design for adding
> SupportsTimeTravel,
> > >> but
> > >> > I
> > >> > > didn't see the alternative in  the FLIP doc
> > >> > >
> > >> > > Sorry, the document description is not very clear.  Regarding
> > whether
> > >> to
> > >> > > support SupportTimeTravel, I have discussed it with yuxia. Since
> we
> > >> have
> > >> > > already passed the corresponding time in getTable(ObjectPath, long
> > >> > > timestamp) of Catalog, SupportTimeTravel may not be necessary.
> > >> > >
> > >> > > In getTable(ObjectPath object, long timestamp), we can obtain the
> > >> schema
> > >> > of
> > >> > > the corresponding time point and put the SNAPSHOT that needs to be
> > >> > consumed
> > >> > > into options.
> > >> > >
> > >> > >
> > >> > > @Shammon
> > >> > > > Could we support this in Flink too?
> > >> > >
> > >> > > I personally think it's possible, but limited by Calcite's syntax
> > >> > > restrictions. I believe we should first support this syntax in
> > >> Calcite.
> > >> > > Currently, I think it may not be easy  to support this syntax in
> > >> Flink's
> > >> > > parser. @Benchao, what do you think? Can Calcite support this
> syntax
> > >> > > ` VERSION AS OF`  ?
> > >> > >
> > >> > >
> > >> > > Best,
> > >> > > Feng.
> > >> > >
> > >> > >
> > >> > > On Fri, May 26, 2023 at 2:55 PM Shammon FY <zj...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > Thanks Feng, the feature of time travel sounds great!
> > >> > > >
> > >> > > > In addition to SYSTEM_TIME, lake houses such as paimon and
> iceberg
> > >> > > support
> > >> > > > snapshot or version. For example, users can query snapshot 1 for
> > >> paimon
> > >> > > by
> > >> > > > the following statement
> > >> > > > SELECT * FROM t VERSION AS OF 1
> > >> > > >
> > >> > > > Could we support this in Flink too?
> > >> > > >
> > >> > > > Best,
> > >> > > > Shammon FY
> > >> > > >
> > >> > > > On Fri, May 26, 2023 at 1:20 PM Benchao Li <
> libenchao@apache.org>
> > >> > wrote:
> > >> > > >
> > >> > > > > Regarding the implementation, did you consider the pushdown
> > >> abilities
> > >> > > > > compatible, e.g., projection pushdown, filter pushdown,
> > partition
> > >> > > > pushdown.
> > >> > > > > Since `Snapshot` is not handled much in existing rules, I
> have a
> > >> > > concern
> > >> > > > > about this. Of course, it depends on your implementation
> detail,
> > >> what
> > >> > > is
> > >> > > > > important is that we'd better add some cross tests for these.
> > >> > > > >
> > >> > > > > Regarding the interface exposed to Connector, I see there is a
> > >> > rejected
> > >> > > > > design for adding SupportsTimeTravel, but I didn't see the
> > >> > alternative
> > >> > > in
> > >> > > > > the FLIP doc. IMO, this is an important thing we need to
> clarify
> > >> > > because
> > >> > > > we
> > >> > > > > need to know whether the Connector supports this, and what
> > >> > > > column/metadata
> > >> > > > > corresponds to 'system_time'.
> > >> > > > >
> > >> > > > > Feng Jin <ji...@gmail.com> 于2023年5月25日周四 22:50写道:
> > >> > > > >
> > >> > > > > > Thanks for your reply
> > >> > > > > >
> > >> > > > > > @Timo @BenChao @yuxia
> > >> > > > > >
> > >> > > > > > Sorry for the mistake,  Currently , calcite only supports
> > `FOR
> > >> > > > > SYSTEM_TIME
> > >> > > > > > AS OF `  syntax.  We can only support `FOR SYSTEM_TIME AS
> OF`
> > .
> > >> > I've
> > >> > > > > > updated the syntax part of the FLIP.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > @Timo
> > >> > > > > >
> > >> > > > > > > We will convert it to TIMESTAMP_LTZ?
> > >> > > > > >
> > >> > > > > > Yes, I think we need to convert TIMESTAMP to TIMESTAMP_LTZ
> and
> > >> then
> > >> > > > > convert
> > >> > > > > > it into a long value.
> > >> > > > > >
> > >> > > > > > > How do we want to query the most recent version of a table
> > >> > > > > >
> > >> > > > > > I think we can use `AS OF CURRENT_TIMESTAMP` ,But it does
> > cause
> > >> > > > > > inconsistency with the real-time concept.
> > >> > > > > > However, from my personal understanding, the scope of  `AS
> OF
> > >> > > > > > CURRENT_TIMESTAMP` is the table itself, not the table
> record.
> > >> So,
> > >> > I
> > >> > > > > think
> > >> > > > > > using CURRENT_TIMESTAMP should also be reasonable?.
> > >> > > > > > Additionally, if no version is specified, the latest version
> > >> should
> > >> > > be
> > >> > > > > used
> > >> > > > > > by default.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Feng
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Thu, May 25, 2023 at 7:47 PM yuxia <
> > >> luoyuxia@alumni.sjtu.edu.cn
> > >> > >
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Thanks Feng for bringing this up. It'll be great to
> > introduce
> > >> > time
> > >> > > > > travel
> > >> > > > > > > to Flink to have a better integration with external data
> > >> soruces.
> > >> > > > > > >
> > >> > > > > > > I also share same concern about the syntax.
> > >> > > > > > > I see in the part of `Whether to support other syntax
> > >> > > > implementations`
> > >> > > > > in
> > >> > > > > > > this FLIP, seems the syntax in Calcite should be `FOR
> > >> SYSTEM_TIME
> > >> > > AS
> > >> > > > > OF`,
> > >> > > > > > > right?
> > >> > > > > > > But the the syntax part in this FLIP, it seems to be `AS
> OF
> > >> > > > TIMESTAMP`
> > >> > > > > > > instead of  `FOR SYSTEM_TIME AS OF`. Is it just a mistake
> or
> > >> by
> > >> > > > design?
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Best regards,
> > >> > > > > > > Yuxia
> > >> > > > > > >
> > >> > > > > > > ----- 原始邮件 -----
> > >> > > > > > > 发件人: "Benchao Li" <li...@apache.org>
> > >> > > > > > > 收件人: "dev" <de...@flink.apache.org>
> > >> > > > > > > 发送时间: 星期四, 2023年 5 月 25日 下午 7:27:17
> > >> > > > > > > 主题: Re: [DISCUSS] FLIP-308: Support Time Travel In Batch
> > Mode
> > >> > > > > > >
> > >> > > > > > > Thanks Feng, it's exciting to have this ability.
> > >> > > > > > >
> > >> > > > > > > Regarding the syntax section, are you proposing `AS OF`
> > >> instead
> > >> > of
> > >> > > > `FOR
> > >> > > > > > > SYSTEM AS OF` to do this? I know `FOR SYSTEM AS OF` is in
> > the
> > >> SQL
> > >> > > > > > standard
> > >> > > > > > > and has been supported in some database vendors such as
> SQL
> > >> > Server.
> > >> > > > > About
> > >> > > > > > > `AS OF`, is it in the standard or any database vendor
> > supports
> > >> > > this,
> > >> > > > if
> > >> > > > > > > yes, I think it's worth to add this support to Calcite,
> and
> > I
> > >> > would
> > >> > > > > give
> > >> > > > > > a
> > >> > > > > > > hand in Calcite side. Otherwise, I think we'd better to
> use
> > >> `FOR
> > >> > > > SYSTEM
> > >> > > > > > AS
> > >> > > > > > > OF`.
> > >> > > > > > >
> > >> > > > > > > Timo Walther <tw...@apache.org> 于2023年5月25日周四 19:02写道:
> > >> > > > > > >
> > >> > > > > > > > Also: How do we want to query the most recent version
> of a
> > >> > table?
> > >> > > > > > > >
> > >> > > > > > > > `AS OF CURRENT_TIMESTAMP` would be ideal, but according
> to
> > >> the
> > >> > > docs
> > >> > > > > > both
> > >> > > > > > > > the type is TIMESTAMP_LTZ and what is even more
> concerning
> > >> is
> > >> > the
> > >> > > > it
> > >> > > > > > > > actually is evalated row-based:
> > >> > > > > > > >
> > >> > > > > > > >  > Returns the current SQL timestamp in the local time
> > zone,
> > >> > the
> > >> > > > > return
> > >> > > > > > > > type is TIMESTAMP_LTZ(3). It is evaluated for each
> record
> > in
> > >> > > > > streaming
> > >> > > > > > > > mode. But in batch mode, it is evaluated once as the
> query
> > >> > starts
> > >> > > > and
> > >> > > > > > > > uses the same result for every row.
> > >> > > > > > > >
> > >> > > > > > > > This could make it difficult to explain in a join
> scenario
> > >> of
> > >> > > > > multiple
> > >> > > > > > > > snapshotted tables.
> > >> > > > > > > >
> > >> > > > > > > > Regards,
> > >> > > > > > > > Timo
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On 25.05.23 12:29, Timo Walther wrote:
> > >> > > > > > > > > Hi Feng,
> > >> > > > > > > > >
> > >> > > > > > > > > thanks for proposing this FLIP. It makes a lot of
> sense
> > to
> > >> > > > finally
> > >> > > > > > > > > support querying tables at a specific point in time or
> > >> > > hopefully
> > >> > > > > also
> > >> > > > > > > > > ranges soon. Following time-versioned tables.
> > >> > > > > > > > >
> > >> > > > > > > > > Here is some feedback from my side:
> > >> > > > > > > > >
> > >> > > > > > > > > 1. Syntax
> > >> > > > > > > > >
> > >> > > > > > > > > Can you elaborate a bit on the Calcite restrictions?
> > >> > > > > > > > >
> > >> > > > > > > > > Does Calcite currently support `AS OF` syntax for this
> > but
> > >> > not
> > >> > > > `FOR
> > >> > > > > > > > > SYSTEM_TIME AS OF`?
> > >> > > > > > > > >
> > >> > > > > > > > > It would be great to support `AS OF` also for
> > >> time-versioned
> > >> > > > joins
> > >> > > > > > and
> > >> > > > > > > > > have a unified and short syntax.
> > >> > > > > > > > >
> > >> > > > > > > > > Once a fix is merged in Calcite for this, we can make
> > this
> > >> > > > > available
> > >> > > > > > in
> > >> > > > > > > > > Flink earlier by copying the corresponding classes
> until
> > >> the
> > >> > > next
> > >> > > > > > > > > Calcite upgrade is performed.
> > >> > > > > > > > >
> > >> > > > > > > > > 2. Semantics
> > >> > > > > > > > >
> > >> > > > > > > > > How do we interpret the timestamp? In Flink we have 2
> > >> > timestamp
> > >> > > > > types
> > >> > > > > > > > > (TIMESTAMP and TIMESTAMP_LTZ). If users specify AS OF
> > >> > TIMESTAMP
> > >> > > > > > > > > '2023-04-27 00:00:00', in which timezone will the
> > >> timestamp
> > >> > be?
> > >> > > > We
> > >> > > > > > will
> > >> > > > > > > > > convert it to TIMESTAMP_LTZ?
> > >> > > > > > > > >
> > >> > > > > > > > > We definely need to clarify this because the past has
> > >> shown
> > >> > > that
> > >> > > > > > > > > daylight saving times make our lives hard.
> > >> > > > > > > > >
> > >> > > > > > > > > Thanks,
> > >> > > > > > > > > Timo
> > >> > > > > > > > >
> > >> > > > > > > > > On 25.05.23 10:57, Feng Jin wrote:
> > >> > > > > > > > >> Hi, everyone.
> > >> > > > > > > > >>
> > >> > > > > > > > >> I’d like to start a discussion about FLIP-308:
> Support
> > >> Time
> > >> > > > Travel
> > >> > > > > > In
> > >> > > > > > > > >> Batch
> > >> > > > > > > > >> Mode [1]
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Time travel is a SQL syntax used to query historical
> > >> > versions
> > >> > > of
> > >> > > > > > data.
> > >> > > > > > > > It
> > >> > > > > > > > >> allows users to specify a point in time and retrieve
> > the
> > >> > data
> > >> > > > and
> > >> > > > > > > > >> schema of
> > >> > > > > > > > >> a table as it appeared at that time. With time
> travel,
> > >> users
> > >> > > can
> > >> > > > > > > easily
> > >> > > > > > > > >> analyze and compare historical versions of data.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> With the widespread use of data lake systems such as
> > >> Paimon,
> > >> > > > > > Iceberg,
> > >> > > > > > > > and
> > >> > > > > > > > >> Hudi, time travel can provide more convenience for
> > users'
> > >> > data
> > >> > > > > > > analysis.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Looking forward to your opinions, any suggestions are
> > >> > > welcomed.
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> 1.
> > >> > > > > > > > >>
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-308%3A+Support+Time+Travel+In+Batch+Mode
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >>
> > >> > > > > > > > >> Best.
> > >> > > > > > > > >>
> > >> > > > > > > > >> Feng
> > >> > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > >
> > >> > > > > > > Best,
> > >> > > > > > > Benchao Li
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Benchao Li
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Best,
> > >> > Benchao Li
> > >> >
> > >>
> > >
> >
>