You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Lincoln Lee <li...@gmail.com> on 2022/05/19 10:24:29 UTC

[DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Dear Flink developers,

I would like to open a discussion on FLIP 234 [1] to support retryable
lookup join to solve delayed updates issue, as a pre-work for this
solution, we proposed FLIP-232[2] which adds a generic retry support for
Async I/O.
We prefer to offer this retry capability via query hints, similar to new
join hints proposed in FLINK-27625[3] & FLIP-204[4].

This feature is backwards compatible and transparently to connectors. For
existing connectors which implements AsyncTableFunction, can easily enable
async retry via the new join hint.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
[2]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
[3] https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
[4]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join

Best,
Lincoln Lee

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Rascal Wu <st...@gmail.com>.
got it, thanks a lot for your help.

Lincoln Lee <li...@gmail.com> 于2022年8月11日周四 17:29写道:

> Hi Rascal,
>
>      Yes, you got the right answer for these two questions. If you have
> more questions about using Flink, welcome mailing to user support and
> questions mailing list: user@flink.apache.org / user-zh@flink.apache.org (
> https://flink.apache.org/community.html#mailing-lists). This dev mailing
> list is mainly for development related discussions.
>
>
> Best,
> Lincoln Lee
>
>
> Rascal Wu <st...@gmail.com> 于2022年8月11日周四 13:18写道:
>
> > Thanks Lincoln a lot for your quick response and clear explanation.
> >
> > It's clear now. So regarding "*For the retry support in FLIP-234, it only
> > provide a 'lookup_miss' retry predicate which is to retry if can not find
> > it*" , try to double confirm again:
> > 1) if the dim record doesn't exist in dim table after retry finished, the
> > output will be +[order1, id1, null], if the dim record coming in dim
> table
> > before last retry, the output will be +[order1, id1, xxx], right? (assume
> > left join here)
> > 2) And if I expect the latest event (+[order1, id1, 12]) update the
> > previous event (+[order1, id1, 10]), I should use regular join instead of
> > lookup join, right?
> >
> > Lincoln Lee <li...@gmail.com> 于2022年8月11日周四 12:01写道:
> >
> > > @Rascal  Thanks for looking at this new feature! There is a lot of
> > content
> > > in these two flips and we will prepare a more detailed user
> documentation
> > > before the 1.16 release.
> > >
> > > First of all, unlike regular join, the lookup join only triggers the
> > lookup
> > > action (access the dimension table) through the records of the stream
> > > table, and the updates of the dimension table itself do not actively
> > > trigger the updating results to
> > > downstream.
> > >
> > > For your example, whether retry is enabled or not, only the first
> > > "+[order1, id1, 10]" will sent to downstream (because only one order
> > record
> > > has come), the new updates of dimension table "10:02 -> (id1, 12)" will
> > not
> > > trigger updates.
> > >
> > > For the retry support in FLIP-234, it only provide a 'lookup_miss'
> retry
> > > predicate which is to retry if can not find it (not always equivalent
> to
> > > the complete join condition [1]), and not to trigger the retry if can
> > find
> > > a non-null value from the dimension table. If more complex value check
> > > logic required, a viable way is to implement a custom
> > `AsyncRetryPredicate`
> > > in DataStream API (as FLIP-232 introduced).
> > >
> > > [1]: for different connector, the index-lookup capability maybe
> > different,
> > > e.g.,  HBase can lookup on rowkey only (by default, without secondary
> > > index), while RDBMS can provided more powerful index-lookup
> capabilities
> > > let's see the lookup join example from flink document(
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
> > > )
> > >
> > >
> > > if the query with a join condition "ON o.customer_id = c.id and
> > c.country
> > > =
> > > 'US'"
> > >
> > > -- enrich each order with customer informationSELECT o.order_id,
> > > o.total, c.country, c.zipFROM Orders AS o  JOIN Customers FOR
> > > SYSTEM_TIME AS OF o.proc_time AS c    ON o.customer_id = c.id and
> > > c.country = 'US';
> > >
> > > for the dimension table in mysql, all of the columns (id & country) can
> > be
> > > used as index-lookup condition
> > >
> > > -- Customers is backed by the JDBC connector and can be used for
> > > lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> > > country STRING,  zip STRING) WITH (  'connector' = 'jdbc',  'url' =
> > > 'jdbc:mysql://mysqlhost:3306/customerdb',  'table-name' =
> > > 'customers');
> > >
> > >
> > >
> > > while if the same table stored in HBase (with no secondary index), only
> > the
> > > 'id' column (rowkey in HBase) can be the index-lookup condition
> > >
> > > -- Customers is backed by the JDBC connector and can be used for
> > > lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> > > country STRING,  zip STRING,
> > >     PRIMARY KEY (id) NOT ENFORCED) WITH (  'connector' = 'hbase-xxx',
> > > ...);
> > >
> > >
> > > so, the 'lookup_miss' retry predicte may result differently in
> different
> > > connectors.
> > >
> > > wish this can helps.
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Rascal Wu <st...@gmail.com> 于2022年8月11日周四 10:32写道:
> > >
> > > > Hi here,
> > > >
> > > > Sorry for digging up this old email.
> > > >
> > > > May I consult one question about what's the final behavior we
> expected
> > > > after we enabled retryable mechanism. I viewed the FLIP-232 &
> FLIP-234,
> > > but
> > > > I didn't get a clear answer as lacking of some knowledge.
> > > >
> > > > Let me take a example there, let's assume we have a dim table(id,
> > price),
> > > > the record(id1) existed in dim table before 10:00, and then it was
> > > updated
> > > > in business system at 10:01, but delay updated in dim table at 10:02.
> > the
> > > > record timeline as follow:
> > > > 10:00 -> (id1, 10)
> > > > 10:00 -> the id1 records was been updated, but delayed persistency or
> > > sync
> > > > to dim table.
> > > > 10:02 -> (id1, 12)
> > > >
> > > > And a Flink application processes an order record that will join the
> > dim
> > > > table at 10:01, so it will output an event +[order1, id1, 10].
> > > > And if we enable retry mechanism no matter it's sync or async, does
> it
> > > mean
> > > > there will be two events sink to downstream:
> > > > 1. retract event: -[order1, id1, 10]
> > > > 2. new event: +[order1, id1, 12]
> > > >
> > > > does the above all behavior is what we expected to solve the delay
> > update
> > > > for dimension table?
> > > >
> > > >
> > > > Lincoln Lee <li...@gmail.com> 于2022年6月7日周二 12:19写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I started a vote for this FLIP [1], please vote there or ask
> > additional
> > > > > questions here. [2]
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
> > > > > [2]
> https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > > > >
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > When reviewing the name of the hint option
> > > 'miss-retry'='true|false', I
> > > > > > feel the current name is not precise enough, it might be easier
> to
> > > > > > understand by using the retry-predicate directly from flip-232,
> > > > > > e.g. 'retry-predicate'='lookup-miss', which has the additional
> > > benefit
> > > > of
> > > > > > extensibility(maybe more retry condition in the future).
> > > > > >
> > > > > > Jark & Jingsong, do you have any suggestions? If we agree with
> the
> > > name
> > > > > > 'retry-predicate' or other better candidate, I'll update the
> FLIP.
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
> > > > > >
> > > > > >> Hi everyone,
> > > > > >>
> > > > > >> I've updated the FLIP[1] based on this discussion thread that we
> > > agree
> > > > > to
> > > > > >> have a single unified 'LOOKUP' hint and also a related part in
> > > > > FLIP-221[2]
> > > > > >>  which is mainly for the necessity of the common table option
> > > > > >> 'lookup.async'.
> > > > > >>
> > > > > >> The main updates are:
> > > > > >> 1. the new unified 'LOOKUP' hint, make retry support both on
> sync
> > > and
> > > > > >> async lookup
> > > > > >> 2. clarify the default choice of the planner for those
> connectors
> > > > which
> > > > > >> have both sync and async lookup capabilities, and how to deal
> with
> > > the
> > > > > >> query hint
> > > > > >> 3. will add a followup issue to discuss whether to remove the
> > > > > >> 'lookup.async' option in HBase connector.
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > >> [2]
> > > https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
> > > > > >>
> > > > > >> Best,
> > > > > >> Lincoln Lee
> > > > > >>
> > > > > >>
> > > > > >> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
> > > > > >>
> > > > > >>> Hi Jingsong,
> > > > > >>>
> > > > > >>> There will be no change for connectors with only one capability
> > > (sync
> > > > > or
> > > > > >>> async).
> > > > > >>>
> > > > > >>> Query hint works in a best effort manner, so if users
> specifies a
> > > > hint
> > > > > >>> with invalid option, the query plan keeps unchanged, e.g., use
> > > > > >>> LOOKUP('table'='customer', 'async'='true'), but backend lookup
> > > source
> > > > > only
> > > > > >>> implemented the sync lookup function, then the async lookup
> hint
> > > > takes
> > > > > no
> > > > > >>> effect.
> > > > > >>>
> > > > > >>> For these connectors which can have both capabilities of async
> > and
> > > > sync
> > > > > >>> lookup, our advice for the connector developer is implementing
> > both
> > > > > sync
> > > > > >>> and async interfaces if both capabilities have suitable use
> > cases,
> > > > and
> > > > > the
> > > > > >>> planner can decide which capability is the preferable one based
> > on
> > > > cost
> > > > > >>> model or maybe other mechanism (another use case is exactly
> what
> > > > we're
> > > > > >>> discussing here, users can give the query hint), otherwise
> choose
> > > one
> > > > > >>> interface to implement.
> > > > > >>>
> > > > > >>> Also, this should be clarified for the lookup function related
> > > APIs.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Lincoln Lee
> > > > > >>>
> > > > > >>>
> > > > > >>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
> > > > > >>>
> > > > > >>>> Hi Lincoln,
> > > > > >>>>
> > > > > >>>> > It's better making decisions at the query level when a
> > connector
> > > > has
> > > > > >>>> both
> > > > > >>>> capabilities.
> > > > > >>>>
> > > > > >>>> Can you clarify the mechanism?
> > > > > >>>> - only sync connector: What connector developers should do
> > > > > >>>> - only async connector: What connector developers should do
> > > > > >>>> - both async and sync connector: What connector developers
> > should
> > > do
> > > > > >>>>
> > > > > >>>> Best,
> > > > > >>>> Jingsong
> > > > > >>>>
> > > > > >>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <
> > > lincoln.86xy@gmail.com>
> > > > > >>>> wrote:
> > > > > >>>>
> > > > > >>>> > Hi Jingsong,
> > > > > >>>> >
> > > > > >>>> > Thanks for your feedback!
> > > > > >>>> >
> > > > > >>>> > Yes, the existing HBase connector use an option
> 'lookup.async'
> > > to
> > > > > >>>> control
> > > > > >>>> > its lookup source implementations that exposed to the
> planner,
> > > > > >>>> however it's
> > > > > >>>> > a private option for the HBase connector only, so it will
> not
> > > > affect
> > > > > >>>> the
> > > > > >>>> > common API.
> > > > > >>>> >
> > > > > >>>> > And as discussed in the mailing thread of FLIP-221[1], we
> got
> > a
> > > > > >>>> consensus
> > > > > >>>> > that do not make it as a common option. It's better making
> > > > decisions
> > > > > >>>> at the
> > > > > >>>> > query level when a connector has both capabilities.
> > > > > >>>> >
> > > > > >>>> > So if everything goes well, we should discuss it whether to
> > > > > deprecate
> > > > > >>>> > the 'lookup.async'
> > > > > >>>> > or not for HBase connector after the hint been done.
> > > > > >>>> >
> > > > > >>>> > This will be mentioned in the Compatibility part of this
> > > FLIP[2].
> > > > > >>>> >
> > > > > >>>> > WDYT?
> > > > > >>>> >
> > > > > >>>> > [1]:
> > > > > https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> > > > > >>>> > [2]:
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > > > > >>>> >
> > > > > >>>> > Best,
> > > > > >>>> > Lincoln Lee
> > > > > >>>> >
> > > > > >>>> >
> > > > > >>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> > > > > >>>> >
> > > > > >>>> > > Hi Lincoln,
> > > > > >>>> > >
> > > > > >>>> > > The unified lookup hint is what I want.
> > > > > >>>> > >
> > > > > >>>> > > And I like 'async'='true|false' option.
> > > > > >>>> > >
> > > > > >>>> > > But there is a compatibility issue, as I remember if async
> > is
> > > > > >>>> currently
> > > > > >>>> > > controlled by connector, and this may also require some
> API
> > > > > changes?
> > > > > >>>> > >
> > > > > >>>> > > We need to have a clear story for the connector combined
> > with
> > > > this
> > > > > >>>> > option:
> > > > > >>>> > > - only sync connector
> > > > > >>>> > > - only async connector
> > > > > >>>> > > - both async and sync connector
> > > > > >>>> > >
> > > > > >>>> > > Best,
> > > > > >>>> > > Jingsong
> > > > > >>>> > >
> > > > > >>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <
> > > > > lincoln.86xy@gmail.com
> > > > > >>>> >
> > > > > >>>> > > wrote:
> > > > > >>>> > >
> > > > > >>>> > > > Thanks Jark for your quick response and the consensus!
> > > > > >>>> > > >
> > > > > >>>> > > > And I will update the FLIP after Jingsong or other
> > > developers
> > > > > >>>> confirm
> > > > > >>>> > > that
> > > > > >>>> > > > there is no problem.
> > > > > >>>> > > >
> > > > > >>>> > > > Best,
> > > > > >>>> > > > Lincoln Lee
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > > > > >>>> > > >
> > > > > >>>> > > > > Thanks for the update.
> > > > > >>>> > > > >
> > > > > >>>> > > > > The unified lookup hint looks good to me.
> > > > > >>>> > > > > And thanks for explaining ALLOW_UNORDERED.
> > > > > >>>> > > > >
> > > > > >>>> > > > > Best,
> > > > > >>>> > > > > Jark
> > > > > >>>> > > > >
> > > > > >>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
> > > > > >>>> lincoln.86xy@gmail.com>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > > >
> > > > > >>>> > > > > > Hi Jark & Jingsong,
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Thanks for your feedback!
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 1.) support retry on sync lookup
> > > > > >>>> > > > > > I also agree with supporting it, this will be useful
> > for
> > > > > >>>> connectors
> > > > > >>>> > > > that
> > > > > >>>> > > > > > don't have asynchronous lookup implementations and
> can
> > > > also
> > > > > >>>> solve
> > > > > >>>> > the
> > > > > >>>> > > > > ASYNC
> > > > > >>>> > > > > > non-target problem to some extent(because the
> retrying
> > > is
> > > > > >>>> blocking
> > > > > >>>> > > for
> > > > > >>>> > > > > sync
> > > > > >>>> > > > > > lookup, and may accumulate delay, but it maybe
> > > acceptable
> > > > > for
> > > > > >>>> the
> > > > > >>>> > > case
> > > > > >>>> > > > > that
> > > > > >>>> > > > > > most or all data want to do a delayed lookup).
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > For the api perspective, we can do some unification.
> > > Let's
> > > > > >>>> think of
> > > > > >>>> > > the
> > > > > >>>> > > > > > whole user story for lookup join:
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one:
> > > > LOOKUP
> > > > > by
> > > > > >>>> > > > different
> > > > > >>>> > > > > > hint option values: 'async'='true|false'
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY
> can
> > > > share
> > > > > >>>> the
> > > > > >>>> > > > > > LOOKUP_MISS_RETRY with hint option:
> > > > > 'miss-retry'='true|false'
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > we can use one single hint LOOKUP with different
> hint
> > > > > options
> > > > > >>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to
> > > cover
> > > > > all
> > > > > >>>> > > related
> > > > > >>>> > > > > > functionalities.
> > > > > >>>> > > > > > Compared to multiple hints with different subsets of
> > > > > >>>> > functionality, a
> > > > > >>>> > > > > > single hint may be easier for users to understand
> and
> > > use,
> > > > > and
> > > > > >>>> > > specific
> > > > > >>>> > > > > > parameters can be quickly found through
> documentation
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > the support matrix will be:
> > > > > >>>> > > > > > lookup support async retry
> > > > > >>>> > > > > > sync w/o retry N N
> > > > > >>>> > > > > > sync w/ retry N Y
> > > > > >>>> > > > > > async w/o retry Y N
> > > > > >>>> > > > > > async w/ retry Y Y
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > and the available hint options for each mode:
> > > > > >>>> > > > > > mode support hint options
> > > > > >>>> > > > > > async async'='true'
> > > > > >>>> > > > > > 'output-mode'='ordere|allow-unordered'
> > > > > >>>> > > > > > 'capacity'='100'
> > > > > >>>> > > > > > 'timeout'='180s'
> > > > > >>>> > > > > > retry miss-retry'='true'
> > > > > >>>> > > > > > 'retry-strategy'='fixed-delay'
> > > > > >>>> > > > > > 'delay'='10s'
> > > > > >>>> > > > > > 'max-attempts'='3'
> > > > > >>>> > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > > >>>> > > > > > 'table.exec.async-lookup.output-mode'
> > > > > >>>> > > > > > Yes, make it align with DataStream Api maybe more
> > > > intuitive,
> > > > > >>>> but
> > > > > >>>> > > > there's
> > > > > >>>> > > > > > some difference in table layer that makes the
> > > > > >>>> 'allow-unordered'
> > > > > >>>> > > > > meaningful:
> > > > > >>>> > > > > > updates in the pipeline need to be processed in
> order,
> > > > > >>>> > > ALLOW_UNORDERED
> > > > > >>>> > > > > > means if users allow unordered result, it will
> attempt
> > > to
> > > > > use
> > > > > >>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does
> not
> > > > affect
> > > > > >>>> the
> > > > > >>>> > > > > > correctness of the result, otherwise ORDERED will be
> > > still
> > > > > >>>> used.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Another choice is that when the user specifies
> > unordered
> > > > > mode,
> > > > > >>>> > > planner
> > > > > >>>> > > > > > throws an error when it finds that it may affect
> > > > > correctness.
> > > > > >>>> But
> > > > > >>>> > > this
> > > > > >>>> > > > is
> > > > > >>>> > > > > > not user-friendly and is not consistent with the
> > > customary
> > > > > >>>> > treatment
> > > > > >>>> > > of
> > > > > >>>> > > > > > invalid query hints(best effort).
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > I opened a pr
> > > https://github.com/apache/flink/pull/19759
> > > > > for
> > > > > >>>> the
> > > > > >>>> > new
> > > > > >>>> > > > > > option
> > > > > >>>> > > > > > 'table.exec.async-lookup.output-mode' and also a
> > > > discussion
> > > > > on
> > > > > >>>> > > > > FLINK-27625:
> > > > > >>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup
> > > join(Since
> > > > > the
> > > > > >>>> > changes
> > > > > >>>> > > > > were
> > > > > >>>> > > > > > relatively minor, no new flip was created)
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > If we can reach a consensus on the single unified
> > hint,
> > > > > e.g.,
> > > > > >>>> > LOOKUP,
> > > > > >>>> > > > > then
> > > > > >>>> > > > > > FLINK-27625 can be covered.
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > WDYT?
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Best,
> > > > > >>>> > > > > > Lincoln Lee
> > > > > >>>> > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > > Hi Lincoln,
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > Delayed Dim Join is a frequently requested
> feature,
> > > it's
> > > > > >>>> exciting
> > > > > >>>> > > to
> > > > > >>>> > > > > see
> > > > > >>>> > > > > > > this feature is on the road.
> > > > > >>>> > > > > > > The FLIP looks good to me in general. I only left
> > some
> > > > > minor
> > > > > >>>> > > > comments.
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > 1) support retry for sync lookup
> > > > > >>>> > > > > > > I'm also fine with the idea proposed by Jingsong.
> > But
> > > > this
> > > > > >>>> > doesn't
> > > > > >>>> > > > > > conflict
> > > > > >>>> > > > > > > with the FLIP and can
> > > > > >>>> > > > > > > be future work. It would be great if we can
> > determine
> > > > the
> > > > > >>>> APIs
> > > > > >>>> > > first.
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > 1) "allow-unordered" => "unordered"
> > > > > >>>> > > > > > > I would prefer the "unordered" output mode rather
> > than
> > > > > >>>> > > > > "allow-unordered".
> > > > > >>>> > > > > > > Because this fully aligns with the DataStream
> > > behaviors
> > > > > and
> > > > > >>>> > avoids
> > > > > >>>> > > > > > > confusion on the differences.
> > > > > >>>> > > > > > > I understand the purpose that adding a "allow"
> > prefix
> > > > > here,
> > > > > >>>> but I
> > > > > >>>> > > > think
> > > > > >>>> > > > > > the
> > > > > >>>> > > > > > > semantic is fine to just
> > > > > >>>> > > > > > > use "unordered" here. We didn't see any users
> > confused
> > > > > about
> > > > > >>>> > > > > > > OutputMode#UNORDERED.
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > Best,
> > > > > >>>> > > > > > > Jark
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> > > > > >>>> > jingsonglee0@gmail.com>
> > > > > >>>> > > > > > wrote:
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > > > > Thanks Lincoln for your proposal.
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > Take a look at `strategy: fixed-delay delay:
> > > duration,
> > > > > >>>> e.g.,
> > > > > >>>> > 10s
> > > > > >>>> > > > > > > > max-attempts: integer, e.g., 3`.
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > Are these options only for async? It looks like
> > > normal
> > > > > >>>> lookups
> > > > > >>>> > > work
> > > > > >>>> > > > > > too?
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > One thing is: most of the lookup functions seem
> to
> > > be
> > > > > >>>> > synchronous
> > > > > >>>> > > > > now?
> > > > > >>>> > > > > > > > There are not so many asynchronous ones?
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > Best,
> > > > > >>>> > > > > > > > Jingsong
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > > > >>>> > > > lincoln.86xy@gmail.com
> > > > > >>>> > > > > >
> > > > > >>>> > > > > > > > wrote:
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > > > > Hi all,
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > Considering the new common table option
> > > > > >>>> 'lookup.max-retries'
> > > > > >>>> > > > > proposed
> > > > > >>>> > > > > > > in
> > > > > >>>> > > > > > > > > FLIP-221[1] which is commonly used for
> exception
> > > > > >>>> handling in
> > > > > >>>> > > > > > connector
> > > > > >>>> > > > > > > > > implementation, we should clearly distinguish
> > > > > >>>> > > ASYNC_LOOKUP_RETRY
> > > > > >>>> > > > > from
> > > > > >>>> > > > > > > it
> > > > > >>>> > > > > > > > to
> > > > > >>>> > > > > > > > > avoid confusing users.
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can
> change
> > > to
> > > > > >>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name
> > implies,
> > > > > >>>> restrict
> > > > > >>>> > it
> > > > > >>>> > > to
> > > > > >>>> > > > > > > support
> > > > > >>>> > > > > > > > > retries only for lookup misses and no longer
> > > include
> > > > > >>>> > exceptions
> > > > > >>>> > > > > (for
> > > > > >>>> > > > > > > sql
> > > > > >>>> > > > > > > > > connectors, let the connector implementer
> decide
> > > how
> > > > > to
> > > > > >>>> > handle
> > > > > >>>> > > > > > > exceptions
> > > > > >>>> > > > > > > > > since there are various kinds of retryable
> > > > exceptions
> > > > > >>>> and can
> > > > > >>>> > > not
> > > > > >>>> > > > > > retry
> > > > > >>>> > > > > > > > > ones).
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > The FLIP[2] has been updated.
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > [1]
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > >>>> > > > > > > > > [2]
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > Best,
> > > > > >>>> > > > > > > > > Lincoln Lee
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > Lincoln Lee <li...@gmail.com>
> > > 于2022年5月19日周四
> > > > > >>>> 18:24写道:
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > > > > Dear Flink developers,
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > > > I would like to open a discussion on FLIP
> 234
> > > [1]
> > > > to
> > > > > >>>> > support
> > > > > >>>> > > > > > > retryable
> > > > > >>>> > > > > > > > > > lookup join to solve delayed updates issue,
> > as a
> > > > > >>>> pre-work
> > > > > >>>> > for
> > > > > >>>> > > > > this
> > > > > >>>> > > > > > > > > > solution, we proposed FLIP-232[2] which
> adds a
> > > > > generic
> > > > > >>>> > retry
> > > > > >>>> > > > > > support
> > > > > >>>> > > > > > > > for
> > > > > >>>> > > > > > > > > > Async I/O.
> > > > > >>>> > > > > > > > > > We prefer to offer this retry capability via
> > > query
> > > > > >>>> hints,
> > > > > >>>> > > > similar
> > > > > >>>> > > > > > to
> > > > > >>>> > > > > > > > new
> > > > > >>>> > > > > > > > > > join hints proposed in FLINK-27625[3] &
> > > > FLIP-204[4].
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > > > This feature is backwards compatible and
> > > > > >>>> transparently to
> > > > > >>>> > > > > > connectors.
> > > > > >>>> > > > > > > > For
> > > > > >>>> > > > > > > > > > existing connectors which implements
> > > > > >>>> AsyncTableFunction,
> > > > > >>>> > can
> > > > > >>>> > > > > easily
> > > > > >>>> > > > > > > > > enable
> > > > > >>>> > > > > > > > > > async retry via the new join hint.
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > > > [1]
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > >>>> > > > > > > > > > [2]
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > >>>> > > > > > > > > > [3]
> > > > > >>>> > > > > >
> > > > > >>>>
> > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > >>>> > > > > > > > > > [4]
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > > > Best,
> > > > > >>>> > > > > > > > > > Lincoln Lee
> > > > > >>>> > > > > > > > > >
> > > > > >>>> > > > > > > > >
> > > > > >>>> > > > > > > >
> > > > > >>>> > > > > > >
> > > > > >>>> > > > > >
> > > > > >>>> > > > >
> > > > > >>>> > > >
> > > > > >>>> > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi Rascal,

     Yes, you got the right answer for these two questions. If you have
more questions about using Flink, welcome mailing to user support and
questions mailing list: user@flink.apache.org / user-zh@flink.apache.org (
https://flink.apache.org/community.html#mailing-lists). This dev mailing
list is mainly for development related discussions.


Best,
Lincoln Lee


Rascal Wu <st...@gmail.com> 于2022年8月11日周四 13:18写道:

> Thanks Lincoln a lot for your quick response and clear explanation.
>
> It's clear now. So regarding "*For the retry support in FLIP-234, it only
> provide a 'lookup_miss' retry predicate which is to retry if can not find
> it*" , try to double confirm again:
> 1) if the dim record doesn't exist in dim table after retry finished, the
> output will be +[order1, id1, null], if the dim record coming in dim table
> before last retry, the output will be +[order1, id1, xxx], right? (assume
> left join here)
> 2) And if I expect the latest event (+[order1, id1, 12]) update the
> previous event (+[order1, id1, 10]), I should use regular join instead of
> lookup join, right?
>
> Lincoln Lee <li...@gmail.com> 于2022年8月11日周四 12:01写道:
>
> > @Rascal  Thanks for looking at this new feature! There is a lot of
> content
> > in these two flips and we will prepare a more detailed user documentation
> > before the 1.16 release.
> >
> > First of all, unlike regular join, the lookup join only triggers the
> lookup
> > action (access the dimension table) through the records of the stream
> > table, and the updates of the dimension table itself do not actively
> > trigger the updating results to
> > downstream.
> >
> > For your example, whether retry is enabled or not, only the first
> > "+[order1, id1, 10]" will sent to downstream (because only one order
> record
> > has come), the new updates of dimension table "10:02 -> (id1, 12)" will
> not
> > trigger updates.
> >
> > For the retry support in FLIP-234, it only provide a 'lookup_miss' retry
> > predicate which is to retry if can not find it (not always equivalent to
> > the complete join condition [1]), and not to trigger the retry if can
> find
> > a non-null value from the dimension table. If more complex value check
> > logic required, a viable way is to implement a custom
> `AsyncRetryPredicate`
> > in DataStream API (as FLIP-232 introduced).
> >
> > [1]: for different connector, the index-lookup capability maybe
> different,
> > e.g.,  HBase can lookup on rowkey only (by default, without secondary
> > index), while RDBMS can provided more powerful index-lookup capabilities
> > let's see the lookup join example from flink document(
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
> > )
> >
> >
> > if the query with a join condition "ON o.customer_id = c.id and
> c.country
> > =
> > 'US'"
> >
> > -- enrich each order with customer informationSELECT o.order_id,
> > o.total, c.country, c.zipFROM Orders AS o  JOIN Customers FOR
> > SYSTEM_TIME AS OF o.proc_time AS c    ON o.customer_id = c.id and
> > c.country = 'US';
> >
> > for the dimension table in mysql, all of the columns (id & country) can
> be
> > used as index-lookup condition
> >
> > -- Customers is backed by the JDBC connector and can be used for
> > lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> > country STRING,  zip STRING) WITH (  'connector' = 'jdbc',  'url' =
> > 'jdbc:mysql://mysqlhost:3306/customerdb',  'table-name' =
> > 'customers');
> >
> >
> >
> > while if the same table stored in HBase (with no secondary index), only
> the
> > 'id' column (rowkey in HBase) can be the index-lookup condition
> >
> > -- Customers is backed by the JDBC connector and can be used for
> > lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> > country STRING,  zip STRING,
> >     PRIMARY KEY (id) NOT ENFORCED) WITH (  'connector' = 'hbase-xxx',
> > ...);
> >
> >
> > so, the 'lookup_miss' retry predicte may result differently in different
> > connectors.
> >
> > wish this can helps.
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Rascal Wu <st...@gmail.com> 于2022年8月11日周四 10:32写道:
> >
> > > Hi here,
> > >
> > > Sorry for digging up this old email.
> > >
> > > May I consult one question about what's the final behavior we expected
> > > after we enabled retryable mechanism. I viewed the FLIP-232 & FLIP-234,
> > but
> > > I didn't get a clear answer as lacking of some knowledge.
> > >
> > > Let me take a example there, let's assume we have a dim table(id,
> price),
> > > the record(id1) existed in dim table before 10:00, and then it was
> > updated
> > > in business system at 10:01, but delay updated in dim table at 10:02.
> the
> > > record timeline as follow:
> > > 10:00 -> (id1, 10)
> > > 10:00 -> the id1 records was been updated, but delayed persistency or
> > sync
> > > to dim table.
> > > 10:02 -> (id1, 12)
> > >
> > > And a Flink application processes an order record that will join the
> dim
> > > table at 10:01, so it will output an event +[order1, id1, 10].
> > > And if we enable retry mechanism no matter it's sync or async, does it
> > mean
> > > there will be two events sink to downstream:
> > > 1. retract event: -[order1, id1, 10]
> > > 2. new event: +[order1, id1, 12]
> > >
> > > does the above all behavior is what we expected to solve the delay
> update
> > > for dimension table?
> > >
> > >
> > > Lincoln Lee <li...@gmail.com> 于2022年6月7日周二 12:19写道:
> > >
> > > > Hi everyone,
> > > >
> > > > I started a vote for this FLIP [1], please vote there or ask
> additional
> > > > questions here. [2]
> > > >
> > > > [1] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
> > > > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > > >
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > When reviewing the name of the hint option
> > 'miss-retry'='true|false', I
> > > > > feel the current name is not precise enough, it might be easier to
> > > > > understand by using the retry-predicate directly from flip-232,
> > > > > e.g. 'retry-predicate'='lookup-miss', which has the additional
> > benefit
> > > of
> > > > > extensibility(maybe more retry condition in the future).
> > > > >
> > > > > Jark & Jingsong, do you have any suggestions? If we agree with the
> > name
> > > > > 'retry-predicate' or other better candidate, I'll update the FLIP.
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I've updated the FLIP[1] based on this discussion thread that we
> > agree
> > > > to
> > > > >> have a single unified 'LOOKUP' hint and also a related part in
> > > > FLIP-221[2]
> > > > >>  which is mainly for the necessity of the common table option
> > > > >> 'lookup.async'.
> > > > >>
> > > > >> The main updates are:
> > > > >> 1. the new unified 'LOOKUP' hint, make retry support both on sync
> > and
> > > > >> async lookup
> > > > >> 2. clarify the default choice of the planner for those connectors
> > > which
> > > > >> have both sync and async lookup capabilities, and how to deal with
> > the
> > > > >> query hint
> > > > >> 3. will add a followup issue to discuss whether to remove the
> > > > >> 'lookup.async' option in HBase connector.
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > >> [2]
> > https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
> > > > >>
> > > > >> Best,
> > > > >> Lincoln Lee
> > > > >>
> > > > >>
> > > > >> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
> > > > >>
> > > > >>> Hi Jingsong,
> > > > >>>
> > > > >>> There will be no change for connectors with only one capability
> > (sync
> > > > or
> > > > >>> async).
> > > > >>>
> > > > >>> Query hint works in a best effort manner, so if users specifies a
> > > hint
> > > > >>> with invalid option, the query plan keeps unchanged, e.g., use
> > > > >>> LOOKUP('table'='customer', 'async'='true'), but backend lookup
> > source
> > > > only
> > > > >>> implemented the sync lookup function, then the async lookup hint
> > > takes
> > > > no
> > > > >>> effect.
> > > > >>>
> > > > >>> For these connectors which can have both capabilities of async
> and
> > > sync
> > > > >>> lookup, our advice for the connector developer is implementing
> both
> > > > sync
> > > > >>> and async interfaces if both capabilities have suitable use
> cases,
> > > and
> > > > the
> > > > >>> planner can decide which capability is the preferable one based
> on
> > > cost
> > > > >>> model or maybe other mechanism (another use case is exactly what
> > > we're
> > > > >>> discussing here, users can give the query hint), otherwise choose
> > one
> > > > >>> interface to implement.
> > > > >>>
> > > > >>> Also, this should be clarified for the lookup function related
> > APIs.
> > > > >>>
> > > > >>> Best,
> > > > >>> Lincoln Lee
> > > > >>>
> > > > >>>
> > > > >>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
> > > > >>>
> > > > >>>> Hi Lincoln,
> > > > >>>>
> > > > >>>> > It's better making decisions at the query level when a
> connector
> > > has
> > > > >>>> both
> > > > >>>> capabilities.
> > > > >>>>
> > > > >>>> Can you clarify the mechanism?
> > > > >>>> - only sync connector: What connector developers should do
> > > > >>>> - only async connector: What connector developers should do
> > > > >>>> - both async and sync connector: What connector developers
> should
> > do
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Jingsong
> > > > >>>>
> > > > >>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <
> > lincoln.86xy@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> > Hi Jingsong,
> > > > >>>> >
> > > > >>>> > Thanks for your feedback!
> > > > >>>> >
> > > > >>>> > Yes, the existing HBase connector use an option 'lookup.async'
> > to
> > > > >>>> control
> > > > >>>> > its lookup source implementations that exposed to the planner,
> > > > >>>> however it's
> > > > >>>> > a private option for the HBase connector only, so it will not
> > > affect
> > > > >>>> the
> > > > >>>> > common API.
> > > > >>>> >
> > > > >>>> > And as discussed in the mailing thread of FLIP-221[1], we got
> a
> > > > >>>> consensus
> > > > >>>> > that do not make it as a common option. It's better making
> > > decisions
> > > > >>>> at the
> > > > >>>> > query level when a connector has both capabilities.
> > > > >>>> >
> > > > >>>> > So if everything goes well, we should discuss it whether to
> > > > deprecate
> > > > >>>> > the 'lookup.async'
> > > > >>>> > or not for HBase connector after the hint been done.
> > > > >>>> >
> > > > >>>> > This will be mentioned in the Compatibility part of this
> > FLIP[2].
> > > > >>>> >
> > > > >>>> > WDYT?
> > > > >>>> >
> > > > >>>> > [1]:
> > > > https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> > > > >>>> > [2]:
> > > > >>>> >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > > > >>>> >
> > > > >>>> > Best,
> > > > >>>> > Lincoln Lee
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> > > > >>>> >
> > > > >>>> > > Hi Lincoln,
> > > > >>>> > >
> > > > >>>> > > The unified lookup hint is what I want.
> > > > >>>> > >
> > > > >>>> > > And I like 'async'='true|false' option.
> > > > >>>> > >
> > > > >>>> > > But there is a compatibility issue, as I remember if async
> is
> > > > >>>> currently
> > > > >>>> > > controlled by connector, and this may also require some API
> > > > changes?
> > > > >>>> > >
> > > > >>>> > > We need to have a clear story for the connector combined
> with
> > > this
> > > > >>>> > option:
> > > > >>>> > > - only sync connector
> > > > >>>> > > - only async connector
> > > > >>>> > > - both async and sync connector
> > > > >>>> > >
> > > > >>>> > > Best,
> > > > >>>> > > Jingsong
> > > > >>>> > >
> > > > >>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <
> > > > lincoln.86xy@gmail.com
> > > > >>>> >
> > > > >>>> > > wrote:
> > > > >>>> > >
> > > > >>>> > > > Thanks Jark for your quick response and the consensus!
> > > > >>>> > > >
> > > > >>>> > > > And I will update the FLIP after Jingsong or other
> > developers
> > > > >>>> confirm
> > > > >>>> > > that
> > > > >>>> > > > there is no problem.
> > > > >>>> > > >
> > > > >>>> > > > Best,
> > > > >>>> > > > Lincoln Lee
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > > > >>>> > > >
> > > > >>>> > > > > Thanks for the update.
> > > > >>>> > > > >
> > > > >>>> > > > > The unified lookup hint looks good to me.
> > > > >>>> > > > > And thanks for explaining ALLOW_UNORDERED.
> > > > >>>> > > > >
> > > > >>>> > > > > Best,
> > > > >>>> > > > > Jark
> > > > >>>> > > > >
> > > > >>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
> > > > >>>> lincoln.86xy@gmail.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > > > Hi Jark & Jingsong,
> > > > >>>> > > > > >
> > > > >>>> > > > > > Thanks for your feedback!
> > > > >>>> > > > > >
> > > > >>>> > > > > > 1.) support retry on sync lookup
> > > > >>>> > > > > > I also agree with supporting it, this will be useful
> for
> > > > >>>> connectors
> > > > >>>> > > > that
> > > > >>>> > > > > > don't have asynchronous lookup implementations and can
> > > also
> > > > >>>> solve
> > > > >>>> > the
> > > > >>>> > > > > ASYNC
> > > > >>>> > > > > > non-target problem to some extent(because the retrying
> > is
> > > > >>>> blocking
> > > > >>>> > > for
> > > > >>>> > > > > sync
> > > > >>>> > > > > > lookup, and may accumulate delay, but it maybe
> > acceptable
> > > > for
> > > > >>>> the
> > > > >>>> > > case
> > > > >>>> > > > > that
> > > > >>>> > > > > > most or all data want to do a delayed lookup).
> > > > >>>> > > > > >
> > > > >>>> > > > > > For the api perspective, we can do some unification.
> > Let's
> > > > >>>> think of
> > > > >>>> > > the
> > > > >>>> > > > > > whole user story for lookup join:
> > > > >>>> > > > > >
> > > > >>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one:
> > > LOOKUP
> > > > by
> > > > >>>> > > > different
> > > > >>>> > > > > > hint option values: 'async'='true|false'
> > > > >>>> > > > > >
> > > > >>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can
> > > share
> > > > >>>> the
> > > > >>>> > > > > > LOOKUP_MISS_RETRY with hint option:
> > > > 'miss-retry'='true|false'
> > > > >>>> > > > > >
> > > > >>>> > > > > > we can use one single hint LOOKUP with different hint
> > > > options
> > > > >>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to
> > cover
> > > > all
> > > > >>>> > > related
> > > > >>>> > > > > > functionalities.
> > > > >>>> > > > > > Compared to multiple hints with different subsets of
> > > > >>>> > functionality, a
> > > > >>>> > > > > > single hint may be easier for users to understand and
> > use,
> > > > and
> > > > >>>> > > specific
> > > > >>>> > > > > > parameters can be quickly found through documentation
> > > > >>>> > > > > >
> > > > >>>> > > > > > the support matrix will be:
> > > > >>>> > > > > > lookup support async retry
> > > > >>>> > > > > > sync w/o retry N N
> > > > >>>> > > > > > sync w/ retry N Y
> > > > >>>> > > > > > async w/o retry Y N
> > > > >>>> > > > > > async w/ retry Y Y
> > > > >>>> > > > > >
> > > > >>>> > > > > > and the available hint options for each mode:
> > > > >>>> > > > > > mode support hint options
> > > > >>>> > > > > > async async'='true'
> > > > >>>> > > > > > 'output-mode'='ordere|allow-unordered'
> > > > >>>> > > > > > 'capacity'='100'
> > > > >>>> > > > > > 'timeout'='180s'
> > > > >>>> > > > > > retry miss-retry'='true'
> > > > >>>> > > > > > 'retry-strategy'='fixed-delay'
> > > > >>>> > > > > > 'delay'='10s'
> > > > >>>> > > > > > 'max-attempts'='3'
> > > > >>>> > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > >>>> > > > > > 'table.exec.async-lookup.output-mode'
> > > > >>>> > > > > > Yes, make it align with DataStream Api maybe more
> > > intuitive,
> > > > >>>> but
> > > > >>>> > > > there's
> > > > >>>> > > > > > some difference in table layer that makes the
> > > > >>>> 'allow-unordered'
> > > > >>>> > > > > meaningful:
> > > > >>>> > > > > > updates in the pipeline need to be processed in order,
> > > > >>>> > > ALLOW_UNORDERED
> > > > >>>> > > > > > means if users allow unordered result, it will attempt
> > to
> > > > use
> > > > >>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not
> > > affect
> > > > >>>> the
> > > > >>>> > > > > > correctness of the result, otherwise ORDERED will be
> > still
> > > > >>>> used.
> > > > >>>> > > > > >
> > > > >>>> > > > > > Another choice is that when the user specifies
> unordered
> > > > mode,
> > > > >>>> > > planner
> > > > >>>> > > > > > throws an error when it finds that it may affect
> > > > correctness.
> > > > >>>> But
> > > > >>>> > > this
> > > > >>>> > > > is
> > > > >>>> > > > > > not user-friendly and is not consistent with the
> > customary
> > > > >>>> > treatment
> > > > >>>> > > of
> > > > >>>> > > > > > invalid query hints(best effort).
> > > > >>>> > > > > >
> > > > >>>> > > > > > I opened a pr
> > https://github.com/apache/flink/pull/19759
> > > > for
> > > > >>>> the
> > > > >>>> > new
> > > > >>>> > > > > > option
> > > > >>>> > > > > > 'table.exec.async-lookup.output-mode' and also a
> > > discussion
> > > > on
> > > > >>>> > > > > FLINK-27625:
> > > > >>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup
> > join(Since
> > > > the
> > > > >>>> > changes
> > > > >>>> > > > > were
> > > > >>>> > > > > > relatively minor, no new flip was created)
> > > > >>>> > > > > >
> > > > >>>> > > > > > If we can reach a consensus on the single unified
> hint,
> > > > e.g.,
> > > > >>>> > LOOKUP,
> > > > >>>> > > > > then
> > > > >>>> > > > > > FLINK-27625 can be covered.
> > > > >>>> > > > > >
> > > > >>>> > > > > > WDYT?
> > > > >>>> > > > > >
> > > > >>>> > > > > > Best,
> > > > >>>> > > > > > Lincoln Lee
> > > > >>>> > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > > >>>> > > > > >
> > > > >>>> > > > > > > Hi Lincoln,
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > Delayed Dim Join is a frequently requested feature,
> > it's
> > > > >>>> exciting
> > > > >>>> > > to
> > > > >>>> > > > > see
> > > > >>>> > > > > > > this feature is on the road.
> > > > >>>> > > > > > > The FLIP looks good to me in general. I only left
> some
> > > > minor
> > > > >>>> > > > comments.
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > 1) support retry for sync lookup
> > > > >>>> > > > > > > I'm also fine with the idea proposed by Jingsong.
> But
> > > this
> > > > >>>> > doesn't
> > > > >>>> > > > > > conflict
> > > > >>>> > > > > > > with the FLIP and can
> > > > >>>> > > > > > > be future work. It would be great if we can
> determine
> > > the
> > > > >>>> APIs
> > > > >>>> > > first.
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > 1) "allow-unordered" => "unordered"
> > > > >>>> > > > > > > I would prefer the "unordered" output mode rather
> than
> > > > >>>> > > > > "allow-unordered".
> > > > >>>> > > > > > > Because this fully aligns with the DataStream
> > behaviors
> > > > and
> > > > >>>> > avoids
> > > > >>>> > > > > > > confusion on the differences.
> > > > >>>> > > > > > > I understand the purpose that adding a "allow"
> prefix
> > > > here,
> > > > >>>> but I
> > > > >>>> > > > think
> > > > >>>> > > > > > the
> > > > >>>> > > > > > > semantic is fine to just
> > > > >>>> > > > > > > use "unordered" here. We didn't see any users
> confused
> > > > about
> > > > >>>> > > > > > > OutputMode#UNORDERED.
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > Best,
> > > > >>>> > > > > > > Jark
> > > > >>>> > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> > > > >>>> > jingsonglee0@gmail.com>
> > > > >>>> > > > > > wrote:
> > > > >>>> > > > > > >
> > > > >>>> > > > > > > > Thanks Lincoln for your proposal.
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > Take a look at `strategy: fixed-delay delay:
> > duration,
> > > > >>>> e.g.,
> > > > >>>> > 10s
> > > > >>>> > > > > > > > max-attempts: integer, e.g., 3`.
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > Are these options only for async? It looks like
> > normal
> > > > >>>> lookups
> > > > >>>> > > work
> > > > >>>> > > > > > too?
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > One thing is: most of the lookup functions seem to
> > be
> > > > >>>> > synchronous
> > > > >>>> > > > > now?
> > > > >>>> > > > > > > > There are not so many asynchronous ones?
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > Best,
> > > > >>>> > > > > > > > Jingsong
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > > >>>> > > > lincoln.86xy@gmail.com
> > > > >>>> > > > > >
> > > > >>>> > > > > > > > wrote:
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > > > > Hi all,
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > Considering the new common table option
> > > > >>>> 'lookup.max-retries'
> > > > >>>> > > > > proposed
> > > > >>>> > > > > > > in
> > > > >>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
> > > > >>>> handling in
> > > > >>>> > > > > > connector
> > > > >>>> > > > > > > > > implementation, we should clearly distinguish
> > > > >>>> > > ASYNC_LOOKUP_RETRY
> > > > >>>> > > > > from
> > > > >>>> > > > > > > it
> > > > >>>> > > > > > > > to
> > > > >>>> > > > > > > > > avoid confusing users.
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change
> > to
> > > > >>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name
> implies,
> > > > >>>> restrict
> > > > >>>> > it
> > > > >>>> > > to
> > > > >>>> > > > > > > support
> > > > >>>> > > > > > > > > retries only for lookup misses and no longer
> > include
> > > > >>>> > exceptions
> > > > >>>> > > > > (for
> > > > >>>> > > > > > > sql
> > > > >>>> > > > > > > > > connectors, let the connector implementer decide
> > how
> > > > to
> > > > >>>> > handle
> > > > >>>> > > > > > > exceptions
> > > > >>>> > > > > > > > > since there are various kinds of retryable
> > > exceptions
> > > > >>>> and can
> > > > >>>> > > not
> > > > >>>> > > > > > retry
> > > > >>>> > > > > > > > > ones).
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > The FLIP[2] has been updated.
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > [1]
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > >>>> > > > > > > > > [2]
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > Best,
> > > > >>>> > > > > > > > > Lincoln Lee
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > Lincoln Lee <li...@gmail.com>
> > 于2022年5月19日周四
> > > > >>>> 18:24写道:
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > > > > Dear Flink developers,
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > > > I would like to open a discussion on FLIP 234
> > [1]
> > > to
> > > > >>>> > support
> > > > >>>> > > > > > > retryable
> > > > >>>> > > > > > > > > > lookup join to solve delayed updates issue,
> as a
> > > > >>>> pre-work
> > > > >>>> > for
> > > > >>>> > > > > this
> > > > >>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a
> > > > generic
> > > > >>>> > retry
> > > > >>>> > > > > > support
> > > > >>>> > > > > > > > for
> > > > >>>> > > > > > > > > > Async I/O.
> > > > >>>> > > > > > > > > > We prefer to offer this retry capability via
> > query
> > > > >>>> hints,
> > > > >>>> > > > similar
> > > > >>>> > > > > > to
> > > > >>>> > > > > > > > new
> > > > >>>> > > > > > > > > > join hints proposed in FLINK-27625[3] &
> > > FLIP-204[4].
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > > > This feature is backwards compatible and
> > > > >>>> transparently to
> > > > >>>> > > > > > connectors.
> > > > >>>> > > > > > > > For
> > > > >>>> > > > > > > > > > existing connectors which implements
> > > > >>>> AsyncTableFunction,
> > > > >>>> > can
> > > > >>>> > > > > easily
> > > > >>>> > > > > > > > > enable
> > > > >>>> > > > > > > > > > async retry via the new join hint.
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > > > [1]
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > >>>> > > > > > > > > > [2]
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > >>>> > > > > > > > > > [3]
> > > > >>>> > > > > >
> > > > >>>>
> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > >>>> > > > > > > > > > [4]
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > > > Best,
> > > > >>>> > > > > > > > > > Lincoln Lee
> > > > >>>> > > > > > > > > >
> > > > >>>> > > > > > > > >
> > > > >>>> > > > > > > >
> > > > >>>> > > > > > >
> > > > >>>> > > > > >
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Rascal Wu <st...@gmail.com>.
Thanks Lincoln a lot for your quick response and clear explanation.

It's clear now. So regarding "*For the retry support in FLIP-234, it only
provide a 'lookup_miss' retry predicate which is to retry if can not find
it*" , try to double confirm again:
1) if the dim record doesn't exist in dim table after retry finished, the
output will be +[order1, id1, null], if the dim record coming in dim table
before last retry, the output will be +[order1, id1, xxx], right? (assume
left join here)
2) And if I expect the latest event (+[order1, id1, 12]) update the
previous event (+[order1, id1, 10]), I should use regular join instead of
lookup join, right?

Lincoln Lee <li...@gmail.com> 于2022年8月11日周四 12:01写道:

> @Rascal  Thanks for looking at this new feature! There is a lot of content
> in these two flips and we will prepare a more detailed user documentation
> before the 1.16 release.
>
> First of all, unlike regular join, the lookup join only triggers the lookup
> action (access the dimension table) through the records of the stream
> table, and the updates of the dimension table itself do not actively
> trigger the updating results to
> downstream.
>
> For your example, whether retry is enabled or not, only the first
> "+[order1, id1, 10]" will sent to downstream (because only one order record
> has come), the new updates of dimension table "10:02 -> (id1, 12)" will not
> trigger updates.
>
> For the retry support in FLIP-234, it only provide a 'lookup_miss' retry
> predicate which is to retry if can not find it (not always equivalent to
> the complete join condition [1]), and not to trigger the retry if can find
> a non-null value from the dimension table. If more complex value check
> logic required, a viable way is to implement a custom `AsyncRetryPredicate`
> in DataStream API (as FLIP-232 introduced).
>
> [1]: for different connector, the index-lookup capability maybe different,
> e.g.,  HBase can lookup on rowkey only (by default, without secondary
> index), while RDBMS can provided more powerful index-lookup capabilities
> let's see the lookup join example from flink document(
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join
> )
>
>
> if the query with a join condition "ON o.customer_id = c.id and c.country
> =
> 'US'"
>
> -- enrich each order with customer informationSELECT o.order_id,
> o.total, c.country, c.zipFROM Orders AS o  JOIN Customers FOR
> SYSTEM_TIME AS OF o.proc_time AS c    ON o.customer_id = c.id and
> c.country = 'US';
>
> for the dimension table in mysql, all of the columns (id & country) can be
> used as index-lookup condition
>
> -- Customers is backed by the JDBC connector and can be used for
> lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> country STRING,  zip STRING) WITH (  'connector' = 'jdbc',  'url' =
> 'jdbc:mysql://mysqlhost:3306/customerdb',  'table-name' =
> 'customers');
>
>
>
> while if the same table stored in HBase (with no secondary index), only the
> 'id' column (rowkey in HBase) can be the index-lookup condition
>
> -- Customers is backed by the JDBC connector and can be used for
> lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
> country STRING,  zip STRING,
>     PRIMARY KEY (id) NOT ENFORCED) WITH (  'connector' = 'hbase-xxx',
> ...);
>
>
> so, the 'lookup_miss' retry predicte may result differently in different
> connectors.
>
> wish this can helps.
>
>
> Best,
> Lincoln Lee
>
>
> Rascal Wu <st...@gmail.com> 于2022年8月11日周四 10:32写道:
>
> > Hi here,
> >
> > Sorry for digging up this old email.
> >
> > May I consult one question about what's the final behavior we expected
> > after we enabled retryable mechanism. I viewed the FLIP-232 & FLIP-234,
> but
> > I didn't get a clear answer as lacking of some knowledge.
> >
> > Let me take a example there, let's assume we have a dim table(id, price),
> > the record(id1) existed in dim table before 10:00, and then it was
> updated
> > in business system at 10:01, but delay updated in dim table at 10:02. the
> > record timeline as follow:
> > 10:00 -> (id1, 10)
> > 10:00 -> the id1 records was been updated, but delayed persistency or
> sync
> > to dim table.
> > 10:02 -> (id1, 12)
> >
> > And a Flink application processes an order record that will join the dim
> > table at 10:01, so it will output an event +[order1, id1, 10].
> > And if we enable retry mechanism no matter it's sync or async, does it
> mean
> > there will be two events sink to downstream:
> > 1. retract event: -[order1, id1, 10]
> > 2. new event: +[order1, id1, 12]
> >
> > does the above all behavior is what we expected to solve the delay update
> > for dimension table?
> >
> >
> > Lincoln Lee <li...@gmail.com> 于2022年6月7日周二 12:19写道:
> >
> > > Hi everyone,
> > >
> > > I started a vote for this FLIP [1], please vote there or ask additional
> > > questions here. [2]
> > >
> > > [1] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
> > > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:
> > >
> > > > Hi everyone,
> > > >
> > > > When reviewing the name of the hint option
> 'miss-retry'='true|false', I
> > > > feel the current name is not precise enough, it might be easier to
> > > > understand by using the retry-predicate directly from flip-232,
> > > > e.g. 'retry-predicate'='lookup-miss', which has the additional
> benefit
> > of
> > > > extensibility(maybe more retry condition in the future).
> > > >
> > > > Jark & Jingsong, do you have any suggestions? If we agree with the
> name
> > > > 'retry-predicate' or other better candidate, I'll update the FLIP.
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I've updated the FLIP[1] based on this discussion thread that we
> agree
> > > to
> > > >> have a single unified 'LOOKUP' hint and also a related part in
> > > FLIP-221[2]
> > > >>  which is mainly for the necessity of the common table option
> > > >> 'lookup.async'.
> > > >>
> > > >> The main updates are:
> > > >> 1. the new unified 'LOOKUP' hint, make retry support both on sync
> and
> > > >> async lookup
> > > >> 2. clarify the default choice of the planner for those connectors
> > which
> > > >> have both sync and async lookup capabilities, and how to deal with
> the
> > > >> query hint
> > > >> 3. will add a followup issue to discuss whether to remove the
> > > >> 'lookup.async' option in HBase connector.
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > >> [2]
> https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
> > > >>
> > > >> Best,
> > > >> Lincoln Lee
> > > >>
> > > >>
> > > >> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
> > > >>
> > > >>> Hi Jingsong,
> > > >>>
> > > >>> There will be no change for connectors with only one capability
> (sync
> > > or
> > > >>> async).
> > > >>>
> > > >>> Query hint works in a best effort manner, so if users specifies a
> > hint
> > > >>> with invalid option, the query plan keeps unchanged, e.g., use
> > > >>> LOOKUP('table'='customer', 'async'='true'), but backend lookup
> source
> > > only
> > > >>> implemented the sync lookup function, then the async lookup hint
> > takes
> > > no
> > > >>> effect.
> > > >>>
> > > >>> For these connectors which can have both capabilities of async and
> > sync
> > > >>> lookup, our advice for the connector developer is implementing both
> > > sync
> > > >>> and async interfaces if both capabilities have suitable use cases,
> > and
> > > the
> > > >>> planner can decide which capability is the preferable one based on
> > cost
> > > >>> model or maybe other mechanism (another use case is exactly what
> > we're
> > > >>> discussing here, users can give the query hint), otherwise choose
> one
> > > >>> interface to implement.
> > > >>>
> > > >>> Also, this should be clarified for the lookup function related
> APIs.
> > > >>>
> > > >>> Best,
> > > >>> Lincoln Lee
> > > >>>
> > > >>>
> > > >>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
> > > >>>
> > > >>>> Hi Lincoln,
> > > >>>>
> > > >>>> > It's better making decisions at the query level when a connector
> > has
> > > >>>> both
> > > >>>> capabilities.
> > > >>>>
> > > >>>> Can you clarify the mechanism?
> > > >>>> - only sync connector: What connector developers should do
> > > >>>> - only async connector: What connector developers should do
> > > >>>> - both async and sync connector: What connector developers should
> do
> > > >>>>
> > > >>>> Best,
> > > >>>> Jingsong
> > > >>>>
> > > >>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <
> lincoln.86xy@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>> > Hi Jingsong,
> > > >>>> >
> > > >>>> > Thanks for your feedback!
> > > >>>> >
> > > >>>> > Yes, the existing HBase connector use an option 'lookup.async'
> to
> > > >>>> control
> > > >>>> > its lookup source implementations that exposed to the planner,
> > > >>>> however it's
> > > >>>> > a private option for the HBase connector only, so it will not
> > affect
> > > >>>> the
> > > >>>> > common API.
> > > >>>> >
> > > >>>> > And as discussed in the mailing thread of FLIP-221[1], we got a
> > > >>>> consensus
> > > >>>> > that do not make it as a common option. It's better making
> > decisions
> > > >>>> at the
> > > >>>> > query level when a connector has both capabilities.
> > > >>>> >
> > > >>>> > So if everything goes well, we should discuss it whether to
> > > deprecate
> > > >>>> > the 'lookup.async'
> > > >>>> > or not for HBase connector after the hint been done.
> > > >>>> >
> > > >>>> > This will be mentioned in the Compatibility part of this
> FLIP[2].
> > > >>>> >
> > > >>>> > WDYT?
> > > >>>> >
> > > >>>> > [1]:
> > > https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> > > >>>> > [2]:
> > > >>>> >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > > >>>> >
> > > >>>> > Best,
> > > >>>> > Lincoln Lee
> > > >>>> >
> > > >>>> >
> > > >>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> > > >>>> >
> > > >>>> > > Hi Lincoln,
> > > >>>> > >
> > > >>>> > > The unified lookup hint is what I want.
> > > >>>> > >
> > > >>>> > > And I like 'async'='true|false' option.
> > > >>>> > >
> > > >>>> > > But there is a compatibility issue, as I remember if async is
> > > >>>> currently
> > > >>>> > > controlled by connector, and this may also require some API
> > > changes?
> > > >>>> > >
> > > >>>> > > We need to have a clear story for the connector combined with
> > this
> > > >>>> > option:
> > > >>>> > > - only sync connector
> > > >>>> > > - only async connector
> > > >>>> > > - both async and sync connector
> > > >>>> > >
> > > >>>> > > Best,
> > > >>>> > > Jingsong
> > > >>>> > >
> > > >>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <
> > > lincoln.86xy@gmail.com
> > > >>>> >
> > > >>>> > > wrote:
> > > >>>> > >
> > > >>>> > > > Thanks Jark for your quick response and the consensus!
> > > >>>> > > >
> > > >>>> > > > And I will update the FLIP after Jingsong or other
> developers
> > > >>>> confirm
> > > >>>> > > that
> > > >>>> > > > there is no problem.
> > > >>>> > > >
> > > >>>> > > > Best,
> > > >>>> > > > Lincoln Lee
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > > >>>> > > >
> > > >>>> > > > > Thanks for the update.
> > > >>>> > > > >
> > > >>>> > > > > The unified lookup hint looks good to me.
> > > >>>> > > > > And thanks for explaining ALLOW_UNORDERED.
> > > >>>> > > > >
> > > >>>> > > > > Best,
> > > >>>> > > > > Jark
> > > >>>> > > > >
> > > >>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
> > > >>>> lincoln.86xy@gmail.com>
> > > >>>> > > > wrote:
> > > >>>> > > > >
> > > >>>> > > > > > Hi Jark & Jingsong,
> > > >>>> > > > > >
> > > >>>> > > > > > Thanks for your feedback!
> > > >>>> > > > > >
> > > >>>> > > > > > 1.) support retry on sync lookup
> > > >>>> > > > > > I also agree with supporting it, this will be useful for
> > > >>>> connectors
> > > >>>> > > > that
> > > >>>> > > > > > don't have asynchronous lookup implementations and can
> > also
> > > >>>> solve
> > > >>>> > the
> > > >>>> > > > > ASYNC
> > > >>>> > > > > > non-target problem to some extent(because the retrying
> is
> > > >>>> blocking
> > > >>>> > > for
> > > >>>> > > > > sync
> > > >>>> > > > > > lookup, and may accumulate delay, but it maybe
> acceptable
> > > for
> > > >>>> the
> > > >>>> > > case
> > > >>>> > > > > that
> > > >>>> > > > > > most or all data want to do a delayed lookup).
> > > >>>> > > > > >
> > > >>>> > > > > > For the api perspective, we can do some unification.
> Let's
> > > >>>> think of
> > > >>>> > > the
> > > >>>> > > > > > whole user story for lookup join:
> > > >>>> > > > > >
> > > >>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one:
> > LOOKUP
> > > by
> > > >>>> > > > different
> > > >>>> > > > > > hint option values: 'async'='true|false'
> > > >>>> > > > > >
> > > >>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can
> > share
> > > >>>> the
> > > >>>> > > > > > LOOKUP_MISS_RETRY with hint option:
> > > 'miss-retry'='true|false'
> > > >>>> > > > > >
> > > >>>> > > > > > we can use one single hint LOOKUP with different hint
> > > options
> > > >>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to
> cover
> > > all
> > > >>>> > > related
> > > >>>> > > > > > functionalities.
> > > >>>> > > > > > Compared to multiple hints with different subsets of
> > > >>>> > functionality, a
> > > >>>> > > > > > single hint may be easier for users to understand and
> use,
> > > and
> > > >>>> > > specific
> > > >>>> > > > > > parameters can be quickly found through documentation
> > > >>>> > > > > >
> > > >>>> > > > > > the support matrix will be:
> > > >>>> > > > > > lookup support async retry
> > > >>>> > > > > > sync w/o retry N N
> > > >>>> > > > > > sync w/ retry N Y
> > > >>>> > > > > > async w/o retry Y N
> > > >>>> > > > > > async w/ retry Y Y
> > > >>>> > > > > >
> > > >>>> > > > > > and the available hint options for each mode:
> > > >>>> > > > > > mode support hint options
> > > >>>> > > > > > async async'='true'
> > > >>>> > > > > > 'output-mode'='ordere|allow-unordered'
> > > >>>> > > > > > 'capacity'='100'
> > > >>>> > > > > > 'timeout'='180s'
> > > >>>> > > > > > retry miss-retry'='true'
> > > >>>> > > > > > 'retry-strategy'='fixed-delay'
> > > >>>> > > > > > 'delay'='10s'
> > > >>>> > > > > > 'max-attempts'='3'
> > > >>>> > > > > >
> > > >>>> > > > > >
> > > >>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > >>>> > > > > > 'table.exec.async-lookup.output-mode'
> > > >>>> > > > > > Yes, make it align with DataStream Api maybe more
> > intuitive,
> > > >>>> but
> > > >>>> > > > there's
> > > >>>> > > > > > some difference in table layer that makes the
> > > >>>> 'allow-unordered'
> > > >>>> > > > > meaningful:
> > > >>>> > > > > > updates in the pipeline need to be processed in order,
> > > >>>> > > ALLOW_UNORDERED
> > > >>>> > > > > > means if users allow unordered result, it will attempt
> to
> > > use
> > > >>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not
> > affect
> > > >>>> the
> > > >>>> > > > > > correctness of the result, otherwise ORDERED will be
> still
> > > >>>> used.
> > > >>>> > > > > >
> > > >>>> > > > > > Another choice is that when the user specifies unordered
> > > mode,
> > > >>>> > > planner
> > > >>>> > > > > > throws an error when it finds that it may affect
> > > correctness.
> > > >>>> But
> > > >>>> > > this
> > > >>>> > > > is
> > > >>>> > > > > > not user-friendly and is not consistent with the
> customary
> > > >>>> > treatment
> > > >>>> > > of
> > > >>>> > > > > > invalid query hints(best effort).
> > > >>>> > > > > >
> > > >>>> > > > > > I opened a pr
> https://github.com/apache/flink/pull/19759
> > > for
> > > >>>> the
> > > >>>> > new
> > > >>>> > > > > > option
> > > >>>> > > > > > 'table.exec.async-lookup.output-mode' and also a
> > discussion
> > > on
> > > >>>> > > > > FLINK-27625:
> > > >>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup
> join(Since
> > > the
> > > >>>> > changes
> > > >>>> > > > > were
> > > >>>> > > > > > relatively minor, no new flip was created)
> > > >>>> > > > > >
> > > >>>> > > > > > If we can reach a consensus on the single unified hint,
> > > e.g.,
> > > >>>> > LOOKUP,
> > > >>>> > > > > then
> > > >>>> > > > > > FLINK-27625 can be covered.
> > > >>>> > > > > >
> > > >>>> > > > > > WDYT?
> > > >>>> > > > > >
> > > >>>> > > > > > Best,
> > > >>>> > > > > > Lincoln Lee
> > > >>>> > > > > >
> > > >>>> > > > > >
> > > >>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > >>>> > > > > >
> > > >>>> > > > > > > Hi Lincoln,
> > > >>>> > > > > > >
> > > >>>> > > > > > > Delayed Dim Join is a frequently requested feature,
> it's
> > > >>>> exciting
> > > >>>> > > to
> > > >>>> > > > > see
> > > >>>> > > > > > > this feature is on the road.
> > > >>>> > > > > > > The FLIP looks good to me in general. I only left some
> > > minor
> > > >>>> > > > comments.
> > > >>>> > > > > > >
> > > >>>> > > > > > > 1) support retry for sync lookup
> > > >>>> > > > > > > I'm also fine with the idea proposed by Jingsong. But
> > this
> > > >>>> > doesn't
> > > >>>> > > > > > conflict
> > > >>>> > > > > > > with the FLIP and can
> > > >>>> > > > > > > be future work. It would be great if we can determine
> > the
> > > >>>> APIs
> > > >>>> > > first.
> > > >>>> > > > > > >
> > > >>>> > > > > > > 1) "allow-unordered" => "unordered"
> > > >>>> > > > > > > I would prefer the "unordered" output mode rather than
> > > >>>> > > > > "allow-unordered".
> > > >>>> > > > > > > Because this fully aligns with the DataStream
> behaviors
> > > and
> > > >>>> > avoids
> > > >>>> > > > > > > confusion on the differences.
> > > >>>> > > > > > > I understand the purpose that adding a "allow" prefix
> > > here,
> > > >>>> but I
> > > >>>> > > > think
> > > >>>> > > > > > the
> > > >>>> > > > > > > semantic is fine to just
> > > >>>> > > > > > > use "unordered" here. We didn't see any users confused
> > > about
> > > >>>> > > > > > > OutputMode#UNORDERED.
> > > >>>> > > > > > >
> > > >>>> > > > > > > Best,
> > > >>>> > > > > > > Jark
> > > >>>> > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> > > >>>> > jingsonglee0@gmail.com>
> > > >>>> > > > > > wrote:
> > > >>>> > > > > > >
> > > >>>> > > > > > > > Thanks Lincoln for your proposal.
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > Take a look at `strategy: fixed-delay delay:
> duration,
> > > >>>> e.g.,
> > > >>>> > 10s
> > > >>>> > > > > > > > max-attempts: integer, e.g., 3`.
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > Are these options only for async? It looks like
> normal
> > > >>>> lookups
> > > >>>> > > work
> > > >>>> > > > > > too?
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > One thing is: most of the lookup functions seem to
> be
> > > >>>> > synchronous
> > > >>>> > > > > now?
> > > >>>> > > > > > > > There are not so many asynchronous ones?
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > Best,
> > > >>>> > > > > > > > Jingsong
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > >>>> > > > lincoln.86xy@gmail.com
> > > >>>> > > > > >
> > > >>>> > > > > > > > wrote:
> > > >>>> > > > > > > >
> > > >>>> > > > > > > > > Hi all,
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > Considering the new common table option
> > > >>>> 'lookup.max-retries'
> > > >>>> > > > > proposed
> > > >>>> > > > > > > in
> > > >>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
> > > >>>> handling in
> > > >>>> > > > > > connector
> > > >>>> > > > > > > > > implementation, we should clearly distinguish
> > > >>>> > > ASYNC_LOOKUP_RETRY
> > > >>>> > > > > from
> > > >>>> > > > > > > it
> > > >>>> > > > > > > > to
> > > >>>> > > > > > > > > avoid confusing users.
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change
> to
> > > >>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
> > > >>>> restrict
> > > >>>> > it
> > > >>>> > > to
> > > >>>> > > > > > > support
> > > >>>> > > > > > > > > retries only for lookup misses and no longer
> include
> > > >>>> > exceptions
> > > >>>> > > > > (for
> > > >>>> > > > > > > sql
> > > >>>> > > > > > > > > connectors, let the connector implementer decide
> how
> > > to
> > > >>>> > handle
> > > >>>> > > > > > > exceptions
> > > >>>> > > > > > > > > since there are various kinds of retryable
> > exceptions
> > > >>>> and can
> > > >>>> > > not
> > > >>>> > > > > > retry
> > > >>>> > > > > > > > > ones).
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > The FLIP[2] has been updated.
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > [1]
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > >>>> > > > > > > > > [2]
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > Best,
> > > >>>> > > > > > > > > Lincoln Lee
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > Lincoln Lee <li...@gmail.com>
> 于2022年5月19日周四
> > > >>>> 18:24写道:
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > > > > Dear Flink developers,
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > > > I would like to open a discussion on FLIP 234
> [1]
> > to
> > > >>>> > support
> > > >>>> > > > > > > retryable
> > > >>>> > > > > > > > > > lookup join to solve delayed updates issue, as a
> > > >>>> pre-work
> > > >>>> > for
> > > >>>> > > > > this
> > > >>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a
> > > generic
> > > >>>> > retry
> > > >>>> > > > > > support
> > > >>>> > > > > > > > for
> > > >>>> > > > > > > > > > Async I/O.
> > > >>>> > > > > > > > > > We prefer to offer this retry capability via
> query
> > > >>>> hints,
> > > >>>> > > > similar
> > > >>>> > > > > > to
> > > >>>> > > > > > > > new
> > > >>>> > > > > > > > > > join hints proposed in FLINK-27625[3] &
> > FLIP-204[4].
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > > > This feature is backwards compatible and
> > > >>>> transparently to
> > > >>>> > > > > > connectors.
> > > >>>> > > > > > > > For
> > > >>>> > > > > > > > > > existing connectors which implements
> > > >>>> AsyncTableFunction,
> > > >>>> > can
> > > >>>> > > > > easily
> > > >>>> > > > > > > > > enable
> > > >>>> > > > > > > > > > async retry via the new join hint.
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > > > [1]
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > >>>> > > > > > > > > > [2]
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > >>>> > > > > > > > > > [3]
> > > >>>> > > > > >
> > > >>>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > >>>> > > > > > > > > > [4]
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > > > Best,
> > > >>>> > > > > > > > > > Lincoln Lee
> > > >>>> > > > > > > > > >
> > > >>>> > > > > > > > >
> > > >>>> > > > > > > >
> > > >>>> > > > > > >
> > > >>>> > > > > >
> > > >>>> > > > >
> > > >>>> > > >
> > > >>>> > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
@Rascal  Thanks for looking at this new feature! There is a lot of content
in these two flips and we will prepare a more detailed user documentation
before the 1.16 release.

First of all, unlike regular join, the lookup join only triggers the lookup
action (access the dimension table) through the records of the stream
table, and the updates of the dimension table itself do not actively
trigger the updating results to
downstream.

For your example, whether retry is enabled or not, only the first
"+[order1, id1, 10]" will sent to downstream (because only one order record
has come), the new updates of dimension table "10:02 -> (id1, 12)" will not
trigger updates.

For the retry support in FLIP-234, it only provide a 'lookup_miss' retry
predicate which is to retry if can not find it (not always equivalent to
the complete join condition [1]), and not to trigger the retry if can find
a non-null value from the dimension table. If more complex value check
logic required, a viable way is to implement a custom `AsyncRetryPredicate`
in DataStream API (as FLIP-232 introduced).

[1]: for different connector, the index-lookup capability maybe different,
e.g.,  HBase can lookup on rowkey only (by default, without secondary
index), while RDBMS can provided more powerful index-lookup capabilities
let's see the lookup join example from flink document(
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join)


if the query with a join condition "ON o.customer_id = c.id and c.country =
'US'"

-- enrich each order with customer informationSELECT o.order_id,
o.total, c.country, c.zipFROM Orders AS o  JOIN Customers FOR
SYSTEM_TIME AS OF o.proc_time AS c    ON o.customer_id = c.id and
c.country = 'US';

for the dimension table in mysql, all of the columns (id & country) can be
used as index-lookup condition

-- Customers is backed by the JDBC connector and can be used for
lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
country STRING,  zip STRING) WITH (  'connector' = 'jdbc',  'url' =
'jdbc:mysql://mysqlhost:3306/customerdb',  'table-name' =
'customers');



while if the same table stored in HBase (with no secondary index), only the
'id' column (rowkey in HBase) can be the index-lookup condition

-- Customers is backed by the JDBC connector and can be used for
lookup joinsCREATE TEMPORARY TABLE Customers (  id INT,  name STRING,
country STRING,  zip STRING,
    PRIMARY KEY (id) NOT ENFORCED) WITH (  'connector' = 'hbase-xxx',  ...);


so, the 'lookup_miss' retry predicte may result differently in different
connectors.

wish this can helps.


Best,
Lincoln Lee


Rascal Wu <st...@gmail.com> 于2022年8月11日周四 10:32写道:

> Hi here,
>
> Sorry for digging up this old email.
>
> May I consult one question about what's the final behavior we expected
> after we enabled retryable mechanism. I viewed the FLIP-232 & FLIP-234, but
> I didn't get a clear answer as lacking of some knowledge.
>
> Let me take a example there, let's assume we have a dim table(id, price),
> the record(id1) existed in dim table before 10:00, and then it was updated
> in business system at 10:01, but delay updated in dim table at 10:02. the
> record timeline as follow:
> 10:00 -> (id1, 10)
> 10:00 -> the id1 records was been updated, but delayed persistency or sync
> to dim table.
> 10:02 -> (id1, 12)
>
> And a Flink application processes an order record that will join the dim
> table at 10:01, so it will output an event +[order1, id1, 10].
> And if we enable retry mechanism no matter it's sync or async, does it mean
> there will be two events sink to downstream:
> 1. retract event: -[order1, id1, 10]
> 2. new event: +[order1, id1, 12]
>
> does the above all behavior is what we expected to solve the delay update
> for dimension table?
>
>
> Lincoln Lee <li...@gmail.com> 于2022年6月7日周二 12:19写道:
>
> > Hi everyone,
> >
> > I started a vote for this FLIP [1], please vote there or ask additional
> > questions here. [2]
> >
> > [1] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
> > [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:
> >
> > > Hi everyone,
> > >
> > > When reviewing the name of the hint option 'miss-retry'='true|false', I
> > > feel the current name is not precise enough, it might be easier to
> > > understand by using the retry-predicate directly from flip-232,
> > > e.g. 'retry-predicate'='lookup-miss', which has the additional benefit
> of
> > > extensibility(maybe more retry condition in the future).
> > >
> > > Jark & Jingsong, do you have any suggestions? If we agree with the name
> > > 'retry-predicate' or other better candidate, I'll update the FLIP.
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
> > >
> > >> Hi everyone,
> > >>
> > >> I've updated the FLIP[1] based on this discussion thread that we agree
> > to
> > >> have a single unified 'LOOKUP' hint and also a related part in
> > FLIP-221[2]
> > >>  which is mainly for the necessity of the common table option
> > >> 'lookup.async'.
> > >>
> > >> The main updates are:
> > >> 1. the new unified 'LOOKUP' hint, make retry support both on sync and
> > >> async lookup
> > >> 2. clarify the default choice of the planner for those connectors
> which
> > >> have both sync and async lookup capabilities, and how to deal with the
> > >> query hint
> > >> 3. will add a followup issue to discuss whether to remove the
> > >> 'lookup.async' option in HBase connector.
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > >> [2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
> > >>
> > >> Best,
> > >> Lincoln Lee
> > >>
> > >>
> > >> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
> > >>
> > >>> Hi Jingsong,
> > >>>
> > >>> There will be no change for connectors with only one capability (sync
> > or
> > >>> async).
> > >>>
> > >>> Query hint works in a best effort manner, so if users specifies a
> hint
> > >>> with invalid option, the query plan keeps unchanged, e.g., use
> > >>> LOOKUP('table'='customer', 'async'='true'), but backend lookup source
> > only
> > >>> implemented the sync lookup function, then the async lookup hint
> takes
> > no
> > >>> effect.
> > >>>
> > >>> For these connectors which can have both capabilities of async and
> sync
> > >>> lookup, our advice for the connector developer is implementing both
> > sync
> > >>> and async interfaces if both capabilities have suitable use cases,
> and
> > the
> > >>> planner can decide which capability is the preferable one based on
> cost
> > >>> model or maybe other mechanism (another use case is exactly what
> we're
> > >>> discussing here, users can give the query hint), otherwise choose one
> > >>> interface to implement.
> > >>>
> > >>> Also, this should be clarified for the lookup function related APIs.
> > >>>
> > >>> Best,
> > >>> Lincoln Lee
> > >>>
> > >>>
> > >>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
> > >>>
> > >>>> Hi Lincoln,
> > >>>>
> > >>>> > It's better making decisions at the query level when a connector
> has
> > >>>> both
> > >>>> capabilities.
> > >>>>
> > >>>> Can you clarify the mechanism?
> > >>>> - only sync connector: What connector developers should do
> > >>>> - only async connector: What connector developers should do
> > >>>> - both async and sync connector: What connector developers should do
> > >>>>
> > >>>> Best,
> > >>>> Jingsong
> > >>>>
> > >>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>> > Hi Jingsong,
> > >>>> >
> > >>>> > Thanks for your feedback!
> > >>>> >
> > >>>> > Yes, the existing HBase connector use an option 'lookup.async' to
> > >>>> control
> > >>>> > its lookup source implementations that exposed to the planner,
> > >>>> however it's
> > >>>> > a private option for the HBase connector only, so it will not
> affect
> > >>>> the
> > >>>> > common API.
> > >>>> >
> > >>>> > And as discussed in the mailing thread of FLIP-221[1], we got a
> > >>>> consensus
> > >>>> > that do not make it as a common option. It's better making
> decisions
> > >>>> at the
> > >>>> > query level when a connector has both capabilities.
> > >>>> >
> > >>>> > So if everything goes well, we should discuss it whether to
> > deprecate
> > >>>> > the 'lookup.async'
> > >>>> > or not for HBase connector after the hint been done.
> > >>>> >
> > >>>> > This will be mentioned in the Compatibility part of this FLIP[2].
> > >>>> >
> > >>>> > WDYT?
> > >>>> >
> > >>>> > [1]:
> > https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> > >>>> > [2]:
> > >>>> >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> > >>>> >
> > >>>> > Best,
> > >>>> > Lincoln Lee
> > >>>> >
> > >>>> >
> > >>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> > >>>> >
> > >>>> > > Hi Lincoln,
> > >>>> > >
> > >>>> > > The unified lookup hint is what I want.
> > >>>> > >
> > >>>> > > And I like 'async'='true|false' option.
> > >>>> > >
> > >>>> > > But there is a compatibility issue, as I remember if async is
> > >>>> currently
> > >>>> > > controlled by connector, and this may also require some API
> > changes?
> > >>>> > >
> > >>>> > > We need to have a clear story for the connector combined with
> this
> > >>>> > option:
> > >>>> > > - only sync connector
> > >>>> > > - only async connector
> > >>>> > > - both async and sync connector
> > >>>> > >
> > >>>> > > Best,
> > >>>> > > Jingsong
> > >>>> > >
> > >>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <
> > lincoln.86xy@gmail.com
> > >>>> >
> > >>>> > > wrote:
> > >>>> > >
> > >>>> > > > Thanks Jark for your quick response and the consensus!
> > >>>> > > >
> > >>>> > > > And I will update the FLIP after Jingsong or other developers
> > >>>> confirm
> > >>>> > > that
> > >>>> > > > there is no problem.
> > >>>> > > >
> > >>>> > > > Best,
> > >>>> > > > Lincoln Lee
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > >>>> > > >
> > >>>> > > > > Thanks for the update.
> > >>>> > > > >
> > >>>> > > > > The unified lookup hint looks good to me.
> > >>>> > > > > And thanks for explaining ALLOW_UNORDERED.
> > >>>> > > > >
> > >>>> > > > > Best,
> > >>>> > > > > Jark
> > >>>> > > > >
> > >>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
> > >>>> lincoln.86xy@gmail.com>
> > >>>> > > > wrote:
> > >>>> > > > >
> > >>>> > > > > > Hi Jark & Jingsong,
> > >>>> > > > > >
> > >>>> > > > > > Thanks for your feedback!
> > >>>> > > > > >
> > >>>> > > > > > 1.) support retry on sync lookup
> > >>>> > > > > > I also agree with supporting it, this will be useful for
> > >>>> connectors
> > >>>> > > > that
> > >>>> > > > > > don't have asynchronous lookup implementations and can
> also
> > >>>> solve
> > >>>> > the
> > >>>> > > > > ASYNC
> > >>>> > > > > > non-target problem to some extent(because the retrying is
> > >>>> blocking
> > >>>> > > for
> > >>>> > > > > sync
> > >>>> > > > > > lookup, and may accumulate delay, but it maybe acceptable
> > for
> > >>>> the
> > >>>> > > case
> > >>>> > > > > that
> > >>>> > > > > > most or all data want to do a delayed lookup).
> > >>>> > > > > >
> > >>>> > > > > > For the api perspective, we can do some unification. Let's
> > >>>> think of
> > >>>> > > the
> > >>>> > > > > > whole user story for lookup join:
> > >>>> > > > > >
> > >>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one:
> LOOKUP
> > by
> > >>>> > > > different
> > >>>> > > > > > hint option values: 'async'='true|false'
> > >>>> > > > > >
> > >>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can
> share
> > >>>> the
> > >>>> > > > > > LOOKUP_MISS_RETRY with hint option:
> > 'miss-retry'='true|false'
> > >>>> > > > > >
> > >>>> > > > > > we can use one single hint LOOKUP with different hint
> > options
> > >>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover
> > all
> > >>>> > > related
> > >>>> > > > > > functionalities.
> > >>>> > > > > > Compared to multiple hints with different subsets of
> > >>>> > functionality, a
> > >>>> > > > > > single hint may be easier for users to understand and use,
> > and
> > >>>> > > specific
> > >>>> > > > > > parameters can be quickly found through documentation
> > >>>> > > > > >
> > >>>> > > > > > the support matrix will be:
> > >>>> > > > > > lookup support async retry
> > >>>> > > > > > sync w/o retry N N
> > >>>> > > > > > sync w/ retry N Y
> > >>>> > > > > > async w/o retry Y N
> > >>>> > > > > > async w/ retry Y Y
> > >>>> > > > > >
> > >>>> > > > > > and the available hint options for each mode:
> > >>>> > > > > > mode support hint options
> > >>>> > > > > > async async'='true'
> > >>>> > > > > > 'output-mode'='ordere|allow-unordered'
> > >>>> > > > > > 'capacity'='100'
> > >>>> > > > > > 'timeout'='180s'
> > >>>> > > > > > retry miss-retry'='true'
> > >>>> > > > > > 'retry-strategy'='fixed-delay'
> > >>>> > > > > > 'delay'='10s'
> > >>>> > > > > > 'max-attempts'='3'
> > >>>> > > > > >
> > >>>> > > > > >
> > >>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> > >>>> > > > > > 'table.exec.async-lookup.output-mode'
> > >>>> > > > > > Yes, make it align with DataStream Api maybe more
> intuitive,
> > >>>> but
> > >>>> > > > there's
> > >>>> > > > > > some difference in table layer that makes the
> > >>>> 'allow-unordered'
> > >>>> > > > > meaningful:
> > >>>> > > > > > updates in the pipeline need to be processed in order,
> > >>>> > > ALLOW_UNORDERED
> > >>>> > > > > > means if users allow unordered result, it will attempt to
> > use
> > >>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not
> affect
> > >>>> the
> > >>>> > > > > > correctness of the result, otherwise ORDERED will be still
> > >>>> used.
> > >>>> > > > > >
> > >>>> > > > > > Another choice is that when the user specifies unordered
> > mode,
> > >>>> > > planner
> > >>>> > > > > > throws an error when it finds that it may affect
> > correctness.
> > >>>> But
> > >>>> > > this
> > >>>> > > > is
> > >>>> > > > > > not user-friendly and is not consistent with the customary
> > >>>> > treatment
> > >>>> > > of
> > >>>> > > > > > invalid query hints(best effort).
> > >>>> > > > > >
> > >>>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759
> > for
> > >>>> the
> > >>>> > new
> > >>>> > > > > > option
> > >>>> > > > > > 'table.exec.async-lookup.output-mode' and also a
> discussion
> > on
> > >>>> > > > > FLINK-27625:
> > >>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since
> > the
> > >>>> > changes
> > >>>> > > > > were
> > >>>> > > > > > relatively minor, no new flip was created)
> > >>>> > > > > >
> > >>>> > > > > > If we can reach a consensus on the single unified hint,
> > e.g.,
> > >>>> > LOOKUP,
> > >>>> > > > > then
> > >>>> > > > > > FLINK-27625 can be covered.
> > >>>> > > > > >
> > >>>> > > > > > WDYT?
> > >>>> > > > > >
> > >>>> > > > > > Best,
> > >>>> > > > > > Lincoln Lee
> > >>>> > > > > >
> > >>>> > > > > >
> > >>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > >>>> > > > > >
> > >>>> > > > > > > Hi Lincoln,
> > >>>> > > > > > >
> > >>>> > > > > > > Delayed Dim Join is a frequently requested feature, it's
> > >>>> exciting
> > >>>> > > to
> > >>>> > > > > see
> > >>>> > > > > > > this feature is on the road.
> > >>>> > > > > > > The FLIP looks good to me in general. I only left some
> > minor
> > >>>> > > > comments.
> > >>>> > > > > > >
> > >>>> > > > > > > 1) support retry for sync lookup
> > >>>> > > > > > > I'm also fine with the idea proposed by Jingsong. But
> this
> > >>>> > doesn't
> > >>>> > > > > > conflict
> > >>>> > > > > > > with the FLIP and can
> > >>>> > > > > > > be future work. It would be great if we can determine
> the
> > >>>> APIs
> > >>>> > > first.
> > >>>> > > > > > >
> > >>>> > > > > > > 1) "allow-unordered" => "unordered"
> > >>>> > > > > > > I would prefer the "unordered" output mode rather than
> > >>>> > > > > "allow-unordered".
> > >>>> > > > > > > Because this fully aligns with the DataStream behaviors
> > and
> > >>>> > avoids
> > >>>> > > > > > > confusion on the differences.
> > >>>> > > > > > > I understand the purpose that adding a "allow" prefix
> > here,
> > >>>> but I
> > >>>> > > > think
> > >>>> > > > > > the
> > >>>> > > > > > > semantic is fine to just
> > >>>> > > > > > > use "unordered" here. We didn't see any users confused
> > about
> > >>>> > > > > > > OutputMode#UNORDERED.
> > >>>> > > > > > >
> > >>>> > > > > > > Best,
> > >>>> > > > > > > Jark
> > >>>> > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> > >>>> > jingsonglee0@gmail.com>
> > >>>> > > > > > wrote:
> > >>>> > > > > > >
> > >>>> > > > > > > > Thanks Lincoln for your proposal.
> > >>>> > > > > > > >
> > >>>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration,
> > >>>> e.g.,
> > >>>> > 10s
> > >>>> > > > > > > > max-attempts: integer, e.g., 3`.
> > >>>> > > > > > > >
> > >>>> > > > > > > > Are these options only for async? It looks like normal
> > >>>> lookups
> > >>>> > > work
> > >>>> > > > > > too?
> > >>>> > > > > > > >
> > >>>> > > > > > > > One thing is: most of the lookup functions seem to be
> > >>>> > synchronous
> > >>>> > > > > now?
> > >>>> > > > > > > > There are not so many asynchronous ones?
> > >>>> > > > > > > >
> > >>>> > > > > > > > Best,
> > >>>> > > > > > > > Jingsong
> > >>>> > > > > > > >
> > >>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > >>>> > > > lincoln.86xy@gmail.com
> > >>>> > > > > >
> > >>>> > > > > > > > wrote:
> > >>>> > > > > > > >
> > >>>> > > > > > > > > Hi all,
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > Considering the new common table option
> > >>>> 'lookup.max-retries'
> > >>>> > > > > proposed
> > >>>> > > > > > > in
> > >>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
> > >>>> handling in
> > >>>> > > > > > connector
> > >>>> > > > > > > > > implementation, we should clearly distinguish
> > >>>> > > ASYNC_LOOKUP_RETRY
> > >>>> > > > > from
> > >>>> > > > > > > it
> > >>>> > > > > > > > to
> > >>>> > > > > > > > > avoid confusing users.
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > >>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
> > >>>> restrict
> > >>>> > it
> > >>>> > > to
> > >>>> > > > > > > support
> > >>>> > > > > > > > > retries only for lookup misses and no longer include
> > >>>> > exceptions
> > >>>> > > > > (for
> > >>>> > > > > > > sql
> > >>>> > > > > > > > > connectors, let the connector implementer decide how
> > to
> > >>>> > handle
> > >>>> > > > > > > exceptions
> > >>>> > > > > > > > > since there are various kinds of retryable
> exceptions
> > >>>> and can
> > >>>> > > not
> > >>>> > > > > > retry
> > >>>> > > > > > > > > ones).
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > The FLIP[2] has been updated.
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > [1]
> > >>>> > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > >>>> > > > > > > > > [2]
> > >>>> > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > >>>> > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > Best,
> > >>>> > > > > > > > > Lincoln Lee
> > >>>> > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
> > >>>> 18:24写道:
> > >>>> > > > > > > > >
> > >>>> > > > > > > > > > Dear Flink developers,
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1]
> to
> > >>>> > support
> > >>>> > > > > > > retryable
> > >>>> > > > > > > > > > lookup join to solve delayed updates issue, as a
> > >>>> pre-work
> > >>>> > for
> > >>>> > > > > this
> > >>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a
> > generic
> > >>>> > retry
> > >>>> > > > > > support
> > >>>> > > > > > > > for
> > >>>> > > > > > > > > > Async I/O.
> > >>>> > > > > > > > > > We prefer to offer this retry capability via query
> > >>>> hints,
> > >>>> > > > similar
> > >>>> > > > > > to
> > >>>> > > > > > > > new
> > >>>> > > > > > > > > > join hints proposed in FLINK-27625[3] &
> FLIP-204[4].
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > > > This feature is backwards compatible and
> > >>>> transparently to
> > >>>> > > > > > connectors.
> > >>>> > > > > > > > For
> > >>>> > > > > > > > > > existing connectors which implements
> > >>>> AsyncTableFunction,
> > >>>> > can
> > >>>> > > > > easily
> > >>>> > > > > > > > > enable
> > >>>> > > > > > > > > > async retry via the new join hint.
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > > > [1]
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > >>>> > > > > > > > > > [2]
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > >>>> > > > > > > > > > [3]
> > >>>> > > > > >
> > >>>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > >>>> > > > > > > > > > [4]
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > > > Best,
> > >>>> > > > > > > > > > Lincoln Lee
> > >>>> > > > > > > > > >
> > >>>> > > > > > > > >
> > >>>> > > > > > > >
> > >>>> > > > > > >
> > >>>> > > > > >
> > >>>> > > > >
> > >>>> > > >
> > >>>> > >
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Rascal Wu <st...@gmail.com>.
Hi here,

Sorry for digging up this old email.

May I consult one question about what's the final behavior we expected
after we enabled retryable mechanism. I viewed the FLIP-232 & FLIP-234, but
I didn't get a clear answer as lacking of some knowledge.

Let me take a example there, let's assume we have a dim table(id, price),
the record(id1) existed in dim table before 10:00, and then it was updated
in business system at 10:01, but delay updated in dim table at 10:02. the
record timeline as follow:
10:00 -> (id1, 10)
10:00 -> the id1 records was been updated, but delayed persistency or sync
to dim table.
10:02 -> (id1, 12)

And a Flink application processes an order record that will join the dim
table at 10:01, so it will output an event +[order1, id1, 10].
And if we enable retry mechanism no matter it's sync or async, does it mean
there will be two events sink to downstream:
1. retract event: -[order1, id1, 10]
2. new event: +[order1, id1, 12]

does the above all behavior is what we expected to solve the delay update
for dimension table?


Lincoln Lee <li...@gmail.com> 于2022年6月7日周二 12:19写道:

> Hi everyone,
>
> I started a vote for this FLIP [1], please vote there or ask additional
> questions here. [2]
>
> [1] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
> [2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h
>
>
> Best,
> Lincoln Lee
>
>
> Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:
>
> > Hi everyone,
> >
> > When reviewing the name of the hint option 'miss-retry'='true|false', I
> > feel the current name is not precise enough, it might be easier to
> > understand by using the retry-predicate directly from flip-232,
> > e.g. 'retry-predicate'='lookup-miss', which has the additional benefit of
> > extensibility(maybe more retry condition in the future).
> >
> > Jark & Jingsong, do you have any suggestions? If we agree with the name
> > 'retry-predicate' or other better candidate, I'll update the FLIP.
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
> >
> >> Hi everyone,
> >>
> >> I've updated the FLIP[1] based on this discussion thread that we agree
> to
> >> have a single unified 'LOOKUP' hint and also a related part in
> FLIP-221[2]
> >>  which is mainly for the necessity of the common table option
> >> 'lookup.async'.
> >>
> >> The main updates are:
> >> 1. the new unified 'LOOKUP' hint, make retry support both on sync and
> >> async lookup
> >> 2. clarify the default choice of the planner for those connectors which
> >> have both sync and async lookup capabilities, and how to deal with the
> >> query hint
> >> 3. will add a followup issue to discuss whether to remove the
> >> 'lookup.async' option in HBase connector.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >> [2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
> >>
> >> Best,
> >> Lincoln Lee
> >>
> >>
> >> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
> >>
> >>> Hi Jingsong,
> >>>
> >>> There will be no change for connectors with only one capability (sync
> or
> >>> async).
> >>>
> >>> Query hint works in a best effort manner, so if users specifies a hint
> >>> with invalid option, the query plan keeps unchanged, e.g., use
> >>> LOOKUP('table'='customer', 'async'='true'), but backend lookup source
> only
> >>> implemented the sync lookup function, then the async lookup hint takes
> no
> >>> effect.
> >>>
> >>> For these connectors which can have both capabilities of async and sync
> >>> lookup, our advice for the connector developer is implementing both
> sync
> >>> and async interfaces if both capabilities have suitable use cases, and
> the
> >>> planner can decide which capability is the preferable one based on cost
> >>> model or maybe other mechanism (another use case is exactly what we're
> >>> discussing here, users can give the query hint), otherwise choose one
> >>> interface to implement.
> >>>
> >>> Also, this should be clarified for the lookup function related APIs.
> >>>
> >>> Best,
> >>> Lincoln Lee
> >>>
> >>>
> >>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
> >>>
> >>>> Hi Lincoln,
> >>>>
> >>>> > It's better making decisions at the query level when a connector has
> >>>> both
> >>>> capabilities.
> >>>>
> >>>> Can you clarify the mechanism?
> >>>> - only sync connector: What connector developers should do
> >>>> - only async connector: What connector developers should do
> >>>> - both async and sync connector: What connector developers should do
> >>>>
> >>>> Best,
> >>>> Jingsong
> >>>>
> >>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Hi Jingsong,
> >>>> >
> >>>> > Thanks for your feedback!
> >>>> >
> >>>> > Yes, the existing HBase connector use an option 'lookup.async' to
> >>>> control
> >>>> > its lookup source implementations that exposed to the planner,
> >>>> however it's
> >>>> > a private option for the HBase connector only, so it will not affect
> >>>> the
> >>>> > common API.
> >>>> >
> >>>> > And as discussed in the mailing thread of FLIP-221[1], we got a
> >>>> consensus
> >>>> > that do not make it as a common option. It's better making decisions
> >>>> at the
> >>>> > query level when a connector has both capabilities.
> >>>> >
> >>>> > So if everything goes well, we should discuss it whether to
> deprecate
> >>>> > the 'lookup.async'
> >>>> > or not for HBase connector after the hint been done.
> >>>> >
> >>>> > This will be mentioned in the Compatibility part of this FLIP[2].
> >>>> >
> >>>> > WDYT?
> >>>> >
> >>>> > [1]:
> https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> >>>> > [2]:
> >>>> >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> >>>> >
> >>>> > Best,
> >>>> > Lincoln Lee
> >>>> >
> >>>> >
> >>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> >>>> >
> >>>> > > Hi Lincoln,
> >>>> > >
> >>>> > > The unified lookup hint is what I want.
> >>>> > >
> >>>> > > And I like 'async'='true|false' option.
> >>>> > >
> >>>> > > But there is a compatibility issue, as I remember if async is
> >>>> currently
> >>>> > > controlled by connector, and this may also require some API
> changes?
> >>>> > >
> >>>> > > We need to have a clear story for the connector combined with this
> >>>> > option:
> >>>> > > - only sync connector
> >>>> > > - only async connector
> >>>> > > - both async and sync connector
> >>>> > >
> >>>> > > Best,
> >>>> > > Jingsong
> >>>> > >
> >>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <
> lincoln.86xy@gmail.com
> >>>> >
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Thanks Jark for your quick response and the consensus!
> >>>> > > >
> >>>> > > > And I will update the FLIP after Jingsong or other developers
> >>>> confirm
> >>>> > > that
> >>>> > > > there is no problem.
> >>>> > > >
> >>>> > > > Best,
> >>>> > > > Lincoln Lee
> >>>> > > >
> >>>> > > >
> >>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> >>>> > > >
> >>>> > > > > Thanks for the update.
> >>>> > > > >
> >>>> > > > > The unified lookup hint looks good to me.
> >>>> > > > > And thanks for explaining ALLOW_UNORDERED.
> >>>> > > > >
> >>>> > > > > Best,
> >>>> > > > > Jark
> >>>> > > > >
> >>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
> >>>> lincoln.86xy@gmail.com>
> >>>> > > > wrote:
> >>>> > > > >
> >>>> > > > > > Hi Jark & Jingsong,
> >>>> > > > > >
> >>>> > > > > > Thanks for your feedback!
> >>>> > > > > >
> >>>> > > > > > 1.) support retry on sync lookup
> >>>> > > > > > I also agree with supporting it, this will be useful for
> >>>> connectors
> >>>> > > > that
> >>>> > > > > > don't have asynchronous lookup implementations and can also
> >>>> solve
> >>>> > the
> >>>> > > > > ASYNC
> >>>> > > > > > non-target problem to some extent(because the retrying is
> >>>> blocking
> >>>> > > for
> >>>> > > > > sync
> >>>> > > > > > lookup, and may accumulate delay, but it maybe acceptable
> for
> >>>> the
> >>>> > > case
> >>>> > > > > that
> >>>> > > > > > most or all data want to do a delayed lookup).
> >>>> > > > > >
> >>>> > > > > > For the api perspective, we can do some unification. Let's
> >>>> think of
> >>>> > > the
> >>>> > > > > > whole user story for lookup join:
> >>>> > > > > >
> >>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP
> by
> >>>> > > > different
> >>>> > > > > > hint option values: 'async'='true|false'
> >>>> > > > > >
> >>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share
> >>>> the
> >>>> > > > > > LOOKUP_MISS_RETRY with hint option:
> 'miss-retry'='true|false'
> >>>> > > > > >
> >>>> > > > > > we can use one single hint LOOKUP with different hint
> options
> >>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover
> all
> >>>> > > related
> >>>> > > > > > functionalities.
> >>>> > > > > > Compared to multiple hints with different subsets of
> >>>> > functionality, a
> >>>> > > > > > single hint may be easier for users to understand and use,
> and
> >>>> > > specific
> >>>> > > > > > parameters can be quickly found through documentation
> >>>> > > > > >
> >>>> > > > > > the support matrix will be:
> >>>> > > > > > lookup support async retry
> >>>> > > > > > sync w/o retry N N
> >>>> > > > > > sync w/ retry N Y
> >>>> > > > > > async w/o retry Y N
> >>>> > > > > > async w/ retry Y Y
> >>>> > > > > >
> >>>> > > > > > and the available hint options for each mode:
> >>>> > > > > > mode support hint options
> >>>> > > > > > async async'='true'
> >>>> > > > > > 'output-mode'='ordere|allow-unordered'
> >>>> > > > > > 'capacity'='100'
> >>>> > > > > > 'timeout'='180s'
> >>>> > > > > > retry miss-retry'='true'
> >>>> > > > > > 'retry-strategy'='fixed-delay'
> >>>> > > > > > 'delay'='10s'
> >>>> > > > > > 'max-attempts'='3'
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> >>>> > > > > > 'table.exec.async-lookup.output-mode'
> >>>> > > > > > Yes, make it align with DataStream Api maybe more intuitive,
> >>>> but
> >>>> > > > there's
> >>>> > > > > > some difference in table layer that makes the
> >>>> 'allow-unordered'
> >>>> > > > > meaningful:
> >>>> > > > > > updates in the pipeline need to be processed in order,
> >>>> > > ALLOW_UNORDERED
> >>>> > > > > > means if users allow unordered result, it will attempt to
> use
> >>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect
> >>>> the
> >>>> > > > > > correctness of the result, otherwise ORDERED will be still
> >>>> used.
> >>>> > > > > >
> >>>> > > > > > Another choice is that when the user specifies unordered
> mode,
> >>>> > > planner
> >>>> > > > > > throws an error when it finds that it may affect
> correctness.
> >>>> But
> >>>> > > this
> >>>> > > > is
> >>>> > > > > > not user-friendly and is not consistent with the customary
> >>>> > treatment
> >>>> > > of
> >>>> > > > > > invalid query hints(best effort).
> >>>> > > > > >
> >>>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759
> for
> >>>> the
> >>>> > new
> >>>> > > > > > option
> >>>> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion
> on
> >>>> > > > > FLINK-27625:
> >>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since
> the
> >>>> > changes
> >>>> > > > > were
> >>>> > > > > > relatively minor, no new flip was created)
> >>>> > > > > >
> >>>> > > > > > If we can reach a consensus on the single unified hint,
> e.g.,
> >>>> > LOOKUP,
> >>>> > > > > then
> >>>> > > > > > FLINK-27625 can be covered.
> >>>> > > > > >
> >>>> > > > > > WDYT?
> >>>> > > > > >
> >>>> > > > > > Best,
> >>>> > > > > > Lincoln Lee
> >>>> > > > > >
> >>>> > > > > >
> >>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> >>>> > > > > >
> >>>> > > > > > > Hi Lincoln,
> >>>> > > > > > >
> >>>> > > > > > > Delayed Dim Join is a frequently requested feature, it's
> >>>> exciting
> >>>> > > to
> >>>> > > > > see
> >>>> > > > > > > this feature is on the road.
> >>>> > > > > > > The FLIP looks good to me in general. I only left some
> minor
> >>>> > > > comments.
> >>>> > > > > > >
> >>>> > > > > > > 1) support retry for sync lookup
> >>>> > > > > > > I'm also fine with the idea proposed by Jingsong. But this
> >>>> > doesn't
> >>>> > > > > > conflict
> >>>> > > > > > > with the FLIP and can
> >>>> > > > > > > be future work. It would be great if we can determine the
> >>>> APIs
> >>>> > > first.
> >>>> > > > > > >
> >>>> > > > > > > 1) "allow-unordered" => "unordered"
> >>>> > > > > > > I would prefer the "unordered" output mode rather than
> >>>> > > > > "allow-unordered".
> >>>> > > > > > > Because this fully aligns with the DataStream behaviors
> and
> >>>> > avoids
> >>>> > > > > > > confusion on the differences.
> >>>> > > > > > > I understand the purpose that adding a "allow" prefix
> here,
> >>>> but I
> >>>> > > > think
> >>>> > > > > > the
> >>>> > > > > > > semantic is fine to just
> >>>> > > > > > > use "unordered" here. We didn't see any users confused
> about
> >>>> > > > > > > OutputMode#UNORDERED.
> >>>> > > > > > >
> >>>> > > > > > > Best,
> >>>> > > > > > > Jark
> >>>> > > > > > >
> >>>> > > > > > >
> >>>> > > > > > >
> >>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> >>>> > jingsonglee0@gmail.com>
> >>>> > > > > > wrote:
> >>>> > > > > > >
> >>>> > > > > > > > Thanks Lincoln for your proposal.
> >>>> > > > > > > >
> >>>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration,
> >>>> e.g.,
> >>>> > 10s
> >>>> > > > > > > > max-attempts: integer, e.g., 3`.
> >>>> > > > > > > >
> >>>> > > > > > > > Are these options only for async? It looks like normal
> >>>> lookups
> >>>> > > work
> >>>> > > > > > too?
> >>>> > > > > > > >
> >>>> > > > > > > > One thing is: most of the lookup functions seem to be
> >>>> > synchronous
> >>>> > > > > now?
> >>>> > > > > > > > There are not so many asynchronous ones?
> >>>> > > > > > > >
> >>>> > > > > > > > Best,
> >>>> > > > > > > > Jingsong
> >>>> > > > > > > >
> >>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> >>>> > > > lincoln.86xy@gmail.com
> >>>> > > > > >
> >>>> > > > > > > > wrote:
> >>>> > > > > > > >
> >>>> > > > > > > > > Hi all,
> >>>> > > > > > > > >
> >>>> > > > > > > > > Considering the new common table option
> >>>> 'lookup.max-retries'
> >>>> > > > > proposed
> >>>> > > > > > > in
> >>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
> >>>> handling in
> >>>> > > > > > connector
> >>>> > > > > > > > > implementation, we should clearly distinguish
> >>>> > > ASYNC_LOOKUP_RETRY
> >>>> > > > > from
> >>>> > > > > > > it
> >>>> > > > > > > > to
> >>>> > > > > > > > > avoid confusing users.
> >>>> > > > > > > > >
> >>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> >>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
> >>>> restrict
> >>>> > it
> >>>> > > to
> >>>> > > > > > > support
> >>>> > > > > > > > > retries only for lookup misses and no longer include
> >>>> > exceptions
> >>>> > > > > (for
> >>>> > > > > > > sql
> >>>> > > > > > > > > connectors, let the connector implementer decide how
> to
> >>>> > handle
> >>>> > > > > > > exceptions
> >>>> > > > > > > > > since there are various kinds of retryable exceptions
> >>>> and can
> >>>> > > not
> >>>> > > > > > retry
> >>>> > > > > > > > > ones).
> >>>> > > > > > > > >
> >>>> > > > > > > > > The FLIP[2] has been updated.
> >>>> > > > > > > > >
> >>>> > > > > > > > > [1]
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> >>>> > > > > > > > > [2]
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > > Best,
> >>>> > > > > > > > > Lincoln Lee
> >>>> > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
> >>>> 18:24写道:
> >>>> > > > > > > > >
> >>>> > > > > > > > > > Dear Flink developers,
> >>>> > > > > > > > > >
> >>>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
> >>>> > support
> >>>> > > > > > > retryable
> >>>> > > > > > > > > > lookup join to solve delayed updates issue, as a
> >>>> pre-work
> >>>> > for
> >>>> > > > > this
> >>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a
> generic
> >>>> > retry
> >>>> > > > > > support
> >>>> > > > > > > > for
> >>>> > > > > > > > > > Async I/O.
> >>>> > > > > > > > > > We prefer to offer this retry capability via query
> >>>> hints,
> >>>> > > > similar
> >>>> > > > > > to
> >>>> > > > > > > > new
> >>>> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> >>>> > > > > > > > > >
> >>>> > > > > > > > > > This feature is backwards compatible and
> >>>> transparently to
> >>>> > > > > > connectors.
> >>>> > > > > > > > For
> >>>> > > > > > > > > > existing connectors which implements
> >>>> AsyncTableFunction,
> >>>> > can
> >>>> > > > > easily
> >>>> > > > > > > > > enable
> >>>> > > > > > > > > > async retry via the new join hint.
> >>>> > > > > > > > > >
> >>>> > > > > > > > > > [1]
> >>>> > > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >>>> > > > > > > > > > [2]
> >>>> > > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> >>>> > > > > > > > > > [3]
> >>>> > > > > >
> >>>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> >>>> > > > > > > > > > [4]
> >>>> > > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> >>>> > > > > > > > > >
> >>>> > > > > > > > > > Best,
> >>>> > > > > > > > > > Lincoln Lee
> >>>> > > > > > > > > >
> >>>> > > > > > > > >
> >>>> > > > > > > >
> >>>> > > > > > >
> >>>> > > > > >
> >>>> > > > >
> >>>> > > >
> >>>> > >
> >>>> >
> >>>>
> >>>
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi everyone,

I started a vote for this FLIP [1], please vote there or ask additional
questions here. [2]

[1] https://lists.apache.org/thread/bb0kqjs8co3hhmtklmwptws4fc4rz810
[2] https://lists.apache.org/thread/9k1sl2519kh2n3yttwqc00p07xdfns3h


Best,
Lincoln Lee


Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 15:51写道:

> Hi everyone,
>
> When reviewing the name of the hint option 'miss-retry'='true|false', I
> feel the current name is not precise enough, it might be easier to
> understand by using the retry-predicate directly from flip-232,
> e.g. 'retry-predicate'='lookup-miss', which has the additional benefit of
> extensibility(maybe more retry condition in the future).
>
> Jark & Jingsong, do you have any suggestions? If we agree with the name
> 'retry-predicate' or other better candidate, I'll update the FLIP.
>
> Best,
> Lincoln Lee
>
>
> Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:
>
>> Hi everyone,
>>
>> I've updated the FLIP[1] based on this discussion thread that we agree to
>> have a single unified 'LOOKUP' hint and also a related part in FLIP-221[2]
>>  which is mainly for the necessity of the common table option
>> 'lookup.async'.
>>
>> The main updates are:
>> 1. the new unified 'LOOKUP' hint, make retry support both on sync and
>> async lookup
>> 2. clarify the default choice of the planner for those connectors which
>> have both sync and async lookup capabilities, and how to deal with the
>> query hint
>> 3. will add a followup issue to discuss whether to remove the
>> 'lookup.async' option in HBase connector.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>> [2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
>>
>>> Hi Jingsong,
>>>
>>> There will be no change for connectors with only one capability (sync or
>>> async).
>>>
>>> Query hint works in a best effort manner, so if users specifies a hint
>>> with invalid option, the query plan keeps unchanged, e.g., use
>>> LOOKUP('table'='customer', 'async'='true'), but backend lookup source only
>>> implemented the sync lookup function, then the async lookup hint takes no
>>> effect.
>>>
>>> For these connectors which can have both capabilities of async and sync
>>> lookup, our advice for the connector developer is implementing both sync
>>> and async interfaces if both capabilities have suitable use cases, and the
>>> planner can decide which capability is the preferable one based on cost
>>> model or maybe other mechanism (another use case is exactly what we're
>>> discussing here, users can give the query hint), otherwise choose one
>>> interface to implement.
>>>
>>> Also, this should be clarified for the lookup function related APIs.
>>>
>>> Best,
>>> Lincoln Lee
>>>
>>>
>>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
>>>
>>>> Hi Lincoln,
>>>>
>>>> > It's better making decisions at the query level when a connector has
>>>> both
>>>> capabilities.
>>>>
>>>> Can you clarify the mechanism?
>>>> - only sync connector: What connector developers should do
>>>> - only async connector: What connector developers should do
>>>> - both async and sync connector: What connector developers should do
>>>>
>>>> Best,
>>>> Jingsong
>>>>
>>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Jingsong,
>>>> >
>>>> > Thanks for your feedback!
>>>> >
>>>> > Yes, the existing HBase connector use an option 'lookup.async' to
>>>> control
>>>> > its lookup source implementations that exposed to the planner,
>>>> however it's
>>>> > a private option for the HBase connector only, so it will not affect
>>>> the
>>>> > common API.
>>>> >
>>>> > And as discussed in the mailing thread of FLIP-221[1], we got a
>>>> consensus
>>>> > that do not make it as a common option. It's better making decisions
>>>> at the
>>>> > query level when a connector has both capabilities.
>>>> >
>>>> > So if everything goes well, we should discuss it whether to deprecate
>>>> > the 'lookup.async'
>>>> > or not for HBase connector after the hint been done.
>>>> >
>>>> > This will be mentioned in the Compatibility part of this FLIP[2].
>>>> >
>>>> > WDYT?
>>>> >
>>>> > [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
>>>> > [2]:
>>>> >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
>>>> >
>>>> > Best,
>>>> > Lincoln Lee
>>>> >
>>>> >
>>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
>>>> >
>>>> > > Hi Lincoln,
>>>> > >
>>>> > > The unified lookup hint is what I want.
>>>> > >
>>>> > > And I like 'async'='true|false' option.
>>>> > >
>>>> > > But there is a compatibility issue, as I remember if async is
>>>> currently
>>>> > > controlled by connector, and this may also require some API changes?
>>>> > >
>>>> > > We need to have a clear story for the connector combined with this
>>>> > option:
>>>> > > - only sync connector
>>>> > > - only async connector
>>>> > > - both async and sync connector
>>>> > >
>>>> > > Best,
>>>> > > Jingsong
>>>> > >
>>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <lincoln.86xy@gmail.com
>>>> >
>>>> > > wrote:
>>>> > >
>>>> > > > Thanks Jark for your quick response and the consensus!
>>>> > > >
>>>> > > > And I will update the FLIP after Jingsong or other developers
>>>> confirm
>>>> > > that
>>>> > > > there is no problem.
>>>> > > >
>>>> > > > Best,
>>>> > > > Lincoln Lee
>>>> > > >
>>>> > > >
>>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
>>>> > > >
>>>> > > > > Thanks for the update.
>>>> > > > >
>>>> > > > > The unified lookup hint looks good to me.
>>>> > > > > And thanks for explaining ALLOW_UNORDERED.
>>>> > > > >
>>>> > > > > Best,
>>>> > > > > Jark
>>>> > > > >
>>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
>>>> lincoln.86xy@gmail.com>
>>>> > > > wrote:
>>>> > > > >
>>>> > > > > > Hi Jark & Jingsong,
>>>> > > > > >
>>>> > > > > > Thanks for your feedback!
>>>> > > > > >
>>>> > > > > > 1.) support retry on sync lookup
>>>> > > > > > I also agree with supporting it, this will be useful for
>>>> connectors
>>>> > > > that
>>>> > > > > > don't have asynchronous lookup implementations and can also
>>>> solve
>>>> > the
>>>> > > > > ASYNC
>>>> > > > > > non-target problem to some extent(because the retrying is
>>>> blocking
>>>> > > for
>>>> > > > > sync
>>>> > > > > > lookup, and may accumulate delay, but it maybe acceptable for
>>>> the
>>>> > > case
>>>> > > > > that
>>>> > > > > > most or all data want to do a delayed lookup).
>>>> > > > > >
>>>> > > > > > For the api perspective, we can do some unification. Let's
>>>> think of
>>>> > > the
>>>> > > > > > whole user story for lookup join:
>>>> > > > > >
>>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
>>>> > > > different
>>>> > > > > > hint option values: 'async'='true|false'
>>>> > > > > >
>>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share
>>>> the
>>>> > > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
>>>> > > > > >
>>>> > > > > > we can use one single hint LOOKUP with different hint options
>>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
>>>> > > related
>>>> > > > > > functionalities.
>>>> > > > > > Compared to multiple hints with different subsets of
>>>> > functionality, a
>>>> > > > > > single hint may be easier for users to understand and use, and
>>>> > > specific
>>>> > > > > > parameters can be quickly found through documentation
>>>> > > > > >
>>>> > > > > > the support matrix will be:
>>>> > > > > > lookup support async retry
>>>> > > > > > sync w/o retry N N
>>>> > > > > > sync w/ retry N Y
>>>> > > > > > async w/o retry Y N
>>>> > > > > > async w/ retry Y Y
>>>> > > > > >
>>>> > > > > > and the available hint options for each mode:
>>>> > > > > > mode support hint options
>>>> > > > > > async async'='true'
>>>> > > > > > 'output-mode'='ordere|allow-unordered'
>>>> > > > > > 'capacity'='100'
>>>> > > > > > 'timeout'='180s'
>>>> > > > > > retry miss-retry'='true'
>>>> > > > > > 'retry-strategy'='fixed-delay'
>>>> > > > > > 'delay'='10s'
>>>> > > > > > 'max-attempts'='3'
>>>> > > > > >
>>>> > > > > >
>>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
>>>> > > > > > 'table.exec.async-lookup.output-mode'
>>>> > > > > > Yes, make it align with DataStream Api maybe more intuitive,
>>>> but
>>>> > > > there's
>>>> > > > > > some difference in table layer that makes the
>>>> 'allow-unordered'
>>>> > > > > meaningful:
>>>> > > > > > updates in the pipeline need to be processed in order,
>>>> > > ALLOW_UNORDERED
>>>> > > > > > means if users allow unordered result, it will attempt to use
>>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect
>>>> the
>>>> > > > > > correctness of the result, otherwise ORDERED will be still
>>>> used.
>>>> > > > > >
>>>> > > > > > Another choice is that when the user specifies unordered mode,
>>>> > > planner
>>>> > > > > > throws an error when it finds that it may affect correctness.
>>>> But
>>>> > > this
>>>> > > > is
>>>> > > > > > not user-friendly and is not consistent with the customary
>>>> > treatment
>>>> > > of
>>>> > > > > > invalid query hints(best effort).
>>>> > > > > >
>>>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759 for
>>>> the
>>>> > new
>>>> > > > > > option
>>>> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
>>>> > > > > FLINK-27625:
>>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
>>>> > changes
>>>> > > > > were
>>>> > > > > > relatively minor, no new flip was created)
>>>> > > > > >
>>>> > > > > > If we can reach a consensus on the single unified hint, e.g.,
>>>> > LOOKUP,
>>>> > > > > then
>>>> > > > > > FLINK-27625 can be covered.
>>>> > > > > >
>>>> > > > > > WDYT?
>>>> > > > > >
>>>> > > > > > Best,
>>>> > > > > > Lincoln Lee
>>>> > > > > >
>>>> > > > > >
>>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
>>>> > > > > >
>>>> > > > > > > Hi Lincoln,
>>>> > > > > > >
>>>> > > > > > > Delayed Dim Join is a frequently requested feature, it's
>>>> exciting
>>>> > > to
>>>> > > > > see
>>>> > > > > > > this feature is on the road.
>>>> > > > > > > The FLIP looks good to me in general. I only left some minor
>>>> > > > comments.
>>>> > > > > > >
>>>> > > > > > > 1) support retry for sync lookup
>>>> > > > > > > I'm also fine with the idea proposed by Jingsong. But this
>>>> > doesn't
>>>> > > > > > conflict
>>>> > > > > > > with the FLIP and can
>>>> > > > > > > be future work. It would be great if we can determine the
>>>> APIs
>>>> > > first.
>>>> > > > > > >
>>>> > > > > > > 1) "allow-unordered" => "unordered"
>>>> > > > > > > I would prefer the "unordered" output mode rather than
>>>> > > > > "allow-unordered".
>>>> > > > > > > Because this fully aligns with the DataStream behaviors and
>>>> > avoids
>>>> > > > > > > confusion on the differences.
>>>> > > > > > > I understand the purpose that adding a "allow" prefix here,
>>>> but I
>>>> > > > think
>>>> > > > > > the
>>>> > > > > > > semantic is fine to just
>>>> > > > > > > use "unordered" here. We didn't see any users confused about
>>>> > > > > > > OutputMode#UNORDERED.
>>>> > > > > > >
>>>> > > > > > > Best,
>>>> > > > > > > Jark
>>>> > > > > > >
>>>> > > > > > >
>>>> > > > > > >
>>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
>>>> > jingsonglee0@gmail.com>
>>>> > > > > > wrote:
>>>> > > > > > >
>>>> > > > > > > > Thanks Lincoln for your proposal.
>>>> > > > > > > >
>>>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration,
>>>> e.g.,
>>>> > 10s
>>>> > > > > > > > max-attempts: integer, e.g., 3`.
>>>> > > > > > > >
>>>> > > > > > > > Are these options only for async? It looks like normal
>>>> lookups
>>>> > > work
>>>> > > > > > too?
>>>> > > > > > > >
>>>> > > > > > > > One thing is: most of the lookup functions seem to be
>>>> > synchronous
>>>> > > > > now?
>>>> > > > > > > > There are not so many asynchronous ones?
>>>> > > > > > > >
>>>> > > > > > > > Best,
>>>> > > > > > > > Jingsong
>>>> > > > > > > >
>>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
>>>> > > > lincoln.86xy@gmail.com
>>>> > > > > >
>>>> > > > > > > > wrote:
>>>> > > > > > > >
>>>> > > > > > > > > Hi all,
>>>> > > > > > > > >
>>>> > > > > > > > > Considering the new common table option
>>>> 'lookup.max-retries'
>>>> > > > > proposed
>>>> > > > > > > in
>>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
>>>> handling in
>>>> > > > > > connector
>>>> > > > > > > > > implementation, we should clearly distinguish
>>>> > > ASYNC_LOOKUP_RETRY
>>>> > > > > from
>>>> > > > > > > it
>>>> > > > > > > > to
>>>> > > > > > > > > avoid confusing users.
>>>> > > > > > > > >
>>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
>>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
>>>> restrict
>>>> > it
>>>> > > to
>>>> > > > > > > support
>>>> > > > > > > > > retries only for lookup misses and no longer include
>>>> > exceptions
>>>> > > > > (for
>>>> > > > > > > sql
>>>> > > > > > > > > connectors, let the connector implementer decide how to
>>>> > handle
>>>> > > > > > > exceptions
>>>> > > > > > > > > since there are various kinds of retryable exceptions
>>>> and can
>>>> > > not
>>>> > > > > > retry
>>>> > > > > > > > > ones).
>>>> > > > > > > > >
>>>> > > > > > > > > The FLIP[2] has been updated.
>>>> > > > > > > > >
>>>> > > > > > > > > [1]
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
>>>> > > > > > > > > [2]
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > > > Best,
>>>> > > > > > > > > Lincoln Lee
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
>>>> 18:24写道:
>>>> > > > > > > > >
>>>> > > > > > > > > > Dear Flink developers,
>>>> > > > > > > > > >
>>>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
>>>> > support
>>>> > > > > > > retryable
>>>> > > > > > > > > > lookup join to solve delayed updates issue, as a
>>>> pre-work
>>>> > for
>>>> > > > > this
>>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
>>>> > retry
>>>> > > > > > support
>>>> > > > > > > > for
>>>> > > > > > > > > > Async I/O.
>>>> > > > > > > > > > We prefer to offer this retry capability via query
>>>> hints,
>>>> > > > similar
>>>> > > > > > to
>>>> > > > > > > > new
>>>> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
>>>> > > > > > > > > >
>>>> > > > > > > > > > This feature is backwards compatible and
>>>> transparently to
>>>> > > > > > connectors.
>>>> > > > > > > > For
>>>> > > > > > > > > > existing connectors which implements
>>>> AsyncTableFunction,
>>>> > can
>>>> > > > > easily
>>>> > > > > > > > > enable
>>>> > > > > > > > > > async retry via the new join hint.
>>>> > > > > > > > > >
>>>> > > > > > > > > > [1]
>>>> > > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>>>> > > > > > > > > > [2]
>>>> > > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
>>>> > > > > > > > > > [3]
>>>> > > > > >
>>>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
>>>> > > > > > > > > > [4]
>>>> > > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
>>>> > > > > > > > > >
>>>> > > > > > > > > > Best,
>>>> > > > > > > > > > Lincoln Lee
>>>> > > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi everyone,

When reviewing the name of the hint option 'miss-retry'='true|false', I
feel the current name is not precise enough, it might be easier to
understand by using the retry-predicate directly from flip-232,
e.g. 'retry-predicate'='lookup-miss', which has the additional benefit of
extensibility(maybe more retry condition in the future).

Jark & Jingsong, do you have any suggestions? If we agree with the name
'retry-predicate' or other better candidate, I'll update the FLIP.

Best,
Lincoln Lee


Lincoln Lee <li...@gmail.com> 于2022年6月2日周四 11:23写道:

> Hi everyone,
>
> I've updated the FLIP[1] based on this discussion thread that we agree to
> have a single unified 'LOOKUP' hint and also a related part in FLIP-221[2]
>  which is mainly for the necessity of the common table option
> 'lookup.async'.
>
> The main updates are:
> 1. the new unified 'LOOKUP' hint, make retry support both on sync and
> async lookup
> 2. clarify the default choice of the planner for those connectors which
> have both sync and async lookup capabilities, and how to deal with the
> query hint
> 3. will add a followup issue to discuss whether to remove the
> 'lookup.async' option in HBase connector.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> [2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq
>
> Best,
> Lincoln Lee
>
>
> Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:
>
>> Hi Jingsong,
>>
>> There will be no change for connectors with only one capability (sync or
>> async).
>>
>> Query hint works in a best effort manner, so if users specifies a hint
>> with invalid option, the query plan keeps unchanged, e.g., use
>> LOOKUP('table'='customer', 'async'='true'), but backend lookup source only
>> implemented the sync lookup function, then the async lookup hint takes no
>> effect.
>>
>> For these connectors which can have both capabilities of async and sync
>> lookup, our advice for the connector developer is implementing both sync
>> and async interfaces if both capabilities have suitable use cases, and the
>> planner can decide which capability is the preferable one based on cost
>> model or maybe other mechanism (another use case is exactly what we're
>> discussing here, users can give the query hint), otherwise choose one
>> interface to implement.
>>
>> Also, this should be clarified for the lookup function related APIs.
>>
>> Best,
>> Lincoln Lee
>>
>>
>> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
>>
>>> Hi Lincoln,
>>>
>>> > It's better making decisions at the query level when a connector has
>>> both
>>> capabilities.
>>>
>>> Can you clarify the mechanism?
>>> - only sync connector: What connector developers should do
>>> - only async connector: What connector developers should do
>>> - both async and sync connector: What connector developers should do
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com>
>>> wrote:
>>>
>>> > Hi Jingsong,
>>> >
>>> > Thanks for your feedback!
>>> >
>>> > Yes, the existing HBase connector use an option 'lookup.async' to
>>> control
>>> > its lookup source implementations that exposed to the planner, however
>>> it's
>>> > a private option for the HBase connector only, so it will not affect
>>> the
>>> > common API.
>>> >
>>> > And as discussed in the mailing thread of FLIP-221[1], we got a
>>> consensus
>>> > that do not make it as a common option. It's better making decisions
>>> at the
>>> > query level when a connector has both capabilities.
>>> >
>>> > So if everything goes well, we should discuss it whether to deprecate
>>> > the 'lookup.async'
>>> > or not for HBase connector after the hint been done.
>>> >
>>> > This will be mentioned in the Compatibility part of this FLIP[2].
>>> >
>>> > WDYT?
>>> >
>>> > [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
>>> > [2]:
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
>>> >
>>> > Best,
>>> > Lincoln Lee
>>> >
>>> >
>>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
>>> >
>>> > > Hi Lincoln,
>>> > >
>>> > > The unified lookup hint is what I want.
>>> > >
>>> > > And I like 'async'='true|false' option.
>>> > >
>>> > > But there is a compatibility issue, as I remember if async is
>>> currently
>>> > > controlled by connector, and this may also require some API changes?
>>> > >
>>> > > We need to have a clear story for the connector combined with this
>>> > option:
>>> > > - only sync connector
>>> > > - only async connector
>>> > > - both async and sync connector
>>> > >
>>> > > Best,
>>> > > Jingsong
>>> > >
>>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Thanks Jark for your quick response and the consensus!
>>> > > >
>>> > > > And I will update the FLIP after Jingsong or other developers
>>> confirm
>>> > > that
>>> > > > there is no problem.
>>> > > >
>>> > > > Best,
>>> > > > Lincoln Lee
>>> > > >
>>> > > >
>>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
>>> > > >
>>> > > > > Thanks for the update.
>>> > > > >
>>> > > > > The unified lookup hint looks good to me.
>>> > > > > And thanks for explaining ALLOW_UNORDERED.
>>> > > > >
>>> > > > > Best,
>>> > > > > Jark
>>> > > > >
>>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <
>>> lincoln.86xy@gmail.com>
>>> > > > wrote:
>>> > > > >
>>> > > > > > Hi Jark & Jingsong,
>>> > > > > >
>>> > > > > > Thanks for your feedback!
>>> > > > > >
>>> > > > > > 1.) support retry on sync lookup
>>> > > > > > I also agree with supporting it, this will be useful for
>>> connectors
>>> > > > that
>>> > > > > > don't have asynchronous lookup implementations and can also
>>> solve
>>> > the
>>> > > > > ASYNC
>>> > > > > > non-target problem to some extent(because the retrying is
>>> blocking
>>> > > for
>>> > > > > sync
>>> > > > > > lookup, and may accumulate delay, but it maybe acceptable for
>>> the
>>> > > case
>>> > > > > that
>>> > > > > > most or all data want to do a delayed lookup).
>>> > > > > >
>>> > > > > > For the api perspective, we can do some unification. Let's
>>> think of
>>> > > the
>>> > > > > > whole user story for lookup join:
>>> > > > > >
>>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
>>> > > > different
>>> > > > > > hint option values: 'async'='true|false'
>>> > > > > >
>>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
>>> > > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
>>> > > > > >
>>> > > > > > we can use one single hint LOOKUP with different hint options
>>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
>>> > > related
>>> > > > > > functionalities.
>>> > > > > > Compared to multiple hints with different subsets of
>>> > functionality, a
>>> > > > > > single hint may be easier for users to understand and use, and
>>> > > specific
>>> > > > > > parameters can be quickly found through documentation
>>> > > > > >
>>> > > > > > the support matrix will be:
>>> > > > > > lookup support async retry
>>> > > > > > sync w/o retry N N
>>> > > > > > sync w/ retry N Y
>>> > > > > > async w/o retry Y N
>>> > > > > > async w/ retry Y Y
>>> > > > > >
>>> > > > > > and the available hint options for each mode:
>>> > > > > > mode support hint options
>>> > > > > > async async'='true'
>>> > > > > > 'output-mode'='ordere|allow-unordered'
>>> > > > > > 'capacity'='100'
>>> > > > > > 'timeout'='180s'
>>> > > > > > retry miss-retry'='true'
>>> > > > > > 'retry-strategy'='fixed-delay'
>>> > > > > > 'delay'='10s'
>>> > > > > > 'max-attempts'='3'
>>> > > > > >
>>> > > > > >
>>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
>>> > > > > > 'table.exec.async-lookup.output-mode'
>>> > > > > > Yes, make it align with DataStream Api maybe more intuitive,
>>> but
>>> > > > there's
>>> > > > > > some difference in table layer that makes the 'allow-unordered'
>>> > > > > meaningful:
>>> > > > > > updates in the pipeline need to be processed in order,
>>> > > ALLOW_UNORDERED
>>> > > > > > means if users allow unordered result, it will attempt to use
>>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect
>>> the
>>> > > > > > correctness of the result, otherwise ORDERED will be still
>>> used.
>>> > > > > >
>>> > > > > > Another choice is that when the user specifies unordered mode,
>>> > > planner
>>> > > > > > throws an error when it finds that it may affect correctness.
>>> But
>>> > > this
>>> > > > is
>>> > > > > > not user-friendly and is not consistent with the customary
>>> > treatment
>>> > > of
>>> > > > > > invalid query hints(best effort).
>>> > > > > >
>>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759 for
>>> the
>>> > new
>>> > > > > > option
>>> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
>>> > > > > FLINK-27625:
>>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
>>> > changes
>>> > > > > were
>>> > > > > > relatively minor, no new flip was created)
>>> > > > > >
>>> > > > > > If we can reach a consensus on the single unified hint, e.g.,
>>> > LOOKUP,
>>> > > > > then
>>> > > > > > FLINK-27625 can be covered.
>>> > > > > >
>>> > > > > > WDYT?
>>> > > > > >
>>> > > > > > Best,
>>> > > > > > Lincoln Lee
>>> > > > > >
>>> > > > > >
>>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
>>> > > > > >
>>> > > > > > > Hi Lincoln,
>>> > > > > > >
>>> > > > > > > Delayed Dim Join is a frequently requested feature, it's
>>> exciting
>>> > > to
>>> > > > > see
>>> > > > > > > this feature is on the road.
>>> > > > > > > The FLIP looks good to me in general. I only left some minor
>>> > > > comments.
>>> > > > > > >
>>> > > > > > > 1) support retry for sync lookup
>>> > > > > > > I'm also fine with the idea proposed by Jingsong. But this
>>> > doesn't
>>> > > > > > conflict
>>> > > > > > > with the FLIP and can
>>> > > > > > > be future work. It would be great if we can determine the
>>> APIs
>>> > > first.
>>> > > > > > >
>>> > > > > > > 1) "allow-unordered" => "unordered"
>>> > > > > > > I would prefer the "unordered" output mode rather than
>>> > > > > "allow-unordered".
>>> > > > > > > Because this fully aligns with the DataStream behaviors and
>>> > avoids
>>> > > > > > > confusion on the differences.
>>> > > > > > > I understand the purpose that adding a "allow" prefix here,
>>> but I
>>> > > > think
>>> > > > > > the
>>> > > > > > > semantic is fine to just
>>> > > > > > > use "unordered" here. We didn't see any users confused about
>>> > > > > > > OutputMode#UNORDERED.
>>> > > > > > >
>>> > > > > > > Best,
>>> > > > > > > Jark
>>> > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
>>> > jingsonglee0@gmail.com>
>>> > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Thanks Lincoln for your proposal.
>>> > > > > > > >
>>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration,
>>> e.g.,
>>> > 10s
>>> > > > > > > > max-attempts: integer, e.g., 3`.
>>> > > > > > > >
>>> > > > > > > > Are these options only for async? It looks like normal
>>> lookups
>>> > > work
>>> > > > > > too?
>>> > > > > > > >
>>> > > > > > > > One thing is: most of the lookup functions seem to be
>>> > synchronous
>>> > > > > now?
>>> > > > > > > > There are not so many asynchronous ones?
>>> > > > > > > >
>>> > > > > > > > Best,
>>> > > > > > > > Jingsong
>>> > > > > > > >
>>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
>>> > > > lincoln.86xy@gmail.com
>>> > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Hi all,
>>> > > > > > > > >
>>> > > > > > > > > Considering the new common table option
>>> 'lookup.max-retries'
>>> > > > > proposed
>>> > > > > > > in
>>> > > > > > > > > FLIP-221[1] which is commonly used for exception
>>> handling in
>>> > > > > > connector
>>> > > > > > > > > implementation, we should clearly distinguish
>>> > > ASYNC_LOOKUP_RETRY
>>> > > > > from
>>> > > > > > > it
>>> > > > > > > > to
>>> > > > > > > > > avoid confusing users.
>>> > > > > > > > >
>>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
>>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
>>> restrict
>>> > it
>>> > > to
>>> > > > > > > support
>>> > > > > > > > > retries only for lookup misses and no longer include
>>> > exceptions
>>> > > > > (for
>>> > > > > > > sql
>>> > > > > > > > > connectors, let the connector implementer decide how to
>>> > handle
>>> > > > > > > exceptions
>>> > > > > > > > > since there are various kinds of retryable exceptions
>>> and can
>>> > > not
>>> > > > > > retry
>>> > > > > > > > > ones).
>>> > > > > > > > >
>>> > > > > > > > > The FLIP[2] has been updated.
>>> > > > > > > > >
>>> > > > > > > > > [1]
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
>>> > > > > > > > > [2]
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > Best,
>>> > > > > > > > > Lincoln Lee
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
>>> 18:24写道:
>>> > > > > > > > >
>>> > > > > > > > > > Dear Flink developers,
>>> > > > > > > > > >
>>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
>>> > support
>>> > > > > > > retryable
>>> > > > > > > > > > lookup join to solve delayed updates issue, as a
>>> pre-work
>>> > for
>>> > > > > this
>>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
>>> > retry
>>> > > > > > support
>>> > > > > > > > for
>>> > > > > > > > > > Async I/O.
>>> > > > > > > > > > We prefer to offer this retry capability via query
>>> hints,
>>> > > > similar
>>> > > > > > to
>>> > > > > > > > new
>>> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
>>> > > > > > > > > >
>>> > > > > > > > > > This feature is backwards compatible and transparently
>>> to
>>> > > > > > connectors.
>>> > > > > > > > For
>>> > > > > > > > > > existing connectors which implements
>>> AsyncTableFunction,
>>> > can
>>> > > > > easily
>>> > > > > > > > > enable
>>> > > > > > > > > > async retry via the new join hint.
>>> > > > > > > > > >
>>> > > > > > > > > > [1]
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>>> > > > > > > > > > [2]
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
>>> > > > > > > > > > [3]
>>> > > > > >
>>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
>>> > > > > > > > > > [4]
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
>>> > > > > > > > > >
>>> > > > > > > > > > Best,
>>> > > > > > > > > > Lincoln Lee
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi everyone,

I've updated the FLIP[1] based on this discussion thread that we agree to
have a single unified 'LOOKUP' hint and also a related part in FLIP-221[2]
 which is mainly for the necessity of the common table option
'lookup.async'.

The main updates are:
1. the new unified 'LOOKUP' hint, make retry support both on sync and async
lookup
2. clarify the default choice of the planner for those connectors which
have both sync and async lookup capabilities, and how to deal with the
query hint
3. will add a followup issue to discuss whether to remove the
'lookup.async' option in HBase connector.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
[2] https://lists.apache.org/thread/1vokqdnnt01yycl7y1p74g556cc8yvtq

Best,
Lincoln Lee


Lincoln Lee <li...@gmail.com> 于2022年6月1日周三 16:03写道:

> Hi Jingsong,
>
> There will be no change for connectors with only one capability (sync or
> async).
>
> Query hint works in a best effort manner, so if users specifies a hint
> with invalid option, the query plan keeps unchanged, e.g., use
> LOOKUP('table'='customer', 'async'='true'), but backend lookup source only
> implemented the sync lookup function, then the async lookup hint takes no
> effect.
>
> For these connectors which can have both capabilities of async and sync
> lookup, our advice for the connector developer is implementing both sync
> and async interfaces if both capabilities have suitable use cases, and the
> planner can decide which capability is the preferable one based on cost
> model or maybe other mechanism (another use case is exactly what we're
> discussing here, users can give the query hint), otherwise choose one
> interface to implement.
>
> Also, this should be clarified for the lookup function related APIs.
>
> Best,
> Lincoln Lee
>
>
> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:
>
>> Hi Lincoln,
>>
>> > It's better making decisions at the query level when a connector has
>> both
>> capabilities.
>>
>> Can you clarify the mechanism?
>> - only sync connector: What connector developers should do
>> - only async connector: What connector developers should do
>> - both async and sync connector: What connector developers should do
>>
>> Best,
>> Jingsong
>>
>> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com>
>> wrote:
>>
>> > Hi Jingsong,
>> >
>> > Thanks for your feedback!
>> >
>> > Yes, the existing HBase connector use an option 'lookup.async' to
>> control
>> > its lookup source implementations that exposed to the planner, however
>> it's
>> > a private option for the HBase connector only, so it will not affect the
>> > common API.
>> >
>> > And as discussed in the mailing thread of FLIP-221[1], we got a
>> consensus
>> > that do not make it as a common option. It's better making decisions at
>> the
>> > query level when a connector has both capabilities.
>> >
>> > So if everything goes well, we should discuss it whether to deprecate
>> > the 'lookup.async'
>> > or not for HBase connector after the hint been done.
>> >
>> > This will be mentioned in the Compatibility part of this FLIP[2].
>> >
>> > WDYT?
>> >
>> > [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
>> > [2]:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
>> >
>> > Best,
>> > Lincoln Lee
>> >
>> >
>> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
>> >
>> > > Hi Lincoln,
>> > >
>> > > The unified lookup hint is what I want.
>> > >
>> > > And I like 'async'='true|false' option.
>> > >
>> > > But there is a compatibility issue, as I remember if async is
>> currently
>> > > controlled by connector, and this may also require some API changes?
>> > >
>> > > We need to have a clear story for the connector combined with this
>> > option:
>> > > - only sync connector
>> > > - only async connector
>> > > - both async and sync connector
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com>
>> > > wrote:
>> > >
>> > > > Thanks Jark for your quick response and the consensus!
>> > > >
>> > > > And I will update the FLIP after Jingsong or other developers
>> confirm
>> > > that
>> > > > there is no problem.
>> > > >
>> > > > Best,
>> > > > Lincoln Lee
>> > > >
>> > > >
>> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
>> > > >
>> > > > > Thanks for the update.
>> > > > >
>> > > > > The unified lookup hint looks good to me.
>> > > > > And thanks for explaining ALLOW_UNORDERED.
>> > > > >
>> > > > > Best,
>> > > > > Jark
>> > > > >
>> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <lincoln.86xy@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi Jark & Jingsong,
>> > > > > >
>> > > > > > Thanks for your feedback!
>> > > > > >
>> > > > > > 1.) support retry on sync lookup
>> > > > > > I also agree with supporting it, this will be useful for
>> connectors
>> > > > that
>> > > > > > don't have asynchronous lookup implementations and can also
>> solve
>> > the
>> > > > > ASYNC
>> > > > > > non-target problem to some extent(because the retrying is
>> blocking
>> > > for
>> > > > > sync
>> > > > > > lookup, and may accumulate delay, but it maybe acceptable for
>> the
>> > > case
>> > > > > that
>> > > > > > most or all data want to do a delayed lookup).
>> > > > > >
>> > > > > > For the api perspective, we can do some unification. Let's
>> think of
>> > > the
>> > > > > > whole user story for lookup join:
>> > > > > >
>> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
>> > > > different
>> > > > > > hint option values: 'async'='true|false'
>> > > > > >
>> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
>> > > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
>> > > > > >
>> > > > > > we can use one single hint LOOKUP with different hint options
>> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
>> > > related
>> > > > > > functionalities.
>> > > > > > Compared to multiple hints with different subsets of
>> > functionality, a
>> > > > > > single hint may be easier for users to understand and use, and
>> > > specific
>> > > > > > parameters can be quickly found through documentation
>> > > > > >
>> > > > > > the support matrix will be:
>> > > > > > lookup support async retry
>> > > > > > sync w/o retry N N
>> > > > > > sync w/ retry N Y
>> > > > > > async w/o retry Y N
>> > > > > > async w/ retry Y Y
>> > > > > >
>> > > > > > and the available hint options for each mode:
>> > > > > > mode support hint options
>> > > > > > async async'='true'
>> > > > > > 'output-mode'='ordere|allow-unordered'
>> > > > > > 'capacity'='100'
>> > > > > > 'timeout'='180s'
>> > > > > > retry miss-retry'='true'
>> > > > > > 'retry-strategy'='fixed-delay'
>> > > > > > 'delay'='10s'
>> > > > > > 'max-attempts'='3'
>> > > > > >
>> > > > > >
>> > > > > > 2.) 'allow-unordered' vs 'unordered' for
>> > > > > > 'table.exec.async-lookup.output-mode'
>> > > > > > Yes, make it align with DataStream Api maybe more intuitive, but
>> > > > there's
>> > > > > > some difference in table layer that makes the 'allow-unordered'
>> > > > > meaningful:
>> > > > > > updates in the pipeline need to be processed in order,
>> > > ALLOW_UNORDERED
>> > > > > > means if users allow unordered result, it will attempt to use
>> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
>> > > > > > correctness of the result, otherwise ORDERED will be still used.
>> > > > > >
>> > > > > > Another choice is that when the user specifies unordered mode,
>> > > planner
>> > > > > > throws an error when it finds that it may affect correctness.
>> But
>> > > this
>> > > > is
>> > > > > > not user-friendly and is not consistent with the customary
>> > treatment
>> > > of
>> > > > > > invalid query hints(best effort).
>> > > > > >
>> > > > > > I opened a pr https://github.com/apache/flink/pull/19759 for
>> the
>> > new
>> > > > > > option
>> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
>> > > > > FLINK-27625:
>> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
>> > changes
>> > > > > were
>> > > > > > relatively minor, no new flip was created)
>> > > > > >
>> > > > > > If we can reach a consensus on the single unified hint, e.g.,
>> > LOOKUP,
>> > > > > then
>> > > > > > FLINK-27625 can be covered.
>> > > > > >
>> > > > > > WDYT?
>> > > > > >
>> > > > > > Best,
>> > > > > > Lincoln Lee
>> > > > > >
>> > > > > >
>> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
>> > > > > >
>> > > > > > > Hi Lincoln,
>> > > > > > >
>> > > > > > > Delayed Dim Join is a frequently requested feature, it's
>> exciting
>> > > to
>> > > > > see
>> > > > > > > this feature is on the road.
>> > > > > > > The FLIP looks good to me in general. I only left some minor
>> > > > comments.
>> > > > > > >
>> > > > > > > 1) support retry for sync lookup
>> > > > > > > I'm also fine with the idea proposed by Jingsong. But this
>> > doesn't
>> > > > > > conflict
>> > > > > > > with the FLIP and can
>> > > > > > > be future work. It would be great if we can determine the APIs
>> > > first.
>> > > > > > >
>> > > > > > > 1) "allow-unordered" => "unordered"
>> > > > > > > I would prefer the "unordered" output mode rather than
>> > > > > "allow-unordered".
>> > > > > > > Because this fully aligns with the DataStream behaviors and
>> > avoids
>> > > > > > > confusion on the differences.
>> > > > > > > I understand the purpose that adding a "allow" prefix here,
>> but I
>> > > > think
>> > > > > > the
>> > > > > > > semantic is fine to just
>> > > > > > > use "unordered" here. We didn't see any users confused about
>> > > > > > > OutputMode#UNORDERED.
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Jark
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
>> > jingsonglee0@gmail.com>
>> > > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks Lincoln for your proposal.
>> > > > > > > >
>> > > > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g.,
>> > 10s
>> > > > > > > > max-attempts: integer, e.g., 3`.
>> > > > > > > >
>> > > > > > > > Are these options only for async? It looks like normal
>> lookups
>> > > work
>> > > > > > too?
>> > > > > > > >
>> > > > > > > > One thing is: most of the lookup functions seem to be
>> > synchronous
>> > > > > now?
>> > > > > > > > There are not so many asynchronous ones?
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Jingsong
>> > > > > > > >
>> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
>> > > > lincoln.86xy@gmail.com
>> > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi all,
>> > > > > > > > >
>> > > > > > > > > Considering the new common table option
>> 'lookup.max-retries'
>> > > > > proposed
>> > > > > > > in
>> > > > > > > > > FLIP-221[1] which is commonly used for exception handling
>> in
>> > > > > > connector
>> > > > > > > > > implementation, we should clearly distinguish
>> > > ASYNC_LOOKUP_RETRY
>> > > > > from
>> > > > > > > it
>> > > > > > > > to
>> > > > > > > > > avoid confusing users.
>> > > > > > > > >
>> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
>> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies,
>> restrict
>> > it
>> > > to
>> > > > > > > support
>> > > > > > > > > retries only for lookup misses and no longer include
>> > exceptions
>> > > > > (for
>> > > > > > > sql
>> > > > > > > > > connectors, let the connector implementer decide how to
>> > handle
>> > > > > > > exceptions
>> > > > > > > > > since there are various kinds of retryable exceptions and
>> can
>> > > not
>> > > > > > retry
>> > > > > > > > > ones).
>> > > > > > > > >
>> > > > > > > > > The FLIP[2] has been updated.
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
>> > > > > > > > > [2]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Best,
>> > > > > > > > > Lincoln Lee
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
>> 18:24写道:
>> > > > > > > > >
>> > > > > > > > > > Dear Flink developers,
>> > > > > > > > > >
>> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
>> > support
>> > > > > > > retryable
>> > > > > > > > > > lookup join to solve delayed updates issue, as a
>> pre-work
>> > for
>> > > > > this
>> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
>> > retry
>> > > > > > support
>> > > > > > > > for
>> > > > > > > > > > Async I/O.
>> > > > > > > > > > We prefer to offer this retry capability via query
>> hints,
>> > > > similar
>> > > > > > to
>> > > > > > > > new
>> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
>> > > > > > > > > >
>> > > > > > > > > > This feature is backwards compatible and transparently
>> to
>> > > > > > connectors.
>> > > > > > > > For
>> > > > > > > > > > existing connectors which implements AsyncTableFunction,
>> > can
>> > > > > easily
>> > > > > > > > > enable
>> > > > > > > > > > async retry via the new join hint.
>> > > > > > > > > >
>> > > > > > > > > > [1]
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>> > > > > > > > > > [2]
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
>> > > > > > > > > > [3]
>> > > > > >
>> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
>> > > > > > > > > > [4]
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
>> > > > > > > > > >
>> > > > > > > > > > Best,
>> > > > > > > > > > Lincoln Lee
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi Jingsong,

There will be no change for connectors with only one capability (sync or
async).

Query hint works in a best effort manner, so if users specifies a hint with
invalid option, the query plan keeps unchanged, e.g., use
LOOKUP('table'='customer', 'async'='true'), but backend lookup source only
implemented the sync lookup function, then the async lookup hint takes no
effect.

For these connectors which can have both capabilities of async and sync
lookup, our advice for the connector developer is implementing both sync
and async interfaces if both capabilities have suitable use cases, and the
planner can decide which capability is the preferable one based on cost
model or maybe other mechanism (another use case is exactly what we're
discussing here, users can give the query hint), otherwise choose one
interface to implement.

Also, this should be clarified for the lookup function related APIs.

Best,
Lincoln Lee


Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 15:18写道:

> Hi Lincoln,
>
> > It's better making decisions at the query level when a connector has both
> capabilities.
>
> Can you clarify the mechanism?
> - only sync connector: What connector developers should do
> - only async connector: What connector developers should do
> - both async and sync connector: What connector developers should do
>
> Best,
> Jingsong
>
> On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com> wrote:
>
> > Hi Jingsong,
> >
> > Thanks for your feedback!
> >
> > Yes, the existing HBase connector use an option 'lookup.async' to control
> > its lookup source implementations that exposed to the planner, however
> it's
> > a private option for the HBase connector only, so it will not affect the
> > common API.
> >
> > And as discussed in the mailing thread of FLIP-221[1], we got a consensus
> > that do not make it as a common option. It's better making decisions at
> the
> > query level when a connector has both capabilities.
> >
> > So if everything goes well, we should discuss it whether to deprecate
> > the 'lookup.async'
> > or not for HBase connector after the hint been done.
> >
> > This will be mentioned in the Compatibility part of this FLIP[2].
> >
> > WDYT?
> >
> > [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> > [2]:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
> >
> > > Hi Lincoln,
> > >
> > > The unified lookup hint is what I want.
> > >
> > > And I like 'async'='true|false' option.
> > >
> > > But there is a compatibility issue, as I remember if async is currently
> > > controlled by connector, and this may also require some API changes?
> > >
> > > We need to have a clear story for the connector combined with this
> > option:
> > > - only sync connector
> > > - only async connector
> > > - both async and sync connector
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Jark for your quick response and the consensus!
> > > >
> > > > And I will update the FLIP after Jingsong or other developers confirm
> > > that
> > > > there is no problem.
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > > >
> > > > > Thanks for the update.
> > > > >
> > > > > The unified lookup hint looks good to me.
> > > > > And thanks for explaining ALLOW_UNORDERED.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Jark & Jingsong,
> > > > > >
> > > > > > Thanks for your feedback!
> > > > > >
> > > > > > 1.) support retry on sync lookup
> > > > > > I also agree with supporting it, this will be useful for
> connectors
> > > > that
> > > > > > don't have asynchronous lookup implementations and can also solve
> > the
> > > > > ASYNC
> > > > > > non-target problem to some extent(because the retrying is
> blocking
> > > for
> > > > > sync
> > > > > > lookup, and may accumulate delay, but it maybe acceptable for the
> > > case
> > > > > that
> > > > > > most or all data want to do a delayed lookup).
> > > > > >
> > > > > > For the api perspective, we can do some unification. Let's think
> of
> > > the
> > > > > > whole user story for lookup join:
> > > > > >
> > > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
> > > > different
> > > > > > hint option values: 'async'='true|false'
> > > > > >
> > > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> > > > > >
> > > > > > we can use one single hint LOOKUP with different hint options
> > > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
> > > related
> > > > > > functionalities.
> > > > > > Compared to multiple hints with different subsets of
> > functionality, a
> > > > > > single hint may be easier for users to understand and use, and
> > > specific
> > > > > > parameters can be quickly found through documentation
> > > > > >
> > > > > > the support matrix will be:
> > > > > > lookup support async retry
> > > > > > sync w/o retry N N
> > > > > > sync w/ retry N Y
> > > > > > async w/o retry Y N
> > > > > > async w/ retry Y Y
> > > > > >
> > > > > > and the available hint options for each mode:
> > > > > > mode support hint options
> > > > > > async async'='true'
> > > > > > 'output-mode'='ordere|allow-unordered'
> > > > > > 'capacity'='100'
> > > > > > 'timeout'='180s'
> > > > > > retry miss-retry'='true'
> > > > > > 'retry-strategy'='fixed-delay'
> > > > > > 'delay'='10s'
> > > > > > 'max-attempts'='3'
> > > > > >
> > > > > >
> > > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > > > 'table.exec.async-lookup.output-mode'
> > > > > > Yes, make it align with DataStream Api maybe more intuitive, but
> > > > there's
> > > > > > some difference in table layer that makes the 'allow-unordered'
> > > > > meaningful:
> > > > > > updates in the pipeline need to be processed in order,
> > > ALLOW_UNORDERED
> > > > > > means if users allow unordered result, it will attempt to use
> > > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > > > > > correctness of the result, otherwise ORDERED will be still used.
> > > > > >
> > > > > > Another choice is that when the user specifies unordered mode,
> > > planner
> > > > > > throws an error when it finds that it may affect correctness. But
> > > this
> > > > is
> > > > > > not user-friendly and is not consistent with the customary
> > treatment
> > > of
> > > > > > invalid query hints(best effort).
> > > > > >
> > > > > > I opened a pr https://github.com/apache/flink/pull/19759 for the
> > new
> > > > > > option
> > > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
> > > > > FLINK-27625:
> > > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
> > changes
> > > > > were
> > > > > > relatively minor, no new flip was created)
> > > > > >
> > > > > > If we can reach a consensus on the single unified hint, e.g.,
> > LOOKUP,
> > > > > then
> > > > > > FLINK-27625 can be covered.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > > > >
> > > > > > > Hi Lincoln,
> > > > > > >
> > > > > > > Delayed Dim Join is a frequently requested feature, it's
> exciting
> > > to
> > > > > see
> > > > > > > this feature is on the road.
> > > > > > > The FLIP looks good to me in general. I only left some minor
> > > > comments.
> > > > > > >
> > > > > > > 1) support retry for sync lookup
> > > > > > > I'm also fine with the idea proposed by Jingsong. But this
> > doesn't
> > > > > > conflict
> > > > > > > with the FLIP and can
> > > > > > > be future work. It would be great if we can determine the APIs
> > > first.
> > > > > > >
> > > > > > > 1) "allow-unordered" => "unordered"
> > > > > > > I would prefer the "unordered" output mode rather than
> > > > > "allow-unordered".
> > > > > > > Because this fully aligns with the DataStream behaviors and
> > avoids
> > > > > > > confusion on the differences.
> > > > > > > I understand the purpose that adding a "allow" prefix here,
> but I
> > > > think
> > > > > > the
> > > > > > > semantic is fine to just
> > > > > > > use "unordered" here. We didn't see any users confused about
> > > > > > > OutputMode#UNORDERED.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jark
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> > jingsonglee0@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Lincoln for your proposal.
> > > > > > > >
> > > > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g.,
> > 10s
> > > > > > > > max-attempts: integer, e.g., 3`.
> > > > > > > >
> > > > > > > > Are these options only for async? It looks like normal
> lookups
> > > work
> > > > > > too?
> > > > > > > >
> > > > > > > > One thing is: most of the lookup functions seem to be
> > synchronous
> > > > > now?
> > > > > > > > There are not so many asynchronous ones?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jingsong
> > > > > > > >
> > > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > > lincoln.86xy@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Considering the new common table option
> 'lookup.max-retries'
> > > > > proposed
> > > > > > > in
> > > > > > > > > FLIP-221[1] which is commonly used for exception handling
> in
> > > > > > connector
> > > > > > > > > implementation, we should clearly distinguish
> > > ASYNC_LOOKUP_RETRY
> > > > > from
> > > > > > > it
> > > > > > > > to
> > > > > > > > > avoid confusing users.
> > > > > > > > >
> > > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict
> > it
> > > to
> > > > > > > support
> > > > > > > > > retries only for lookup misses and no longer include
> > exceptions
> > > > > (for
> > > > > > > sql
> > > > > > > > > connectors, let the connector implementer decide how to
> > handle
> > > > > > > exceptions
> > > > > > > > > since there are various kinds of retryable exceptions and
> can
> > > not
> > > > > > retry
> > > > > > > > > ones).
> > > > > > > > >
> > > > > > > > > The FLIP[2] has been updated.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Lincoln Lee
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四
> 18:24写道:
> > > > > > > > >
> > > > > > > > > > Dear Flink developers,
> > > > > > > > > >
> > > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
> > support
> > > > > > > retryable
> > > > > > > > > > lookup join to solve delayed updates issue, as a pre-work
> > for
> > > > > this
> > > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
> > retry
> > > > > > support
> > > > > > > > for
> > > > > > > > > > Async I/O.
> > > > > > > > > > We prefer to offer this retry capability via query hints,
> > > > similar
> > > > > > to
> > > > > > > > new
> > > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > > > > > >
> > > > > > > > > > This feature is backwards compatible and transparently to
> > > > > > connectors.
> > > > > > > > For
> > > > > > > > > > existing connectors which implements AsyncTableFunction,
> > can
> > > > > easily
> > > > > > > > > enable
> > > > > > > > > > async retry via the new join hint.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > > > > > [3]
> > > > > > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > > > > > [4]
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Lincoln Lee
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Jingsong Li <ji...@gmail.com>.
Hi Lincoln,

> It's better making decisions at the query level when a connector has both
capabilities.

Can you clarify the mechanism?
- only sync connector: What connector developers should do
- only async connector: What connector developers should do
- both async and sync connector: What connector developers should do

Best,
Jingsong

On Wed, Jun 1, 2022 at 2:29 PM Lincoln Lee <li...@gmail.com> wrote:

> Hi Jingsong,
>
> Thanks for your feedback!
>
> Yes, the existing HBase connector use an option 'lookup.async' to control
> its lookup source implementations that exposed to the planner, however it's
> a private option for the HBase connector only, so it will not affect the
> common API.
>
> And as discussed in the mailing thread of FLIP-221[1], we got a consensus
> that do not make it as a common option. It's better making decisions at the
> query level when a connector has both capabilities.
>
> So if everything goes well, we should discuss it whether to deprecate
> the 'lookup.async'
> or not for HBase connector after the hint been done.
>
> This will be mentioned in the Compatibility part of this FLIP[2].
>
> WDYT?
>
> [1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
> [2]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode
>
> Best,
> Lincoln Lee
>
>
> Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:
>
> > Hi Lincoln,
> >
> > The unified lookup hint is what I want.
> >
> > And I like 'async'='true|false' option.
> >
> > But there is a compatibility issue, as I remember if async is currently
> > controlled by connector, and this may also require some API changes?
> >
> > We need to have a clear story for the connector combined with this
> option:
> > - only sync connector
> > - only async connector
> > - both async and sync connector
> >
> > Best,
> > Jingsong
> >
> > On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com>
> > wrote:
> >
> > > Thanks Jark for your quick response and the consensus!
> > >
> > > And I will update the FLIP after Jingsong or other developers confirm
> > that
> > > there is no problem.
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> > >
> > > > Thanks for the update.
> > > >
> > > > The unified lookup hint looks good to me.
> > > > And thanks for explaining ALLOW_UNORDERED.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Jark & Jingsong,
> > > > >
> > > > > Thanks for your feedback!
> > > > >
> > > > > 1.) support retry on sync lookup
> > > > > I also agree with supporting it, this will be useful for connectors
> > > that
> > > > > don't have asynchronous lookup implementations and can also solve
> the
> > > > ASYNC
> > > > > non-target problem to some extent(because the retrying is blocking
> > for
> > > > sync
> > > > > lookup, and may accumulate delay, but it maybe acceptable for the
> > case
> > > > that
> > > > > most or all data want to do a delayed lookup).
> > > > >
> > > > > For the api perspective, we can do some unification. Let's think of
> > the
> > > > > whole user story for lookup join:
> > > > >
> > > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
> > > different
> > > > > hint option values: 'async'='true|false'
> > > > >
> > > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> > > > >
> > > > > we can use one single hint LOOKUP with different hint options
> > > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
> > related
> > > > > functionalities.
> > > > > Compared to multiple hints with different subsets of
> functionality, a
> > > > > single hint may be easier for users to understand and use, and
> > specific
> > > > > parameters can be quickly found through documentation
> > > > >
> > > > > the support matrix will be:
> > > > > lookup support async retry
> > > > > sync w/o retry N N
> > > > > sync w/ retry N Y
> > > > > async w/o retry Y N
> > > > > async w/ retry Y Y
> > > > >
> > > > > and the available hint options for each mode:
> > > > > mode support hint options
> > > > > async async'='true'
> > > > > 'output-mode'='ordere|allow-unordered'
> > > > > 'capacity'='100'
> > > > > 'timeout'='180s'
> > > > > retry miss-retry'='true'
> > > > > 'retry-strategy'='fixed-delay'
> > > > > 'delay'='10s'
> > > > > 'max-attempts'='3'
> > > > >
> > > > >
> > > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > > 'table.exec.async-lookup.output-mode'
> > > > > Yes, make it align with DataStream Api maybe more intuitive, but
> > > there's
> > > > > some difference in table layer that makes the 'allow-unordered'
> > > > meaningful:
> > > > > updates in the pipeline need to be processed in order,
> > ALLOW_UNORDERED
> > > > > means if users allow unordered result, it will attempt to use
> > > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > > > > correctness of the result, otherwise ORDERED will be still used.
> > > > >
> > > > > Another choice is that when the user specifies unordered mode,
> > planner
> > > > > throws an error when it finds that it may affect correctness. But
> > this
> > > is
> > > > > not user-friendly and is not consistent with the customary
> treatment
> > of
> > > > > invalid query hints(best effort).
> > > > >
> > > > > I opened a pr https://github.com/apache/flink/pull/19759 for the
> new
> > > > > option
> > > > > 'table.exec.async-lookup.output-mode' and also a discussion on
> > > > FLINK-27625:
> > > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the
> changes
> > > > were
> > > > > relatively minor, no new flip was created)
> > > > >
> > > > > If we can reach a consensus on the single unified hint, e.g.,
> LOOKUP,
> > > > then
> > > > > FLINK-27625 can be covered.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > > >
> > > > > > Hi Lincoln,
> > > > > >
> > > > > > Delayed Dim Join is a frequently requested feature, it's exciting
> > to
> > > > see
> > > > > > this feature is on the road.
> > > > > > The FLIP looks good to me in general. I only left some minor
> > > comments.
> > > > > >
> > > > > > 1) support retry for sync lookup
> > > > > > I'm also fine with the idea proposed by Jingsong. But this
> doesn't
> > > > > conflict
> > > > > > with the FLIP and can
> > > > > > be future work. It would be great if we can determine the APIs
> > first.
> > > > > >
> > > > > > 1) "allow-unordered" => "unordered"
> > > > > > I would prefer the "unordered" output mode rather than
> > > > "allow-unordered".
> > > > > > Because this fully aligns with the DataStream behaviors and
> avoids
> > > > > > confusion on the differences.
> > > > > > I understand the purpose that adding a "allow" prefix here, but I
> > > think
> > > > > the
> > > > > > semantic is fine to just
> > > > > > use "unordered" here. We didn't see any users confused about
> > > > > > OutputMode#UNORDERED.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <
> jingsonglee0@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks Lincoln for your proposal.
> > > > > > >
> > > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g.,
> 10s
> > > > > > > max-attempts: integer, e.g., 3`.
> > > > > > >
> > > > > > > Are these options only for async? It looks like normal lookups
> > work
> > > > > too?
> > > > > > >
> > > > > > > One thing is: most of the lookup functions seem to be
> synchronous
> > > > now?
> > > > > > > There are not so many asynchronous ones?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong
> > > > > > >
> > > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > > lincoln.86xy@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Considering the new common table option 'lookup.max-retries'
> > > > proposed
> > > > > > in
> > > > > > > > FLIP-221[1] which is commonly used for exception handling in
> > > > > connector
> > > > > > > > implementation, we should clearly distinguish
> > ASYNC_LOOKUP_RETRY
> > > > from
> > > > > > it
> > > > > > > to
> > > > > > > > avoid confusing users.
> > > > > > > >
> > > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict
> it
> > to
> > > > > > support
> > > > > > > > retries only for lookup misses and no longer include
> exceptions
> > > > (for
> > > > > > sql
> > > > > > > > connectors, let the connector implementer decide how to
> handle
> > > > > > exceptions
> > > > > > > > since there are various kinds of retryable exceptions and can
> > not
> > > > > retry
> > > > > > > > ones).
> > > > > > > >
> > > > > > > > The FLIP[2] has been updated.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Lincoln Lee
> > > > > > > >
> > > > > > > >
> > > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > > > > > >
> > > > > > > > > Dear Flink developers,
> > > > > > > > >
> > > > > > > > > I would like to open a discussion on FLIP 234 [1] to
> support
> > > > > > retryable
> > > > > > > > > lookup join to solve delayed updates issue, as a pre-work
> for
> > > > this
> > > > > > > > > solution, we proposed FLIP-232[2] which adds a generic
> retry
> > > > > support
> > > > > > > for
> > > > > > > > > Async I/O.
> > > > > > > > > We prefer to offer this retry capability via query hints,
> > > similar
> > > > > to
> > > > > > > new
> > > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > > > > >
> > > > > > > > > This feature is backwards compatible and transparently to
> > > > > connectors.
> > > > > > > For
> > > > > > > > > existing connectors which implements AsyncTableFunction,
> can
> > > > easily
> > > > > > > > enable
> > > > > > > > > async retry via the new join hint.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > > > > [3]
> > > > > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > > > > [4]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Lincoln Lee
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi Jingsong,

Thanks for your feedback!

Yes, the existing HBase connector use an option 'lookup.async' to control
its lookup source implementations that exposed to the planner, however it's
a private option for the HBase connector only, so it will not affect the
common API.

And as discussed in the mailing thread of FLIP-221[1], we got a consensus
that do not make it as a common option. It's better making decisions at the
query level when a connector has both capabilities.

So if everything goes well, we should discuss it whether to deprecate
the 'lookup.async'
or not for HBase connector after the hint been done.

This will be mentioned in the Compatibility part of this FLIP[2].

WDYT?

[1]: https://lists.apache.org/thread/v76g8v1o9sjdho9kbzlgjyv38l2oynox
[2]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems?src=contextnavpagetreemode

Best,
Lincoln Lee


Jingsong Li <ji...@gmail.com> 于2022年6月1日周三 14:11写道:

> Hi Lincoln,
>
> The unified lookup hint is what I want.
>
> And I like 'async'='true|false' option.
>
> But there is a compatibility issue, as I remember if async is currently
> controlled by connector, and this may also require some API changes?
>
> We need to have a clear story for the connector combined with this option:
> - only sync connector
> - only async connector
> - both async and sync connector
>
> Best,
> Jingsong
>
> On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com>
> wrote:
>
> > Thanks Jark for your quick response and the consensus!
> >
> > And I will update the FLIP after Jingsong or other developers confirm
> that
> > there is no problem.
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
> >
> > > Thanks for the update.
> > >
> > > The unified lookup hint looks good to me.
> > > And thanks for explaining ALLOW_UNORDERED.
> > >
> > > Best,
> > > Jark
> > >
> > > On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com>
> > wrote:
> > >
> > > > Hi Jark & Jingsong,
> > > >
> > > > Thanks for your feedback!
> > > >
> > > > 1.) support retry on sync lookup
> > > > I also agree with supporting it, this will be useful for connectors
> > that
> > > > don't have asynchronous lookup implementations and can also solve the
> > > ASYNC
> > > > non-target problem to some extent(because the retrying is blocking
> for
> > > sync
> > > > lookup, and may accumulate delay, but it maybe acceptable for the
> case
> > > that
> > > > most or all data want to do a delayed lookup).
> > > >
> > > > For the api perspective, we can do some unification. Let's think of
> the
> > > > whole user story for lookup join:
> > > >
> > > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
> > different
> > > > hint option values: 'async'='true|false'
> > > >
> > > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> > > >
> > > > we can use one single hint LOOKUP with different hint options
> > > > ('async'='true|false', 'miss-retry'='true|false') to cover all
> related
> > > > functionalities.
> > > > Compared to multiple hints with different subsets of functionality, a
> > > > single hint may be easier for users to understand and use, and
> specific
> > > > parameters can be quickly found through documentation
> > > >
> > > > the support matrix will be:
> > > > lookup support async retry
> > > > sync w/o retry N N
> > > > sync w/ retry N Y
> > > > async w/o retry Y N
> > > > async w/ retry Y Y
> > > >
> > > > and the available hint options for each mode:
> > > > mode support hint options
> > > > async async'='true'
> > > > 'output-mode'='ordere|allow-unordered'
> > > > 'capacity'='100'
> > > > 'timeout'='180s'
> > > > retry miss-retry'='true'
> > > > 'retry-strategy'='fixed-delay'
> > > > 'delay'='10s'
> > > > 'max-attempts'='3'
> > > >
> > > >
> > > > 2.) 'allow-unordered' vs 'unordered' for
> > > > 'table.exec.async-lookup.output-mode'
> > > > Yes, make it align with DataStream Api maybe more intuitive, but
> > there's
> > > > some difference in table layer that makes the 'allow-unordered'
> > > meaningful:
> > > > updates in the pipeline need to be processed in order,
> ALLOW_UNORDERED
> > > > means if users allow unordered result, it will attempt to use
> > > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > > > correctness of the result, otherwise ORDERED will be still used.
> > > >
> > > > Another choice is that when the user specifies unordered mode,
> planner
> > > > throws an error when it finds that it may affect correctness. But
> this
> > is
> > > > not user-friendly and is not consistent with the customary treatment
> of
> > > > invalid query hints(best effort).
> > > >
> > > > I opened a pr https://github.com/apache/flink/pull/19759 for the new
> > > > option
> > > > 'table.exec.async-lookup.output-mode' and also a discussion on
> > > FLINK-27625:
> > > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes
> > > were
> > > > relatively minor, no new flip was created)
> > > >
> > > > If we can reach a consensus on the single unified hint, e.g., LOOKUP,
> > > then
> > > > FLINK-27625 can be covered.
> > > >
> > > > WDYT?
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > > >
> > > > > Hi Lincoln,
> > > > >
> > > > > Delayed Dim Join is a frequently requested feature, it's exciting
> to
> > > see
> > > > > this feature is on the road.
> > > > > The FLIP looks good to me in general. I only left some minor
> > comments.
> > > > >
> > > > > 1) support retry for sync lookup
> > > > > I'm also fine with the idea proposed by Jingsong. But this doesn't
> > > > conflict
> > > > > with the FLIP and can
> > > > > be future work. It would be great if we can determine the APIs
> first.
> > > > >
> > > > > 1) "allow-unordered" => "unordered"
> > > > > I would prefer the "unordered" output mode rather than
> > > "allow-unordered".
> > > > > Because this fully aligns with the DataStream behaviors and avoids
> > > > > confusion on the differences.
> > > > > I understand the purpose that adding a "allow" prefix here, but I
> > think
> > > > the
> > > > > semantic is fine to just
> > > > > use "unordered" here. We didn't see any users confused about
> > > > > OutputMode#UNORDERED.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >
> > > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks Lincoln for your proposal.
> > > > > >
> > > > > > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> > > > > > max-attempts: integer, e.g., 3`.
> > > > > >
> > > > > > Are these options only for async? It looks like normal lookups
> work
> > > > too?
> > > > > >
> > > > > > One thing is: most of the lookup functions seem to be synchronous
> > > now?
> > > > > > There are not so many asynchronous ones?
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> > lincoln.86xy@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Considering the new common table option 'lookup.max-retries'
> > > proposed
> > > > > in
> > > > > > > FLIP-221[1] which is commonly used for exception handling in
> > > > connector
> > > > > > > implementation, we should clearly distinguish
> ASYNC_LOOKUP_RETRY
> > > from
> > > > > it
> > > > > > to
> > > > > > > avoid confusing users.
> > > > > > >
> > > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it
> to
> > > > > support
> > > > > > > retries only for lookup misses and no longer include exceptions
> > > (for
> > > > > sql
> > > > > > > connectors, let the connector implementer decide how to handle
> > > > > exceptions
> > > > > > > since there are various kinds of retryable exceptions and can
> not
> > > > retry
> > > > > > > ones).
> > > > > > >
> > > > > > > The FLIP[2] has been updated.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > > >
> > > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > > > > >
> > > > > > > > Dear Flink developers,
> > > > > > > >
> > > > > > > > I would like to open a discussion on FLIP 234 [1] to support
> > > > > retryable
> > > > > > > > lookup join to solve delayed updates issue, as a pre-work for
> > > this
> > > > > > > > solution, we proposed FLIP-232[2] which adds a generic retry
> > > > support
> > > > > > for
> > > > > > > > Async I/O.
> > > > > > > > We prefer to offer this retry capability via query hints,
> > similar
> > > > to
> > > > > > new
> > > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > > > >
> > > > > > > > This feature is backwards compatible and transparently to
> > > > connectors.
> > > > > > For
> > > > > > > > existing connectors which implements AsyncTableFunction, can
> > > easily
> > > > > > > enable
> > > > > > > > async retry via the new join hint.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > > [2]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > > > [3]
> > > > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > > > [4]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Lincoln Lee
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Jingsong Li <ji...@gmail.com>.
Hi Lincoln,

The unified lookup hint is what I want.

And I like 'async'='true|false' option.

But there is a compatibility issue, as I remember if async is currently
controlled by connector, and this may also require some API changes?

We need to have a clear story for the connector combined with this option:
- only sync connector
- only async connector
- both async and sync connector

Best,
Jingsong

On Mon, May 30, 2022 at 3:55 PM Lincoln Lee <li...@gmail.com> wrote:

> Thanks Jark for your quick response and the consensus!
>
> And I will update the FLIP after Jingsong or other developers confirm that
> there is no problem.
>
> Best,
> Lincoln Lee
>
>
> Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:
>
> > Thanks for the update.
> >
> > The unified lookup hint looks good to me.
> > And thanks for explaining ALLOW_UNORDERED.
> >
> > Best,
> > Jark
> >
> > On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com>
> wrote:
> >
> > > Hi Jark & Jingsong,
> > >
> > > Thanks for your feedback!
> > >
> > > 1.) support retry on sync lookup
> > > I also agree with supporting it, this will be useful for connectors
> that
> > > don't have asynchronous lookup implementations and can also solve the
> > ASYNC
> > > non-target problem to some extent(because the retrying is blocking for
> > sync
> > > lookup, and may accumulate delay, but it maybe acceptable for the case
> > that
> > > most or all data want to do a delayed lookup).
> > >
> > > For the api perspective, we can do some unification. Let's think of the
> > > whole user story for lookup join:
> > >
> > > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by
> different
> > > hint option values: 'async'='true|false'
> > >
> > > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> > >
> > > we can use one single hint LOOKUP with different hint options
> > > ('async'='true|false', 'miss-retry'='true|false') to cover all related
> > > functionalities.
> > > Compared to multiple hints with different subsets of functionality, a
> > > single hint may be easier for users to understand and use, and specific
> > > parameters can be quickly found through documentation
> > >
> > > the support matrix will be:
> > > lookup support async retry
> > > sync w/o retry N N
> > > sync w/ retry N Y
> > > async w/o retry Y N
> > > async w/ retry Y Y
> > >
> > > and the available hint options for each mode:
> > > mode support hint options
> > > async async'='true'
> > > 'output-mode'='ordere|allow-unordered'
> > > 'capacity'='100'
> > > 'timeout'='180s'
> > > retry miss-retry'='true'
> > > 'retry-strategy'='fixed-delay'
> > > 'delay'='10s'
> > > 'max-attempts'='3'
> > >
> > >
> > > 2.) 'allow-unordered' vs 'unordered' for
> > > 'table.exec.async-lookup.output-mode'
> > > Yes, make it align with DataStream Api maybe more intuitive, but
> there's
> > > some difference in table layer that makes the 'allow-unordered'
> > meaningful:
> > > updates in the pipeline need to be processed in order, ALLOW_UNORDERED
> > > means if users allow unordered result, it will attempt to use
> > > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > > correctness of the result, otherwise ORDERED will be still used.
> > >
> > > Another choice is that when the user specifies unordered mode, planner
> > > throws an error when it finds that it may affect correctness. But this
> is
> > > not user-friendly and is not consistent with the customary treatment of
> > > invalid query hints(best effort).
> > >
> > > I opened a pr https://github.com/apache/flink/pull/19759 for the new
> > > option
> > > 'table.exec.async-lookup.output-mode' and also a discussion on
> > FLINK-27625:
> > > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes
> > were
> > > relatively minor, no new flip was created)
> > >
> > > If we can reach a consensus on the single unified hint, e.g., LOOKUP,
> > then
> > > FLINK-27625 can be covered.
> > >
> > > WDYT?
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> > >
> > > > Hi Lincoln,
> > > >
> > > > Delayed Dim Join is a frequently requested feature, it's exciting to
> > see
> > > > this feature is on the road.
> > > > The FLIP looks good to me in general. I only left some minor
> comments.
> > > >
> > > > 1) support retry for sync lookup
> > > > I'm also fine with the idea proposed by Jingsong. But this doesn't
> > > conflict
> > > > with the FLIP and can
> > > > be future work. It would be great if we can determine the APIs first.
> > > >
> > > > 1) "allow-unordered" => "unordered"
> > > > I would prefer the "unordered" output mode rather than
> > "allow-unordered".
> > > > Because this fully aligns with the DataStream behaviors and avoids
> > > > confusion on the differences.
> > > > I understand the purpose that adding a "allow" prefix here, but I
> think
> > > the
> > > > semantic is fine to just
> > > > use "unordered" here. We didn't see any users confused about
> > > > OutputMode#UNORDERED.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks Lincoln for your proposal.
> > > > >
> > > > > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> > > > > max-attempts: integer, e.g., 3`.
> > > > >
> > > > > Are these options only for async? It looks like normal lookups work
> > > too?
> > > > >
> > > > > One thing is: most of the lookup functions seem to be synchronous
> > now?
> > > > > There are not so many asynchronous ones?
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <
> lincoln.86xy@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Considering the new common table option 'lookup.max-retries'
> > proposed
> > > > in
> > > > > > FLIP-221[1] which is commonly used for exception handling in
> > > connector
> > > > > > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY
> > from
> > > > it
> > > > > to
> > > > > > avoid confusing users.
> > > > > >
> > > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to
> > > > support
> > > > > > retries only for lookup misses and no longer include exceptions
> > (for
> > > > sql
> > > > > > connectors, let the connector implementer decide how to handle
> > > > exceptions
> > > > > > since there are various kinds of retryable exceptions and can not
> > > retry
> > > > > > ones).
> > > > > >
> > > > > > The FLIP[2] has been updated.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > > > >
> > > > > > > Dear Flink developers,
> > > > > > >
> > > > > > > I would like to open a discussion on FLIP 234 [1] to support
> > > > retryable
> > > > > > > lookup join to solve delayed updates issue, as a pre-work for
> > this
> > > > > > > solution, we proposed FLIP-232[2] which adds a generic retry
> > > support
> > > > > for
> > > > > > > Async I/O.
> > > > > > > We prefer to offer this retry capability via query hints,
> similar
> > > to
> > > > > new
> > > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > > >
> > > > > > > This feature is backwards compatible and transparently to
> > > connectors.
> > > > > For
> > > > > > > existing connectors which implements AsyncTableFunction, can
> > easily
> > > > > > enable
> > > > > > > async retry via the new join hint.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > > [2]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > > [3]
> > > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > > [4]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > > >
> > > > > > > Best,
> > > > > > > Lincoln Lee
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Thanks Jark for your quick response and the consensus!

And I will update the FLIP after Jingsong or other developers confirm that
there is no problem.

Best,
Lincoln Lee


Jark Wu <im...@gmail.com> 于2022年5月30日周一 15:49写道:

> Thanks for the update.
>
> The unified lookup hint looks good to me.
> And thanks for explaining ALLOW_UNORDERED.
>
> Best,
> Jark
>
> On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com> wrote:
>
> > Hi Jark & Jingsong,
> >
> > Thanks for your feedback!
> >
> > 1.) support retry on sync lookup
> > I also agree with supporting it, this will be useful for connectors that
> > don't have asynchronous lookup implementations and can also solve the
> ASYNC
> > non-target problem to some extent(because the retrying is blocking for
> sync
> > lookup, and may accumulate delay, but it maybe acceptable for the case
> that
> > most or all data want to do a delayed lookup).
> >
> > For the api perspective, we can do some unification. Let's think of the
> > whole user story for lookup join:
> >
> > ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by different
> > hint option values: 'async'='true|false'
> >
> > ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> > LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
> >
> > we can use one single hint LOOKUP with different hint options
> > ('async'='true|false', 'miss-retry'='true|false') to cover all related
> > functionalities.
> > Compared to multiple hints with different subsets of functionality, a
> > single hint may be easier for users to understand and use, and specific
> > parameters can be quickly found through documentation
> >
> > the support matrix will be:
> > lookup support async retry
> > sync w/o retry N N
> > sync w/ retry N Y
> > async w/o retry Y N
> > async w/ retry Y Y
> >
> > and the available hint options for each mode:
> > mode support hint options
> > async async'='true'
> > 'output-mode'='ordere|allow-unordered'
> > 'capacity'='100'
> > 'timeout'='180s'
> > retry miss-retry'='true'
> > 'retry-strategy'='fixed-delay'
> > 'delay'='10s'
> > 'max-attempts'='3'
> >
> >
> > 2.) 'allow-unordered' vs 'unordered' for
> > 'table.exec.async-lookup.output-mode'
> > Yes, make it align with DataStream Api maybe more intuitive, but there's
> > some difference in table layer that makes the 'allow-unordered'
> meaningful:
> > updates in the pipeline need to be processed in order, ALLOW_UNORDERED
> > means if users allow unordered result, it will attempt to use
> > AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> > correctness of the result, otherwise ORDERED will be still used.
> >
> > Another choice is that when the user specifies unordered mode, planner
> > throws an error when it finds that it may affect correctness. But this is
> > not user-friendly and is not consistent with the customary treatment of
> > invalid query hints(best effort).
> >
> > I opened a pr https://github.com/apache/flink/pull/19759 for the new
> > option
> > 'table.exec.async-lookup.output-mode' and also a discussion on
> FLINK-27625:
> > add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes
> were
> > relatively minor, no new flip was created)
> >
> > If we can reach a consensus on the single unified hint, e.g., LOOKUP,
> then
> > FLINK-27625 can be covered.
> >
> > WDYT?
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
> >
> > > Hi Lincoln,
> > >
> > > Delayed Dim Join is a frequently requested feature, it's exciting to
> see
> > > this feature is on the road.
> > > The FLIP looks good to me in general. I only left some minor comments.
> > >
> > > 1) support retry for sync lookup
> > > I'm also fine with the idea proposed by Jingsong. But this doesn't
> > conflict
> > > with the FLIP and can
> > > be future work. It would be great if we can determine the APIs first.
> > >
> > > 1) "allow-unordered" => "unordered"
> > > I would prefer the "unordered" output mode rather than
> "allow-unordered".
> > > Because this fully aligns with the DataStream behaviors and avoids
> > > confusion on the differences.
> > > I understand the purpose that adding a "allow" prefix here, but I think
> > the
> > > semantic is fine to just
> > > use "unordered" here. We didn't see any users confused about
> > > OutputMode#UNORDERED.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com>
> > wrote:
> > >
> > > > Thanks Lincoln for your proposal.
> > > >
> > > > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> > > > max-attempts: integer, e.g., 3`.
> > > >
> > > > Are these options only for async? It looks like normal lookups work
> > too?
> > > >
> > > > One thing is: most of the lookup functions seem to be synchronous
> now?
> > > > There are not so many asynchronous ones?
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <lincoln.86xy@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Considering the new common table option 'lookup.max-retries'
> proposed
> > > in
> > > > > FLIP-221[1] which is commonly used for exception handling in
> > connector
> > > > > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY
> from
> > > it
> > > > to
> > > > > avoid confusing users.
> > > > >
> > > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to
> > > support
> > > > > retries only for lookup misses and no longer include exceptions
> (for
> > > sql
> > > > > connectors, let the connector implementer decide how to handle
> > > exceptions
> > > > > since there are various kinds of retryable exceptions and can not
> > retry
> > > > > ones).
> > > > >
> > > > > The FLIP[2] has been updated.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > >
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > > >
> > > > > > Dear Flink developers,
> > > > > >
> > > > > > I would like to open a discussion on FLIP 234 [1] to support
> > > retryable
> > > > > > lookup join to solve delayed updates issue, as a pre-work for
> this
> > > > > > solution, we proposed FLIP-232[2] which adds a generic retry
> > support
> > > > for
> > > > > > Async I/O.
> > > > > > We prefer to offer this retry capability via query hints, similar
> > to
> > > > new
> > > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > > >
> > > > > > This feature is backwards compatible and transparently to
> > connectors.
> > > > For
> > > > > > existing connectors which implements AsyncTableFunction, can
> easily
> > > > > enable
> > > > > > async retry via the new join hint.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > > [2]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > > [3]
> > https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > > [4]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Jark Wu <im...@gmail.com>.
Thanks for the update.

The unified lookup hint looks good to me.
And thanks for explaining ALLOW_UNORDERED.

Best,
Jark

On Mon, 30 May 2022 at 15:31, Lincoln Lee <li...@gmail.com> wrote:

> Hi Jark & Jingsong,
>
> Thanks for your feedback!
>
> 1.) support retry on sync lookup
> I also agree with supporting it, this will be useful for connectors that
> don't have asynchronous lookup implementations and can also solve the ASYNC
> non-target problem to some extent(because the retrying is blocking for sync
> lookup, and may accumulate delay, but it maybe acceptable for the case that
> most or all data want to do a delayed lookup).
>
> For the api perspective, we can do some unification. Let's think of the
> whole user story for lookup join:
>
> ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by different
> hint option values: 'async'='true|false'
>
> ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
> LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'
>
> we can use one single hint LOOKUP with different hint options
> ('async'='true|false', 'miss-retry'='true|false') to cover all related
> functionalities.
> Compared to multiple hints with different subsets of functionality, a
> single hint may be easier for users to understand and use, and specific
> parameters can be quickly found through documentation
>
> the support matrix will be:
> lookup support async retry
> sync w/o retry N N
> sync w/ retry N Y
> async w/o retry Y N
> async w/ retry Y Y
>
> and the available hint options for each mode:
> mode support hint options
> async async'='true'
> 'output-mode'='ordere|allow-unordered'
> 'capacity'='100'
> 'timeout'='180s'
> retry miss-retry'='true'
> 'retry-strategy'='fixed-delay'
> 'delay'='10s'
> 'max-attempts'='3'
>
>
> 2.) 'allow-unordered' vs 'unordered' for
> 'table.exec.async-lookup.output-mode'
> Yes, make it align with DataStream Api maybe more intuitive, but there's
> some difference in table layer that makes the 'allow-unordered' meaningful:
> updates in the pipeline need to be processed in order, ALLOW_UNORDERED
> means if users allow unordered result, it will attempt to use
> AsyncDataStream.OutputMode.UNORDERED when it does not affect the
> correctness of the result, otherwise ORDERED will be still used.
>
> Another choice is that when the user specifies unordered mode, planner
> throws an error when it finds that it may affect correctness. But this is
> not user-friendly and is not consistent with the customary treatment of
> invalid query hints(best effort).
>
> I opened a pr https://github.com/apache/flink/pull/19759 for the new
> option
> 'table.exec.async-lookup.output-mode' and also a discussion on FLINK-27625:
> add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes were
> relatively minor, no new flip was created)
>
> If we can reach a consensus on the single unified hint, e.g., LOOKUP,  then
> FLINK-27625 can be covered.
>
> WDYT?
>
> Best,
> Lincoln Lee
>
>
> Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:
>
> > Hi Lincoln,
> >
> > Delayed Dim Join is a frequently requested feature, it's exciting to see
> > this feature is on the road.
> > The FLIP looks good to me in general. I only left some minor comments.
> >
> > 1) support retry for sync lookup
> > I'm also fine with the idea proposed by Jingsong. But this doesn't
> conflict
> > with the FLIP and can
> > be future work. It would be great if we can determine the APIs first.
> >
> > 1) "allow-unordered" => "unordered"
> > I would prefer the "unordered" output mode rather than "allow-unordered".
> > Because this fully aligns with the DataStream behaviors and avoids
> > confusion on the differences.
> > I understand the purpose that adding a "allow" prefix here, but I think
> the
> > semantic is fine to just
> > use "unordered" here. We didn't see any users confused about
> > OutputMode#UNORDERED.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com>
> wrote:
> >
> > > Thanks Lincoln for your proposal.
> > >
> > > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> > > max-attempts: integer, e.g., 3`.
> > >
> > > Are these options only for async? It looks like normal lookups work
> too?
> > >
> > > One thing is: most of the lookup functions seem to be synchronous now?
> > > There are not so many asynchronous ones?
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <li...@gmail.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Considering the new common table option 'lookup.max-retries' proposed
> > in
> > > > FLIP-221[1] which is commonly used for exception handling in
> connector
> > > > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from
> > it
> > > to
> > > > avoid confusing users.
> > > >
> > > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to
> > support
> > > > retries only for lookup misses and no longer include exceptions (for
> > sql
> > > > connectors, let the connector implementer decide how to handle
> > exceptions
> > > > since there are various kinds of retryable exceptions and can not
> retry
> > > > ones).
> > > >
> > > > The FLIP[2] has been updated.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > >
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > > >
> > > > > Dear Flink developers,
> > > > >
> > > > > I would like to open a discussion on FLIP 234 [1] to support
> > retryable
> > > > > lookup join to solve delayed updates issue, as a pre-work for this
> > > > > solution, we proposed FLIP-232[2] which adds a generic retry
> support
> > > for
> > > > > Async I/O.
> > > > > We prefer to offer this retry capability via query hints, similar
> to
> > > new
> > > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > > >
> > > > > This feature is backwards compatible and transparently to
> connectors.
> > > For
> > > > > existing connectors which implements AsyncTableFunction, can easily
> > > > enable
> > > > > async retry via the new join hint.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > > [3]
> https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > > [4]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi Jark & Jingsong,

Thanks for your feedback!

1.) support retry on sync lookup
I also agree with supporting it, this will be useful for connectors that
don't have asynchronous lookup implementations and can also solve the ASYNC
non-target problem to some extent(because the retrying is blocking for sync
lookup, and may accumulate delay, but it maybe acceptable for the case that
most or all data want to do a delayed lookup).

For the api perspective, we can do some unification. Let's think of the
whole user story for lookup join:

ASYNC_LOOKUP vs SYNC_LOOKUP  can share a common one: LOOKUP by different
hint option values: 'async'='true|false'

ASYNC_LOOKUP_MISS_RETRY vs SYNC_LOOKUP_MISS_RETRY can share the
LOOKUP_MISS_RETRY with hint option: 'miss-retry'='true|false'

we can use one single hint LOOKUP with different hint options
('async'='true|false', 'miss-retry'='true|false') to cover all related
functionalities.
Compared to multiple hints with different subsets of functionality, a
single hint may be easier for users to understand and use, and specific
parameters can be quickly found through documentation

the support matrix will be:
lookup support async retry
sync w/o retry N N
sync w/ retry N Y
async w/o retry Y N
async w/ retry Y Y

and the available hint options for each mode:
mode support hint options
async async'='true'
'output-mode'='ordere|allow-unordered'
'capacity'='100'
'timeout'='180s'
retry miss-retry'='true'
'retry-strategy'='fixed-delay'
'delay'='10s'
'max-attempts'='3'


2.) 'allow-unordered' vs 'unordered' for
'table.exec.async-lookup.output-mode'
Yes, make it align with DataStream Api maybe more intuitive, but there's
some difference in table layer that makes the 'allow-unordered' meaningful:
updates in the pipeline need to be processed in order, ALLOW_UNORDERED
means if users allow unordered result, it will attempt to use
AsyncDataStream.OutputMode.UNORDERED when it does not affect the
correctness of the result, otherwise ORDERED will be still used.

Another choice is that when the user specifies unordered mode, planner
throws an error when it finds that it may affect correctness. But this is
not user-friendly and is not consistent with the customary treatment of
invalid query hints(best effort).

I opened a pr https://github.com/apache/flink/pull/19759 for the new option
'table.exec.async-lookup.output-mode' and also a discussion on FLINK-27625:
add query hint 'ASYNC_LOOKUP' for async lookup join(Since the changes were
relatively minor, no new flip was created)

If we can reach a consensus on the single unified hint, e.g., LOOKUP,  then
FLINK-27625 can be covered.

WDYT?

Best,
Lincoln Lee


Jark Wu <im...@gmail.com> 于2022年5月27日周五 21:04写道:

> Hi Lincoln,
>
> Delayed Dim Join is a frequently requested feature, it's exciting to see
> this feature is on the road.
> The FLIP looks good to me in general. I only left some minor comments.
>
> 1) support retry for sync lookup
> I'm also fine with the idea proposed by Jingsong. But this doesn't conflict
> with the FLIP and can
> be future work. It would be great if we can determine the APIs first.
>
> 1) "allow-unordered" => "unordered"
> I would prefer the "unordered" output mode rather than "allow-unordered".
> Because this fully aligns with the DataStream behaviors and avoids
> confusion on the differences.
> I understand the purpose that adding a "allow" prefix here, but I think the
> semantic is fine to just
> use "unordered" here. We didn't see any users confused about
> OutputMode#UNORDERED.
>
> Best,
> Jark
>
>
>
> On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com> wrote:
>
> > Thanks Lincoln for your proposal.
> >
> > Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> > max-attempts: integer, e.g., 3`.
> >
> > Are these options only for async? It looks like normal lookups work too?
> >
> > One thing is: most of the lookup functions seem to be synchronous now?
> > There are not so many asynchronous ones?
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <li...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > Considering the new common table option 'lookup.max-retries' proposed
> in
> > > FLIP-221[1] which is commonly used for exception handling in connector
> > > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from
> it
> > to
> > > avoid confusing users.
> > >
> > > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to
> support
> > > retries only for lookup misses and no longer include exceptions (for
> sql
> > > connectors, let the connector implementer decide how to handle
> exceptions
> > > since there are various kinds of retryable exceptions and can not retry
> > > ones).
> > >
> > > The FLIP[2] has been updated.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > >
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> > >
> > > > Dear Flink developers,
> > > >
> > > > I would like to open a discussion on FLIP 234 [1] to support
> retryable
> > > > lookup join to solve delayed updates issue, as a pre-work for this
> > > > solution, we proposed FLIP-232[2] which adds a generic retry support
> > for
> > > > Async I/O.
> > > > We prefer to offer this retry capability via query hints, similar to
> > new
> > > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > > >
> > > > This feature is backwards compatible and transparently to connectors.
> > For
> > > > existing connectors which implements AsyncTableFunction, can easily
> > > enable
> > > > async retry via the new join hint.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > > [3] https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > > [4]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Jark Wu <im...@gmail.com>.
Hi Lincoln,

Delayed Dim Join is a frequently requested feature, it's exciting to see
this feature is on the road.
The FLIP looks good to me in general. I only left some minor comments.

1) support retry for sync lookup
I'm also fine with the idea proposed by Jingsong. But this doesn't conflict
with the FLIP and can
be future work. It would be great if we can determine the APIs first.

1) "allow-unordered" => "unordered"
I would prefer the "unordered" output mode rather than "allow-unordered".
Because this fully aligns with the DataStream behaviors and avoids
confusion on the differences.
I understand the purpose that adding a "allow" prefix here, but I think the
semantic is fine to just
use "unordered" here. We didn't see any users confused about
OutputMode#UNORDERED.

Best,
Jark



On Fri, 27 May 2022 at 12:58, Jingsong Li <ji...@gmail.com> wrote:

> Thanks Lincoln for your proposal.
>
> Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
> max-attempts: integer, e.g., 3`.
>
> Are these options only for async? It looks like normal lookups work too?
>
> One thing is: most of the lookup functions seem to be synchronous now?
> There are not so many asynchronous ones?
>
> Best,
> Jingsong
>
> On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <li...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Considering the new common table option 'lookup.max-retries' proposed in
> > FLIP-221[1] which is commonly used for exception handling in connector
> > implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from it
> to
> > avoid confusing users.
> >
> > To do so, the name ASYNC_LOOKUP_RETRY can change to
> > ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to support
> > retries only for lookup misses and no longer include exceptions (for sql
> > connectors, let the connector implementer decide how to handle exceptions
> > since there are various kinds of retryable exceptions and can not retry
> > ones).
> >
> > The FLIP[2] has been updated.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
> >
> > > Dear Flink developers,
> > >
> > > I would like to open a discussion on FLIP 234 [1] to support retryable
> > > lookup join to solve delayed updates issue, as a pre-work for this
> > > solution, we proposed FLIP-232[2] which adds a generic retry support
> for
> > > Async I/O.
> > > We prefer to offer this retry capability via query hints, similar to
> new
> > > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> > >
> > > This feature is backwards compatible and transparently to connectors.
> For
> > > existing connectors which implements AsyncTableFunction, can easily
> > enable
> > > async retry via the new join hint.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > > [3] https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > > [4]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> > >
> > > Best,
> > > Lincoln Lee
> > >
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Jingsong Li <ji...@gmail.com>.
Thanks Lincoln for your proposal.

Take a look at `strategy: fixed-delay delay: duration, e.g., 10s
max-attempts: integer, e.g., 3`.

Are these options only for async? It looks like normal lookups work too?

One thing is: most of the lookup functions seem to be synchronous now?
There are not so many asynchronous ones?

Best,
Jingsong

On Tue, May 24, 2022 at 11:48 AM Lincoln Lee <li...@gmail.com> wrote:

> Hi all,
>
> Considering the new common table option 'lookup.max-retries' proposed in
> FLIP-221[1] which is commonly used for exception handling in connector
> implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from it to
> avoid confusing users.
>
> To do so, the name ASYNC_LOOKUP_RETRY can change to
> ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to support
> retries only for lookup misses and no longer include exceptions (for sql
> connectors, let the connector implementer decide how to handle exceptions
> since there are various kinds of retryable exceptions and can not retry
> ones).
>
> The FLIP[2] has been updated.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
>
>
> Best,
> Lincoln Lee
>
>
> Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:
>
> > Dear Flink developers,
> >
> > I would like to open a discussion on FLIP 234 [1] to support retryable
> > lookup join to solve delayed updates issue, as a pre-work for this
> > solution, we proposed FLIP-232[2] which adds a generic retry support for
> > Async I/O.
> > We prefer to offer this retry capability via query hints, similar to new
> > join hints proposed in FLINK-27625[3] & FLIP-204[4].
> >
> > This feature is backwards compatible and transparently to connectors. For
> > existing connectors which implements AsyncTableFunction, can easily
> enable
> > async retry via the new join hint.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> > [2]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > [3] https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> > [4]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
> >
> > Best,
> > Lincoln Lee
> >
>

Re: [DISCUSS] FLIP-234: Support Retryable Lookup Join To Solve Delayed Updates Issue In External Systems

Posted by Lincoln Lee <li...@gmail.com>.
Hi all,

Considering the new common table option 'lookup.max-retries' proposed in
FLIP-221[1] which is commonly used for exception handling in connector
implementation, we should clearly distinguish ASYNC_LOOKUP_RETRY from it to
avoid confusing users.

To do so, the name ASYNC_LOOKUP_RETRY can change to
ASYNC_LOOKUP_MISS_RETRY,  and as the name implies, restrict it to support
retries only for lookup misses and no longer include exceptions (for sql
connectors, let the connector implementer decide how to handle exceptions
since there are various kinds of retryable exceptions and can not retry
ones).

The FLIP[2] has been updated.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-221%3A+Abstraction+for+lookup+source+cache+and+metric
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems


Best,
Lincoln Lee


Lincoln Lee <li...@gmail.com> 于2022年5月19日周四 18:24写道:

> Dear Flink developers,
>
> I would like to open a discussion on FLIP 234 [1] to support retryable
> lookup join to solve delayed updates issue, as a pre-work for this
> solution, we proposed FLIP-232[2] which adds a generic retry support for
> Async I/O.
> We prefer to offer this retry capability via query hints, similar to new
> join hints proposed in FLINK-27625[3] & FLIP-204[4].
>
> This feature is backwards compatible and transparently to connectors. For
> existing connectors which implements AsyncTableFunction, can easily enable
> async retry via the new join hint.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> [3] https://lists.apache.org/thread/jm9kg33wk9z2bvo2b0g5bp3n5kfj6qv8
> [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204:+Introduce+Hash+Lookup+Join
>
> Best,
> Lincoln Lee
>