You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Dawid Wysakowicz <dw...@apache.org> on 2020/04/01 13:12:10 UTC

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Hi,

Few comments from my side:

1. Regarding the motivation:

I think the example with changing the update-mode is not a good one. In
the long term this should be done with EMIT CHANGELOG (discussed in
FLIP-105).

Nitpicking: I would mention that it is rather for debugging/ad-hoc
solution. I think this should not be a recommended way for production
use cases as it bypasses the Catalog, which should be the source of truth.

2. I could not understand how the additional options will be passed to
the TableSourceFactory. Could you elaborate a bit more on that? I see
there is a Context interface that gives the options. But cannot find a
way to get the context itself in the factory. Moreover I think it would
make more sense to have rather a push based approach here. Something
like applyOptions(ReadableConfig) method.

3. As for the concerns Jingsong raised in the voting thread. I think it
is not a big problem, but I agree this should be also described. I
disagree with "Connector don't know format information in TableFactory
before obtains real properties, so it can not list any format
`supportedHintOptions`".

When a factory is instantiated it has access to the CatalogTable,
therefore it has access to all the original properties. In turn it knows
the original format and can call FormatFactory#supportedHintOptions().

The only case when this would not work would be if we allow changing the
format of the Table (e.g. from avro to parquet), which does not sound
like a good idea to me. I think this feature should not end up as a way
to declare a whole table inline in a SQL query, but should rather be a
simple way for debugging queries. We should not end up with an extreme
example where we do:

|SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
'format.type' = 'json', ....) */|

4. SQL Hints syntax.

I think the k and v in the hint_item should be QUOTED_STRING (not sure
if it is equivalent to string_literal). I think we should not use
simple_identifier because this implies that we cannot use e.g. any SQL
keywords. Anyway it has nothing to do with identifiers. If I am not
mistaken it is also how the options in the CREATE statement are implemented.

What is the purpose of the remaining hint_item: hint_name(hint_opt [
,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a feeling
it does also suggests to support the whole Apache Calcite hint system
without specifying that explicitly. Is the intention of the FLIP to
support choosing e.g. JOIN strategies through hints already? If it is so
it should be mentioned in the FLIP, imo.

5. I think something does not work around the supportedHintOptions and
wildcards. How do you want to represent a wildcard key as a
ConfigOption? I am not sure about that, just a though, maybe it make
sense to have rather Set<String> supportedHintOptionKeys()?

Best,

Dawid

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Kurt Young <yk...@gmail.com>.
+1, LGTM

Best,
Kurt


On Wed, Apr 8, 2020 at 10:27 AM Jark Wu <im...@gmail.com> wrote:

> Thanks for the summary Danny. +1 to the new proposal.
>
> I have a minor concern about the global configuration
> `table.optimizer.dynamic-table-options.enabled`, does it belong to
> optimizer?
> From my point of view, it is just an API to set table options and uses
> Calcite in the implementation.
> I'm also thinking about what's the name of other configurations, e.g
> time-zone, code-gen length, state ttl.
> Should they prefix with "optimizer" or "exec" or something else or nothing?
>
> Best,
> Jark
>
> On Tue, 7 Apr 2020 at 23:17, Timo Walther <tw...@apache.org> wrote:
>
> > Thanks for the update Danny. +1 from my side.
> >
> > Regards,
> > Timo
> >
> >
> > On 07.04.20 13:25, Danny Chan wrote:
> > > Hi, every ~
> > >
> > > It seems that we all agree to drop the idea for white/black list for
> each
> > > connector, and have a global config option to default disable this
> > feature.
> > >
> > > I have also discussed with Timo and Jark about the interface
> > > TableSourceTable.Context.getExecutionOptions and finally we decide to
> > > introduce a new interface CatalogTable#copy(Map<String, String>) to
> > support
> > > re-generate the table with new table options.
> > >
> > > So let me summarize the current design broadly again:
> > >
> > >     - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe
> > the
> > >     dynamic table options
> > >     - There is no constraint on which option key can be used in the
> > OPTIONS,
> > >     that means, any option key is allowed, the factory would to the
> > validation
> > >     work finally
> > >     - Introduce method CatalogTable#copy, we use this method to
> > regenerate a
> > >     new CatalogTable to find a table factory and creates table
> > source/sink
> > >     - There is a global config option to default disable this feature
> (if
> > >     user uses OPTIONS, an exception throws to tell open the option)
> > >
> > > I have updated the WIKI
> > > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
> > >,
> > > look forward to your suggestions ~
> > >
> > > Jark Wu <im...@gmail.com> 于2020年4月7日周二 上午11:24写道:
> > >
> > >> I'm fine to disable this feature by default and avoid
> > >> whitelisting/blacklisting. This simplifies a lot of things.
> > >>
> > >> Regarding to TableSourceFactory#Context#getExecutionOptions, do we
> > really
> > >> need this interface?
> > >> Should the connector factory be aware of the properties is merged with
> > >> hints or not?
> > >> What's the problem if we always get properties from
> > >> `CatalogTable#getProperties`?
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Tue, 7 Apr 2020 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> > >>
> > >>> Sounds like a reasonable compromise, disabling this feature by
> default
> > >> is a
> > >>> way to protect
> > >>> the vulnerability, and we can simplify the design quite a lot. We can
> > >>> gather some users'
> > >>> feedback to see whether further protections are necessary in the
> > future.
> > >>>
> > >>> Best,
> > >>> Kurt
> > >>>
> > >>>
> > >>> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org>
> > wrote:
> > >>>
> > >>>> I agree with Aljoscha. The length of this thread shows that this is
> > >>>> highly controversal. I think nobody really likes this feature 100%
> but
> > >>>> we could not find a better solution. I would consider it as a
> > >>>> nice-to-have improvement during a notebook/debugging session.
> > >>>>
> > >>>> I would accept avoiding whitelisting/blacklisting if the feature is
> > >>>> disabled by default. And we make the merged properties available in
> a
> > >>>> separate TableSourceFactory#Context#getExecutionOptions as Danny
> > >>> proposed.
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>> Thanks,
> > >>>> Timo
> > >>>>
> > >>>>
> > >>>> On 06.04.20 09:59, Aljoscha Krettek wrote:
> > >>>>> The reason I'm saying it should be disabled by default is that this
> > >>> uses
> > >>>>> hint syntax, and hints should really not change query semantics.
> > >>>>>
> > >>>>> I'm quite strongly against hints that change query semantics, but
> if
> > >> we
> > >>>>> disable this by default I would be (reluctantly) OK with the
> feature.
> > >>>>> Companies that create deployments or set up the SQL environment for
> > >>>>> users can enable the feature if they want.
> > >>>>>
> > >>>>> But yes, I also agree that we don't need whitelisting/blacklisting,
> > >>>>> which makes this a lot easier to do.
> > >>>>>
> > >>>>> Best,
> > >>>>> Aljoscha
> > >>>>>
> > >>>>> On 06.04.20 04:27, Danny Chan wrote:
> > >>>>>> Hi, everyone ~
> > >>>>>>
> > >>>>>> @Aljoscha @Timo
> > >>>>>>
> > >>>>>>> I think we're designing ourselves into ever more complicated
> > >> corners
> > >>>>>> here
> > >>>>>>
> > >>>>>> I kindly agree that, personally didn't see strong reasons why we
> > >>>>>> should limit on each connector properties:
> > >>>>>>
> > >>>>>> • we can define any table options for CREATE TABLE, why we treat
> the
> > >>>>>> dynamic options differently, we never consider any security
> problems
> > >>>>>> when create table, we should not either for dynamic table options
> > >>>>>> • If we do not have whitelist properties or blacklist properties,
> > >> the
> > >>>>>> table source creation work would be much easier, just used the
> > >> merged
> > >>>>>> options. There is no need to modify each connector to decide which
> > >>>>>> options could be overridden and how we merge them(the merge work
> is
> > >>>>>> redundant).
> > >>>>>> • @Timo, how about we support another interface
> > >>>>>> `TableSourceFactory#Context.getExecutionOptions`, we always use
> this
> > >>>>>> interface to get the options to create our table source. There is
> no
> > >>>>>> need to copy the catalog table itselt, we just need to generate
> our
> > >>>>>> Context correctly.
> > >>>>>> • @Aljoscha I agree to have a global config option, but I disagree
> > >> to
> > >>>>>> default disable it, a global default config would break the user
> > >>>>>> experience too much, especially when user want to modify the
> options
> > >>>>>> in a ad-hoc way.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > >>>>>> `TableSourceFactory#forbiddenHintOptions` based on the fact that
> we
> > >>>>>> does not have black/white list for CREATE TABLE at all at lease
> for
> > >>>>>> current codebase.
> > >>>>>>
> > >>>>>>
> > >>>>>> @Timo (i have replied offline but allows to represent it here
> again)
> > >>>>>>
> > >>>>>> The `TableSourceFactory#supportedHintOptions` doesn't work well
> for
> > >> 3
> > >>>>>> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > >>>>>> 1. For key with wildcard, like connector.property.* , use a
> > >> blacklist
> > >>>>>> make us have the ability to disable some of the keys under that,
> > >> i.e.
> > >>>>>> connector.property.key1 , a whitelist can only match with prefix
> > >>>>>>
> > >>>>>> 2. We want the connectors to have the ability to disable format
> type
> > >>>>>> switch format.type but allows all the other properties, e.g.
> > >> format.*
> > >>>>>> without format.type(let's call it SET_B), if we use the whitelist,
> > >> we
> > >>>>>> have to enumerate all the specific format keys start with format
> > >>>>>> (SET_B), but with the old connector factories, we have no idea
> what
> > >>>>>> specific format keys it supports(there is either a format.* or
> > >>> nothing).
> > >>>>>>
> > >>>>>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > >>>>>> blacklist and whitelist has the same expressiveness, use blacklist
> > >>>>>> makes the code not too verbose to enumerate all the duplicate keys
> > >>>>>> with #supportedKeys .(Not very strong reason, but i think as a
> > >>>>>> connector developer, it makes sense)
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Danny Chan
> > >>>>>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> @Aljoscha: I disagree with your approach because a `Catalog` can
> > >>> return
> > >>>>>>> a custom factory that is not using any properties. The hinting
> must
> > >>> be
> > >>>>>>> transparent to a factory. We should NOT modify the metadata
> > >>>>>>> `CatalogTable` at any point in time after the catalog.
> > >>>>>>>
> > >>>>>>> @Danny, @Jingsong: How about we stick to the original design that
> > >> we
> > >>>>>>> wanted to vote on but use:
> > >>>>>>>
> > >>>>>>> Set<String> supportedHintProperties()
> > >>>>>>>
> > >>>>>>> This fits better to the old factory design. And for the new
> FLIP-95
> > >>>>>>> factories we will use `ConfigOption` and provide good utilities
> for
> > >>>>>>> merging with hints etc.
> > >>>>>>>
> > >>>>>>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> > >>>>>>> hinting in formats.
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Timo
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > >>>>>>>> I think we're designing ourselves into ever more complicated
> > >> corners
> > >>>>>>>> here. Maybe we need to take a step back and reconsider. What
> would
> > >>> you
> > >>>>>>>> think about this (somewhat) simpler proposal:
> > >>>>>>>>
> > >>>>>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > >>>>>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have
> for
> > >>>> this
> > >>>>>>>> in the future. This will simply overwrite all connector
> > >> properties,
> > >>>> the
> > >>>>>>>> table factories don't know about hints but simply work with the
> > >>>>>>>> properties that they are given
> > >>>>>>>>
> > >>>>>>>> - this special hint is disabled by default and can be activated
> > >>> with a
> > >>>>>>>> global option "foo.bazzle.connector-hints" (or something like
> > >> this)
> > >>>>>>>> which has a warning that describes that this can change query
> > >>>> semantics
> > >>>>>>>> etc.
> > >>>>>>>>
> > >>>>>>>> That's it. This makes connector implementations a lot easier
> while
> > >>>>>>>> still
> > >>>>>>>> allowing inline configuration.
> > >>>>>>>>
> > >>>>>>>> I still don't like using hint syntax at all for this, because I
> > >>>>>>>> strongly
> > >>>>>>>> maintain that hints should not change query syntax. In general
> > >> using
> > >>>>>>>> hints should be kept to a minimum because they usually point to
> > >>>>>>>> shortcomings in the system.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Aljoscha
> > >>>>>>>>
> > >>>>>>>> On 02.04.20 06:06, Jingsong Li wrote:
> > >>>>>>>>> Hi Dawid,
> > >>>>>>>>>
> > >>>>>>>>>> When a factory is instantiated it has access to the
> > >> CatalogTable,
> > >>>>>>>>> therefore it has access to all the original properties. In turn
> > >> it
> > >>>>>>>>> knows
> > >>>>>>>>> the original format and can call
> > >>>> FormatFactory#supportedHintOptions().
> > >>>>>>>>>
> > >>>>>>>>> Factory can only get CatalogTable when creating source or sink,
> > >>>>>>>>> right? IIUC, TableFactory may be stateless too.
> > >>>>>>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> > >>>>>>>>> get CatalogTable, so it is impossible to create FormatFactory?
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Jingsong Lee
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <
> yuzhao.cyz@gmail.com
> > >>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> > >>>>>>>>>>
> > >>>>>>>>>> 1. Regarding the motivation:
> > >>>>>>>>>>
> > >>>>>>>>>> I agree it's not a good suggested way based on the fact that
> we
> > >>> have
> > >>>>>>>>>> better solution, but i think we can support override that as
> > >> long
> > >>>>>>>>>> as it
> > >>>>>>>>>> exists as one of the the table options. I would remove if from
> > >> the
> > >>>>>>>>>> motication part.
> > >>>>>>>>>>
> > >>>>>>>>>> 2. The options passes around during sql-to-rel conversion,
> right
> > >>>>>>>>>> after we
> > >>>>>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
> > >> this
> > >>>> is
> > >>>>>>>>>> indeed a push way method at least in the RelOptTable layer,
> then
> > >>>>>>>>>> we hand
> > >>>>>>>>>> over the options to TableSourceFactory with our own context,
> > >> which
> > >>>> is
> > >>>>>>>>>> fine
> > >>>>>>>>>> becuase TableSourceFactory#Context is the contact to pass
> around
> > >>>>>>>>>> these
> > >>>>>>>>>> table-about variables.
> > >>>>>>>>>>
> > >>>>>>>>>> 3. "We should not end up with an extreme example where we can
> > >>>>>>>>>> change the
> > >>>>>>>>>> connector type", i totally agree that, and i have listed the
> > >>>>>>>>>> "connector.type" as forbidden attribute in the WIKI. As for
> the
> > >>>>>>>>>> format, i
> > >>>>>>>>>> think the connector itself can/should control whether to
> > >> override
> > >>>> the
> > >>>>>>>>>> "format.type", that is one of the reason i change the
> > >>>>>>>>>> TableSourceFactory#supportedHintOpitons to
> > >>>>>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we
> can
> > >>>>>>>>>> limit the
> > >>>>>>>>>> format keys we want conveniently.
> > >>>>>>>>>>
> > >>>>>>>>>> 4. SQL Hints syntax.
> > >>>>>>>>>>
> > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> > >> (not
> > >>>>>>>>>>> sure
> > >>>>>>>>>> if it is equivalent to string_literal).
> > >>>>>>>>>>
> > >>>>>>>>>> I disagree, we at least should keep sync with our DDL: use the
> > >>>> string
> > >>>>>>>>>> literal as the key. We did also support the simple identifier
> > >>>> because
> > >>>>>>>>>> this
> > >>>>>>>>>> is the common hint syntax from Calcite, it does not hurt
> > >> anything
> > >>>> for
> > >>>>>>>>>> the
> > >>>>>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If
> you
> > >>>> think
> > >>>>>>>>>> that
> > >>>>>>>>>> may cause some confuse, i can make the syntax pluggable for
> each
> > >>>>>>>>>> hint in
> > >>>>>>>>>> CALCITE 1.23)
> > >>>>>>>>>>
> > >>>>>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
> > >> the
> > >>>>>>>>>> title to
> > >>>>>>>>>> "Supports dynamic table options", would make it more clear in
> > >> the
> > >>>>>>>>>> WIKI.
> > >>>>>>>>>>
> > >>>>>>>>>> 5. Yes, we also have this concerns from our offline
> discussion,
> > >>>>>>>>>> that is
> > >>>>>>>>>> one of the reason, why i change the
> > >>>>>>>>>> TableSourceFactory#supportedHintOpitons
> > >>>>>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> > >>>>>>>>>> Set<String>
> > >>>>>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> > >>>>>>>>>>
> > >>>>>>>>>>      - The wildcard is still not descriptive, we can still not
> > >>>>>>>>>> forbidden one
> > >>>>>>>>>> of the properties among the wildcard properties, we can not
> > >> enable
> > >>>> or
> > >>>>>>>>>> disable them totally
> > >>>>>>>>>>      - ConfigOption is our new structure for keys, and it does
> > >> not
> > >>>>>>>>>> support
> > >>>>>>>>>> wildcard yet.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Danny Chan
> > >>>>>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > >>>>>>>>>> <dw...@apache.org>,写道:
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>> Few comments from my side:
> > >>>>>>>>>>> 1. Regarding the motivation:
> > >>>>>>>>>>> I think the example with changing the update-mode is not a
> good
> > >>>>>>>>>>> one. In
> > >>>>>>>>>> the long term this should be done with EMIT CHANGELOG
> (discussed
> > >>> in
> > >>>>>>>>>> FLIP-105).
> > >>>>>>>>>>> Nitpicking: I would mention that it is rather for
> > >>> debugging/ad-hoc
> > >>>>>>>>>> solution. I think this should not be a recommended way for
> > >>>> production
> > >>>>>>>>>> use
> > >>>>>>>>>> cases as it bypasses the Catalog, which should be the source
> of
> > >>>>>>>>>> truth.
> > >>>>>>>>>>> 2. I could not understand how the additional options will be
> > >>>>>>>>>>> passed to
> > >>>>>>>>>> the TableSourceFactory. Could you elaborate a bit more on
> that?
> > >> I
> > >>>> see
> > >>>>>>>>>> there
> > >>>>>>>>>> is a Context interface that gives the options. But cannot
> find a
> > >>> way
> > >>>>>>>>>> to get
> > >>>>>>>>>> the context itself in the factory. Moreover I think it would
> > >> make
> > >>>>>>>>>> more
> > >>>>>>>>>> sense to have rather a push based approach here. Something
> like
> > >>>>>>>>>> applyOptions(ReadableConfig) method.
> > >>>>>>>>>>> 3. As for the concerns Jingsong raised in the voting thread.
> I
> > >>>>>>>>>>> think it
> > >>>>>>>>>> is not a big problem, but I agree this should be also
> > >> described. I
> > >>>>>>>>>> disagree
> > >>>>>>>>>> with "Connector don't know format information in TableFactory
> > >>> before
> > >>>>>>>>>> obtains real properties, so it can not list any format
> > >>>>>>>>>> `supportedHintOptions`".
> > >>>>>>>>>>> When a factory is instantiated it has access to the
> > >> CatalogTable,
> > >>>>>>>>>> therefore it has access to all the original properties. In
> turn
> > >> it
> > >>>>>>>>>> knows
> > >>>>>>>>>> the original format and can call
> > >>>>>>>>>> FormatFactory#supportedHintOptions().
> > >>>>>>>>>>> The only case when this would not work would be if we allow
> > >>>> changing
> > >>>>>>>>>>> the
> > >>>>>>>>>> format of the Table (e.g. from avro to parquet), which does
> not
> > >>>> sound
> > >>>>>>>>>> like
> > >>>>>>>>>> a good idea to me. I think this feature should not end up as a
> > >> way
> > >>>> to
> > >>>>>>>>>> declare a whole table inline in a SQL query, but should rather
> > >> be
> > >>> a
> > >>>>>>>>>> simple
> > >>>>>>>>>> way for debugging queries. We should not end up with an
> extreme
> > >>>>>>>>>> example
> > >>>>>>>>>> where we do:
> > >>>>>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka',
> ...,
> > >>>>>>>>>> 'format.type' = 'json', ....) */
> > >>>>>>>>>>> 4. SQL Hints syntax.
> > >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> > >> (not
> > >>>>>>>>>>> sure
> > >>>>>>>>>> if it is equivalent to string_literal). I think we should not
> > >> use
> > >>>>>>>>>> simple_identifier because this implies that we cannot use e.g.
> > >> any
> > >>>>>>>>>> SQL
> > >>>>>>>>>> keywords. Anyway it has nothing to do with identifiers. If I
> am
> > >>> not
> > >>>>>>>>>> mistaken it is also how the options in the CREATE statement
> are
> > >>>>>>>>>> implemented.
> > >>>>>>>>>>> What is the purpose of the remaining hint_item:
> > >>> hint_name(hint_opt
> > >>>> [
> > >>>>>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I
> got a
> > >>>>>>>>>> feeling it
> > >>>>>>>>>> does also suggests to support the whole Apache Calcite hint
> > >> system
> > >>>>>>>>>> without
> > >>>>>>>>>> specifying that explicitly. Is the intention of the FLIP to
> > >>> support
> > >>>>>>>>>> choosing e.g. JOIN strategies through hints already? If it is
> so
> > >>> it
> > >>>>>>>>>> should
> > >>>>>>>>>> be mentioned in the FLIP, imo.
> > >>>>>>>>>>> 5. I think something does not work around the
> > >>>>>>>>>>> supportedHintOptions and
> > >>>>>>>>>> wildcards. How do you want to represent a wildcard key as a
> > >>>>>>>>>> ConfigOption? I
> > >>>>>>>>>> am not sure about that, just a though, maybe it make sense to
> > >> have
> > >>>>>>>>>> rather
> > >>>>>>>>>> Set<String> supportedHintOptionKeys()?
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>> Dawid
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Danny Chan <da...@apache.org>.
Hi everyone ~

Thanks for the feedback, i would start a new vote again if there are no new
input objections after 24 hours ~

Best,
Danny

Jark Wu <im...@gmail.com> 于2020年4月8日周三 下午7:19写道:

> `table.dynamic-table-options.enabled` and `TableConfigOptions` sounds good
> to me.
>
> Best,
> Jark
>
> On Wed, 8 Apr 2020 at 18:59, Danny Chan <yu...@gmail.com> wrote:
>
> > `table.dynamic-table-options.enabled` seems fine to me, I would make a
> new
> > `TableConfigOptions` class and put the config option there ~
> >
> > What do you think about the new class to put ?
> >
> > Best,
> > Danny Chan
> > 在 2020年4月8日 +0800 PM5:33,dev@flink.apache.org,写道:
> > >
> > > `table.dynamic-table-options.enabled`
> >
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Jark Wu <im...@gmail.com>.
`table.dynamic-table-options.enabled` and `TableConfigOptions` sounds good
to me.

Best,
Jark

On Wed, 8 Apr 2020 at 18:59, Danny Chan <yu...@gmail.com> wrote:

> `table.dynamic-table-options.enabled` seems fine to me, I would make a new
> `TableConfigOptions` class and put the config option there ~
>
> What do you think about the new class to put ?
>
> Best,
> Danny Chan
> 在 2020年4月8日 +0800 PM5:33,dev@flink.apache.org,写道:
> >
> > `table.dynamic-table-options.enabled`
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Danny Chan <yu...@gmail.com>.
`table.dynamic-table-options.enabled` seems fine to me, I would make a new `TableConfigOptions` class and put the config option there ~

What do you think about the new class to put ?

Best,
Danny Chan
在 2020年4月8日 +0800 PM5:33,dev@flink.apache.org,写道:
>
> `table.dynamic-table-options.enabled`

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Aljoscha Krettek <al...@apache.org>.
On 08.04.20 04:27, Jark Wu wrote:

> I have a minor concern about the global configuration
> `table.optimizer.dynamic-table-options.enabled`, does it belong to
> optimizer?
>  From my point of view, it is just an API to set table options and uses
> Calcite in the implementation.
> I'm also thinking about what's the name of other configurations, e.g
> time-zone, code-gen length, state ttl.
> Should they prefix with "optimizer" or "exec" or something else or nothing?

I agree that this is probably not the right option name. Could we just 
have `table.dynamic-table-options.enabled`?

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Jark Wu <im...@gmail.com>.
Thanks for the summary Danny. +1 to the new proposal.

I have a minor concern about the global configuration
`table.optimizer.dynamic-table-options.enabled`, does it belong to
optimizer?
From my point of view, it is just an API to set table options and uses
Calcite in the implementation.
I'm also thinking about what's the name of other configurations, e.g
time-zone, code-gen length, state ttl.
Should they prefix with "optimizer" or "exec" or something else or nothing?

Best,
Jark

On Tue, 7 Apr 2020 at 23:17, Timo Walther <tw...@apache.org> wrote:

> Thanks for the update Danny. +1 from my side.
>
> Regards,
> Timo
>
>
> On 07.04.20 13:25, Danny Chan wrote:
> > Hi, every ~
> >
> > It seems that we all agree to drop the idea for white/black list for each
> > connector, and have a global config option to default disable this
> feature.
> >
> > I have also discussed with Timo and Jark about the interface
> > TableSourceTable.Context.getExecutionOptions and finally we decide to
> > introduce a new interface CatalogTable#copy(Map<String, String>) to
> support
> > re-generate the table with new table options.
> >
> > So let me summarize the current design broadly again:
> >
> >     - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe
> the
> >     dynamic table options
> >     - There is no constraint on which option key can be used in the
> OPTIONS,
> >     that means, any option key is allowed, the factory would to the
> validation
> >     work finally
> >     - Introduce method CatalogTable#copy, we use this method to
> regenerate a
> >     new CatalogTable to find a table factory and creates table
> source/sink
> >     - There is a global config option to default disable this feature (if
> >     user uses OPTIONS, an exception throws to tell open the option)
> >
> > I have updated the WIKI
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
> >,
> > look forward to your suggestions ~
> >
> > Jark Wu <im...@gmail.com> 于2020年4月7日周二 上午11:24写道:
> >
> >> I'm fine to disable this feature by default and avoid
> >> whitelisting/blacklisting. This simplifies a lot of things.
> >>
> >> Regarding to TableSourceFactory#Context#getExecutionOptions, do we
> really
> >> need this interface?
> >> Should the connector factory be aware of the properties is merged with
> >> hints or not?
> >> What's the problem if we always get properties from
> >> `CatalogTable#getProperties`?
> >>
> >> Best,
> >> Jark
> >>
> >> On Tue, 7 Apr 2020 at 10:39, Kurt Young <yk...@gmail.com> wrote:
> >>
> >>> Sounds like a reasonable compromise, disabling this feature by default
> >> is a
> >>> way to protect
> >>> the vulnerability, and we can simplify the design quite a lot. We can
> >>> gather some users'
> >>> feedback to see whether further protections are necessary in the
> future.
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org>
> wrote:
> >>>
> >>>> I agree with Aljoscha. The length of this thread shows that this is
> >>>> highly controversal. I think nobody really likes this feature 100% but
> >>>> we could not find a better solution. I would consider it as a
> >>>> nice-to-have improvement during a notebook/debugging session.
> >>>>
> >>>> I would accept avoiding whitelisting/blacklisting if the feature is
> >>>> disabled by default. And we make the merged properties available in a
> >>>> separate TableSourceFactory#Context#getExecutionOptions as Danny
> >>> proposed.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Thanks,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 06.04.20 09:59, Aljoscha Krettek wrote:
> >>>>> The reason I'm saying it should be disabled by default is that this
> >>> uses
> >>>>> hint syntax, and hints should really not change query semantics.
> >>>>>
> >>>>> I'm quite strongly against hints that change query semantics, but if
> >> we
> >>>>> disable this by default I would be (reluctantly) OK with the feature.
> >>>>> Companies that create deployments or set up the SQL environment for
> >>>>> users can enable the feature if they want.
> >>>>>
> >>>>> But yes, I also agree that we don't need whitelisting/blacklisting,
> >>>>> which makes this a lot easier to do.
> >>>>>
> >>>>> Best,
> >>>>> Aljoscha
> >>>>>
> >>>>> On 06.04.20 04:27, Danny Chan wrote:
> >>>>>> Hi, everyone ~
> >>>>>>
> >>>>>> @Aljoscha @Timo
> >>>>>>
> >>>>>>> I think we're designing ourselves into ever more complicated
> >> corners
> >>>>>> here
> >>>>>>
> >>>>>> I kindly agree that, personally didn't see strong reasons why we
> >>>>>> should limit on each connector properties:
> >>>>>>
> >>>>>> • we can define any table options for CREATE TABLE, why we treat the
> >>>>>> dynamic options differently, we never consider any security problems
> >>>>>> when create table, we should not either for dynamic table options
> >>>>>> • If we do not have whitelist properties or blacklist properties,
> >> the
> >>>>>> table source creation work would be much easier, just used the
> >> merged
> >>>>>> options. There is no need to modify each connector to decide which
> >>>>>> options could be overridden and how we merge them(the merge work is
> >>>>>> redundant).
> >>>>>> • @Timo, how about we support another interface
> >>>>>> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> >>>>>> interface to get the options to create our table source. There is no
> >>>>>> need to copy the catalog table itselt, we just need to generate our
> >>>>>> Context correctly.
> >>>>>> • @Aljoscha I agree to have a global config option, but I disagree
> >> to
> >>>>>> default disable it, a global default config would break the user
> >>>>>> experience too much, especially when user want to modify the options
> >>>>>> in a ad-hoc way.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> >>>>>> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> >>>>>> does not have black/white list for CREATE TABLE at all at lease for
> >>>>>> current codebase.
> >>>>>>
> >>>>>>
> >>>>>> @Timo (i have replied offline but allows to represent it here again)
> >>>>>>
> >>>>>> The `TableSourceFactory#supportedHintOptions` doesn't work well for
> >> 3
> >>>>>> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> >>>>>> 1. For key with wildcard, like connector.property.* , use a
> >> blacklist
> >>>>>> make us have the ability to disable some of the keys under that,
> >> i.e.
> >>>>>> connector.property.key1 , a whitelist can only match with prefix
> >>>>>>
> >>>>>> 2. We want the connectors to have the ability to disable format type
> >>>>>> switch format.type but allows all the other properties, e.g.
> >> format.*
> >>>>>> without format.type(let's call it SET_B), if we use the whitelist,
> >> we
> >>>>>> have to enumerate all the specific format keys start with format
> >>>>>> (SET_B), but with the old connector factories, we have no idea what
> >>>>>> specific format keys it supports(there is either a format.* or
> >>> nothing).
> >>>>>>
> >>>>>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> >>>>>> blacklist and whitelist has the same expressiveness, use blacklist
> >>>>>> makes the code not too verbose to enumerate all the duplicate keys
> >>>>>> with #supportedKeys .(Not very strong reason, but i think as a
> >>>>>> connector developer, it makes sense)
> >>>>>>
> >>>>>> Best,
> >>>>>> Danny Chan
> >>>>>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> @Aljoscha: I disagree with your approach because a `Catalog` can
> >>> return
> >>>>>>> a custom factory that is not using any properties. The hinting must
> >>> be
> >>>>>>> transparent to a factory. We should NOT modify the metadata
> >>>>>>> `CatalogTable` at any point in time after the catalog.
> >>>>>>>
> >>>>>>> @Danny, @Jingsong: How about we stick to the original design that
> >> we
> >>>>>>> wanted to vote on but use:
> >>>>>>>
> >>>>>>> Set<String> supportedHintProperties()
> >>>>>>>
> >>>>>>> This fits better to the old factory design. And for the new FLIP-95
> >>>>>>> factories we will use `ConfigOption` and provide good utilities for
> >>>>>>> merging with hints etc.
> >>>>>>>
> >>>>>>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> >>>>>>> hinting in formats.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>>
> >>>>>>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> >>>>>>>> I think we're designing ourselves into ever more complicated
> >> corners
> >>>>>>>> here. Maybe we need to take a step back and reconsider. What would
> >>> you
> >>>>>>>> think about this (somewhat) simpler proposal:
> >>>>>>>>
> >>>>>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> >>>>>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> >>>> this
> >>>>>>>> in the future. This will simply overwrite all connector
> >> properties,
> >>>> the
> >>>>>>>> table factories don't know about hints but simply work with the
> >>>>>>>> properties that they are given
> >>>>>>>>
> >>>>>>>> - this special hint is disabled by default and can be activated
> >>> with a
> >>>>>>>> global option "foo.bazzle.connector-hints" (or something like
> >> this)
> >>>>>>>> which has a warning that describes that this can change query
> >>>> semantics
> >>>>>>>> etc.
> >>>>>>>>
> >>>>>>>> That's it. This makes connector implementations a lot easier while
> >>>>>>>> still
> >>>>>>>> allowing inline configuration.
> >>>>>>>>
> >>>>>>>> I still don't like using hint syntax at all for this, because I
> >>>>>>>> strongly
> >>>>>>>> maintain that hints should not change query syntax. In general
> >> using
> >>>>>>>> hints should be kept to a minimum because they usually point to
> >>>>>>>> shortcomings in the system.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Aljoscha
> >>>>>>>>
> >>>>>>>> On 02.04.20 06:06, Jingsong Li wrote:
> >>>>>>>>> Hi Dawid,
> >>>>>>>>>
> >>>>>>>>>> When a factory is instantiated it has access to the
> >> CatalogTable,
> >>>>>>>>> therefore it has access to all the original properties. In turn
> >> it
> >>>>>>>>> knows
> >>>>>>>>> the original format and can call
> >>>> FormatFactory#supportedHintOptions().
> >>>>>>>>>
> >>>>>>>>> Factory can only get CatalogTable when creating source or sink,
> >>>>>>>>> right? IIUC, TableFactory may be stateless too.
> >>>>>>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> >>>>>>>>> get CatalogTable, so it is impossible to create FormatFactory?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Jingsong Lee
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao.cyz@gmail.com
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> >>>>>>>>>>
> >>>>>>>>>> 1. Regarding the motivation:
> >>>>>>>>>>
> >>>>>>>>>> I agree it's not a good suggested way based on the fact that we
> >>> have
> >>>>>>>>>> better solution, but i think we can support override that as
> >> long
> >>>>>>>>>> as it
> >>>>>>>>>> exists as one of the the table options. I would remove if from
> >> the
> >>>>>>>>>> motication part.
> >>>>>>>>>>
> >>>>>>>>>> 2. The options passes around during sql-to-rel conversion, right
> >>>>>>>>>> after we
> >>>>>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
> >> this
> >>>> is
> >>>>>>>>>> indeed a push way method at least in the RelOptTable layer, then
> >>>>>>>>>> we hand
> >>>>>>>>>> over the options to TableSourceFactory with our own context,
> >> which
> >>>> is
> >>>>>>>>>> fine
> >>>>>>>>>> becuase TableSourceFactory#Context is the contact to pass around
> >>>>>>>>>> these
> >>>>>>>>>> table-about variables.
> >>>>>>>>>>
> >>>>>>>>>> 3. "We should not end up with an extreme example where we can
> >>>>>>>>>> change the
> >>>>>>>>>> connector type", i totally agree that, and i have listed the
> >>>>>>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> >>>>>>>>>> format, i
> >>>>>>>>>> think the connector itself can/should control whether to
> >> override
> >>>> the
> >>>>>>>>>> "format.type", that is one of the reason i change the
> >>>>>>>>>> TableSourceFactory#supportedHintOpitons to
> >>>>>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> >>>>>>>>>> limit the
> >>>>>>>>>> format keys we want conveniently.
> >>>>>>>>>>
> >>>>>>>>>> 4. SQL Hints syntax.
> >>>>>>>>>>
> >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> >> (not
> >>>>>>>>>>> sure
> >>>>>>>>>> if it is equivalent to string_literal).
> >>>>>>>>>>
> >>>>>>>>>> I disagree, we at least should keep sync with our DDL: use the
> >>>> string
> >>>>>>>>>> literal as the key. We did also support the simple identifier
> >>>> because
> >>>>>>>>>> this
> >>>>>>>>>> is the common hint syntax from Calcite, it does not hurt
> >> anything
> >>>> for
> >>>>>>>>>> the
> >>>>>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> >>>> think
> >>>>>>>>>> that
> >>>>>>>>>> may cause some confuse, i can make the syntax pluggable for each
> >>>>>>>>>> hint in
> >>>>>>>>>> CALCITE 1.23)
> >>>>>>>>>>
> >>>>>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
> >> the
> >>>>>>>>>> title to
> >>>>>>>>>> "Supports dynamic table options", would make it more clear in
> >> the
> >>>>>>>>>> WIKI.
> >>>>>>>>>>
> >>>>>>>>>> 5. Yes, we also have this concerns from our offline discussion,
> >>>>>>>>>> that is
> >>>>>>>>>> one of the reason, why i change the
> >>>>>>>>>> TableSourceFactory#supportedHintOpitons
> >>>>>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> >>>>>>>>>> Set<String>
> >>>>>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> >>>>>>>>>>
> >>>>>>>>>>      - The wildcard is still not descriptive, we can still not
> >>>>>>>>>> forbidden one
> >>>>>>>>>> of the properties among the wildcard properties, we can not
> >> enable
> >>>> or
> >>>>>>>>>> disable them totally
> >>>>>>>>>>      - ConfigOption is our new structure for keys, and it does
> >> not
> >>>>>>>>>> support
> >>>>>>>>>> wildcard yet.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Danny Chan
> >>>>>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> >>>>>>>>>> <dw...@apache.org>,写道:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> Few comments from my side:
> >>>>>>>>>>> 1. Regarding the motivation:
> >>>>>>>>>>> I think the example with changing the update-mode is not a good
> >>>>>>>>>>> one. In
> >>>>>>>>>> the long term this should be done with EMIT CHANGELOG (discussed
> >>> in
> >>>>>>>>>> FLIP-105).
> >>>>>>>>>>> Nitpicking: I would mention that it is rather for
> >>> debugging/ad-hoc
> >>>>>>>>>> solution. I think this should not be a recommended way for
> >>>> production
> >>>>>>>>>> use
> >>>>>>>>>> cases as it bypasses the Catalog, which should be the source of
> >>>>>>>>>> truth.
> >>>>>>>>>>> 2. I could not understand how the additional options will be
> >>>>>>>>>>> passed to
> >>>>>>>>>> the TableSourceFactory. Could you elaborate a bit more on that?
> >> I
> >>>> see
> >>>>>>>>>> there
> >>>>>>>>>> is a Context interface that gives the options. But cannot find a
> >>> way
> >>>>>>>>>> to get
> >>>>>>>>>> the context itself in the factory. Moreover I think it would
> >> make
> >>>>>>>>>> more
> >>>>>>>>>> sense to have rather a push based approach here. Something like
> >>>>>>>>>> applyOptions(ReadableConfig) method.
> >>>>>>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> >>>>>>>>>>> think it
> >>>>>>>>>> is not a big problem, but I agree this should be also
> >> described. I
> >>>>>>>>>> disagree
> >>>>>>>>>> with "Connector don't know format information in TableFactory
> >>> before
> >>>>>>>>>> obtains real properties, so it can not list any format
> >>>>>>>>>> `supportedHintOptions`".
> >>>>>>>>>>> When a factory is instantiated it has access to the
> >> CatalogTable,
> >>>>>>>>>> therefore it has access to all the original properties. In turn
> >> it
> >>>>>>>>>> knows
> >>>>>>>>>> the original format and can call
> >>>>>>>>>> FormatFactory#supportedHintOptions().
> >>>>>>>>>>> The only case when this would not work would be if we allow
> >>>> changing
> >>>>>>>>>>> the
> >>>>>>>>>> format of the Table (e.g. from avro to parquet), which does not
> >>>> sound
> >>>>>>>>>> like
> >>>>>>>>>> a good idea to me. I think this feature should not end up as a
> >> way
> >>>> to
> >>>>>>>>>> declare a whole table inline in a SQL query, but should rather
> >> be
> >>> a
> >>>>>>>>>> simple
> >>>>>>>>>> way for debugging queries. We should not end up with an extreme
> >>>>>>>>>> example
> >>>>>>>>>> where we do:
> >>>>>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> >>>>>>>>>> 'format.type' = 'json', ....) */
> >>>>>>>>>>> 4. SQL Hints syntax.
> >>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> >> (not
> >>>>>>>>>>> sure
> >>>>>>>>>> if it is equivalent to string_literal). I think we should not
> >> use
> >>>>>>>>>> simple_identifier because this implies that we cannot use e.g.
> >> any
> >>>>>>>>>> SQL
> >>>>>>>>>> keywords. Anyway it has nothing to do with identifiers. If I am
> >>> not
> >>>>>>>>>> mistaken it is also how the options in the CREATE statement are
> >>>>>>>>>> implemented.
> >>>>>>>>>>> What is the purpose of the remaining hint_item:
> >>> hint_name(hint_opt
> >>>> [
> >>>>>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> >>>>>>>>>> feeling it
> >>>>>>>>>> does also suggests to support the whole Apache Calcite hint
> >> system
> >>>>>>>>>> without
> >>>>>>>>>> specifying that explicitly. Is the intention of the FLIP to
> >>> support
> >>>>>>>>>> choosing e.g. JOIN strategies through hints already? If it is so
> >>> it
> >>>>>>>>>> should
> >>>>>>>>>> be mentioned in the FLIP, imo.
> >>>>>>>>>>> 5. I think something does not work around the
> >>>>>>>>>>> supportedHintOptions and
> >>>>>>>>>> wildcards. How do you want to represent a wildcard key as a
> >>>>>>>>>> ConfigOption? I
> >>>>>>>>>> am not sure about that, just a though, maybe it make sense to
> >> have
> >>>>>>>>>> rather
> >>>>>>>>>> Set<String> supportedHintOptionKeys()?
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Dawid
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Timo Walther <tw...@apache.org>.
Thanks for the update Danny. +1 from my side.

Regards,
Timo


On 07.04.20 13:25, Danny Chan wrote:
> Hi, every ~
> 
> It seems that we all agree to drop the idea for white/black list for each
> connector, and have a global config option to default disable this feature.
> 
> I have also discussed with Timo and Jark about the interface
> TableSourceTable.Context.getExecutionOptions and finally we decide to
> introduce a new interface CatalogTable#copy(Map<String, String>) to support
> re-generate the table with new table options.
> 
> So let me summarize the current design broadly again:
> 
>     - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe the
>     dynamic table options
>     - There is no constraint on which option key can be used in the OPTIONS,
>     that means, any option key is allowed, the factory would to the validation
>     work finally
>     - Introduce method CatalogTable#copy, we use this method to regenerate a
>     new CatalogTable to find a table factory and creates table source/sink
>     - There is a global config option to default disable this feature (if
>     user uses OPTIONS, an exception throws to tell open the option)
> 
> I have updated the WIKI
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL>,
> look forward to your suggestions ~
> 
> Jark Wu <im...@gmail.com> 于2020年4月7日周二 上午11:24写道:
> 
>> I'm fine to disable this feature by default and avoid
>> whitelisting/blacklisting. This simplifies a lot of things.
>>
>> Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
>> need this interface?
>> Should the connector factory be aware of the properties is merged with
>> hints or not?
>> What's the problem if we always get properties from
>> `CatalogTable#getProperties`?
>>
>> Best,
>> Jark
>>
>> On Tue, 7 Apr 2020 at 10:39, Kurt Young <yk...@gmail.com> wrote:
>>
>>> Sounds like a reasonable compromise, disabling this feature by default
>> is a
>>> way to protect
>>> the vulnerability, and we can simplify the design quite a lot. We can
>>> gather some users'
>>> feedback to see whether further protections are necessary in the future.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org> wrote:
>>>
>>>> I agree with Aljoscha. The length of this thread shows that this is
>>>> highly controversal. I think nobody really likes this feature 100% but
>>>> we could not find a better solution. I would consider it as a
>>>> nice-to-have improvement during a notebook/debugging session.
>>>>
>>>> I would accept avoiding whitelisting/blacklisting if the feature is
>>>> disabled by default. And we make the merged properties available in a
>>>> separate TableSourceFactory#Context#getExecutionOptions as Danny
>>> proposed.
>>>>
>>>> What do you think?
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>>
>>>> On 06.04.20 09:59, Aljoscha Krettek wrote:
>>>>> The reason I'm saying it should be disabled by default is that this
>>> uses
>>>>> hint syntax, and hints should really not change query semantics.
>>>>>
>>>>> I'm quite strongly against hints that change query semantics, but if
>> we
>>>>> disable this by default I would be (reluctantly) OK with the feature.
>>>>> Companies that create deployments or set up the SQL environment for
>>>>> users can enable the feature if they want.
>>>>>
>>>>> But yes, I also agree that we don't need whitelisting/blacklisting,
>>>>> which makes this a lot easier to do.
>>>>>
>>>>> Best,
>>>>> Aljoscha
>>>>>
>>>>> On 06.04.20 04:27, Danny Chan wrote:
>>>>>> Hi, everyone ~
>>>>>>
>>>>>> @Aljoscha @Timo
>>>>>>
>>>>>>> I think we're designing ourselves into ever more complicated
>> corners
>>>>>> here
>>>>>>
>>>>>> I kindly agree that, personally didn't see strong reasons why we
>>>>>> should limit on each connector properties:
>>>>>>
>>>>>> • we can define any table options for CREATE TABLE, why we treat the
>>>>>> dynamic options differently, we never consider any security problems
>>>>>> when create table, we should not either for dynamic table options
>>>>>> • If we do not have whitelist properties or blacklist properties,
>> the
>>>>>> table source creation work would be much easier, just used the
>> merged
>>>>>> options. There is no need to modify each connector to decide which
>>>>>> options could be overridden and how we merge them(the merge work is
>>>>>> redundant).
>>>>>> • @Timo, how about we support another interface
>>>>>> `TableSourceFactory#Context.getExecutionOptions`, we always use this
>>>>>> interface to get the options to create our table source. There is no
>>>>>> need to copy the catalog table itselt, we just need to generate our
>>>>>> Context correctly.
>>>>>> • @Aljoscha I agree to have a global config option, but I disagree
>> to
>>>>>> default disable it, a global default config would break the user
>>>>>> experience too much, especially when user want to modify the options
>>>>>> in a ad-hoc way.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I suggest to remove `TableSourceFactory#supportedHintOptions` or
>>>>>> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
>>>>>> does not have black/white list for CREATE TABLE at all at lease for
>>>>>> current codebase.
>>>>>>
>>>>>>
>>>>>> @Timo (i have replied offline but allows to represent it here again)
>>>>>>
>>>>>> The `TableSourceFactory#supportedHintOptions` doesn't work well for
>> 3
>>>>>> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
>>>>>> 1. For key with wildcard, like connector.property.* , use a
>> blacklist
>>>>>> make us have the ability to disable some of the keys under that,
>> i.e.
>>>>>> connector.property.key1 , a whitelist can only match with prefix
>>>>>>
>>>>>> 2. We want the connectors to have the ability to disable format type
>>>>>> switch format.type but allows all the other properties, e.g.
>> format.*
>>>>>> without format.type(let's call it SET_B), if we use the whitelist,
>> we
>>>>>> have to enumerate all the specific format keys start with format
>>>>>> (SET_B), but with the old connector factories, we have no idea what
>>>>>> specific format keys it supports(there is either a format.* or
>>> nothing).
>>>>>>
>>>>>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
>>>>>> blacklist and whitelist has the same expressiveness, use blacklist
>>>>>> makes the code not too verbose to enumerate all the duplicate keys
>>>>>> with #supportedKeys .(Not very strong reason, but i think as a
>>>>>> connector developer, it makes sense)
>>>>>>
>>>>>> Best,
>>>>>> Danny Chan
>>>>>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> @Aljoscha: I disagree with your approach because a `Catalog` can
>>> return
>>>>>>> a custom factory that is not using any properties. The hinting must
>>> be
>>>>>>> transparent to a factory. We should NOT modify the metadata
>>>>>>> `CatalogTable` at any point in time after the catalog.
>>>>>>>
>>>>>>> @Danny, @Jingsong: How about we stick to the original design that
>> we
>>>>>>> wanted to vote on but use:
>>>>>>>
>>>>>>> Set<String> supportedHintProperties()
>>>>>>>
>>>>>>> This fits better to the old factory design. And for the new FLIP-95
>>>>>>> factories we will use `ConfigOption` and provide good utilities for
>>>>>>> merging with hints etc.
>>>>>>>
>>>>>>> We can allow `"format.*"` in `supportedHintProperties()` to allow
>>>>>>> hinting in formats.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>>
>>>>>>>
>>>>>>> On 02.04.20 16:24, Aljoscha Krettek wrote:
>>>>>>>> I think we're designing ourselves into ever more complicated
>> corners
>>>>>>>> here. Maybe we need to take a step back and reconsider. What would
>>> you
>>>>>>>> think about this (somewhat) simpler proposal:
>>>>>>>>
>>>>>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
>>>>>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
>>>> this
>>>>>>>> in the future. This will simply overwrite all connector
>> properties,
>>>> the
>>>>>>>> table factories don't know about hints but simply work with the
>>>>>>>> properties that they are given
>>>>>>>>
>>>>>>>> - this special hint is disabled by default and can be activated
>>> with a
>>>>>>>> global option "foo.bazzle.connector-hints" (or something like
>> this)
>>>>>>>> which has a warning that describes that this can change query
>>>> semantics
>>>>>>>> etc.
>>>>>>>>
>>>>>>>> That's it. This makes connector implementations a lot easier while
>>>>>>>> still
>>>>>>>> allowing inline configuration.
>>>>>>>>
>>>>>>>> I still don't like using hint syntax at all for this, because I
>>>>>>>> strongly
>>>>>>>> maintain that hints should not change query syntax. In general
>> using
>>>>>>>> hints should be kept to a minimum because they usually point to
>>>>>>>> shortcomings in the system.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Aljoscha
>>>>>>>>
>>>>>>>> On 02.04.20 06:06, Jingsong Li wrote:
>>>>>>>>> Hi Dawid,
>>>>>>>>>
>>>>>>>>>> When a factory is instantiated it has access to the
>> CatalogTable,
>>>>>>>>> therefore it has access to all the original properties. In turn
>> it
>>>>>>>>> knows
>>>>>>>>> the original format and can call
>>>> FormatFactory#supportedHintOptions().
>>>>>>>>>
>>>>>>>>> Factory can only get CatalogTable when creating source or sink,
>>>>>>>>> right? IIUC, TableFactory may be stateless too.
>>>>>>>>> When invoking SourceFactory#supportedHintOptions(), it can not
>>>>>>>>> get CatalogTable, so it is impossible to create FormatFactory?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jingsong Lee
>>>>>>>>>
>>>>>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao.cyz@gmail.com
>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
>>>>>>>>>>
>>>>>>>>>> 1. Regarding the motivation:
>>>>>>>>>>
>>>>>>>>>> I agree it's not a good suggested way based on the fact that we
>>> have
>>>>>>>>>> better solution, but i think we can support override that as
>> long
>>>>>>>>>> as it
>>>>>>>>>> exists as one of the the table options. I would remove if from
>> the
>>>>>>>>>> motication part.
>>>>>>>>>>
>>>>>>>>>> 2. The options passes around during sql-to-rel conversion, right
>>>>>>>>>> after we
>>>>>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
>> this
>>>> is
>>>>>>>>>> indeed a push way method at least in the RelOptTable layer, then
>>>>>>>>>> we hand
>>>>>>>>>> over the options to TableSourceFactory with our own context,
>> which
>>>> is
>>>>>>>>>> fine
>>>>>>>>>> becuase TableSourceFactory#Context is the contact to pass around
>>>>>>>>>> these
>>>>>>>>>> table-about variables.
>>>>>>>>>>
>>>>>>>>>> 3. "We should not end up with an extreme example where we can
>>>>>>>>>> change the
>>>>>>>>>> connector type", i totally agree that, and i have listed the
>>>>>>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
>>>>>>>>>> format, i
>>>>>>>>>> think the connector itself can/should control whether to
>> override
>>>> the
>>>>>>>>>> "format.type", that is one of the reason i change the
>>>>>>>>>> TableSourceFactory#supportedHintOpitons to
>>>>>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
>>>>>>>>>> limit the
>>>>>>>>>> format keys we want conveniently.
>>>>>>>>>>
>>>>>>>>>> 4. SQL Hints syntax.
>>>>>>>>>>
>>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
>> (not
>>>>>>>>>>> sure
>>>>>>>>>> if it is equivalent to string_literal).
>>>>>>>>>>
>>>>>>>>>> I disagree, we at least should keep sync with our DDL: use the
>>>> string
>>>>>>>>>> literal as the key. We did also support the simple identifier
>>>> because
>>>>>>>>>> this
>>>>>>>>>> is the common hint syntax from Calcite, it does not hurt
>> anything
>>>> for
>>>>>>>>>> the
>>>>>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
>>>> think
>>>>>>>>>> that
>>>>>>>>>> may cause some confuse, i can make the syntax pluggable for each
>>>>>>>>>> hint in
>>>>>>>>>> CALCITE 1.23)
>>>>>>>>>>
>>>>>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
>> the
>>>>>>>>>> title to
>>>>>>>>>> "Supports dynamic table options", would make it more clear in
>> the
>>>>>>>>>> WIKI.
>>>>>>>>>>
>>>>>>>>>> 5. Yes, we also have this concerns from our offline discussion,
>>>>>>>>>> that is
>>>>>>>>>> one of the reason, why i change the
>>>>>>>>>> TableSourceFactory#supportedHintOpitons
>>>>>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
>>>>>>>>>> Set<String>
>>>>>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
>>>>>>>>>>
>>>>>>>>>>      - The wildcard is still not descriptive, we can still not
>>>>>>>>>> forbidden one
>>>>>>>>>> of the properties among the wildcard properties, we can not
>> enable
>>>> or
>>>>>>>>>> disable them totally
>>>>>>>>>>      - ConfigOption is our new structure for keys, and it does
>> not
>>>>>>>>>> support
>>>>>>>>>> wildcard yet.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
>>>>>>>>>> <dw...@apache.org>,写道:
>>>>>>>>>>> Hi,
>>>>>>>>>>> Few comments from my side:
>>>>>>>>>>> 1. Regarding the motivation:
>>>>>>>>>>> I think the example with changing the update-mode is not a good
>>>>>>>>>>> one. In
>>>>>>>>>> the long term this should be done with EMIT CHANGELOG (discussed
>>> in
>>>>>>>>>> FLIP-105).
>>>>>>>>>>> Nitpicking: I would mention that it is rather for
>>> debugging/ad-hoc
>>>>>>>>>> solution. I think this should not be a recommended way for
>>>> production
>>>>>>>>>> use
>>>>>>>>>> cases as it bypasses the Catalog, which should be the source of
>>>>>>>>>> truth.
>>>>>>>>>>> 2. I could not understand how the additional options will be
>>>>>>>>>>> passed to
>>>>>>>>>> the TableSourceFactory. Could you elaborate a bit more on that?
>> I
>>>> see
>>>>>>>>>> there
>>>>>>>>>> is a Context interface that gives the options. But cannot find a
>>> way
>>>>>>>>>> to get
>>>>>>>>>> the context itself in the factory. Moreover I think it would
>> make
>>>>>>>>>> more
>>>>>>>>>> sense to have rather a push based approach here. Something like
>>>>>>>>>> applyOptions(ReadableConfig) method.
>>>>>>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
>>>>>>>>>>> think it
>>>>>>>>>> is not a big problem, but I agree this should be also
>> described. I
>>>>>>>>>> disagree
>>>>>>>>>> with "Connector don't know format information in TableFactory
>>> before
>>>>>>>>>> obtains real properties, so it can not list any format
>>>>>>>>>> `supportedHintOptions`".
>>>>>>>>>>> When a factory is instantiated it has access to the
>> CatalogTable,
>>>>>>>>>> therefore it has access to all the original properties. In turn
>> it
>>>>>>>>>> knows
>>>>>>>>>> the original format and can call
>>>>>>>>>> FormatFactory#supportedHintOptions().
>>>>>>>>>>> The only case when this would not work would be if we allow
>>>> changing
>>>>>>>>>>> the
>>>>>>>>>> format of the Table (e.g. from avro to parquet), which does not
>>>> sound
>>>>>>>>>> like
>>>>>>>>>> a good idea to me. I think this feature should not end up as a
>> way
>>>> to
>>>>>>>>>> declare a whole table inline in a SQL query, but should rather
>> be
>>> a
>>>>>>>>>> simple
>>>>>>>>>> way for debugging queries. We should not end up with an extreme
>>>>>>>>>> example
>>>>>>>>>> where we do:
>>>>>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
>>>>>>>>>> 'format.type' = 'json', ....) */
>>>>>>>>>>> 4. SQL Hints syntax.
>>>>>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
>> (not
>>>>>>>>>>> sure
>>>>>>>>>> if it is equivalent to string_literal). I think we should not
>> use
>>>>>>>>>> simple_identifier because this implies that we cannot use e.g.
>> any
>>>>>>>>>> SQL
>>>>>>>>>> keywords. Anyway it has nothing to do with identifiers. If I am
>>> not
>>>>>>>>>> mistaken it is also how the options in the CREATE statement are
>>>>>>>>>> implemented.
>>>>>>>>>>> What is the purpose of the remaining hint_item:
>>> hint_name(hint_opt
>>>> [
>>>>>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
>>>>>>>>>> feeling it
>>>>>>>>>> does also suggests to support the whole Apache Calcite hint
>> system
>>>>>>>>>> without
>>>>>>>>>> specifying that explicitly. Is the intention of the FLIP to
>>> support
>>>>>>>>>> choosing e.g. JOIN strategies through hints already? If it is so
>>> it
>>>>>>>>>> should
>>>>>>>>>> be mentioned in the FLIP, imo.
>>>>>>>>>>> 5. I think something does not work around the
>>>>>>>>>>> supportedHintOptions and
>>>>>>>>>> wildcards. How do you want to represent a wildcard key as a
>>>>>>>>>> ConfigOption? I
>>>>>>>>>> am not sure about that, just a though, maybe it make sense to
>> have
>>>>>>>>>> rather
>>>>>>>>>> Set<String> supportedHintOptionKeys()?
>>>>>>>>>>> Best,
>>>>>>>>>>> Dawid
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
> 


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Danny Chan <da...@apache.org>.
Hi, every ~

It seems that we all agree to drop the idea for white/black list for each
connector, and have a global config option to default disable this feature.

I have also discussed with Timo and Jark about the interface
TableSourceTable.Context.getExecutionOptions and finally we decide to
introduce a new interface CatalogTable#copy(Map<String, String>) to support
re-generate the table with new table options.

So let me summarize the current design broadly again:

   - Use the syntax /*+ OPTIONS('k1' = 'v1', 'k2' = 'v2') to describe the
   dynamic table options
   - There is no constraint on which option key can be used in the OPTIONS,
   that means, any option key is allowed, the factory would to the validation
   work finally
   - Introduce method CatalogTable#copy, we use this method to regenerate a
   new CatalogTable to find a table factory and creates table source/sink
   - There is a global config option to default disable this feature (if
   user uses OPTIONS, an exception throws to tell open the option)

I have updated the WIKI
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL>,
look forward to your suggestions ~

Jark Wu <im...@gmail.com> 于2020年4月7日周二 上午11:24写道:

> I'm fine to disable this feature by default and avoid
> whitelisting/blacklisting. This simplifies a lot of things.
>
> Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
> need this interface?
> Should the connector factory be aware of the properties is merged with
> hints or not?
> What's the problem if we always get properties from
> `CatalogTable#getProperties`?
>
> Best,
> Jark
>
> On Tue, 7 Apr 2020 at 10:39, Kurt Young <yk...@gmail.com> wrote:
>
> > Sounds like a reasonable compromise, disabling this feature by default
> is a
> > way to protect
> > the vulnerability, and we can simplify the design quite a lot. We can
> > gather some users'
> > feedback to see whether further protections are necessary in the future.
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org> wrote:
> >
> > > I agree with Aljoscha. The length of this thread shows that this is
> > > highly controversal. I think nobody really likes this feature 100% but
> > > we could not find a better solution. I would consider it as a
> > > nice-to-have improvement during a notebook/debugging session.
> > >
> > > I would accept avoiding whitelisting/blacklisting if the feature is
> > > disabled by default. And we make the merged properties available in a
> > > separate TableSourceFactory#Context#getExecutionOptions as Danny
> > proposed.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Timo
> > >
> > >
> > > On 06.04.20 09:59, Aljoscha Krettek wrote:
> > > > The reason I'm saying it should be disabled by default is that this
> > uses
> > > > hint syntax, and hints should really not change query semantics.
> > > >
> > > > I'm quite strongly against hints that change query semantics, but if
> we
> > > > disable this by default I would be (reluctantly) OK with the feature.
> > > > Companies that create deployments or set up the SQL environment for
> > > > users can enable the feature if they want.
> > > >
> > > > But yes, I also agree that we don't need whitelisting/blacklisting,
> > > > which makes this a lot easier to do.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > On 06.04.20 04:27, Danny Chan wrote:
> > > >> Hi, everyone ~
> > > >>
> > > >> @Aljoscha @Timo
> > > >>
> > > >>> I think we're designing ourselves into ever more complicated
> corners
> > > >> here
> > > >>
> > > >> I kindly agree that, personally didn't see strong reasons why we
> > > >> should limit on each connector properties:
> > > >>
> > > >> • we can define any table options for CREATE TABLE, why we treat the
> > > >> dynamic options differently, we never consider any security problems
> > > >> when create table, we should not either for dynamic table options
> > > >> • If we do not have whitelist properties or blacklist properties,
> the
> > > >> table source creation work would be much easier, just used the
> merged
> > > >> options. There is no need to modify each connector to decide which
> > > >> options could be overridden and how we merge them(the merge work is
> > > >> redundant).
> > > >> • @Timo, how about we support another interface
> > > >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> > > >> interface to get the options to create our table source. There is no
> > > >> need to copy the catalog table itselt, we just need to generate our
> > > >> Context correctly.
> > > >> • @Aljoscha I agree to have a global config option, but I disagree
> to
> > > >> default disable it, a global default config would break the user
> > > >> experience too much, especially when user want to modify the options
> > > >> in a ad-hoc way.
> > > >>
> > > >>
> > > >>
> > > >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > > >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> > > >> does not have black/white list for CREATE TABLE at all at lease for
> > > >> current codebase.
> > > >>
> > > >>
> > > >> @Timo (i have replied offline but allows to represent it here again)
> > > >>
> > > >> The `TableSourceFactory#supportedHintOptions` doesn't work well for
> 3
> > > >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > > >> 1. For key with wildcard, like connector.property.* , use a
> blacklist
> > > >> make us have the ability to disable some of the keys under that,
> i.e.
> > > >> connector.property.key1 , a whitelist can only match with prefix
> > > >>
> > > >> 2. We want the connectors to have the ability to disable format type
> > > >> switch format.type but allows all the other properties, e.g.
> format.*
> > > >> without format.type(let's call it SET_B), if we use the whitelist,
> we
> > > >> have to enumerate all the specific format keys start with format
> > > >> (SET_B), but with the old connector factories, we have no idea what
> > > >> specific format keys it supports(there is either a format.* or
> > nothing).
> > > >>
> > > >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > > >> blacklist and whitelist has the same expressiveness, use blacklist
> > > >> makes the code not too verbose to enumerate all the duplicate keys
> > > >> with #supportedKeys .(Not very strong reason, but i think as a
> > > >> connector developer, it makes sense)
> > > >>
> > > >> Best,
> > > >> Danny Chan
> > > >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> > > >>> Hi everyone,
> > > >>>
> > > >>> @Aljoscha: I disagree with your approach because a `Catalog` can
> > return
> > > >>> a custom factory that is not using any properties. The hinting must
> > be
> > > >>> transparent to a factory. We should NOT modify the metadata
> > > >>> `CatalogTable` at any point in time after the catalog.
> > > >>>
> > > >>> @Danny, @Jingsong: How about we stick to the original design that
> we
> > > >>> wanted to vote on but use:
> > > >>>
> > > >>> Set<String> supportedHintProperties()
> > > >>>
> > > >>> This fits better to the old factory design. And for the new FLIP-95
> > > >>> factories we will use `ConfigOption` and provide good utilities for
> > > >>> merging with hints etc.
> > > >>>
> > > >>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> > > >>> hinting in formats.
> > > >>>
> > > >>> What do you think?
> > > >>>
> > > >>> Regards,
> > > >>> Timo
> > > >>>
> > > >>>
> > > >>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > > >>>> I think we're designing ourselves into ever more complicated
> corners
> > > >>>> here. Maybe we need to take a step back and reconsider. What would
> > you
> > > >>>> think about this (somewhat) simpler proposal:
> > > >>>>
> > > >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > > >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> > > this
> > > >>>> in the future. This will simply overwrite all connector
> properties,
> > > the
> > > >>>> table factories don't know about hints but simply work with the
> > > >>>> properties that they are given
> > > >>>>
> > > >>>> - this special hint is disabled by default and can be activated
> > with a
> > > >>>> global option "foo.bazzle.connector-hints" (or something like
> this)
> > > >>>> which has a warning that describes that this can change query
> > > semantics
> > > >>>> etc.
> > > >>>>
> > > >>>> That's it. This makes connector implementations a lot easier while
> > > >>>> still
> > > >>>> allowing inline configuration.
> > > >>>>
> > > >>>> I still don't like using hint syntax at all for this, because I
> > > >>>> strongly
> > > >>>> maintain that hints should not change query syntax. In general
> using
> > > >>>> hints should be kept to a minimum because they usually point to
> > > >>>> shortcomings in the system.
> > > >>>>
> > > >>>> Best,
> > > >>>> Aljoscha
> > > >>>>
> > > >>>> On 02.04.20 06:06, Jingsong Li wrote:
> > > >>>>> Hi Dawid,
> > > >>>>>
> > > >>>>>> When a factory is instantiated it has access to the
> CatalogTable,
> > > >>>>> therefore it has access to all the original properties. In turn
> it
> > > >>>>> knows
> > > >>>>> the original format and can call
> > > FormatFactory#supportedHintOptions().
> > > >>>>>
> > > >>>>> Factory can only get CatalogTable when creating source or sink,
> > > >>>>> right? IIUC, TableFactory may be stateless too.
> > > >>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> > > >>>>> get CatalogTable, so it is impossible to create FormatFactory?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Jingsong Lee
> > > >>>>>
> > > >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yuzhao.cyz@gmail.com
> >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> > > >>>>>>
> > > >>>>>> 1. Regarding the motivation:
> > > >>>>>>
> > > >>>>>> I agree it's not a good suggested way based on the fact that we
> > have
> > > >>>>>> better solution, but i think we can support override that as
> long
> > > >>>>>> as it
> > > >>>>>> exists as one of the the table options. I would remove if from
> the
> > > >>>>>> motication part.
> > > >>>>>>
> > > >>>>>> 2. The options passes around during sql-to-rel conversion, right
> > > >>>>>> after we
> > > >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink),
> this
> > > is
> > > >>>>>> indeed a push way method at least in the RelOptTable layer, then
> > > >>>>>> we hand
> > > >>>>>> over the options to TableSourceFactory with our own context,
> which
> > > is
> > > >>>>>> fine
> > > >>>>>> becuase TableSourceFactory#Context is the contact to pass around
> > > >>>>>> these
> > > >>>>>> table-about variables.
> > > >>>>>>
> > > >>>>>> 3. "We should not end up with an extreme example where we can
> > > >>>>>> change the
> > > >>>>>> connector type", i totally agree that, and i have listed the
> > > >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> > > >>>>>> format, i
> > > >>>>>> think the connector itself can/should control whether to
> override
> > > the
> > > >>>>>> "format.type", that is one of the reason i change the
> > > >>>>>> TableSourceFactory#supportedHintOpitons to
> > > >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> > > >>>>>> limit the
> > > >>>>>> format keys we want conveniently.
> > > >>>>>>
> > > >>>>>> 4. SQL Hints syntax.
> > > >>>>>>
> > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> (not
> > > >>>>>>> sure
> > > >>>>>> if it is equivalent to string_literal).
> > > >>>>>>
> > > >>>>>> I disagree, we at least should keep sync with our DDL: use the
> > > string
> > > >>>>>> literal as the key. We did also support the simple identifier
> > > because
> > > >>>>>> this
> > > >>>>>> is the common hint syntax from Calcite, it does not hurt
> anything
> > > for
> > > >>>>>> the
> > > >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> > > think
> > > >>>>>> that
> > > >>>>>> may cause some confuse, i can make the syntax pluggable for each
> > > >>>>>> hint in
> > > >>>>>> CALCITE 1.23)
> > > >>>>>>
> > > >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed
> the
> > > >>>>>> title to
> > > >>>>>> "Supports dynamic table options", would make it more clear in
> the
> > > >>>>>> WIKI.
> > > >>>>>>
> > > >>>>>> 5. Yes, we also have this concerns from our offline discussion,
> > > >>>>>> that is
> > > >>>>>> one of the reason, why i change the
> > > >>>>>> TableSourceFactory#supportedHintOpitons
> > > >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> > > >>>>>> Set<String>
> > > >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> > > >>>>>>
> > > >>>>>>     - The wildcard is still not descriptive, we can still not
> > > >>>>>> forbidden one
> > > >>>>>> of the properties among the wildcard properties, we can not
> enable
> > > or
> > > >>>>>> disable them totally
> > > >>>>>>     - ConfigOption is our new structure for keys, and it does
> not
> > > >>>>>> support
> > > >>>>>> wildcard yet.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Danny Chan
> > > >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > > >>>>>> <dw...@apache.org>,写道:
> > > >>>>>>> Hi,
> > > >>>>>>> Few comments from my side:
> > > >>>>>>> 1. Regarding the motivation:
> > > >>>>>>> I think the example with changing the update-mode is not a good
> > > >>>>>>> one. In
> > > >>>>>> the long term this should be done with EMIT CHANGELOG (discussed
> > in
> > > >>>>>> FLIP-105).
> > > >>>>>>> Nitpicking: I would mention that it is rather for
> > debugging/ad-hoc
> > > >>>>>> solution. I think this should not be a recommended way for
> > > production
> > > >>>>>> use
> > > >>>>>> cases as it bypasses the Catalog, which should be the source of
> > > >>>>>> truth.
> > > >>>>>>> 2. I could not understand how the additional options will be
> > > >>>>>>> passed to
> > > >>>>>> the TableSourceFactory. Could you elaborate a bit more on that?
> I
> > > see
> > > >>>>>> there
> > > >>>>>> is a Context interface that gives the options. But cannot find a
> > way
> > > >>>>>> to get
> > > >>>>>> the context itself in the factory. Moreover I think it would
> make
> > > >>>>>> more
> > > >>>>>> sense to have rather a push based approach here. Something like
> > > >>>>>> applyOptions(ReadableConfig) method.
> > > >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> > > >>>>>>> think it
> > > >>>>>> is not a big problem, but I agree this should be also
> described. I
> > > >>>>>> disagree
> > > >>>>>> with "Connector don't know format information in TableFactory
> > before
> > > >>>>>> obtains real properties, so it can not list any format
> > > >>>>>> `supportedHintOptions`".
> > > >>>>>>> When a factory is instantiated it has access to the
> CatalogTable,
> > > >>>>>> therefore it has access to all the original properties. In turn
> it
> > > >>>>>> knows
> > > >>>>>> the original format and can call
> > > >>>>>> FormatFactory#supportedHintOptions().
> > > >>>>>>> The only case when this would not work would be if we allow
> > > changing
> > > >>>>>>> the
> > > >>>>>> format of the Table (e.g. from avro to parquet), which does not
> > > sound
> > > >>>>>> like
> > > >>>>>> a good idea to me. I think this feature should not end up as a
> way
> > > to
> > > >>>>>> declare a whole table inline in a SQL query, but should rather
> be
> > a
> > > >>>>>> simple
> > > >>>>>> way for debugging queries. We should not end up with an extreme
> > > >>>>>> example
> > > >>>>>> where we do:
> > > >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> > > >>>>>> 'format.type' = 'json', ....) */
> > > >>>>>>> 4. SQL Hints syntax.
> > > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING
> (not
> > > >>>>>>> sure
> > > >>>>>> if it is equivalent to string_literal). I think we should not
> use
> > > >>>>>> simple_identifier because this implies that we cannot use e.g.
> any
> > > >>>>>> SQL
> > > >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am
> > not
> > > >>>>>> mistaken it is also how the options in the CREATE statement are
> > > >>>>>> implemented.
> > > >>>>>>> What is the purpose of the remaining hint_item:
> > hint_name(hint_opt
> > > [
> > > >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> > > >>>>>> feeling it
> > > >>>>>> does also suggests to support the whole Apache Calcite hint
> system
> > > >>>>>> without
> > > >>>>>> specifying that explicitly. Is the intention of the FLIP to
> > support
> > > >>>>>> choosing e.g. JOIN strategies through hints already? If it is so
> > it
> > > >>>>>> should
> > > >>>>>> be mentioned in the FLIP, imo.
> > > >>>>>>> 5. I think something does not work around the
> > > >>>>>>> supportedHintOptions and
> > > >>>>>> wildcards. How do you want to represent a wildcard key as a
> > > >>>>>> ConfigOption? I
> > > >>>>>> am not sure about that, just a though, maybe it make sense to
> have
> > > >>>>>> rather
> > > >>>>>> Set<String> supportedHintOptionKeys()?
> > > >>>>>>> Best,
> > > >>>>>>> Dawid
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Jark Wu <im...@gmail.com>.
I'm fine to disable this feature by default and avoid
whitelisting/blacklisting. This simplifies a lot of things.

Regarding to TableSourceFactory#Context#getExecutionOptions, do we really
need this interface?
Should the connector factory be aware of the properties is merged with
hints or not?
What's the problem if we always get properties from
`CatalogTable#getProperties`?

Best,
Jark

On Tue, 7 Apr 2020 at 10:39, Kurt Young <yk...@gmail.com> wrote:

> Sounds like a reasonable compromise, disabling this feature by default is a
> way to protect
> the vulnerability, and we can simplify the design quite a lot. We can
> gather some users'
> feedback to see whether further protections are necessary in the future.
>
> Best,
> Kurt
>
>
> On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org> wrote:
>
> > I agree with Aljoscha. The length of this thread shows that this is
> > highly controversal. I think nobody really likes this feature 100% but
> > we could not find a better solution. I would consider it as a
> > nice-to-have improvement during a notebook/debugging session.
> >
> > I would accept avoiding whitelisting/blacklisting if the feature is
> > disabled by default. And we make the merged properties available in a
> > separate TableSourceFactory#Context#getExecutionOptions as Danny
> proposed.
> >
> > What do you think?
> >
> > Thanks,
> > Timo
> >
> >
> > On 06.04.20 09:59, Aljoscha Krettek wrote:
> > > The reason I'm saying it should be disabled by default is that this
> uses
> > > hint syntax, and hints should really not change query semantics.
> > >
> > > I'm quite strongly against hints that change query semantics, but if we
> > > disable this by default I would be (reluctantly) OK with the feature.
> > > Companies that create deployments or set up the SQL environment for
> > > users can enable the feature if they want.
> > >
> > > But yes, I also agree that we don't need whitelisting/blacklisting,
> > > which makes this a lot easier to do.
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On 06.04.20 04:27, Danny Chan wrote:
> > >> Hi, everyone ~
> > >>
> > >> @Aljoscha @Timo
> > >>
> > >>> I think we're designing ourselves into ever more complicated corners
> > >> here
> > >>
> > >> I kindly agree that, personally didn't see strong reasons why we
> > >> should limit on each connector properties:
> > >>
> > >> • we can define any table options for CREATE TABLE, why we treat the
> > >> dynamic options differently, we never consider any security problems
> > >> when create table, we should not either for dynamic table options
> > >> • If we do not have whitelist properties or blacklist properties, the
> > >> table source creation work would be much easier, just used the merged
> > >> options. There is no need to modify each connector to decide which
> > >> options could be overridden and how we merge them(the merge work is
> > >> redundant).
> > >> • @Timo, how about we support another interface
> > >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> > >> interface to get the options to create our table source. There is no
> > >> need to copy the catalog table itselt, we just need to generate our
> > >> Context correctly.
> > >> • @Aljoscha I agree to have a global config option, but I disagree to
> > >> default disable it, a global default config would break the user
> > >> experience too much, especially when user want to modify the options
> > >> in a ad-hoc way.
> > >>
> > >>
> > >>
> > >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> > >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> > >> does not have black/white list for CREATE TABLE at all at lease for
> > >> current codebase.
> > >>
> > >>
> > >> @Timo (i have replied offline but allows to represent it here again)
> > >>
> > >> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3
> > >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> > >> 1. For key with wildcard, like connector.property.* , use a blacklist
> > >> make us have the ability to disable some of the keys under that, i.e.
> > >> connector.property.key1 , a whitelist can only match with prefix
> > >>
> > >> 2. We want the connectors to have the ability to disable format type
> > >> switch format.type but allows all the other properties, e.g. format.*
> > >> without format.type(let's call it SET_B), if we use the whitelist, we
> > >> have to enumerate all the specific format keys start with format
> > >> (SET_B), but with the old connector factories, we have no idea what
> > >> specific format keys it supports(there is either a format.* or
> nothing).
> > >>
> > >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> > >> blacklist and whitelist has the same expressiveness, use blacklist
> > >> makes the code not too verbose to enumerate all the duplicate keys
> > >> with #supportedKeys .(Not very strong reason, but i think as a
> > >> connector developer, it makes sense)
> > >>
> > >> Best,
> > >> Danny Chan
> > >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> > >>> Hi everyone,
> > >>>
> > >>> @Aljoscha: I disagree with your approach because a `Catalog` can
> return
> > >>> a custom factory that is not using any properties. The hinting must
> be
> > >>> transparent to a factory. We should NOT modify the metadata
> > >>> `CatalogTable` at any point in time after the catalog.
> > >>>
> > >>> @Danny, @Jingsong: How about we stick to the original design that we
> > >>> wanted to vote on but use:
> > >>>
> > >>> Set<String> supportedHintProperties()
> > >>>
> > >>> This fits better to the old factory design. And for the new FLIP-95
> > >>> factories we will use `ConfigOption` and provide good utilities for
> > >>> merging with hints etc.
> > >>>
> > >>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> > >>> hinting in formats.
> > >>>
> > >>> What do you think?
> > >>>
> > >>> Regards,
> > >>> Timo
> > >>>
> > >>>
> > >>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > >>>> I think we're designing ourselves into ever more complicated corners
> > >>>> here. Maybe we need to take a step back and reconsider. What would
> you
> > >>>> think about this (somewhat) simpler proposal:
> > >>>>
> > >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> > this
> > >>>> in the future. This will simply overwrite all connector properties,
> > the
> > >>>> table factories don't know about hints but simply work with the
> > >>>> properties that they are given
> > >>>>
> > >>>> - this special hint is disabled by default and can be activated
> with a
> > >>>> global option "foo.bazzle.connector-hints" (or something like this)
> > >>>> which has a warning that describes that this can change query
> > semantics
> > >>>> etc.
> > >>>>
> > >>>> That's it. This makes connector implementations a lot easier while
> > >>>> still
> > >>>> allowing inline configuration.
> > >>>>
> > >>>> I still don't like using hint syntax at all for this, because I
> > >>>> strongly
> > >>>> maintain that hints should not change query syntax. In general using
> > >>>> hints should be kept to a minimum because they usually point to
> > >>>> shortcomings in the system.
> > >>>>
> > >>>> Best,
> > >>>> Aljoscha
> > >>>>
> > >>>> On 02.04.20 06:06, Jingsong Li wrote:
> > >>>>> Hi Dawid,
> > >>>>>
> > >>>>>> When a factory is instantiated it has access to the CatalogTable,
> > >>>>> therefore it has access to all the original properties. In turn it
> > >>>>> knows
> > >>>>> the original format and can call
> > FormatFactory#supportedHintOptions().
> > >>>>>
> > >>>>> Factory can only get CatalogTable when creating source or sink,
> > >>>>> right? IIUC, TableFactory may be stateless too.
> > >>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> > >>>>> get CatalogTable, so it is impossible to create FormatFactory?
> > >>>>>
> > >>>>> Best,
> > >>>>> Jingsong Lee
> > >>>>>
> > >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> > >>>>>>
> > >>>>>> 1. Regarding the motivation:
> > >>>>>>
> > >>>>>> I agree it's not a good suggested way based on the fact that we
> have
> > >>>>>> better solution, but i think we can support override that as long
> > >>>>>> as it
> > >>>>>> exists as one of the the table options. I would remove if from the
> > >>>>>> motication part.
> > >>>>>>
> > >>>>>> 2. The options passes around during sql-to-rel conversion, right
> > >>>>>> after we
> > >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this
> > is
> > >>>>>> indeed a push way method at least in the RelOptTable layer, then
> > >>>>>> we hand
> > >>>>>> over the options to TableSourceFactory with our own context, which
> > is
> > >>>>>> fine
> > >>>>>> becuase TableSourceFactory#Context is the contact to pass around
> > >>>>>> these
> > >>>>>> table-about variables.
> > >>>>>>
> > >>>>>> 3. "We should not end up with an extreme example where we can
> > >>>>>> change the
> > >>>>>> connector type", i totally agree that, and i have listed the
> > >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> > >>>>>> format, i
> > >>>>>> think the connector itself can/should control whether to override
> > the
> > >>>>>> "format.type", that is one of the reason i change the
> > >>>>>> TableSourceFactory#supportedHintOpitons to
> > >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> > >>>>>> limit the
> > >>>>>> format keys we want conveniently.
> > >>>>>>
> > >>>>>> 4. SQL Hints syntax.
> > >>>>>>
> > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> > >>>>>>> sure
> > >>>>>> if it is equivalent to string_literal).
> > >>>>>>
> > >>>>>> I disagree, we at least should keep sync with our DDL: use the
> > string
> > >>>>>> literal as the key. We did also support the simple identifier
> > because
> > >>>>>> this
> > >>>>>> is the common hint syntax from Calcite, it does not hurt anything
> > for
> > >>>>>> the
> > >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> > think
> > >>>>>> that
> > >>>>>> may cause some confuse, i can make the syntax pluggable for each
> > >>>>>> hint in
> > >>>>>> CALCITE 1.23)
> > >>>>>>
> > >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed the
> > >>>>>> title to
> > >>>>>> "Supports dynamic table options", would make it more clear in the
> > >>>>>> WIKI.
> > >>>>>>
> > >>>>>> 5. Yes, we also have this concerns from our offline discussion,
> > >>>>>> that is
> > >>>>>> one of the reason, why i change the
> > >>>>>> TableSourceFactory#supportedHintOpitons
> > >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> > >>>>>> Set<String>
> > >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> > >>>>>>
> > >>>>>>     - The wildcard is still not descriptive, we can still not
> > >>>>>> forbidden one
> > >>>>>> of the properties among the wildcard properties, we can not enable
> > or
> > >>>>>> disable them totally
> > >>>>>>     - ConfigOption is our new structure for keys, and it does not
> > >>>>>> support
> > >>>>>> wildcard yet.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Danny Chan
> > >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > >>>>>> <dw...@apache.org>,写道:
> > >>>>>>> Hi,
> > >>>>>>> Few comments from my side:
> > >>>>>>> 1. Regarding the motivation:
> > >>>>>>> I think the example with changing the update-mode is not a good
> > >>>>>>> one. In
> > >>>>>> the long term this should be done with EMIT CHANGELOG (discussed
> in
> > >>>>>> FLIP-105).
> > >>>>>>> Nitpicking: I would mention that it is rather for
> debugging/ad-hoc
> > >>>>>> solution. I think this should not be a recommended way for
> > production
> > >>>>>> use
> > >>>>>> cases as it bypasses the Catalog, which should be the source of
> > >>>>>> truth.
> > >>>>>>> 2. I could not understand how the additional options will be
> > >>>>>>> passed to
> > >>>>>> the TableSourceFactory. Could you elaborate a bit more on that? I
> > see
> > >>>>>> there
> > >>>>>> is a Context interface that gives the options. But cannot find a
> way
> > >>>>>> to get
> > >>>>>> the context itself in the factory. Moreover I think it would make
> > >>>>>> more
> > >>>>>> sense to have rather a push based approach here. Something like
> > >>>>>> applyOptions(ReadableConfig) method.
> > >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> > >>>>>>> think it
> > >>>>>> is not a big problem, but I agree this should be also described. I
> > >>>>>> disagree
> > >>>>>> with "Connector don't know format information in TableFactory
> before
> > >>>>>> obtains real properties, so it can not list any format
> > >>>>>> `supportedHintOptions`".
> > >>>>>>> When a factory is instantiated it has access to the CatalogTable,
> > >>>>>> therefore it has access to all the original properties. In turn it
> > >>>>>> knows
> > >>>>>> the original format and can call
> > >>>>>> FormatFactory#supportedHintOptions().
> > >>>>>>> The only case when this would not work would be if we allow
> > changing
> > >>>>>>> the
> > >>>>>> format of the Table (e.g. from avro to parquet), which does not
> > sound
> > >>>>>> like
> > >>>>>> a good idea to me. I think this feature should not end up as a way
> > to
> > >>>>>> declare a whole table inline in a SQL query, but should rather be
> a
> > >>>>>> simple
> > >>>>>> way for debugging queries. We should not end up with an extreme
> > >>>>>> example
> > >>>>>> where we do:
> > >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> > >>>>>> 'format.type' = 'json', ....) */
> > >>>>>>> 4. SQL Hints syntax.
> > >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> > >>>>>>> sure
> > >>>>>> if it is equivalent to string_literal). I think we should not use
> > >>>>>> simple_identifier because this implies that we cannot use e.g. any
> > >>>>>> SQL
> > >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am
> not
> > >>>>>> mistaken it is also how the options in the CREATE statement are
> > >>>>>> implemented.
> > >>>>>>> What is the purpose of the remaining hint_item:
> hint_name(hint_opt
> > [
> > >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> > >>>>>> feeling it
> > >>>>>> does also suggests to support the whole Apache Calcite hint system
> > >>>>>> without
> > >>>>>> specifying that explicitly. Is the intention of the FLIP to
> support
> > >>>>>> choosing e.g. JOIN strategies through hints already? If it is so
> it
> > >>>>>> should
> > >>>>>> be mentioned in the FLIP, imo.
> > >>>>>>> 5. I think something does not work around the
> > >>>>>>> supportedHintOptions and
> > >>>>>> wildcards. How do you want to represent a wildcard key as a
> > >>>>>> ConfigOption? I
> > >>>>>> am not sure about that, just a though, maybe it make sense to have
> > >>>>>> rather
> > >>>>>> Set<String> supportedHintOptionKeys()?
> > >>>>>>> Best,
> > >>>>>>> Dawid
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Kurt Young <yk...@gmail.com>.
Sounds like a reasonable compromise, disabling this feature by default is a
way to protect
the vulnerability, and we can simplify the design quite a lot. We can
gather some users'
feedback to see whether further protections are necessary in the future.

Best,
Kurt


On Mon, Apr 6, 2020 at 11:49 PM Timo Walther <tw...@apache.org> wrote:

> I agree with Aljoscha. The length of this thread shows that this is
> highly controversal. I think nobody really likes this feature 100% but
> we could not find a better solution. I would consider it as a
> nice-to-have improvement during a notebook/debugging session.
>
> I would accept avoiding whitelisting/blacklisting if the feature is
> disabled by default. And we make the merged properties available in a
> separate TableSourceFactory#Context#getExecutionOptions as Danny proposed.
>
> What do you think?
>
> Thanks,
> Timo
>
>
> On 06.04.20 09:59, Aljoscha Krettek wrote:
> > The reason I'm saying it should be disabled by default is that this uses
> > hint syntax, and hints should really not change query semantics.
> >
> > I'm quite strongly against hints that change query semantics, but if we
> > disable this by default I would be (reluctantly) OK with the feature.
> > Companies that create deployments or set up the SQL environment for
> > users can enable the feature if they want.
> >
> > But yes, I also agree that we don't need whitelisting/blacklisting,
> > which makes this a lot easier to do.
> >
> > Best,
> > Aljoscha
> >
> > On 06.04.20 04:27, Danny Chan wrote:
> >> Hi, everyone ~
> >>
> >> @Aljoscha @Timo
> >>
> >>> I think we're designing ourselves into ever more complicated corners
> >> here
> >>
> >> I kindly agree that, personally didn't see strong reasons why we
> >> should limit on each connector properties:
> >>
> >> • we can define any table options for CREATE TABLE, why we treat the
> >> dynamic options differently, we never consider any security problems
> >> when create table, we should not either for dynamic table options
> >> • If we do not have whitelist properties or blacklist properties, the
> >> table source creation work would be much easier, just used the merged
> >> options. There is no need to modify each connector to decide which
> >> options could be overridden and how we merge them(the merge work is
> >> redundant).
> >> • @Timo, how about we support another interface
> >> `TableSourceFactory#Context.getExecutionOptions`, we always use this
> >> interface to get the options to create our table source. There is no
> >> need to copy the catalog table itselt, we just need to generate our
> >> Context correctly.
> >> • @Aljoscha I agree to have a global config option, but I disagree to
> >> default disable it, a global default config would break the user
> >> experience too much, especially when user want to modify the options
> >> in a ad-hoc way.
> >>
> >>
> >>
> >> I suggest to remove `TableSourceFactory#supportedHintOptions` or
> >> `TableSourceFactory#forbiddenHintOptions` based on the fact that we
> >> does not have black/white list for CREATE TABLE at all at lease for
> >> current codebase.
> >>
> >>
> >> @Timo (i have replied offline but allows to represent it here again)
> >>
> >> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3
> >> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> >> 1. For key with wildcard, like connector.property.* , use a blacklist
> >> make us have the ability to disable some of the keys under that, i.e.
> >> connector.property.key1 , a whitelist can only match with prefix
> >>
> >> 2. We want the connectors to have the ability to disable format type
> >> switch format.type but allows all the other properties, e.g. format.*
> >> without format.type(let's call it SET_B), if we use the whitelist, we
> >> have to enumerate all the specific format keys start with format
> >> (SET_B), but with the old connector factories, we have no idea what
> >> specific format keys it supports(there is either a format.* or nothing).
> >>
> >> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the
> >> blacklist and whitelist has the same expressiveness, use blacklist
> >> makes the code not too verbose to enumerate all the duplicate keys
> >> with #supportedKeys .(Not very strong reason, but i think as a
> >> connector developer, it makes sense)
> >>
> >> Best,
> >> Danny Chan
> >> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> >>> Hi everyone,
> >>>
> >>> @Aljoscha: I disagree with your approach because a `Catalog` can return
> >>> a custom factory that is not using any properties. The hinting must be
> >>> transparent to a factory. We should NOT modify the metadata
> >>> `CatalogTable` at any point in time after the catalog.
> >>>
> >>> @Danny, @Jingsong: How about we stick to the original design that we
> >>> wanted to vote on but use:
> >>>
> >>> Set<String> supportedHintProperties()
> >>>
> >>> This fits better to the old factory design. And for the new FLIP-95
> >>> factories we will use `ConfigOption` and provide good utilities for
> >>> merging with hints etc.
> >>>
> >>> We can allow `"format.*"` in `supportedHintProperties()` to allow
> >>> hinting in formats.
> >>>
> >>> What do you think?
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 02.04.20 16:24, Aljoscha Krettek wrote:
> >>>> I think we're designing ourselves into ever more complicated corners
> >>>> here. Maybe we need to take a step back and reconsider. What would you
> >>>> think about this (somewhat) simpler proposal:
> >>>>
> >>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> >>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for
> this
> >>>> in the future. This will simply overwrite all connector properties,
> the
> >>>> table factories don't know about hints but simply work with the
> >>>> properties that they are given
> >>>>
> >>>> - this special hint is disabled by default and can be activated with a
> >>>> global option "foo.bazzle.connector-hints" (or something like this)
> >>>> which has a warning that describes that this can change query
> semantics
> >>>> etc.
> >>>>
> >>>> That's it. This makes connector implementations a lot easier while
> >>>> still
> >>>> allowing inline configuration.
> >>>>
> >>>> I still don't like using hint syntax at all for this, because I
> >>>> strongly
> >>>> maintain that hints should not change query syntax. In general using
> >>>> hints should be kept to a minimum because they usually point to
> >>>> shortcomings in the system.
> >>>>
> >>>> Best,
> >>>> Aljoscha
> >>>>
> >>>> On 02.04.20 06:06, Jingsong Li wrote:
> >>>>> Hi Dawid,
> >>>>>
> >>>>>> When a factory is instantiated it has access to the CatalogTable,
> >>>>> therefore it has access to all the original properties. In turn it
> >>>>> knows
> >>>>> the original format and can call
> FormatFactory#supportedHintOptions().
> >>>>>
> >>>>> Factory can only get CatalogTable when creating source or sink,
> >>>>> right? IIUC, TableFactory may be stateless too.
> >>>>> When invoking SourceFactory#supportedHintOptions(), it can not
> >>>>> get CatalogTable, so it is impossible to create FormatFactory?
> >>>>>
> >>>>> Best,
> >>>>> Jingsong Lee
> >>>>>
> >>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
> >>>>>>
> >>>>>> 1. Regarding the motivation:
> >>>>>>
> >>>>>> I agree it's not a good suggested way based on the fact that we have
> >>>>>> better solution, but i think we can support override that as long
> >>>>>> as it
> >>>>>> exists as one of the the table options. I would remove if from the
> >>>>>> motication part.
> >>>>>>
> >>>>>> 2. The options passes around during sql-to-rel conversion, right
> >>>>>> after we
> >>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this
> is
> >>>>>> indeed a push way method at least in the RelOptTable layer, then
> >>>>>> we hand
> >>>>>> over the options to TableSourceFactory with our own context, which
> is
> >>>>>> fine
> >>>>>> becuase TableSourceFactory#Context is the contact to pass around
> >>>>>> these
> >>>>>> table-about variables.
> >>>>>>
> >>>>>> 3. "We should not end up with an extreme example where we can
> >>>>>> change the
> >>>>>> connector type", i totally agree that, and i have listed the
> >>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
> >>>>>> format, i
> >>>>>> think the connector itself can/should control whether to override
> the
> >>>>>> "format.type", that is one of the reason i change the
> >>>>>> TableSourceFactory#supportedHintOpitons to
> >>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> >>>>>> limit the
> >>>>>> format keys we want conveniently.
> >>>>>>
> >>>>>> 4. SQL Hints syntax.
> >>>>>>
> >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> >>>>>>> sure
> >>>>>> if it is equivalent to string_literal).
> >>>>>>
> >>>>>> I disagree, we at least should keep sync with our DDL: use the
> string
> >>>>>> literal as the key. We did also support the simple identifier
> because
> >>>>>> this
> >>>>>> is the common hint syntax from Calcite, it does not hurt anything
> for
> >>>>>> the
> >>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you
> think
> >>>>>> that
> >>>>>> may cause some confuse, i can make the syntax pluggable for each
> >>>>>> hint in
> >>>>>> CALCITE 1.23)
> >>>>>>
> >>>>>> We only supports OPTIONS hint in the FLIP, and i have changed the
> >>>>>> title to
> >>>>>> "Supports dynamic table options", would make it more clear in the
> >>>>>> WIKI.
> >>>>>>
> >>>>>> 5. Yes, we also have this concerns from our offline discussion,
> >>>>>> that is
> >>>>>> one of the reason, why i change the
> >>>>>> TableSourceFactory#supportedHintOpitons
> >>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose
> >>>>>> Set<String>
> >>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
> >>>>>>
> >>>>>>     - The wildcard is still not descriptive, we can still not
> >>>>>> forbidden one
> >>>>>> of the properties among the wildcard properties, we can not enable
> or
> >>>>>> disable them totally
> >>>>>>     - ConfigOption is our new structure for keys, and it does not
> >>>>>> support
> >>>>>> wildcard yet.
> >>>>>>
> >>>>>> Best,
> >>>>>> Danny Chan
> >>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> >>>>>> <dw...@apache.org>,写道:
> >>>>>>> Hi,
> >>>>>>> Few comments from my side:
> >>>>>>> 1. Regarding the motivation:
> >>>>>>> I think the example with changing the update-mode is not a good
> >>>>>>> one. In
> >>>>>> the long term this should be done with EMIT CHANGELOG (discussed in
> >>>>>> FLIP-105).
> >>>>>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
> >>>>>> solution. I think this should not be a recommended way for
> production
> >>>>>> use
> >>>>>> cases as it bypasses the Catalog, which should be the source of
> >>>>>> truth.
> >>>>>>> 2. I could not understand how the additional options will be
> >>>>>>> passed to
> >>>>>> the TableSourceFactory. Could you elaborate a bit more on that? I
> see
> >>>>>> there
> >>>>>> is a Context interface that gives the options. But cannot find a way
> >>>>>> to get
> >>>>>> the context itself in the factory. Moreover I think it would make
> >>>>>> more
> >>>>>> sense to have rather a push based approach here. Something like
> >>>>>> applyOptions(ReadableConfig) method.
> >>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I
> >>>>>>> think it
> >>>>>> is not a big problem, but I agree this should be also described. I
> >>>>>> disagree
> >>>>>> with "Connector don't know format information in TableFactory before
> >>>>>> obtains real properties, so it can not list any format
> >>>>>> `supportedHintOptions`".
> >>>>>>> When a factory is instantiated it has access to the CatalogTable,
> >>>>>> therefore it has access to all the original properties. In turn it
> >>>>>> knows
> >>>>>> the original format and can call
> >>>>>> FormatFactory#supportedHintOptions().
> >>>>>>> The only case when this would not work would be if we allow
> changing
> >>>>>>> the
> >>>>>> format of the Table (e.g. from avro to parquet), which does not
> sound
> >>>>>> like
> >>>>>> a good idea to me. I think this feature should not end up as a way
> to
> >>>>>> declare a whole table inline in a SQL query, but should rather be a
> >>>>>> simple
> >>>>>> way for debugging queries. We should not end up with an extreme
> >>>>>> example
> >>>>>> where we do:
> >>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> >>>>>> 'format.type' = 'json', ....) */
> >>>>>>> 4. SQL Hints syntax.
> >>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not
> >>>>>>> sure
> >>>>>> if it is equivalent to string_literal). I think we should not use
> >>>>>> simple_identifier because this implies that we cannot use e.g. any
> >>>>>> SQL
> >>>>>> keywords. Anyway it has nothing to do with identifiers. If I am not
> >>>>>> mistaken it is also how the options in the CREATE statement are
> >>>>>> implemented.
> >>>>>>> What is the purpose of the remaining hint_item: hint_name(hint_opt
> [
> >>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> >>>>>> feeling it
> >>>>>> does also suggests to support the whole Apache Calcite hint system
> >>>>>> without
> >>>>>> specifying that explicitly. Is the intention of the FLIP to support
> >>>>>> choosing e.g. JOIN strategies through hints already? If it is so it
> >>>>>> should
> >>>>>> be mentioned in the FLIP, imo.
> >>>>>>> 5. I think something does not work around the
> >>>>>>> supportedHintOptions and
> >>>>>> wildcards. How do you want to represent a wildcard key as a
> >>>>>> ConfigOption? I
> >>>>>> am not sure about that, just a though, maybe it make sense to have
> >>>>>> rather
> >>>>>> Set<String> supportedHintOptionKeys()?
> >>>>>>> Best,
> >>>>>>> Dawid
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
>
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Timo Walther <tw...@apache.org>.
I agree with Aljoscha. The length of this thread shows that this is 
highly controversal. I think nobody really likes this feature 100% but 
we could not find a better solution. I would consider it as a 
nice-to-have improvement during a notebook/debugging session.

I would accept avoiding whitelisting/blacklisting if the feature is 
disabled by default. And we make the merged properties available in a 
separate TableSourceFactory#Context#getExecutionOptions as Danny proposed.

What do you think?

Thanks,
Timo


On 06.04.20 09:59, Aljoscha Krettek wrote:
> The reason I'm saying it should be disabled by default is that this uses 
> hint syntax, and hints should really not change query semantics.
> 
> I'm quite strongly against hints that change query semantics, but if we 
> disable this by default I would be (reluctantly) OK with the feature. 
> Companies that create deployments or set up the SQL environment for 
> users can enable the feature if they want.
> 
> But yes, I also agree that we don't need whitelisting/blacklisting, 
> which makes this a lot easier to do.
> 
> Best,
> Aljoscha
> 
> On 06.04.20 04:27, Danny Chan wrote:
>> Hi, everyone ~
>>
>> @Aljoscha @Timo
>>
>>> I think we're designing ourselves into ever more complicated corners
>> here
>>
>> I kindly agree that, personally didn't see strong reasons why we 
>> should limit on each connector properties:
>>
>> • we can define any table options for CREATE TABLE, why we treat the 
>> dynamic options differently, we never consider any security problems 
>> when create table, we should not either for dynamic table options
>> • If we do not have whitelist properties or blacklist properties, the 
>> table source creation work would be much easier, just used the merged 
>> options. There is no need to modify each connector to decide which 
>> options could be overridden and how we merge them(the merge work is 
>> redundant).
>> • @Timo, how about we support another interface 
>> `TableSourceFactory#Context.getExecutionOptions`, we always use this 
>> interface to get the options to create our table source. There is no 
>> need to copy the catalog table itselt, we just need to generate our 
>> Context correctly.
>> • @Aljoscha I agree to have a global config option, but I disagree to 
>> default disable it, a global default config would break the user 
>> experience too much, especially when user want to modify the options 
>> in a ad-hoc way.
>>
>>
>>
>> I suggest to remove `TableSourceFactory#supportedHintOptions` or 
>> `TableSourceFactory#forbiddenHintOptions` based on the fact that we 
>> does not have black/white list for CREATE TABLE at all at lease for 
>> current codebase.
>>
>>
>> @Timo (i have replied offline but allows to represent it here again)
>>
>> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 
>> reasons compared to `TableSourceFactory#forbiddenHintOptions`:
>> 1. For key with wildcard, like connector.property.* , use a blacklist 
>> make us have the ability to disable some of the keys under that, i.e. 
>> connector.property.key1 , a whitelist can only match with prefix
>>
>> 2. We want the connectors to have the ability to disable format type 
>> switch format.type but allows all the other properties, e.g. format.* 
>> without format.type(let's call it SET_B), if we use the whitelist, we 
>> have to enumerate all the specific format keys start with format 
>> (SET_B), but with the old connector factories, we have no idea what 
>> specific format keys it supports(there is either a format.* or nothing).
>>
>> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the 
>> blacklist and whitelist has the same expressiveness, use blacklist 
>> makes the code not too verbose to enumerate all the duplicate keys 
>> with #supportedKeys .(Not very strong reason, but i think as a 
>> connector developer, it makes sense)
>>
>> Best,
>> Danny Chan
>> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
>>> Hi everyone,
>>>
>>> @Aljoscha: I disagree with your approach because a `Catalog` can return
>>> a custom factory that is not using any properties. The hinting must be
>>> transparent to a factory. We should NOT modify the metadata
>>> `CatalogTable` at any point in time after the catalog.
>>>
>>> @Danny, @Jingsong: How about we stick to the original design that we
>>> wanted to vote on but use:
>>>
>>> Set<String> supportedHintProperties()
>>>
>>> This fits better to the old factory design. And for the new FLIP-95
>>> factories we will use `ConfigOption` and provide good utilities for
>>> merging with hints etc.
>>>
>>> We can allow `"format.*"` in `supportedHintProperties()` to allow
>>> hinting in formats.
>>>
>>> What do you think?
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 02.04.20 16:24, Aljoscha Krettek wrote:
>>>> I think we're designing ourselves into ever more complicated corners
>>>> here. Maybe we need to take a step back and reconsider. What would you
>>>> think about this (somewhat) simpler proposal:
>>>>
>>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
>>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for this
>>>> in the future. This will simply overwrite all connector properties, the
>>>> table factories don't know about hints but simply work with the
>>>> properties that they are given
>>>>
>>>> - this special hint is disabled by default and can be activated with a
>>>> global option "foo.bazzle.connector-hints" (or something like this)
>>>> which has a warning that describes that this can change query semantics
>>>> etc.
>>>>
>>>> That's it. This makes connector implementations a lot easier while 
>>>> still
>>>> allowing inline configuration.
>>>>
>>>> I still don't like using hint syntax at all for this, because I 
>>>> strongly
>>>> maintain that hints should not change query syntax. In general using
>>>> hints should be kept to a minimum because they usually point to
>>>> shortcomings in the system.
>>>>
>>>> Best,
>>>> Aljoscha
>>>>
>>>> On 02.04.20 06:06, Jingsong Li wrote:
>>>>> Hi Dawid,
>>>>>
>>>>>> When a factory is instantiated it has access to the CatalogTable,
>>>>> therefore it has access to all the original properties. In turn it 
>>>>> knows
>>>>> the original format and can call FormatFactory#supportedHintOptions().
>>>>>
>>>>> Factory can only get CatalogTable when creating source or sink,
>>>>> right? IIUC, TableFactory may be stateless too.
>>>>> When invoking SourceFactory#supportedHintOptions(), it can not
>>>>> get CatalogTable, so it is impossible to create FormatFactory?
>>>>>
>>>>> Best,
>>>>> Jingsong Lee
>>>>>
>>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
>>>>>>
>>>>>> 1. Regarding the motivation:
>>>>>>
>>>>>> I agree it's not a good suggested way based on the fact that we have
>>>>>> better solution, but i think we can support override that as long 
>>>>>> as it
>>>>>> exists as one of the the table options. I would remove if from the
>>>>>> motication part.
>>>>>>
>>>>>> 2. The options passes around during sql-to-rel conversion, right
>>>>>> after we
>>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
>>>>>> indeed a push way method at least in the RelOptTable layer, then 
>>>>>> we hand
>>>>>> over the options to TableSourceFactory with our own context, which is
>>>>>> fine
>>>>>> becuase TableSourceFactory#Context is the contact to pass around 
>>>>>> these
>>>>>> table-about variables.
>>>>>>
>>>>>> 3. "We should not end up with an extreme example where we can 
>>>>>> change the
>>>>>> connector type", i totally agree that, and i have listed the
>>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
>>>>>> format, i
>>>>>> think the connector itself can/should control whether to override the
>>>>>> "format.type", that is one of the reason i change the
>>>>>> TableSourceFactory#supportedHintOpitons to
>>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
>>>>>> limit the
>>>>>> format keys we want conveniently.
>>>>>>
>>>>>> 4. SQL Hints syntax.
>>>>>>
>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not 
>>>>>>> sure
>>>>>> if it is equivalent to string_literal).
>>>>>>
>>>>>> I disagree, we at least should keep sync with our DDL: use the string
>>>>>> literal as the key. We did also support the simple identifier because
>>>>>> this
>>>>>> is the common hint syntax from Calcite, it does not hurt anything for
>>>>>> the
>>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you think
>>>>>> that
>>>>>> may cause some confuse, i can make the syntax pluggable for each 
>>>>>> hint in
>>>>>> CALCITE 1.23)
>>>>>>
>>>>>> We only supports OPTIONS hint in the FLIP, and i have changed the
>>>>>> title to
>>>>>> "Supports dynamic table options", would make it more clear in the 
>>>>>> WIKI.
>>>>>>
>>>>>> 5. Yes, we also have this concerns from our offline discussion, 
>>>>>> that is
>>>>>> one of the reason, why i change the
>>>>>> TableSourceFactory#supportedHintOpitons
>>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose 
>>>>>> Set<String>
>>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
>>>>>>
>>>>>>     - The wildcard is still not descriptive, we can still not
>>>>>> forbidden one
>>>>>> of the properties among the wildcard properties, we can not enable or
>>>>>> disable them totally
>>>>>>     - ConfigOption is our new structure for keys, and it does not 
>>>>>> support
>>>>>> wildcard yet.
>>>>>>
>>>>>> Best,
>>>>>> Danny Chan
>>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
>>>>>> <dw...@apache.org>,写道:
>>>>>>> Hi,
>>>>>>> Few comments from my side:
>>>>>>> 1. Regarding the motivation:
>>>>>>> I think the example with changing the update-mode is not a good 
>>>>>>> one. In
>>>>>> the long term this should be done with EMIT CHANGELOG (discussed in
>>>>>> FLIP-105).
>>>>>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
>>>>>> solution. I think this should not be a recommended way for production
>>>>>> use
>>>>>> cases as it bypasses the Catalog, which should be the source of 
>>>>>> truth.
>>>>>>> 2. I could not understand how the additional options will be 
>>>>>>> passed to
>>>>>> the TableSourceFactory. Could you elaborate a bit more on that? I see
>>>>>> there
>>>>>> is a Context interface that gives the options. But cannot find a way
>>>>>> to get
>>>>>> the context itself in the factory. Moreover I think it would make 
>>>>>> more
>>>>>> sense to have rather a push based approach here. Something like
>>>>>> applyOptions(ReadableConfig) method.
>>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I 
>>>>>>> think it
>>>>>> is not a big problem, but I agree this should be also described. I
>>>>>> disagree
>>>>>> with "Connector don't know format information in TableFactory before
>>>>>> obtains real properties, so it can not list any format
>>>>>> `supportedHintOptions`".
>>>>>>> When a factory is instantiated it has access to the CatalogTable,
>>>>>> therefore it has access to all the original properties. In turn it 
>>>>>> knows
>>>>>> the original format and can call 
>>>>>> FormatFactory#supportedHintOptions().
>>>>>>> The only case when this would not work would be if we allow changing
>>>>>>> the
>>>>>> format of the Table (e.g. from avro to parquet), which does not sound
>>>>>> like
>>>>>> a good idea to me. I think this feature should not end up as a way to
>>>>>> declare a whole table inline in a SQL query, but should rather be a
>>>>>> simple
>>>>>> way for debugging queries. We should not end up with an extreme 
>>>>>> example
>>>>>> where we do:
>>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
>>>>>> 'format.type' = 'json', ....) */
>>>>>>> 4. SQL Hints syntax.
>>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not 
>>>>>>> sure
>>>>>> if it is equivalent to string_literal). I think we should not use
>>>>>> simple_identifier because this implies that we cannot use e.g. any 
>>>>>> SQL
>>>>>> keywords. Anyway it has nothing to do with identifiers. If I am not
>>>>>> mistaken it is also how the options in the CREATE statement are
>>>>>> implemented.
>>>>>>> What is the purpose of the remaining hint_item: hint_name(hint_opt [
>>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
>>>>>> feeling it
>>>>>> does also suggests to support the whole Apache Calcite hint system
>>>>>> without
>>>>>> specifying that explicitly. Is the intention of the FLIP to support
>>>>>> choosing e.g. JOIN strategies through hints already? If it is so it
>>>>>> should
>>>>>> be mentioned in the FLIP, imo.
>>>>>>> 5. I think something does not work around the 
>>>>>>> supportedHintOptions and
>>>>>> wildcards. How do you want to represent a wildcard key as a
>>>>>> ConfigOption? I
>>>>>> am not sure about that, just a though, maybe it make sense to have
>>>>>> rather
>>>>>> Set<String> supportedHintOptionKeys()?
>>>>>>> Best,
>>>>>>> Dawid
>>>>>>
>>>>>
>>>>>
>>>
>>


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Aljoscha Krettek <al...@apache.org>.
The reason I'm saying it should be disabled by default is that this uses 
hint syntax, and hints should really not change query semantics.

I'm quite strongly against hints that change query semantics, but if we 
disable this by default I would be (reluctantly) OK with the feature. 
Companies that create deployments or set up the SQL environment for 
users can enable the feature if they want.

But yes, I also agree that we don't need whitelisting/blacklisting, 
which makes this a lot easier to do.

Best,
Aljoscha

On 06.04.20 04:27, Danny Chan wrote:
> Hi, everyone ~
> 
> @Aljoscha @Timo
> 
>> I think we're designing ourselves into ever more complicated corners
> here
> 
> I kindly agree that, personally didn't see strong reasons why we should limit on each connector properties:
> 
> • we can define any table options for CREATE TABLE, why we treat the dynamic options differently, we never consider any security problems when create table, we should not either for dynamic table options
> • If we do not have whitelist properties or blacklist properties, the table source creation work would be much easier, just used the merged options. There is no need to modify each connector to decide which options could be overridden and how we merge them(the merge work is redundant).
> • @Timo, how about we support another interface `TableSourceFactory#Context.getExecutionOptions`, we always use this interface to get the options to create our table source. There is no need to copy the catalog table itselt, we just need to generate our Context correctly.
> • @Aljoscha I agree to have a global config option, but I disagree to default disable it, a global default config would break the user experience too much, especially when user want to modify the options in a ad-hoc way.
> 
> 
> 
> I suggest to remove `TableSourceFactory#supportedHintOptions` or `TableSourceFactory#forbiddenHintOptions` based on the fact that we does not have black/white list for CREATE TABLE at all at lease for current codebase.
> 
> 
> @Timo (i have replied offline but allows to represent it here again)
> 
> The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 reasons compared to `TableSourceFactory#forbiddenHintOptions`:
> 1. For key with wildcard, like connector.property.* , use a blacklist make us have the ability to disable some of the keys under that, i.e. connector.property.key1 , a whitelist can only match with prefix
> 
> 2. We want the connectors to have the ability to disable format type switch format.type but allows all the other properties, e.g. format.* without format.type(let's call it SET_B), if we use the whitelist, we have to enumerate all the specific format keys start with format (SET_B), but with the old connector factories, we have no idea what specific format keys it supports(there is either a format.* or nothing).
> 
> 3. Except the cases for 1 and 2, for normal keys(no wildcard), the blacklist and whitelist has the same expressiveness, use blacklist makes the code not too verbose to enumerate all the duplicate keys with #supportedKeys .(Not very strong reason, but i think as a connector developer, it makes sense)
> 
> Best,
> Danny Chan
> 在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
>> Hi everyone,
>>
>> @Aljoscha: I disagree with your approach because a `Catalog` can return
>> a custom factory that is not using any properties. The hinting must be
>> transparent to a factory. We should NOT modify the metadata
>> `CatalogTable` at any point in time after the catalog.
>>
>> @Danny, @Jingsong: How about we stick to the original design that we
>> wanted to vote on but use:
>>
>> Set<String> supportedHintProperties()
>>
>> This fits better to the old factory design. And for the new FLIP-95
>> factories we will use `ConfigOption` and provide good utilities for
>> merging with hints etc.
>>
>> We can allow `"format.*"` in `supportedHintProperties()` to allow
>> hinting in formats.
>>
>> What do you think?
>>
>> Regards,
>> Timo
>>
>>
>> On 02.04.20 16:24, Aljoscha Krettek wrote:
>>> I think we're designing ourselves into ever more complicated corners
>>> here. Maybe we need to take a step back and reconsider. What would you
>>> think about this (somewhat) simpler proposal:
>>>
>>> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
>>> CONNECTOR_PROPERTIES, depending on what naming we want to have for this
>>> in the future. This will simply overwrite all connector properties, the
>>> table factories don't know about hints but simply work with the
>>> properties that they are given
>>>
>>> - this special hint is disabled by default and can be activated with a
>>> global option "foo.bazzle.connector-hints" (or something like this)
>>> which has a warning that describes that this can change query semantics
>>> etc.
>>>
>>> That's it. This makes connector implementations a lot easier while still
>>> allowing inline configuration.
>>>
>>> I still don't like using hint syntax at all for this, because I strongly
>>> maintain that hints should not change query syntax. In general using
>>> hints should be kept to a minimum because they usually point to
>>> shortcomings in the system.
>>>
>>> Best,
>>> Aljoscha
>>>
>>> On 02.04.20 06:06, Jingsong Li wrote:
>>>> Hi Dawid,
>>>>
>>>>> When a factory is instantiated it has access to the CatalogTable,
>>>> therefore it has access to all the original properties. In turn it knows
>>>> the original format and can call FormatFactory#supportedHintOptions().
>>>>
>>>> Factory can only get CatalogTable when creating source or sink,
>>>> right? IIUC, TableFactory may be stateless too.
>>>> When invoking SourceFactory#supportedHintOptions(), it can not
>>>> get CatalogTable, so it is impossible to create FormatFactory?
>>>>
>>>> Best,
>>>> Jingsong Lee
>>>>
>>>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> wrote:
>>>>
>>>>> Hi, Dawid, thanks so much for the detail suggestions ~
>>>>>
>>>>> 1. Regarding the motivation:
>>>>>
>>>>> I agree it's not a good suggested way based on the fact that we have
>>>>> better solution, but i think we can support override that as long as it
>>>>> exists as one of the the table options. I would remove if from the
>>>>> motication part.
>>>>>
>>>>> 2. The options passes around during sql-to-rel conversion, right
>>>>> after we
>>>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
>>>>> indeed a push way method at least in the RelOptTable layer, then we hand
>>>>> over the options to TableSourceFactory with our own context, which is
>>>>> fine
>>>>> becuase TableSourceFactory#Context is the contact to pass around these
>>>>> table-about variables.
>>>>>
>>>>> 3. "We should not end up with an extreme example where we can change the
>>>>> connector type", i totally agree that, and i have listed the
>>>>> "connector.type" as forbidden attribute in the WIKI. As for the
>>>>> format, i
>>>>> think the connector itself can/should control whether to override the
>>>>> "format.type", that is one of the reason i change the
>>>>> TableSourceFactory#supportedHintOpitons to
>>>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
>>>>> limit the
>>>>> format keys we want conveniently.
>>>>>
>>>>> 4. SQL Hints syntax.
>>>>>
>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>>>>> if it is equivalent to string_literal).
>>>>>
>>>>> I disagree, we at least should keep sync with our DDL: use the string
>>>>> literal as the key. We did also support the simple identifier because
>>>>> this
>>>>> is the common hint syntax from Calcite, it does not hurt anything for
>>>>> the
>>>>> OPTIONS hint, the unsupported keys would validate fails.(If you think
>>>>> that
>>>>> may cause some confuse, i can make the syntax pluggable for each hint in
>>>>> CALCITE 1.23)
>>>>>
>>>>> We only supports OPTIONS hint in the FLIP, and i have changed the
>>>>> title to
>>>>> "Supports dynamic table options", would make it more clear in the WIKI.
>>>>>
>>>>> 5. Yes, we also have this concerns from our offline discussion, that is
>>>>> one of the reason, why i change the
>>>>> TableSourceFactory#supportedHintOpitons
>>>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
>>>>> supportedHintOptionKeys() with wildcard for 2 reasons:
>>>>>
>>>>>     - The wildcard is still not descriptive, we can still not
>>>>> forbidden one
>>>>> of the properties among the wildcard properties, we can not enable or
>>>>> disable them totally
>>>>>     - ConfigOption is our new structure for keys, and it does not support
>>>>> wildcard yet.
>>>>>
>>>>> Best,
>>>>> Danny Chan
>>>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
>>>>> <dw...@apache.org>,写道:
>>>>>> Hi,
>>>>>> Few comments from my side:
>>>>>> 1. Regarding the motivation:
>>>>>> I think the example with changing the update-mode is not a good one. In
>>>>> the long term this should be done with EMIT CHANGELOG (discussed in
>>>>> FLIP-105).
>>>>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
>>>>> solution. I think this should not be a recommended way for production
>>>>> use
>>>>> cases as it bypasses the Catalog, which should be the source of truth.
>>>>>> 2. I could not understand how the additional options will be passed to
>>>>> the TableSourceFactory. Could you elaborate a bit more on that? I see
>>>>> there
>>>>> is a Context interface that gives the options. But cannot find a way
>>>>> to get
>>>>> the context itself in the factory. Moreover I think it would make more
>>>>> sense to have rather a push based approach here. Something like
>>>>> applyOptions(ReadableConfig) method.
>>>>>> 3. As for the concerns Jingsong raised in the voting thread. I think it
>>>>> is not a big problem, but I agree this should be also described. I
>>>>> disagree
>>>>> with "Connector don't know format information in TableFactory before
>>>>> obtains real properties, so it can not list any format
>>>>> `supportedHintOptions`".
>>>>>> When a factory is instantiated it has access to the CatalogTable,
>>>>> therefore it has access to all the original properties. In turn it knows
>>>>> the original format and can call FormatFactory#supportedHintOptions().
>>>>>> The only case when this would not work would be if we allow changing
>>>>>> the
>>>>> format of the Table (e.g. from avro to parquet), which does not sound
>>>>> like
>>>>> a good idea to me. I think this feature should not end up as a way to
>>>>> declare a whole table inline in a SQL query, but should rather be a
>>>>> simple
>>>>> way for debugging queries. We should not end up with an extreme example
>>>>> where we do:
>>>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
>>>>> 'format.type' = 'json', ....) */
>>>>>> 4. SQL Hints syntax.
>>>>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>>>>> if it is equivalent to string_literal). I think we should not use
>>>>> simple_identifier because this implies that we cannot use e.g. any SQL
>>>>> keywords. Anyway it has nothing to do with identifiers. If I am not
>>>>> mistaken it is also how the options in the CREATE statement are
>>>>> implemented.
>>>>>> What is the purpose of the remaining hint_item: hint_name(hint_opt [
>>>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
>>>>> feeling it
>>>>> does also suggests to support the whole Apache Calcite hint system
>>>>> without
>>>>> specifying that explicitly. Is the intention of the FLIP to support
>>>>> choosing e.g. JOIN strategies through hints already? If it is so it
>>>>> should
>>>>> be mentioned in the FLIP, imo.
>>>>>> 5. I think something does not work around the supportedHintOptions and
>>>>> wildcards. How do you want to represent a wildcard key as a
>>>>> ConfigOption? I
>>>>> am not sure about that, just a though, maybe it make sense to have
>>>>> rather
>>>>> Set<String> supportedHintOptionKeys()?
>>>>>> Best,
>>>>>> Dawid
>>>>>
>>>>
>>>>
>>
> 


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Danny Chan <yu...@gmail.com>.
Hi, everyone ~

@Aljoscha @Timo

> I think we're designing ourselves into ever more complicated corners
here

I kindly agree that, personally didn't see strong reasons why we should limit on each connector properties:

• we can define any table options for CREATE TABLE, why we treat the dynamic options differently, we never consider any security problems when create table, we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties, the table source creation work would be much easier, just used the merged options. There is no need to modify each connector to decide which options could be overridden and how we merge them(the merge work is redundant).
• @Timo, how about we support another interface `TableSourceFactory#Context.getExecutionOptions`, we always use this interface to get the options to create our table source. There is no need to copy the catalog table itselt, we just need to generate our Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree to default disable it, a global default config would break the user experience too much, especially when user want to modify the options in a ad-hoc way.



I suggest to remove `TableSourceFactory#supportedHintOptions` or `TableSourceFactory#forbiddenHintOptions` based on the fact that we does not have black/white list for CREATE TABLE at all at lease for current codebase.


@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 reasons compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a blacklist make us have the ability to disable some of the keys under that, i.e. connector.property.key1 , a whitelist can only match with prefix

2. We want the connectors to have the ability to disable format type switch format.type but allows all the other properties, e.g. format.* without format.type(let's call it SET_B), if we use the whitelist, we have to enumerate all the specific format keys start with format (SET_B), but with the old connector factories, we have no idea what specific format keys it supports(there is either a format.* or nothing).

3. Except the cases for 1 and 2, for normal keys(no wildcard), the blacklist and whitelist has the same expressiveness, use blacklist makes the code not too verbose to enumerate all the duplicate keys with #supportedKeys .(Not very strong reason, but i think as a connector developer, it makes sense)

Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther <tw...@apache.org>,写道:
> Hi everyone,
>
> @Aljoscha: I disagree with your approach because a `Catalog` can return
> a custom factory that is not using any properties. The hinting must be
> transparent to a factory. We should NOT modify the metadata
> `CatalogTable` at any point in time after the catalog.
>
> @Danny, @Jingsong: How about we stick to the original design that we
> wanted to vote on but use:
>
> Set<String> supportedHintProperties()
>
> This fits better to the old factory design. And for the new FLIP-95
> factories we will use `ConfigOption` and provide good utilities for
> merging with hints etc.
>
> We can allow `"format.*"` in `supportedHintProperties()` to allow
> hinting in formats.
>
> What do you think?
>
> Regards,
> Timo
>
>
> On 02.04.20 16:24, Aljoscha Krettek wrote:
> > I think we're designing ourselves into ever more complicated corners
> > here. Maybe we need to take a step back and reconsider. What would you
> > think about this (somewhat) simpler proposal:
> >
> > - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
> > CONNECTOR_PROPERTIES, depending on what naming we want to have for this
> > in the future. This will simply overwrite all connector properties, the
> > table factories don't know about hints but simply work with the
> > properties that they are given
> >
> > - this special hint is disabled by default and can be activated with a
> > global option "foo.bazzle.connector-hints" (or something like this)
> > which has a warning that describes that this can change query semantics
> > etc.
> >
> > That's it. This makes connector implementations a lot easier while still
> > allowing inline configuration.
> >
> > I still don't like using hint syntax at all for this, because I strongly
> > maintain that hints should not change query syntax. In general using
> > hints should be kept to a minimum because they usually point to
> > shortcomings in the system.
> >
> > Best,
> > Aljoscha
> >
> > On 02.04.20 06:06, Jingsong Li wrote:
> > > Hi Dawid,
> > >
> > > > When a factory is instantiated it has access to the CatalogTable,
> > > therefore it has access to all the original properties. In turn it knows
> > > the original format and can call FormatFactory#supportedHintOptions().
> > >
> > > Factory can only get CatalogTable when creating source or sink,
> > > right? IIUC, TableFactory may be stateless too.
> > > When invoking SourceFactory#supportedHintOptions(), it can not
> > > get CatalogTable, so it is impossible to create FormatFactory?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> wrote:
> > >
> > > > Hi, Dawid, thanks so much for the detail suggestions ~
> > > >
> > > > 1. Regarding the motivation:
> > > >
> > > > I agree it's not a good suggested way based on the fact that we have
> > > > better solution, but i think we can support override that as long as it
> > > > exists as one of the the table options. I would remove if from the
> > > > motication part.
> > > >
> > > > 2. The options passes around during sql-to-rel conversion, right
> > > > after we
> > > > generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
> > > > indeed a push way method at least in the RelOptTable layer, then we hand
> > > > over the options to TableSourceFactory with our own context, which is
> > > > fine
> > > > becuase TableSourceFactory#Context is the contact to pass around these
> > > > table-about variables.
> > > >
> > > > 3. "We should not end up with an extreme example where we can change the
> > > > connector type", i totally agree that, and i have listed the
> > > > "connector.type" as forbidden attribute in the WIKI. As for the
> > > > format, i
> > > > think the connector itself can/should control whether to override the
> > > > "format.type", that is one of the reason i change the
> > > > TableSourceFactory#supportedHintOpitons to
> > > > TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can
> > > > limit the
> > > > format keys we want conveniently.
> > > >
> > > > 4. SQL Hints syntax.
> > > >
> > > > > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> > > > if it is equivalent to string_literal).
> > > >
> > > > I disagree, we at least should keep sync with our DDL: use the string
> > > > literal as the key. We did also support the simple identifier because
> > > > this
> > > > is the common hint syntax from Calcite, it does not hurt anything for
> > > > the
> > > > OPTIONS hint, the unsupported keys would validate fails.(If you think
> > > > that
> > > > may cause some confuse, i can make the syntax pluggable for each hint in
> > > > CALCITE 1.23)
> > > >
> > > > We only supports OPTIONS hint in the FLIP, and i have changed the
> > > > title to
> > > > "Supports dynamic table options", would make it more clear in the WIKI.
> > > >
> > > > 5. Yes, we also have this concerns from our offline discussion, that is
> > > > one of the reason, why i change the
> > > > TableSourceFactory#supportedHintOpitons
> > > > to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
> > > > supportedHintOptionKeys() with wildcard for 2 reasons:
> > > >
> > > >    - The wildcard is still not descriptive, we can still not
> > > > forbidden one
> > > > of the properties among the wildcard properties, we can not enable or
> > > > disable them totally
> > > >    - ConfigOption is our new structure for keys, and it does not support
> > > > wildcard yet.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz
> > > > <dw...@apache.org>,写道:
> > > > > Hi,
> > > > > Few comments from my side:
> > > > > 1. Regarding the motivation:
> > > > > I think the example with changing the update-mode is not a good one. In
> > > > the long term this should be done with EMIT CHANGELOG (discussed in
> > > > FLIP-105).
> > > > > Nitpicking: I would mention that it is rather for debugging/ad-hoc
> > > > solution. I think this should not be a recommended way for production
> > > > use
> > > > cases as it bypasses the Catalog, which should be the source of truth.
> > > > > 2. I could not understand how the additional options will be passed to
> > > > the TableSourceFactory. Could you elaborate a bit more on that? I see
> > > > there
> > > > is a Context interface that gives the options. But cannot find a way
> > > > to get
> > > > the context itself in the factory. Moreover I think it would make more
> > > > sense to have rather a push based approach here. Something like
> > > > applyOptions(ReadableConfig) method.
> > > > > 3. As for the concerns Jingsong raised in the voting thread. I think it
> > > > is not a big problem, but I agree this should be also described. I
> > > > disagree
> > > > with "Connector don't know format information in TableFactory before
> > > > obtains real properties, so it can not list any format
> > > > `supportedHintOptions`".
> > > > > When a factory is instantiated it has access to the CatalogTable,
> > > > therefore it has access to all the original properties. In turn it knows
> > > > the original format and can call FormatFactory#supportedHintOptions().
> > > > > The only case when this would not work would be if we allow changing
> > > > > the
> > > > format of the Table (e.g. from avro to parquet), which does not sound
> > > > like
> > > > a good idea to me. I think this feature should not end up as a way to
> > > > declare a whole table inline in a SQL query, but should rather be a
> > > > simple
> > > > way for debugging queries. We should not end up with an extreme example
> > > > where we do:
> > > > > SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> > > > 'format.type' = 'json', ....) */
> > > > > 4. SQL Hints syntax.
> > > > > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> > > > if it is equivalent to string_literal). I think we should not use
> > > > simple_identifier because this implies that we cannot use e.g. any SQL
> > > > keywords. Anyway it has nothing to do with identifiers. If I am not
> > > > mistaken it is also how the options in the CREATE statement are
> > > > implemented.
> > > > > What is the purpose of the remaining hint_item: hint_name(hint_opt [
> > > > ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a
> > > > feeling it
> > > > does also suggests to support the whole Apache Calcite hint system
> > > > without
> > > > specifying that explicitly. Is the intention of the FLIP to support
> > > > choosing e.g. JOIN strategies through hints already? If it is so it
> > > > should
> > > > be mentioned in the FLIP, imo.
> > > > > 5. I think something does not work around the supportedHintOptions and
> > > > wildcards. How do you want to represent a wildcard key as a
> > > > ConfigOption? I
> > > > am not sure about that, just a though, maybe it make sense to have
> > > > rather
> > > > Set<String> supportedHintOptionKeys()?
> > > > > Best,
> > > > > Dawid
> > > >
> > >
> > >
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Timo Walther <tw...@apache.org>.
Hi everyone,

@Aljoscha: I disagree with your approach because a `Catalog` can return 
a custom factory that is not using any properties. The hinting must be 
transparent to a factory. We should NOT modify the metadata 
`CatalogTable` at any point in time after the catalog.

@Danny, @Jingsong: How about we stick to the original design that we 
wanted to vote on but use:

Set<String> supportedHintProperties()

This fits better to the old factory design. And for the new FLIP-95 
factories we will use `ConfigOption` and provide good utilities for 
merging with hints etc.

We can allow `"format.*"` in `supportedHintProperties()` to allow 
hinting in formats.

What do you think?

Regards,
Timo


On 02.04.20 16:24, Aljoscha Krettek wrote:
> I think we're designing ourselves into ever more complicated corners 
> here. Maybe we need to take a step back and reconsider. What would you 
> think about this (somewhat) simpler proposal:
> 
> - introduce a hint called CONNECTOR_OPTIONS(k=v,...). or 
> CONNECTOR_PROPERTIES, depending on what naming we want to have for this 
> in the future. This will simply overwrite all connector properties, the 
> table factories don't know about hints but simply work with the 
> properties that they are given
> 
> - this special hint is disabled by default and can be activated with a 
> global option "foo.bazzle.connector-hints" (or something like this) 
> which has a warning that describes that this can change query semantics 
> etc.
> 
> That's it. This makes connector implementations a lot easier while still 
> allowing inline configuration.
> 
> I still don't like using hint syntax at all for this, because I strongly 
> maintain that hints should not change query syntax. In general using 
> hints should be kept to a minimum because they usually point to 
> shortcomings in the system.
> 
> Best,
> Aljoscha
> 
> On 02.04.20 06:06, Jingsong Li wrote:
>> Hi Dawid,
>>
>>> When a factory is instantiated it has access to the CatalogTable,
>> therefore it has access to all the original properties. In turn it knows
>> the original format and can call FormatFactory#supportedHintOptions().
>>
>> Factory can only get CatalogTable when creating source or sink,
>> right? IIUC, TableFactory may be stateless too.
>> When invoking SourceFactory#supportedHintOptions(), it can not
>> get CatalogTable, so it is impossible to create FormatFactory?
>>
>> Best,
>> Jingsong Lee
>>
>> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> wrote:
>>
>>> Hi, Dawid, thanks so much for the detail suggestions ~
>>>
>>> 1. Regarding the motivation:
>>>
>>> I agree it's not a good suggested way based on the fact that we have
>>> better solution, but i think we can support override that as long as it
>>> exists as one of the the table options. I would remove if from the
>>> motication part.
>>>
>>> 2. The options passes around during sql-to-rel conversion, right 
>>> after we
>>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
>>> indeed a push way method at least in the RelOptTable layer, then we hand
>>> over the options to TableSourceFactory with our own context, which is 
>>> fine
>>> becuase TableSourceFactory#Context is the contact to pass around these
>>> table-about variables.
>>>
>>> 3. "We should not end up with an extreme example where we can change the
>>> connector type", i totally agree that, and i have listed the
>>> "connector.type" as forbidden attribute in the WIKI. As for the 
>>> format, i
>>> think the connector itself can/should control whether to override the
>>> "format.type", that is one of the reason i change the
>>> TableSourceFactory#supportedHintOpitons to
>>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can 
>>> limit the
>>> format keys we want conveniently.
>>>
>>> 4. SQL Hints syntax.
>>>
>>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>>> if it is equivalent to string_literal).
>>>
>>> I disagree, we at least should keep sync with our DDL: use the string
>>> literal as the key. We did also support the simple identifier because 
>>> this
>>> is the common hint syntax from Calcite, it does not hurt anything for 
>>> the
>>> OPTIONS hint, the unsupported keys would validate fails.(If you think 
>>> that
>>> may cause some confuse, i can make the syntax pluggable for each hint in
>>> CALCITE 1.23)
>>>
>>> We only supports OPTIONS hint in the FLIP, and i have changed the 
>>> title to
>>> "Supports dynamic table options", would make it more clear in the WIKI.
>>>
>>> 5. Yes, we also have this concerns from our offline discussion, that is
>>> one of the reason, why i change the 
>>> TableSourceFactory#supportedHintOpitons
>>> to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
>>> supportedHintOptionKeys() with wildcard for 2 reasons:
>>>
>>>    - The wildcard is still not descriptive, we can still not 
>>> forbidden one
>>> of the properties among the wildcard properties, we can not enable or
>>> disable them totally
>>>    - ConfigOption is our new structure for keys, and it does not support
>>> wildcard yet.
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz 
>>> <dw...@apache.org>,写道:
>>>> Hi,
>>>> Few comments from my side:
>>>> 1. Regarding the motivation:
>>>> I think the example with changing the update-mode is not a good one. In
>>> the long term this should be done with EMIT CHANGELOG (discussed in
>>> FLIP-105).
>>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
>>> solution. I think this should not be a recommended way for production 
>>> use
>>> cases as it bypasses the Catalog, which should be the source of truth.
>>>> 2. I could not understand how the additional options will be passed to
>>> the TableSourceFactory. Could you elaborate a bit more on that? I see 
>>> there
>>> is a Context interface that gives the options. But cannot find a way 
>>> to get
>>> the context itself in the factory. Moreover I think it would make more
>>> sense to have rather a push based approach here. Something like
>>> applyOptions(ReadableConfig) method.
>>>> 3. As for the concerns Jingsong raised in the voting thread. I think it
>>> is not a big problem, but I agree this should be also described. I 
>>> disagree
>>> with "Connector don't know format information in TableFactory before
>>> obtains real properties, so it can not list any format
>>> `supportedHintOptions`".
>>>> When a factory is instantiated it has access to the CatalogTable,
>>> therefore it has access to all the original properties. In turn it knows
>>> the original format and can call FormatFactory#supportedHintOptions().
>>>> The only case when this would not work would be if we allow changing 
>>>> the
>>> format of the Table (e.g. from avro to parquet), which does not sound 
>>> like
>>> a good idea to me. I think this feature should not end up as a way to
>>> declare a whole table inline in a SQL query, but should rather be a 
>>> simple
>>> way for debugging queries. We should not end up with an extreme example
>>> where we do:
>>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
>>> 'format.type' = 'json', ....) */
>>>> 4. SQL Hints syntax.
>>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>>> if it is equivalent to string_literal). I think we should not use
>>> simple_identifier because this implies that we cannot use e.g. any SQL
>>> keywords. Anyway it has nothing to do with identifiers. If I am not
>>> mistaken it is also how the options in the CREATE statement are 
>>> implemented.
>>>> What is the purpose of the remaining hint_item: hint_name(hint_opt [
>>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a 
>>> feeling it
>>> does also suggests to support the whole Apache Calcite hint system 
>>> without
>>> specifying that explicitly. Is the intention of the FLIP to support
>>> choosing e.g. JOIN strategies through hints already? If it is so it 
>>> should
>>> be mentioned in the FLIP, imo.
>>>> 5. I think something does not work around the supportedHintOptions and
>>> wildcards. How do you want to represent a wildcard key as a 
>>> ConfigOption? I
>>> am not sure about that, just a though, maybe it make sense to have 
>>> rather
>>> Set<String> supportedHintOptionKeys()?
>>>> Best,
>>>> Dawid
>>>
>>
>>


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Aljoscha Krettek <al...@apache.org>.
I think we're designing ourselves into ever more complicated corners 
here. Maybe we need to take a step back and reconsider. What would you 
think about this (somewhat) simpler proposal:

- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or 
CONNECTOR_PROPERTIES, depending on what naming we want to have for this 
in the future. This will simply overwrite all connector properties, the 
table factories don't know about hints but simply work with the 
properties that they are given

- this special hint is disabled by default and can be activated with a 
global option "foo.bazzle.connector-hints" (or something like this) 
which has a warning that describes that this can change query semantics etc.

That's it. This makes connector implementations a lot easier while still 
allowing inline configuration.

I still don't like using hint syntax at all for this, because I strongly 
maintain that hints should not change query syntax. In general using 
hints should be kept to a minimum because they usually point to 
shortcomings in the system.

Best,
Aljoscha

On 02.04.20 06:06, Jingsong Li wrote:
> Hi Dawid,
> 
>> When a factory is instantiated it has access to the CatalogTable,
> therefore it has access to all the original properties. In turn it knows
> the original format and can call FormatFactory#supportedHintOptions().
> 
> Factory can only get CatalogTable when creating source or sink,
> right? IIUC, TableFactory may be stateless too.
> When invoking SourceFactory#supportedHintOptions(), it can not
> get CatalogTable, so it is impossible to create FormatFactory?
> 
> Best,
> Jingsong Lee
> 
> On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> wrote:
> 
>> Hi, Dawid, thanks so much for the detail suggestions ~
>>
>> 1. Regarding the motivation:
>>
>> I agree it's not a good suggested way based on the fact that we have
>> better solution, but i think we can support override that as long as it
>> exists as one of the the table options. I would remove if from the
>> motication part.
>>
>> 2. The options passes around during sql-to-rel conversion, right after we
>> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
>> indeed a push way method at least in the RelOptTable layer, then we hand
>> over the options to TableSourceFactory with our own context, which is fine
>> becuase TableSourceFactory#Context is the contact to pass around these
>> table-about variables.
>>
>> 3. "We should not end up with an extreme example where we can change the
>> connector type", i totally agree that, and i have listed the
>> "connector.type" as forbidden attribute in the WIKI. As for the format, i
>> think the connector itself can/should control whether to override the
>> "format.type", that is one of the reason i change the
>> TableSourceFactory#supportedHintOpitons to
>> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can limit the
>> format keys we want conveniently.
>>
>> 4. SQL Hints syntax.
>>
>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>> if it is equivalent to string_literal).
>>
>> I disagree, we at least should keep sync with our DDL: use the string
>> literal as the key. We did also support the simple identifier because this
>> is the common hint syntax from Calcite, it does not hurt anything for the
>> OPTIONS hint, the unsupported keys would validate fails.(If you think that
>> may cause some confuse, i can make the syntax pluggable for each hint in
>> CALCITE 1.23)
>>
>> We only supports OPTIONS hint in the FLIP, and i have changed the title to
>> "Supports dynamic table options", would make it more clear in the WIKI.
>>
>> 5. Yes, we also have this concerns from our offline discussion, that is
>> one of the reason, why i change the TableSourceFactory#supportedHintOpitons
>> to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
>> supportedHintOptionKeys() with wildcard for 2 reasons:
>>
>>    - The wildcard is still not descriptive, we can still not forbidden one
>> of the properties among the wildcard properties, we can not enable or
>> disable them totally
>>    - ConfigOption is our new structure for keys, and it does not support
>> wildcard yet.
>>
>> Best,
>> Danny Chan
>> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz <dw...@apache.org>,写道:
>>> Hi,
>>> Few comments from my side:
>>> 1. Regarding the motivation:
>>> I think the example with changing the update-mode is not a good one. In
>> the long term this should be done with EMIT CHANGELOG (discussed in
>> FLIP-105).
>>> Nitpicking: I would mention that it is rather for debugging/ad-hoc
>> solution. I think this should not be a recommended way for production use
>> cases as it bypasses the Catalog, which should be the source of truth.
>>> 2. I could not understand how the additional options will be passed to
>> the TableSourceFactory. Could you elaborate a bit more on that? I see there
>> is a Context interface that gives the options. But cannot find a way to get
>> the context itself in the factory. Moreover I think it would make more
>> sense to have rather a push based approach here. Something like
>> applyOptions(ReadableConfig) method.
>>> 3. As for the concerns Jingsong raised in the voting thread. I think it
>> is not a big problem, but I agree this should be also described. I disagree
>> with "Connector don't know format information in TableFactory before
>> obtains real properties, so it can not list any format
>> `supportedHintOptions`".
>>> When a factory is instantiated it has access to the CatalogTable,
>> therefore it has access to all the original properties. In turn it knows
>> the original format and can call FormatFactory#supportedHintOptions().
>>> The only case when this would not work would be if we allow changing the
>> format of the Table (e.g. from avro to parquet), which does not sound like
>> a good idea to me. I think this feature should not end up as a way to
>> declare a whole table inline in a SQL query, but should rather be a simple
>> way for debugging queries. We should not end up with an extreme example
>> where we do:
>>> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
>> 'format.type' = 'json', ....) */
>>> 4. SQL Hints syntax.
>>> I think the k and v in the hint_item should be QUOTED_STRING (not sure
>> if it is equivalent to string_literal). I think we should not use
>> simple_identifier because this implies that we cannot use e.g. any SQL
>> keywords. Anyway it has nothing to do with identifiers. If I am not
>> mistaken it is also how the options in the CREATE statement are implemented.
>>> What is the purpose of the remaining hint_item: hint_name(hint_opt [
>> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a feeling it
>> does also suggests to support the whole Apache Calcite hint system without
>> specifying that explicitly. Is the intention of the FLIP to support
>> choosing e.g. JOIN strategies through hints already? If it is so it should
>> be mentioned in the FLIP, imo.
>>> 5. I think something does not work around the supportedHintOptions and
>> wildcards. How do you want to represent a wildcard key as a ConfigOption? I
>> am not sure about that, just a though, maybe it make sense to have rather
>> Set<String> supportedHintOptionKeys()?
>>> Best,
>>> Dawid
>>
> 
> 


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Jingsong Li <ji...@gmail.com>.
Hi Dawid,

> When a factory is instantiated it has access to the CatalogTable,
therefore it has access to all the original properties. In turn it knows
the original format and can call FormatFactory#supportedHintOptions().

Factory can only get CatalogTable when creating source or sink,
right? IIUC, TableFactory may be stateless too.
When invoking SourceFactory#supportedHintOptions(), it can not
get CatalogTable, so it is impossible to create FormatFactory?

Best,
Jingsong Lee

On Wed, Apr 1, 2020 at 10:05 PM Danny Chan <yu...@gmail.com> wrote:

> Hi, Dawid, thanks so much for the detail suggestions ~
>
> 1. Regarding the motivation:
>
> I agree it's not a good suggested way based on the fact that we have
> better solution, but i think we can support override that as long as it
> exists as one of the the table options. I would remove if from the
> motication part.
>
> 2. The options passes around during sql-to-rel conversion, right after we
> generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
> indeed a push way method at least in the RelOptTable layer, then we hand
> over the options to TableSourceFactory with our own context, which is fine
> becuase TableSourceFactory#Context is the contact to pass around these
> table-about variables.
>
> 3. "We should not end up with an extreme example where we can change the
> connector type", i totally agree that, and i have listed the
> "connector.type" as forbidden attribute in the WIKI. As for the format, i
> think the connector itself can/should control whether to override the
> "format.type", that is one of the reason i change the
> TableSourceFactory#supportedHintOpitons to
> TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can limit the
> format keys we want conveniently.
>
> 4. SQL Hints syntax.
>
> > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> if it is equivalent to string_literal).
>
> I disagree, we at least should keep sync with our DDL: use the string
> literal as the key. We did also support the simple identifier because this
> is the common hint syntax from Calcite, it does not hurt anything for the
> OPTIONS hint, the unsupported keys would validate fails.(If you think that
> may cause some confuse, i can make the syntax pluggable for each hint in
> CALCITE 1.23)
>
> We only supports OPTIONS hint in the FLIP, and i have changed the title to
> "Supports dynamic table options", would make it more clear in the WIKI.
>
> 5. Yes, we also have this concerns from our offline discussion, that is
> one of the reason, why i change the TableSourceFactory#supportedHintOpitons
> to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String>
> supportedHintOptionKeys() with wildcard for 2 reasons:
>
>   - The wildcard is still not descriptive, we can still not forbidden one
> of the properties among the wildcard properties, we can not enable or
> disable them totally
>   - ConfigOption is our new structure for keys, and it does not support
> wildcard yet.
>
> Best,
> Danny Chan
> 在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz <dw...@apache.org>,写道:
> > Hi,
> > Few comments from my side:
> > 1. Regarding the motivation:
> > I think the example with changing the update-mode is not a good one. In
> the long term this should be done with EMIT CHANGELOG (discussed in
> FLIP-105).
> > Nitpicking: I would mention that it is rather for debugging/ad-hoc
> solution. I think this should not be a recommended way for production use
> cases as it bypasses the Catalog, which should be the source of truth.
> > 2. I could not understand how the additional options will be passed to
> the TableSourceFactory. Could you elaborate a bit more on that? I see there
> is a Context interface that gives the options. But cannot find a way to get
> the context itself in the factory. Moreover I think it would make more
> sense to have rather a push based approach here. Something like
> applyOptions(ReadableConfig) method.
> > 3. As for the concerns Jingsong raised in the voting thread. I think it
> is not a big problem, but I agree this should be also described. I disagree
> with "Connector don't know format information in TableFactory before
> obtains real properties, so it can not list any format
> `supportedHintOptions`".
> > When a factory is instantiated it has access to the CatalogTable,
> therefore it has access to all the original properties. In turn it knows
> the original format and can call FormatFactory#supportedHintOptions().
> > The only case when this would not work would be if we allow changing the
> format of the Table (e.g. from avro to parquet), which does not sound like
> a good idea to me. I think this feature should not end up as a way to
> declare a whole table inline in a SQL query, but should rather be a simple
> way for debugging queries. We should not end up with an extreme example
> where we do:
> > SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
> 'format.type' = 'json', ....) */
> > 4. SQL Hints syntax.
> > I think the k and v in the hint_item should be QUOTED_STRING (not sure
> if it is equivalent to string_literal). I think we should not use
> simple_identifier because this implies that we cannot use e.g. any SQL
> keywords. Anyway it has nothing to do with identifiers. If I am not
> mistaken it is also how the options in the CREATE statement are implemented.
> > What is the purpose of the remaining hint_item: hint_name(hint_opt [
> ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a feeling it
> does also suggests to support the whole Apache Calcite hint system without
> specifying that explicitly. Is the intention of the FLIP to support
> choosing e.g. JOIN strategies through hints already? If it is so it should
> be mentioned in the FLIP, imo.
> > 5. I think something does not work around the supportedHintOptions and
> wildcards. How do you want to represent a wildcard key as a ConfigOption? I
> am not sure about that, just a though, maybe it make sense to have rather
> Set<String> supportedHintOptionKeys()?
> > Best,
> > Dawid
>


-- 
Best, Jingsong Lee

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Posted by Danny Chan <yu...@gmail.com>.
Hi, Dawid, thanks so much for the detail suggestions ~

1. Regarding the motivation:

I agree it's not a good suggested way based on the fact that we have better solution, but i think we can support override that as long as it exists as one of the the table options. I would remove if from the motication part.

2. The options passes around during sql-to-rel conversion, right after we generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is indeed a push way method at least in the RelOptTable layer, then we hand over the options to TableSourceFactory with our own context, which is fine becuase TableSourceFactory#Context is the contact to pass around these table-about variables.

3. "We should not end up with an extreme example where we can change the connector type", i totally agree that, and i have listed the "connector.type" as forbidden attribute in the WIKI. As for the format, i think the connector itself can/should control whether to override the "format.type", that is one of the reason i change the TableSourceFactory#supportedHintOpitons to
TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can limit the format keys we want conveniently.

4. SQL Hints syntax.

> I think the k and v in the hint_item should be QUOTED_STRING (not sure if it is equivalent to string_literal).

I disagree, we at least should keep sync with our DDL: use the string literal as the key. We did also support the simple identifier because this is the common hint syntax from Calcite, it does not hurt anything for the OPTIONS hint, the unsupported keys would validate fails.(If you think that may cause some confuse, i can make the syntax pluggable for each hint in CALCITE 1.23)

We only supports OPTIONS hint in the FLIP, and i have changed the title to "Supports dynamic table options", would make it more clear in the WIKI.

5. Yes, we also have this concerns from our offline discussion, that is one of the reason, why i change the TableSourceFactory#supportedHintOpitons to TableSourceFactory#forbiddenHintOpitons, i do not choose Set<String> supportedHintOptionKeys() with wildcard for 2 reasons:

  - The wildcard is still not descriptive, we can still not forbidden one of the properties among the wildcard properties, we can not enable or disable them totally
  - ConfigOption is our new structure for keys, and it does not support wildcard yet.

Best,
Danny Chan
在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz <dw...@apache.org>,写道:
> Hi,
> Few comments from my side:
> 1. Regarding the motivation:
> I think the example with changing the update-mode is not a good one. In the long term this should be done with EMIT CHANGELOG (discussed in FLIP-105).
> Nitpicking: I would mention that it is rather for debugging/ad-hoc solution. I think this should not be a recommended way for production use cases as it bypasses the Catalog, which should be the source of truth.
> 2. I could not understand how the additional options will be passed to the TableSourceFactory. Could you elaborate a bit more on that? I see there is a Context interface that gives the options. But cannot find a way to get the context itself in the factory. Moreover I think it would make more sense to have rather a push based approach here. Something like applyOptions(ReadableConfig) method.
> 3. As for the concerns Jingsong raised in the voting thread. I think it is not a big problem, but I agree this should be also described. I disagree with "Connector don't know format information in TableFactory before obtains real properties, so it can not list any format `supportedHintOptions`".
> When a factory is instantiated it has access to the CatalogTable, therefore it has access to all the original properties. In turn it knows the original format and can call FormatFactory#supportedHintOptions().
> The only case when this would not work would be if we allow changing the format of the Table (e.g. from avro to parquet), which does not sound like a good idea to me. I think this feature should not end up as a way to declare a whole table inline in a SQL query, but should rather be a simple way for debugging queries. We should not end up with an extreme example where we do:
> SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ..., 'format.type' = 'json', ....) */
> 4. SQL Hints syntax.
> I think the k and v in the hint_item should be QUOTED_STRING (not sure if it is equivalent to string_literal). I think we should not use simple_identifier because this implies that we cannot use e.g. any SQL keywords. Anyway it has nothing to do with identifiers. If I am not mistaken it is also how the options in the CREATE statement are implemented.
> What is the purpose of the remaining hint_item: hint_name(hint_opt [ ,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a feeling it does also suggests to support the whole Apache Calcite hint system without specifying that explicitly. Is the intention of the FLIP to support choosing e.g. JOIN strategies through hints already? If it is so it should be mentioned in the FLIP, imo.
> 5. I think something does not work around the supportedHintOptions and wildcards. How do you want to represent a wildcard key as a ConfigOption? I am not sure about that, just a though, maybe it make sense to have rather Set<String> supportedHintOptionKeys()?
> Best,
> Dawid