You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Jark Wu <im...@gmail.com> on 2019/10/18 10:28:23 UTC

[DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Hi everyone,

I would like to start a discussion[1] about how to make Expression string
serializable and deserializable. Expression is the general interface for
all kinds of expressions in Flink Table API & SQL, it represents a logical
tree for producing a computation result. In FLIP-66[2] and FLIP-70[3], we
introduced watermark and computed column syntax in DDL. The watermark
strategy and computed column are both represented in Expression. In order
to persist watermark and computed column information in catalog, we need to
figure out how to persist and restore Expression.

FLIP-80:
https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing

Thanks for any feedback!

Best,
Jark

[1]:
https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
[2]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
[3]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Jark Wu <im...@gmail.com>.
Thank you all, then I will update the design doc which should be pretty
simple now...

Best,
Jark

On Thu, 24 Oct 2019 at 17:09, Timo Walther <tw...@apache.org> wrote:

> Hi Jark,
>
> +1 for your suggestion. I think it will simplify the design a lot if we
> serialize all expressions as SQL strings and will avoid duplicate parser
> code. Initially, I had concerns that there might be expressions in the
> future that cannot be represented in SQL. But currently I cannot come up
> with a counter example.
>
> Table operations will be a different topic that will require a custom
> string syntax. But expressions as SQL expressions sounds good to me.
>
> @Jingsong: Jark is right. Table API expression strings are outdated and
> error-prone. They will be removed at some point.
>
> Regards,
> Timo
>
>
> On 24.10.19 10:50, Jark Wu wrote:
> > Thanks Jingsong,
> >
> > As discussed in Java Expression DSL, we are planning to drop the Java
> Expression string API.
> > So I think we don’t have a plan to unify #1 and #2. And in the future,
> we may only have SQL parser to parse a string expression.
> > The only thing to consider is, whether can all the resolved expression
> be converted to SqlNode.
> > AFAIK, currently, after expression resolving, all the expressions can be
> converted to SqlNodes.
> >
> > Best,
> > Jark
> >
> > [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> >
> >
> >> 在 2019年10月24日,13:30,Jingsong Li <ji...@gmail.com> 写道:
> >>
> >> Thanks Jark for your proposal.
> >>
> >> If we introduce a new kind of string presentation for expression, we
> will
> >> have 3 string presentation now:
> >> 1. Java expression string api. We have PlannerExpressionParser to parse
> >> string to Expressions.
> >> 2. Sql string presentation, as you said, we can use calcite classes to
> >> parse and unparse.
> >> 3. New kind of string presentation for serialize.
> >>
> >>  From this point of view, I prefer not to introduce a new kind of string
> >> presentation to reduce the complexity.
> >>
> >> There are some differences between #1 and #2:
> >> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> >> - bigint literal: "1L" and "cast(1 as BIGINT)"
> >>
> >> Now it is two completely independent sets., Whether we can unify #1 and
> #2
> >> into one set, and we all use one parser?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu <im...@gmail.com> wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
> >>> literals.
> >>> That means the string format of literal is SQL compatible.
> >>> After some discussion with Kurt, we think why not one more step
> forward,
> >>> i.e. convert the whole expression to SQL format.
> >>>
> >>> For example, the above expression will be converted to:
> >>>
> >>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>
> >>> There are some benefits from this:
> >>> 0) the string representation is more readable, and can be manually
> typed
> >>> more easily.
> >>> 1) the string format is SQL syntax, not customized, which means it can
> be
> >>> integrated by third party projects.
> >>> 2) we can reuse Calcite's SqlParser to parse string and
> SqlNode#unparse to
> >>> generate string, this can avoid introducing duplicate code and a custom
> >>> parser.
> >>> 3) no compatible problems.
> >>>
> >>> Regarding to how Expression can be converted into a SQL string, I
> think we
> >>> can leverage some Calcite utils:
> >>>
> >>> ResolvedExpression ---(ExpressionConverter)---> RexNode
> >>> ----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:
> >>>
> >>>> Hi Jark,
> >>>>
> >>>> thanks for the proposal. This is a great effort to finalize the new
> API
> >>>> design.
> >>>>
> >>>> I'm wondering if we could simply use the SQL parser like
> >>>> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
> >>>> an expression that contain only literals. This would avoid any
> >>>> discussion as the syntax is already defined by the SQL standard. And
> it
> >>>> would also be very unlikely to have a need for a version.
> >>>>
> >>>> For example:
> >>>>
> >>>> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
> >>>> 12:12:12'))
> >>>>
> >>>> Or even further if the SQL parser allows that:
> >>>>
> >>>> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>>
> >>>> I would find it confusing if we use different representation for
> >>>> literals such as intervals and timestamps in the properties. This
> would
> >>>> also reduce code duplication as we reuse logic for parsing identifiers
> >>> etc.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 18.10.19 12:28, Jark Wu wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I would like to start a discussion[1] about how to make Expression
> >>> string
> >>>>> serializable and deserializable. Expression is the general interface
> >>> for
> >>>>> all kinds of expressions in Flink Table API & SQL, it represents a
> >>>> logical
> >>>>> tree for producing a computation result. In FLIP-66[2] and
> FLIP-70[3],
> >>> we
> >>>>> introduced watermark and computed column syntax in DDL. The watermark
> >>>>> strategy and computed column are both represented in Expression. In
> >>> order
> >>>>> to persist watermark and computed column information in catalog, we
> >>> need
> >>>> to
> >>>>> figure out how to persist and restore Expression.
> >>>>>
> >>>>> FLIP-80:
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> >>>>>
> >>>>> Thanks for any feedback!
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> >>>>> [2]:
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> >>>>> [3]:
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >
> >
>
>

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Jingsong Li <ji...@gmail.com>.
Thanks Jark and Timo for your explain.

+1 to use SQL strings. Just took a look to calcite unparse, it
really complex.

Jark, yeah, The only thing need by ExpressionConverter is RelBuilder.

Best,
Jingsong Lee

On Thu, Oct 24, 2019 at 5:09 PM Timo Walther <tw...@apache.org> wrote:

> Hi Jark,
>
> +1 for your suggestion. I think it will simplify the design a lot if we
> serialize all expressions as SQL strings and will avoid duplicate parser
> code. Initially, I had concerns that there might be expressions in the
> future that cannot be represented in SQL. But currently I cannot come up
> with a counter example.
>
> Table operations will be a different topic that will require a custom
> string syntax. But expressions as SQL expressions sounds good to me.
>
> @Jingsong: Jark is right. Table API expression strings are outdated and
> error-prone. They will be removed at some point.
>
> Regards,
> Timo
>
>
> On 24.10.19 10:50, Jark Wu wrote:
> > Thanks Jingsong,
> >
> > As discussed in Java Expression DSL, we are planning to drop the Java
> Expression string API.
> > So I think we don’t have a plan to unify #1 and #2. And in the future,
> we may only have SQL parser to parse a string expression.
> > The only thing to consider is, whether can all the resolved expression
> be converted to SqlNode.
> > AFAIK, currently, after expression resolving, all the expressions can be
> converted to SqlNodes.
> >
> > Best,
> > Jark
> >
> > [1]:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html
> >
> >
> >> 在 2019年10月24日,13:30,Jingsong Li <ji...@gmail.com> 写道:
> >>
> >> Thanks Jark for your proposal.
> >>
> >> If we introduce a new kind of string presentation for expression, we
> will
> >> have 3 string presentation now:
> >> 1. Java expression string api. We have PlannerExpressionParser to parse
> >> string to Expressions.
> >> 2. Sql string presentation, as you said, we can use calcite classes to
> >> parse and unparse.
> >> 3. New kind of string presentation for serialize.
> >>
> >>  From this point of view, I prefer not to introduce a new kind of string
> >> presentation to reduce the complexity.
> >>
> >> There are some differences between #1 and #2:
> >> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> >> - bigint literal: "1L" and "cast(1 as BIGINT)"
> >>
> >> Now it is two completely independent sets., Whether we can unify #1 and
> #2
> >> into one set, and we all use one parser?
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu <im...@gmail.com> wrote:
> >>
> >>> Hi Timo,
> >>>
> >>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
> >>> literals.
> >>> That means the string format of literal is SQL compatible.
> >>> After some discussion with Kurt, we think why not one more step
> forward,
> >>> i.e. convert the whole expression to SQL format.
> >>>
> >>> For example, the above expression will be converted to:
> >>>
> >>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>
> >>> There are some benefits from this:
> >>> 0) the string representation is more readable, and can be manually
> typed
> >>> more easily.
> >>> 1) the string format is SQL syntax, not customized, which means it can
> be
> >>> integrated by third party projects.
> >>> 2) we can reuse Calcite's SqlParser to parse string and
> SqlNode#unparse to
> >>> generate string, this can avoid introducing duplicate code and a custom
> >>> parser.
> >>> 3) no compatible problems.
> >>>
> >>> Regarding to how Expression can be converted into a SQL string, I
> think we
> >>> can leverage some Calcite utils:
> >>>
> >>> ResolvedExpression ---(ExpressionConverter)---> RexNode
> >>> ----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:
> >>>
> >>>> Hi Jark,
> >>>>
> >>>> thanks for the proposal. This is a great effort to finalize the new
> API
> >>>> design.
> >>>>
> >>>> I'm wondering if we could simply use the SQL parser like
> >>>> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
> >>>> an expression that contain only literals. This would avoid any
> >>>> discussion as the syntax is already defined by the SQL standard. And
> it
> >>>> would also be very unlikely to have a need for a version.
> >>>>
> >>>> For example:
> >>>>
> >>>> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
> >>>> 12:12:12'))
> >>>>
> >>>> Or even further if the SQL parser allows that:
> >>>>
> >>>> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >>>>
> >>>> I would find it confusing if we use different representation for
> >>>> literals such as intervals and timestamps in the properties. This
> would
> >>>> also reduce code duplication as we reuse logic for parsing identifiers
> >>> etc.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 18.10.19 12:28, Jark Wu wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I would like to start a discussion[1] about how to make Expression
> >>> string
> >>>>> serializable and deserializable. Expression is the general interface
> >>> for
> >>>>> all kinds of expressions in Flink Table API & SQL, it represents a
> >>>> logical
> >>>>> tree for producing a computation result. In FLIP-66[2] and
> FLIP-70[3],
> >>> we
> >>>>> introduced watermark and computed column syntax in DDL. The watermark
> >>>>> strategy and computed column are both represented in Expression. In
> >>> order
> >>>>> to persist watermark and computed column information in catalog, we
> >>> need
> >>>> to
> >>>>> figure out how to persist and restore Expression.
> >>>>>
> >>>>> FLIP-80:
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> >>>>>
> >>>>> Thanks for any feedback!
> >>>>>
> >>>>> Best,
> >>>>> Jark
> >>>>>
> >>>>> [1]:
> >>>>>
> >>>>
> >>>
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> >>>>> [2]:
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> >>>>> [3]:
> >>>>>
> >>>>
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> Best, Jingsong Lee
> >
> >
>
>

-- 
Best, Jingsong Lee

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Timo Walther <tw...@apache.org>.
Hi Jark,

+1 for your suggestion. I think it will simplify the design a lot if we 
serialize all expressions as SQL strings and will avoid duplicate parser 
code. Initially, I had concerns that there might be expressions in the 
future that cannot be represented in SQL. But currently I cannot come up 
with a counter example.

Table operations will be a different topic that will require a custom 
string syntax. But expressions as SQL expressions sounds good to me.

@Jingsong: Jark is right. Table API expression strings are outdated and 
error-prone. They will be removed at some point.

Regards,
Timo


On 24.10.19 10:50, Jark Wu wrote:
> Thanks Jingsong,
> 
> As discussed in Java Expression DSL, we are planning to drop the Java Expression string API.
> So I think we don’t have a plan to unify #1 and #2. And in the future, we may only have SQL parser to parse a string expression.
> The only thing to consider is, whether can all the resolved expression be converted to SqlNode.
> AFAIK, currently, after expression resolving, all the expressions can be converted to SqlNodes.
> 
> Best,
> Jark
> 
> [1]: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html>
> 
>> 在 2019年10月24日,13:30,Jingsong Li <ji...@gmail.com> 写道:
>>
>> Thanks Jark for your proposal.
>>
>> If we introduce a new kind of string presentation for expression, we will
>> have 3 string presentation now:
>> 1. Java expression string api. We have PlannerExpressionParser to parse
>> string to Expressions.
>> 2. Sql string presentation, as you said, we can use calcite classes to
>> parse and unparse.
>> 3. New kind of string presentation for serialize.
>>
>>  From this point of view, I prefer not to introduce a new kind of string
>> presentation to reduce the complexity.
>>
>> There are some differences between #1 and #2:
>> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
>> - bigint literal: "1L" and "cast(1 as BIGINT)"
>>
>> Now it is two completely independent sets., Whether we can unify #1 and #2
>> into one set, and we all use one parser?
>>
>> Best,
>> Jingsong Lee
>>
>> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu <im...@gmail.com> wrote:
>>
>>> Hi Timo,
>>>
>>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
>>> literals.
>>> That means the string format of literal is SQL compatible.
>>> After some discussion with Kurt, we think why not one more step forward,
>>> i.e. convert the whole expression to SQL format.
>>>
>>> For example, the above expression will be converted to:
>>>
>>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>>>
>>> There are some benefits from this:
>>> 0) the string representation is more readable, and can be manually typed
>>> more easily.
>>> 1) the string format is SQL syntax, not customized, which means it can be
>>> integrated by third party projects.
>>> 2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
>>> generate string, this can avoid introducing duplicate code and a custom
>>> parser.
>>> 3) no compatible problems.
>>>
>>> Regarding to how Expression can be converted into a SQL string, I think we
>>> can leverage some Calcite utils:
>>>
>>> ResolvedExpression ---(ExpressionConverter)---> RexNode
>>> ----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
>>>
>>> What do you think?
>>>
>>> Best,
>>> Jark
>>>
>>> On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:
>>>
>>>> Hi Jark,
>>>>
>>>> thanks for the proposal. This is a great effort to finalize the new API
>>>> design.
>>>>
>>>> I'm wondering if we could simply use the SQL parser like
>>>> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
>>>> an expression that contain only literals. This would avoid any
>>>> discussion as the syntax is already defined by the SQL standard. And it
>>>> would also be very unlikely to have a need for a version.
>>>>
>>>> For example:
>>>>
>>>> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
>>>> 12:12:12'))
>>>>
>>>> Or even further if the SQL parser allows that:
>>>>
>>>> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>>>>
>>>> I would find it confusing if we use different representation for
>>>> literals such as intervals and timestamps in the properties. This would
>>>> also reduce code duplication as we reuse logic for parsing identifiers
>>> etc.
>>>>
>>>> What do you think?
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 18.10.19 12:28, Jark Wu wrote:
>>>>> Hi everyone,
>>>>>
>>>>> I would like to start a discussion[1] about how to make Expression
>>> string
>>>>> serializable and deserializable. Expression is the general interface
>>> for
>>>>> all kinds of expressions in Flink Table API & SQL, it represents a
>>>> logical
>>>>> tree for producing a computation result. In FLIP-66[2] and FLIP-70[3],
>>> we
>>>>> introduced watermark and computed column syntax in DDL. The watermark
>>>>> strategy and computed column are both represented in Expression. In
>>> order
>>>>> to persist watermark and computed column information in catalog, we
>>> need
>>>> to
>>>>> figure out how to persist and restore Expression.
>>>>>
>>>>> FLIP-80:
>>>>>
>>>>
>>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
>>>>>
>>>>> Thanks for any feedback!
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> [1]:
>>>>>
>>>>
>>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
>>>>> [2]:
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
>>>>> [3]:
>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
>>>>>
>>>>
>>>>
>>>
>>
>>
>> -- 
>> Best, Jingsong Lee
> 
> 


Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Jark Wu <im...@gmail.com>.
Thanks Jingsong,

As discussed in Java Expression DSL, we are planning to drop the Java Expression string API. 
So I think we don’t have a plan to unify #1 and #2. And in the future, we may only have SQL parser to parse a string expression.
The only thing to consider is, whether can all the resolved expression be converted to SqlNode. 
AFAIK, currently, after expression resolving, all the expressions can be converted to SqlNodes. 

Best,
Jark

[1]: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html <http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduction-of-a-Table-API-Java-Expression-DSL-td27787.html>

> 在 2019年10月24日,13:30,Jingsong Li <ji...@gmail.com> 写道:
> 
> Thanks Jark for your proposal.
> 
> If we introduce a new kind of string presentation for expression, we will
> have 3 string presentation now:
> 1. Java expression string api. We have PlannerExpressionParser to parse
> string to Expressions.
> 2. Sql string presentation, as you said, we can use calcite classes to
> parse and unparse.
> 3. New kind of string presentation for serialize.
> 
> From this point of view, I prefer not to introduce a new kind of string
> presentation to reduce the complexity.
> 
> There are some differences between #1 and #2:
> - method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
> - bigint literal: "1L" and "cast(1 as BIGINT)"
> 
> Now it is two completely independent sets., Whether we can unify #1 and #2
> into one set, and we all use one parser?
> 
> Best,
> Jingsong Lee
> 
> On Tue, Oct 22, 2019 at 7:57 PM Jark Wu <im...@gmail.com> wrote:
> 
>> Hi Timo,
>> 
>> I think it's a good idea to use `SqlParser#parseExpression()` to parse
>> literals.
>> That means the string format of literal is SQL compatible.
>> After some discussion with Kurt, we think why not one more step forward,
>> i.e. convert the whole expression to SQL format.
>> 
>> For example, the above expression will be converted to:
>> 
>> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>> 
>> There are some benefits from this:
>> 0) the string representation is more readable, and can be manually typed
>> more easily.
>> 1) the string format is SQL syntax, not customized, which means it can be
>> integrated by third party projects.
>> 2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
>> generate string, this can avoid introducing duplicate code and a custom
>> parser.
>> 3) no compatible problems.
>> 
>> Regarding to how Expression can be converted into a SQL string, I think we
>> can leverage some Calcite utils:
>> 
>> ResolvedExpression ---(ExpressionConverter)---> RexNode
>> ----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
>> 
>> What do you think?
>> 
>> Best,
>> Jark
>> 
>> On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:
>> 
>>> Hi Jark,
>>> 
>>> thanks for the proposal. This is a great effort to finalize the new API
>>> design.
>>> 
>>> I'm wondering if we could simply use the SQL parser like
>>> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
>>> an expression that contain only literals. This would avoid any
>>> discussion as the syntax is already defined by the SQL standard. And it
>>> would also be very unlikely to have a need for a version.
>>> 
>>> For example:
>>> 
>>> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
>>> 12:12:12'))
>>> 
>>> Or even further if the SQL parser allows that:
>>> 
>>> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>>> 
>>> I would find it confusing if we use different representation for
>>> literals such as intervals and timestamps in the properties. This would
>>> also reduce code duplication as we reuse logic for parsing identifiers
>> etc.
>>> 
>>> What do you think?
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 18.10.19 12:28, Jark Wu wrote:
>>>> Hi everyone,
>>>> 
>>>> I would like to start a discussion[1] about how to make Expression
>> string
>>>> serializable and deserializable. Expression is the general interface
>> for
>>>> all kinds of expressions in Flink Table API & SQL, it represents a
>>> logical
>>>> tree for producing a computation result. In FLIP-66[2] and FLIP-70[3],
>> we
>>>> introduced watermark and computed column syntax in DDL. The watermark
>>>> strategy and computed column are both represented in Expression. In
>> order
>>>> to persist watermark and computed column information in catalog, we
>> need
>>> to
>>>> figure out how to persist and restore Expression.
>>>> 
>>>> FLIP-80:
>>>> 
>>> 
>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
>>>> 
>>>> Thanks for any feedback!
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> [1]:
>>>> 
>>> 
>> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
>>>> [2]:
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
>>>> [3]:
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
>>>> 
>>> 
>>> 
>> 
> 
> 
> -- 
> Best, Jingsong Lee


Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Jingsong Li <ji...@gmail.com>.
Thanks Jark for your proposal.

If we introduce a new kind of string presentation for expression, we will
have 3 string presentation now:
1. Java expression string api. We have PlannerExpressionParser to parse
string to Expressions.
2. Sql string presentation, as you said, we can use calcite classes to
parse and unparse.
3. New kind of string presentation for serialize.

From this point of view, I prefer not to introduce a new kind of string
presentation to reduce the complexity.

There are some differences between #1 and #2:
- method invoking: "f0.substring(1, f7)" and "SUBSTRING(f0, 1, f7)"
- bigint literal: "1L" and "cast(1 as BIGINT)"

Now it is two completely independent sets., Whether we can unify #1 and #2
into one set, and we all use one parser?

Best,
Jingsong Lee

On Tue, Oct 22, 2019 at 7:57 PM Jark Wu <im...@gmail.com> wrote:

> Hi Timo,
>
> I think it's a good idea to use `SqlParser#parseExpression()` to parse
> literals.
> That means the string format of literal is SQL compatible.
> After some discussion with Kurt, we think why not one more step forward,
> i.e. convert the whole expression to SQL format.
>
> For example, the above expression will be converted to:
>
> `cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>
> There are some benefits from this:
> 0) the string representation is more readable, and can be manually typed
> more easily.
> 1) the string format is SQL syntax, not customized, which means it can be
> integrated by third party projects.
> 2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
> generate string, this can avoid introducing duplicate code and a custom
> parser.
> 3) no compatible problems.
>
> Regarding to how Expression can be converted into a SQL string, I think we
> can leverage some Calcite utils:
>
> ResolvedExpression ---(ExpressionConverter)---> RexNode
> ----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()
>
> What do you think?
>
> Best,
> Jark
>
> On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:
>
> > Hi Jark,
> >
> > thanks for the proposal. This is a great effort to finalize the new API
> > design.
> >
> > I'm wondering if we could simply use the SQL parser like
> > `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
> > an expression that contain only literals. This would avoid any
> > discussion as the syntax is already defined by the SQL standard. And it
> > would also be very unlikely to have a need for a version.
> >
> > For example:
> >
> > CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
> > 12:12:12'))
> >
> > Or even further if the SQL parser allows that:
> >
> > CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
> >
> > I would find it confusing if we use different representation for
> > literals such as intervals and timestamps in the properties. This would
> > also reduce code duplication as we reuse logic for parsing identifiers
> etc.
> >
> > What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > On 18.10.19 12:28, Jark Wu wrote:
> > > Hi everyone,
> > >
> > > I would like to start a discussion[1] about how to make Expression
> string
> > > serializable and deserializable. Expression is the general interface
> for
> > > all kinds of expressions in Flink Table API & SQL, it represents a
> > logical
> > > tree for producing a computation result. In FLIP-66[2] and FLIP-70[3],
> we
> > > introduced watermark and computed column syntax in DDL. The watermark
> > > strategy and computed column are both represented in Expression. In
> order
> > > to persist watermark and computed column information in catalog, we
> need
> > to
> > > figure out how to persist and restore Expression.
> > >
> > > FLIP-80:
> > >
> >
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> > >
> > > Thanks for any feedback!
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> >
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> > > [2]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> > > [3]:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
> > >
> >
> >
>


-- 
Best, Jingsong Lee

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Jark Wu <im...@gmail.com>.
Hi Timo,

I think it's a good idea to use `SqlParser#parseExpression()` to parse
literals.
That means the string format of literal is SQL compatible.
After some discussion with Kurt, we think why not one more step forward,
i.e. convert the whole expression to SQL format.

For example, the above expression will be converted to:

`cat`.`db`.`func`(`cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')

There are some benefits from this:
0) the string representation is more readable, and can be manually typed
more easily.
1) the string format is SQL syntax, not customized, which means it can be
integrated by third party projects.
2) we can reuse Calcite's SqlParser to parse string and SqlNode#unparse to
generate string, this can avoid introducing duplicate code and a custom
parser.
3) no compatible problems.

Regarding to how Expression can be converted into a SQL string, I think we
can leverage some Calcite utils:

ResolvedExpression ---(ExpressionConverter)---> RexNode
----(RexToSqlNodeConverter)---> SqlNode --> SqlNode#unparse()

What do you think?

Best,
Jark

On Mon, 21 Oct 2019 at 22:08, Timo Walther <tw...@apache.org> wrote:

> Hi Jark,
>
> thanks for the proposal. This is a great effort to finalize the new API
> design.
>
> I'm wondering if we could simply use the SQL parser like
> `org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse
> an expression that contain only literals. This would avoid any
> discussion as the syntax is already defined by the SQL standard. And it
> would also be very unlikely to have a need for a version.
>
> For example:
>
> CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21
> 12:12:12'))
>
> Or even further if the SQL parser allows that:
>
> CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')
>
> I would find it confusing if we use different representation for
> literals such as intervals and timestamps in the properties. This would
> also reduce code duplication as we reuse logic for parsing identifiers etc.
>
> What do you think?
>
> Regards,
> Timo
>
>
> On 18.10.19 12:28, Jark Wu wrote:
> > Hi everyone,
> >
> > I would like to start a discussion[1] about how to make Expression string
> > serializable and deserializable. Expression is the general interface for
> > all kinds of expressions in Flink Table API & SQL, it represents a
> logical
> > tree for producing a computation result. In FLIP-66[2] and FLIP-70[3], we
> > introduced watermark and computed column syntax in DDL. The watermark
> > strategy and computed column are both represented in Expression. In order
> > to persist watermark and computed column information in catalog, we need
> to
> > figure out how to persist and restore Expression.
> >
> > FLIP-80:
> >
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> >
> > Thanks for any feedback!
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> > [2]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> > [3]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
> >
>
>

Re: [DISUCSS] FLIP-80: Expression String Serializable and Deserializable

Posted by Timo Walther <tw...@apache.org>.
Hi Jark,

thanks for the proposal. This is a great effort to finalize the new API 
design.

I'm wondering if we could simply use the SQL parser like 
`org.apache.calcite.sql.parser.SqlParser#parseExpression(..)` to parse 
an expression that contain only literals. This would avoid any 
discussion as the syntax is already defined by the SQL standard. And it 
would also be very unlikely to have a need for a version.

For example:

CALL('FUNC', FIELD('f0'), VALUE('TIMESTAMP(3)', TIMESTAMP '2019-10-21 
12:12:12'))

Or even further if the SQL parser allows that:

CALL('FUNC', `cat`.`db`.`f0`, TIMESTAMP '2019-10-21 12:12:12')

I would find it confusing if we use different representation for 
literals such as intervals and timestamps in the properties. This would 
also reduce code duplication as we reuse logic for parsing identifiers etc.

What do you think?

Regards,
Timo


On 18.10.19 12:28, Jark Wu wrote:
> Hi everyone,
>
> I would like to start a discussion[1] about how to make Expression string
> serializable and deserializable. Expression is the general interface for
> all kinds of expressions in Flink Table API & SQL, it represents a logical
> tree for producing a computation result. In FLIP-66[2] and FLIP-70[3], we
> introduced watermark and computed column syntax in DDL. The watermark
> strategy and computed column are both represented in Expression. In order
> to persist watermark and computed column information in catalog, we need to
> figure out how to persist and restore Expression.
>
> FLIP-80:
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
>
> Thanks for any feedback!
>
> Best,
> Jark
>
> [1]:
> https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
> [2]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
> [3]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design
>