You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Gavin Ray <ra...@gmail.com> on 2022/03/18 15:50:11 UTC

Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Sorry to beat a dead horse here, but I'm one of those weirdos that gets a
lot of use out of Calcite's JSON operators.
Calcite's JSON implementation is broken for queries that have more than one
depth of JSON object/array calls.

The reason is because the operator calls "jsonize()", which parses the
(presumably) JVM object as a JSON string
This works for something like Map<String, String>, but if you have
Map<String, Map<String, String>>, what happens is this:

JSON_OBJECT(
  foo: 1,
  bar: JSON_OBJECT(
     qux: 2
  )
)

The parse happens inside-out, so first we get the innermost object parsed,
which gives:

{ "qux": 2 }

But -- as a string! This is important!
Now, when we parse the next object, we get this:

{ "foo": 1, "bar": \"{ \"qux\": 2 }\" }

This is because the object with "qux" isn't an object, but a string value
Which to be valid JSON, needs to have its quotes and braces escaped

Definitely not what you want, and the value isn't usable =(

Since there is no state/context/stack (that I can tell) when the parse
function is called,
how might it be possible to write something to the effect of:

"Analyze the query, and if the number of JSON operations is greater than
one,
only call 'jsonize()' on the outer-most parse/object."

https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Benchao Li <li...@apache.org>.
Glad to know there is already a issue tracking this, I will follow the
discussions there.

Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 22:17写道:

> Stamatis, how were you using FORMAT JSON?
>
> One thing I found is that you have to put it on every JSON-producing
> function
>
> On Tue, Mar 22, 2022 at 4:25 AM Stamatis Zampetakis <za...@gmail.com>
> wrote:
>
> > I left some comments under CALCITE-4989 with respect to FORMAT JSON etc.
> If
> > it really solves the problem can someone update the JIRA accordingly?
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-4989
> >
> > On Tue, Mar 22, 2022 at 3:56 AM Gavin Ray <ra...@gmail.com> wrote:
> >
> > > Wow, really appreciate you taking the time to explain how it works.
> > >
> > > Now just need to modify my PR to get this one-line change into the SQL
> > > parser so that it accepts sub-expressions in JSON queries
> > > (the other changes in that PR are invalid and should be dropped)
> > >
> > >
> > >
> >
> https://github.com/apache/calcite/pull/2730/files#diff-e873041549333502af52ece8a1b34301ae5a059ff4719e9bddbaef48929e7047
> > >
> > > One thing that seems odd though -- looking at
> > > "SqlJsonValueExpressionOperator" the definition seems to imply "FORMAT
> > > JSON":
> > >
> > >
> >
> https://github.com/apache/calcite/blob/170035fd97df1afdd0c0a499f63e0ca1606f7837/core/src/main/java/org/apache/calcite/sql/fun/SqlJsonValueExpressionOperator.java#L33
> > >
> > >   public SqlJsonValueExpressionOperator() {
> > >     super("FORMAT JSON", SqlKind.JSON_VALUE_EXPRESSION, 28,
> > >         ReturnTypes.explicit(SqlTypeName.ANY)
> > >             .andThen(SqlTypeTransforms.TO_NULLABLE),
> > >         null, OperandTypes.CHARACTER);
> > >   }
> > >
> > > Could it be possible that manually adding it FORMAT JSON makes it apply
> > > twice, or I wonder why it's not doing this by default?
> > >
> > >
> > >
> > > On Mon, Mar 21, 2022 at 8:06 PM Benchao Li <li...@apache.org>
> wrote:
> > >
> > > > Gavin,
> > > >
> > > > "FORMAT JSON" is an operator, it converts the json string into an
> > > internal
> > > > `JsonValueContext`[1],
> > > > and the `JsonValueContext` holds the deserialized json object which
> > will
> > > be
> > > > further used by
> > > > the following json functions.
> > > >
> > > > "FORMAT JSON" is just one choice of "JSON output clause", which is
> > stated
> > > > in the standard as:
> > > > <JSON output clause> ::=
> > > >   RETURNING <data type>
> > > >       [ FORMAT <JSON representation> ]
> > > > <JSON representation> ::=
> > > >     JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
> > > >   | <implementation-defined JSON representation option>
> > > >
> > > > Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
> > > > implemented yet in calcite).
> > > >
> > > > "JSON output clause" only controls the serialization format for json
> > > > function's output,
> > > > and "JSON FORMAT" should be the default. Hence the behavior without
> > > "FORMAT
> > > > JSON"
> > > > should be corrected IMHO.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101
> > > >
> > > > Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 03:48写道:
> > > >
> > > > > Benchao,
> > > > >
> > > > > I owe you a beer, thank you!!
> > > > > Using FORMAT JSON fixes everything -- it just has to be specified
> on
> > > > every
> > > > > JSON function.
> > > > >
> > > > > See images below if curious:
> > > > >
> > > > > Without FORMAT JSON:
> > > > > https://i.imgur.com/ekgFXqe.png
> > > > >
> > > > > With FORMAT JSON:
> > > > > https://i.imgur.com/8lHIqUZ.png
> > > > >
> > > > > This is fantastic news =D
> > > > > Now I am just curious how the "FORMAT JSON" modifier works.
> > > > >
> > > > > On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hot damn, you're right -- that doesn't look like it's
> > double-escaped!
> > > > > > I will try to this out ASAP and post an update here, thank you
> for
> > > the
> > > > > tip
> > > > > > =)
> > > > > >
> > > > > >
> > > > > > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <libenchao@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > >> Hi Gavin,
> > > > > >>
> > > > > >> Have you ever tried 'FORMAT JSON' like this[1], the tests show
> > that
> > > > the
> > > > > >> result of 'format json' is what you want.
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> > > > > >>
> > > > > >>
> > > > > >> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
> > > > > >>
> > > > > >> > That sounds very reasonable to me
> > > > > >> >
> > > > > >> > I don't know the Calcite codebase as well as other folks --
> and
> > > > > >> certainly
> > > > > >> > not nearly as well as you
> > > > > >> > Where would be the place to put such a thing/the overall
> > approach?
> > > > > >> >
> > > > > >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <
> > > jhyde.apache@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > I think you’re proposing making the JSON_ functions smarter
> at
> > > > > >> runtime.
> > > > > >> > My
> > > > > >> > > general philosophy is to have the smarts at prepare time and
> > > make
> > > > > the
> > > > > >> > > runtime operators dumb. I think that philosophy can be
> applied
> > > > here.
> > > > > >> Some
> > > > > >> > > extra logic would kick in when preparing a query that has
> > JSON_
> > > > > >> > functions,
> > > > > >> > > and that logic would fold together multiple nested JSON_
> > > > functions.
> > > > > >> > Perhaps
> > > > > >> > > we need to add a new (internal) JSON_ function that can do
> all
> > > of
> > > > > the
> > > > > >> > steps.
> > > > > >> > >
> > > > > >> > > Julian
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <
> > ray.gavin97@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > > Sorry to beat a dead horse here, but I'm one of those
> > weirdos
> > > > that
> > > > > >> > gets a
> > > > > >> > > > lot of use out of Calcite's JSON operators.
> > > > > >> > > > Calcite's JSON implementation is broken for queries that
> > have
> > > > more
> > > > > >> than
> > > > > >> > > one
> > > > > >> > > > depth of JSON object/array calls.
> > > > > >> > > >
> > > > > >> > > > The reason is because the operator calls "jsonize()",
> which
> > > > parses
> > > > > >> the
> > > > > >> > > > (presumably) JVM object as a JSON string
> > > > > >> > > > This works for something like Map<String, String>, but if
> > you
> > > > have
> > > > > >> > > > Map<String, Map<String, String>>, what happens is this:
> > > > > >> > > >
> > > > > >> > > > JSON_OBJECT(
> > > > > >> > > >  foo: 1,
> > > > > >> > > >  bar: JSON_OBJECT(
> > > > > >> > > >     qux: 2
> > > > > >> > > >  )
> > > > > >> > > > )
> > > > > >> > > >
> > > > > >> > > > The parse happens inside-out, so first we get the
> innermost
> > > > object
> > > > > >> > > parsed,
> > > > > >> > > > which gives:
> > > > > >> > > >
> > > > > >> > > > { "qux": 2 }
> > > > > >> > > >
> > > > > >> > > > But -- as a string! This is important!
> > > > > >> > > > Now, when we parse the next object, we get this:
> > > > > >> > > >
> > > > > >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > > > > >> > > >
> > > > > >> > > > This is because the object with "qux" isn't an object,
> but a
> > > > > string
> > > > > >> > value
> > > > > >> > > > Which to be valid JSON, needs to have its quotes and
> braces
> > > > > escaped
> > > > > >> > > >
> > > > > >> > > > Definitely not what you want, and the value isn't usable
> =(
> > > > > >> > > >
> > > > > >> > > > Since there is no state/context/stack (that I can tell)
> when
> > > the
> > > > > >> parse
> > > > > >> > > > function is called,
> > > > > >> > > > how might it be possible to write something to the effect
> > of:
> > > > > >> > > >
> > > > > >> > > > "Analyze the query, and if the number of JSON operations
> is
> > > > > greater
> > > > > >> > than
> > > > > >> > > > one,
> > > > > >> > > > only call 'jsonize()' on the outer-most parse/object."
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Best,
> > > > > >> Benchao Li
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best,
> > > > Benchao Li
> > > >
> > >
> >
>


-- 

Best,
Benchao Li

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Gavin Ray <ra...@gmail.com>.
Stamatis, how were you using FORMAT JSON?

One thing I found is that you have to put it on every JSON-producing
function

On Tue, Mar 22, 2022 at 4:25 AM Stamatis Zampetakis <za...@gmail.com>
wrote:

> I left some comments under CALCITE-4989 with respect to FORMAT JSON etc. If
> it really solves the problem can someone update the JIRA accordingly?
>
> Best,
> Stamatis
>
> [1] https://issues.apache.org/jira/browse/CALCITE-4989
>
> On Tue, Mar 22, 2022 at 3:56 AM Gavin Ray <ra...@gmail.com> wrote:
>
> > Wow, really appreciate you taking the time to explain how it works.
> >
> > Now just need to modify my PR to get this one-line change into the SQL
> > parser so that it accepts sub-expressions in JSON queries
> > (the other changes in that PR are invalid and should be dropped)
> >
> >
> >
> https://github.com/apache/calcite/pull/2730/files#diff-e873041549333502af52ece8a1b34301ae5a059ff4719e9bddbaef48929e7047
> >
> > One thing that seems odd though -- looking at
> > "SqlJsonValueExpressionOperator" the definition seems to imply "FORMAT
> > JSON":
> >
> >
> https://github.com/apache/calcite/blob/170035fd97df1afdd0c0a499f63e0ca1606f7837/core/src/main/java/org/apache/calcite/sql/fun/SqlJsonValueExpressionOperator.java#L33
> >
> >   public SqlJsonValueExpressionOperator() {
> >     super("FORMAT JSON", SqlKind.JSON_VALUE_EXPRESSION, 28,
> >         ReturnTypes.explicit(SqlTypeName.ANY)
> >             .andThen(SqlTypeTransforms.TO_NULLABLE),
> >         null, OperandTypes.CHARACTER);
> >   }
> >
> > Could it be possible that manually adding it FORMAT JSON makes it apply
> > twice, or I wonder why it's not doing this by default?
> >
> >
> >
> > On Mon, Mar 21, 2022 at 8:06 PM Benchao Li <li...@apache.org> wrote:
> >
> > > Gavin,
> > >
> > > "FORMAT JSON" is an operator, it converts the json string into an
> > internal
> > > `JsonValueContext`[1],
> > > and the `JsonValueContext` holds the deserialized json object which
> will
> > be
> > > further used by
> > > the following json functions.
> > >
> > > "FORMAT JSON" is just one choice of "JSON output clause", which is
> stated
> > > in the standard as:
> > > <JSON output clause> ::=
> > >   RETURNING <data type>
> > >       [ FORMAT <JSON representation> ]
> > > <JSON representation> ::=
> > >     JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
> > >   | <implementation-defined JSON representation option>
> > >
> > > Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
> > > implemented yet in calcite).
> > >
> > > "JSON output clause" only controls the serialization format for json
> > > function's output,
> > > and "JSON FORMAT" should be the default. Hence the behavior without
> > "FORMAT
> > > JSON"
> > > should be corrected IMHO.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101
> > >
> > > Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 03:48写道:
> > >
> > > > Benchao,
> > > >
> > > > I owe you a beer, thank you!!
> > > > Using FORMAT JSON fixes everything -- it just has to be specified on
> > > every
> > > > JSON function.
> > > >
> > > > See images below if curious:
> > > >
> > > > Without FORMAT JSON:
> > > > https://i.imgur.com/ekgFXqe.png
> > > >
> > > > With FORMAT JSON:
> > > > https://i.imgur.com/8lHIqUZ.png
> > > >
> > > > This is fantastic news =D
> > > > Now I am just curious how the "FORMAT JSON" modifier works.
> > > >
> > > > On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com>
> > > wrote:
> > > >
> > > > > Hot damn, you're right -- that doesn't look like it's
> double-escaped!
> > > > > I will try to this out ASAP and post an update here, thank you for
> > the
> > > > tip
> > > > > =)
> > > > >
> > > > >
> > > > > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org>
> > > wrote:
> > > > >
> > > > >> Hi Gavin,
> > > > >>
> > > > >> Have you ever tried 'FORMAT JSON' like this[1], the tests show
> that
> > > the
> > > > >> result of 'format json' is what you want.
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> > > > >>
> > > > >>
> > > > >> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
> > > > >>
> > > > >> > That sounds very reasonable to me
> > > > >> >
> > > > >> > I don't know the Calcite codebase as well as other folks -- and
> > > > >> certainly
> > > > >> > not nearly as well as you
> > > > >> > Where would be the place to put such a thing/the overall
> approach?
> > > > >> >
> > > > >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <
> > jhyde.apache@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > I think you’re proposing making the JSON_ functions smarter at
> > > > >> runtime.
> > > > >> > My
> > > > >> > > general philosophy is to have the smarts at prepare time and
> > make
> > > > the
> > > > >> > > runtime operators dumb. I think that philosophy can be applied
> > > here.
> > > > >> Some
> > > > >> > > extra logic would kick in when preparing a query that has
> JSON_
> > > > >> > functions,
> > > > >> > > and that logic would fold together multiple nested JSON_
> > > functions.
> > > > >> > Perhaps
> > > > >> > > we need to add a new (internal) JSON_ function that can do all
> > of
> > > > the
> > > > >> > steps.
> > > > >> > >
> > > > >> > > Julian
> > > > >> > >
> > > > >> > >
> > > > >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <
> ray.gavin97@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > Sorry to beat a dead horse here, but I'm one of those
> weirdos
> > > that
> > > > >> > gets a
> > > > >> > > > lot of use out of Calcite's JSON operators.
> > > > >> > > > Calcite's JSON implementation is broken for queries that
> have
> > > more
> > > > >> than
> > > > >> > > one
> > > > >> > > > depth of JSON object/array calls.
> > > > >> > > >
> > > > >> > > > The reason is because the operator calls "jsonize()", which
> > > parses
> > > > >> the
> > > > >> > > > (presumably) JVM object as a JSON string
> > > > >> > > > This works for something like Map<String, String>, but if
> you
> > > have
> > > > >> > > > Map<String, Map<String, String>>, what happens is this:
> > > > >> > > >
> > > > >> > > > JSON_OBJECT(
> > > > >> > > >  foo: 1,
> > > > >> > > >  bar: JSON_OBJECT(
> > > > >> > > >     qux: 2
> > > > >> > > >  )
> > > > >> > > > )
> > > > >> > > >
> > > > >> > > > The parse happens inside-out, so first we get the innermost
> > > object
> > > > >> > > parsed,
> > > > >> > > > which gives:
> > > > >> > > >
> > > > >> > > > { "qux": 2 }
> > > > >> > > >
> > > > >> > > > But -- as a string! This is important!
> > > > >> > > > Now, when we parse the next object, we get this:
> > > > >> > > >
> > > > >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > > > >> > > >
> > > > >> > > > This is because the object with "qux" isn't an object, but a
> > > > string
> > > > >> > value
> > > > >> > > > Which to be valid JSON, needs to have its quotes and braces
> > > > escaped
> > > > >> > > >
> > > > >> > > > Definitely not what you want, and the value isn't usable =(
> > > > >> > > >
> > > > >> > > > Since there is no state/context/stack (that I can tell) when
> > the
> > > > >> parse
> > > > >> > > > function is called,
> > > > >> > > > how might it be possible to write something to the effect
> of:
> > > > >> > > >
> > > > >> > > > "Analyze the query, and if the number of JSON operations is
> > > > greater
> > > > >> > than
> > > > >> > > > one,
> > > > >> > > > only call 'jsonize()' on the outer-most parse/object."
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Best,
> > > > >> Benchao Li
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
> >
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Stamatis Zampetakis <za...@gmail.com>.
I left some comments under CALCITE-4989 with respect to FORMAT JSON etc. If
it really solves the problem can someone update the JIRA accordingly?

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/CALCITE-4989

On Tue, Mar 22, 2022 at 3:56 AM Gavin Ray <ra...@gmail.com> wrote:

> Wow, really appreciate you taking the time to explain how it works.
>
> Now just need to modify my PR to get this one-line change into the SQL
> parser so that it accepts sub-expressions in JSON queries
> (the other changes in that PR are invalid and should be dropped)
>
>
> https://github.com/apache/calcite/pull/2730/files#diff-e873041549333502af52ece8a1b34301ae5a059ff4719e9bddbaef48929e7047
>
> One thing that seems odd though -- looking at
> "SqlJsonValueExpressionOperator" the definition seems to imply "FORMAT
> JSON":
>
> https://github.com/apache/calcite/blob/170035fd97df1afdd0c0a499f63e0ca1606f7837/core/src/main/java/org/apache/calcite/sql/fun/SqlJsonValueExpressionOperator.java#L33
>
>   public SqlJsonValueExpressionOperator() {
>     super("FORMAT JSON", SqlKind.JSON_VALUE_EXPRESSION, 28,
>         ReturnTypes.explicit(SqlTypeName.ANY)
>             .andThen(SqlTypeTransforms.TO_NULLABLE),
>         null, OperandTypes.CHARACTER);
>   }
>
> Could it be possible that manually adding it FORMAT JSON makes it apply
> twice, or I wonder why it's not doing this by default?
>
>
>
> On Mon, Mar 21, 2022 at 8:06 PM Benchao Li <li...@apache.org> wrote:
>
> > Gavin,
> >
> > "FORMAT JSON" is an operator, it converts the json string into an
> internal
> > `JsonValueContext`[1],
> > and the `JsonValueContext` holds the deserialized json object which will
> be
> > further used by
> > the following json functions.
> >
> > "FORMAT JSON" is just one choice of "JSON output clause", which is stated
> > in the standard as:
> > <JSON output clause> ::=
> >   RETURNING <data type>
> >       [ FORMAT <JSON representation> ]
> > <JSON representation> ::=
> >     JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
> >   | <implementation-defined JSON representation option>
> >
> > Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
> > implemented yet in calcite).
> >
> > "JSON output clause" only controls the serialization format for json
> > function's output,
> > and "JSON FORMAT" should be the default. Hence the behavior without
> "FORMAT
> > JSON"
> > should be corrected IMHO.
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101
> >
> > Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 03:48写道:
> >
> > > Benchao,
> > >
> > > I owe you a beer, thank you!!
> > > Using FORMAT JSON fixes everything -- it just has to be specified on
> > every
> > > JSON function.
> > >
> > > See images below if curious:
> > >
> > > Without FORMAT JSON:
> > > https://i.imgur.com/ekgFXqe.png
> > >
> > > With FORMAT JSON:
> > > https://i.imgur.com/8lHIqUZ.png
> > >
> > > This is fantastic news =D
> > > Now I am just curious how the "FORMAT JSON" modifier works.
> > >
> > > On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com>
> > wrote:
> > >
> > > > Hot damn, you're right -- that doesn't look like it's double-escaped!
> > > > I will try to this out ASAP and post an update here, thank you for
> the
> > > tip
> > > > =)
> > > >
> > > >
> > > > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org>
> > wrote:
> > > >
> > > >> Hi Gavin,
> > > >>
> > > >> Have you ever tried 'FORMAT JSON' like this[1], the tests show that
> > the
> > > >> result of 'format json' is what you want.
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> > > >>
> > > >>
> > > >> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
> > > >>
> > > >> > That sounds very reasonable to me
> > > >> >
> > > >> > I don't know the Calcite codebase as well as other folks -- and
> > > >> certainly
> > > >> > not nearly as well as you
> > > >> > Where would be the place to put such a thing/the overall approach?
> > > >> >
> > > >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <
> jhyde.apache@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > I think you’re proposing making the JSON_ functions smarter at
> > > >> runtime.
> > > >> > My
> > > >> > > general philosophy is to have the smarts at prepare time and
> make
> > > the
> > > >> > > runtime operators dumb. I think that philosophy can be applied
> > here.
> > > >> Some
> > > >> > > extra logic would kick in when preparing a query that has JSON_
> > > >> > functions,
> > > >> > > and that logic would fold together multiple nested JSON_
> > functions.
> > > >> > Perhaps
> > > >> > > we need to add a new (internal) JSON_ function that can do all
> of
> > > the
> > > >> > steps.
> > > >> > >
> > > >> > > Julian
> > > >> > >
> > > >> > >
> > > >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ray.gavin97@gmail.com
> >
> > > >> wrote:
> > > >> > > >
> > > >> > > > Sorry to beat a dead horse here, but I'm one of those weirdos
> > that
> > > >> > gets a
> > > >> > > > lot of use out of Calcite's JSON operators.
> > > >> > > > Calcite's JSON implementation is broken for queries that have
> > more
> > > >> than
> > > >> > > one
> > > >> > > > depth of JSON object/array calls.
> > > >> > > >
> > > >> > > > The reason is because the operator calls "jsonize()", which
> > parses
> > > >> the
> > > >> > > > (presumably) JVM object as a JSON string
> > > >> > > > This works for something like Map<String, String>, but if you
> > have
> > > >> > > > Map<String, Map<String, String>>, what happens is this:
> > > >> > > >
> > > >> > > > JSON_OBJECT(
> > > >> > > >  foo: 1,
> > > >> > > >  bar: JSON_OBJECT(
> > > >> > > >     qux: 2
> > > >> > > >  )
> > > >> > > > )
> > > >> > > >
> > > >> > > > The parse happens inside-out, so first we get the innermost
> > object
> > > >> > > parsed,
> > > >> > > > which gives:
> > > >> > > >
> > > >> > > > { "qux": 2 }
> > > >> > > >
> > > >> > > > But -- as a string! This is important!
> > > >> > > > Now, when we parse the next object, we get this:
> > > >> > > >
> > > >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > > >> > > >
> > > >> > > > This is because the object with "qux" isn't an object, but a
> > > string
> > > >> > value
> > > >> > > > Which to be valid JSON, needs to have its quotes and braces
> > > escaped
> > > >> > > >
> > > >> > > > Definitely not what you want, and the value isn't usable =(
> > > >> > > >
> > > >> > > > Since there is no state/context/stack (that I can tell) when
> the
> > > >> parse
> > > >> > > > function is called,
> > > >> > > > how might it be possible to write something to the effect of:
> > > >> > > >
> > > >> > > > "Analyze the query, and if the number of JSON operations is
> > > greater
> > > >> > than
> > > >> > > > one,
> > > >> > > > only call 'jsonize()' on the outer-most parse/object."
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Best,
> > > >> Benchao Li
> > > >>
> > > >
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Gavin Ray <ra...@gmail.com>.
Wow, really appreciate you taking the time to explain how it works.

Now just need to modify my PR to get this one-line change into the SQL
parser so that it accepts sub-expressions in JSON queries
(the other changes in that PR are invalid and should be dropped)

https://github.com/apache/calcite/pull/2730/files#diff-e873041549333502af52ece8a1b34301ae5a059ff4719e9bddbaef48929e7047

One thing that seems odd though -- looking at
"SqlJsonValueExpressionOperator" the definition seems to imply "FORMAT
JSON":
https://github.com/apache/calcite/blob/170035fd97df1afdd0c0a499f63e0ca1606f7837/core/src/main/java/org/apache/calcite/sql/fun/SqlJsonValueExpressionOperator.java#L33

  public SqlJsonValueExpressionOperator() {
    super("FORMAT JSON", SqlKind.JSON_VALUE_EXPRESSION, 28,
        ReturnTypes.explicit(SqlTypeName.ANY)
            .andThen(SqlTypeTransforms.TO_NULLABLE),
        null, OperandTypes.CHARACTER);
  }

Could it be possible that manually adding it FORMAT JSON makes it apply
twice, or I wonder why it's not doing this by default?



On Mon, Mar 21, 2022 at 8:06 PM Benchao Li <li...@apache.org> wrote:

> Gavin,
>
> "FORMAT JSON" is an operator, it converts the json string into an internal
> `JsonValueContext`[1],
> and the `JsonValueContext` holds the deserialized json object which will be
> further used by
> the following json functions.
>
> "FORMAT JSON" is just one choice of "JSON output clause", which is stated
> in the standard as:
> <JSON output clause> ::=
>   RETURNING <data type>
>       [ FORMAT <JSON representation> ]
> <JSON representation> ::=
>     JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
>   | <implementation-defined JSON representation option>
>
> Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
> implemented yet in calcite).
>
> "JSON output clause" only controls the serialization format for json
> function's output,
> and "JSON FORMAT" should be the default. Hence the behavior without "FORMAT
> JSON"
> should be corrected IMHO.
>
> [1]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101
>
> Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 03:48写道:
>
> > Benchao,
> >
> > I owe you a beer, thank you!!
> > Using FORMAT JSON fixes everything -- it just has to be specified on
> every
> > JSON function.
> >
> > See images below if curious:
> >
> > Without FORMAT JSON:
> > https://i.imgur.com/ekgFXqe.png
> >
> > With FORMAT JSON:
> > https://i.imgur.com/8lHIqUZ.png
> >
> > This is fantastic news =D
> > Now I am just curious how the "FORMAT JSON" modifier works.
> >
> > On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com>
> wrote:
> >
> > > Hot damn, you're right -- that doesn't look like it's double-escaped!
> > > I will try to this out ASAP and post an update here, thank you for the
> > tip
> > > =)
> > >
> > >
> > > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org>
> wrote:
> > >
> > >> Hi Gavin,
> > >>
> > >> Have you ever tried 'FORMAT JSON' like this[1], the tests show that
> the
> > >> result of 'format json' is what you want.
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> > >>
> > >>
> > >> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
> > >>
> > >> > That sounds very reasonable to me
> > >> >
> > >> > I don't know the Calcite codebase as well as other folks -- and
> > >> certainly
> > >> > not nearly as well as you
> > >> > Where would be the place to put such a thing/the overall approach?
> > >> >
> > >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jhyde.apache@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > I think you’re proposing making the JSON_ functions smarter at
> > >> runtime.
> > >> > My
> > >> > > general philosophy is to have the smarts at prepare time and make
> > the
> > >> > > runtime operators dumb. I think that philosophy can be applied
> here.
> > >> Some
> > >> > > extra logic would kick in when preparing a query that has JSON_
> > >> > functions,
> > >> > > and that logic would fold together multiple nested JSON_
> functions.
> > >> > Perhaps
> > >> > > we need to add a new (internal) JSON_ function that can do all of
> > the
> > >> > steps.
> > >> > >
> > >> > > Julian
> > >> > >
> > >> > >
> > >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > Sorry to beat a dead horse here, but I'm one of those weirdos
> that
> > >> > gets a
> > >> > > > lot of use out of Calcite's JSON operators.
> > >> > > > Calcite's JSON implementation is broken for queries that have
> more
> > >> than
> > >> > > one
> > >> > > > depth of JSON object/array calls.
> > >> > > >
> > >> > > > The reason is because the operator calls "jsonize()", which
> parses
> > >> the
> > >> > > > (presumably) JVM object as a JSON string
> > >> > > > This works for something like Map<String, String>, but if you
> have
> > >> > > > Map<String, Map<String, String>>, what happens is this:
> > >> > > >
> > >> > > > JSON_OBJECT(
> > >> > > >  foo: 1,
> > >> > > >  bar: JSON_OBJECT(
> > >> > > >     qux: 2
> > >> > > >  )
> > >> > > > )
> > >> > > >
> > >> > > > The parse happens inside-out, so first we get the innermost
> object
> > >> > > parsed,
> > >> > > > which gives:
> > >> > > >
> > >> > > > { "qux": 2 }
> > >> > > >
> > >> > > > But -- as a string! This is important!
> > >> > > > Now, when we parse the next object, we get this:
> > >> > > >
> > >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > >> > > >
> > >> > > > This is because the object with "qux" isn't an object, but a
> > string
> > >> > value
> > >> > > > Which to be valid JSON, needs to have its quotes and braces
> > escaped
> > >> > > >
> > >> > > > Definitely not what you want, and the value isn't usable =(
> > >> > > >
> > >> > > > Since there is no state/context/stack (that I can tell) when the
> > >> parse
> > >> > > > function is called,
> > >> > > > how might it be possible to write something to the effect of:
> > >> > > >
> > >> > > > "Analyze the query, and if the number of JSON operations is
> > greater
> > >> > than
> > >> > > > one,
> > >> > > > only call 'jsonize()' on the outer-most parse/object."
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> > >> > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >>
> > >> Best,
> > >> Benchao Li
> > >>
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Benchao Li <li...@apache.org>.
Gavin,

"FORMAT JSON" is an operator, it converts the json string into an internal
`JsonValueContext`[1],
and the `JsonValueContext` holds the deserialized json object which will be
further used by
the following json functions.

"FORMAT JSON" is just one choice of "JSON output clause", which is stated
in the standard as:
<JSON output clause> ::=
  RETURNING <data type>
      [ FORMAT <JSON representation> ]
<JSON representation> ::=
    JSON [ ENCODING { UTF8 | UTF16 | UTF32 } ]
  | <implementation-defined JSON representation option>

Other options could be "FORMAT AVRO", "FORMAT BSON" (which is not
implemented yet in calcite).

"JSON output clause" only controls the serialization format for json
function's output,
and "JSON FORMAT" should be the default. Hence the behavior without "FORMAT
JSON"
should be corrected IMHO.

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L101

Gavin Ray <ra...@gmail.com> 于2022年3月22日周二 03:48写道:

> Benchao,
>
> I owe you a beer, thank you!!
> Using FORMAT JSON fixes everything -- it just has to be specified on every
> JSON function.
>
> See images below if curious:
>
> Without FORMAT JSON:
> https://i.imgur.com/ekgFXqe.png
>
> With FORMAT JSON:
> https://i.imgur.com/8lHIqUZ.png
>
> This is fantastic news =D
> Now I am just curious how the "FORMAT JSON" modifier works.
>
> On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com> wrote:
>
> > Hot damn, you're right -- that doesn't look like it's double-escaped!
> > I will try to this out ASAP and post an update here, thank you for the
> tip
> > =)
> >
> >
> > On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org> wrote:
> >
> >> Hi Gavin,
> >>
> >> Have you ever tried 'FORMAT JSON' like this[1], the tests show that the
> >> result of 'format json' is what you want.
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
> >>
> >>
> >> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
> >>
> >> > That sounds very reasonable to me
> >> >
> >> > I don't know the Calcite codebase as well as other folks -- and
> >> certainly
> >> > not nearly as well as you
> >> > Where would be the place to put such a thing/the overall approach?
> >> >
> >> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jh...@gmail.com>
> >> > wrote:
> >> >
> >> > > I think you’re proposing making the JSON_ functions smarter at
> >> runtime.
> >> > My
> >> > > general philosophy is to have the smarts at prepare time and make
> the
> >> > > runtime operators dumb. I think that philosophy can be applied here.
> >> Some
> >> > > extra logic would kick in when preparing a query that has JSON_
> >> > functions,
> >> > > and that logic would fold together multiple nested JSON_ functions.
> >> > Perhaps
> >> > > we need to add a new (internal) JSON_ function that can do all of
> the
> >> > steps.
> >> > >
> >> > > Julian
> >> > >
> >> > >
> >> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > Sorry to beat a dead horse here, but I'm one of those weirdos that
> >> > gets a
> >> > > > lot of use out of Calcite's JSON operators.
> >> > > > Calcite's JSON implementation is broken for queries that have more
> >> than
> >> > > one
> >> > > > depth of JSON object/array calls.
> >> > > >
> >> > > > The reason is because the operator calls "jsonize()", which parses
> >> the
> >> > > > (presumably) JVM object as a JSON string
> >> > > > This works for something like Map<String, String>, but if you have
> >> > > > Map<String, Map<String, String>>, what happens is this:
> >> > > >
> >> > > > JSON_OBJECT(
> >> > > >  foo: 1,
> >> > > >  bar: JSON_OBJECT(
> >> > > >     qux: 2
> >> > > >  )
> >> > > > )
> >> > > >
> >> > > > The parse happens inside-out, so first we get the innermost object
> >> > > parsed,
> >> > > > which gives:
> >> > > >
> >> > > > { "qux": 2 }
> >> > > >
> >> > > > But -- as a string! This is important!
> >> > > > Now, when we parse the next object, we get this:
> >> > > >
> >> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> >> > > >
> >> > > > This is because the object with "qux" isn't an object, but a
> string
> >> > value
> >> > > > Which to be valid JSON, needs to have its quotes and braces
> escaped
> >> > > >
> >> > > > Definitely not what you want, and the value isn't usable =(
> >> > > >
> >> > > > Since there is no state/context/stack (that I can tell) when the
> >> parse
> >> > > > function is called,
> >> > > > how might it be possible to write something to the effect of:
> >> > > >
> >> > > > "Analyze the query, and if the number of JSON operations is
> greater
> >> > than
> >> > > > one,
> >> > > > only call 'jsonize()' on the outer-most parse/object."
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> >> > >
> >> > >
> >> >
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >>
> >
>


-- 

Best,
Benchao Li

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Gavin Ray <ra...@gmail.com>.
Benchao,

I owe you a beer, thank you!!
Using FORMAT JSON fixes everything -- it just has to be specified on every
JSON function.

See images below if curious:

Without FORMAT JSON:
https://i.imgur.com/ekgFXqe.png

With FORMAT JSON:
https://i.imgur.com/8lHIqUZ.png

This is fantastic news =D
Now I am just curious how the "FORMAT JSON" modifier works.

On Sun, Mar 20, 2022 at 10:42 PM Gavin Ray <ra...@gmail.com> wrote:

> Hot damn, you're right -- that doesn't look like it's double-escaped!
> I will try to this out ASAP and post an update here, thank you for the tip
> =)
>
>
> On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org> wrote:
>
>> Hi Gavin,
>>
>> Have you ever tried 'FORMAT JSON' like this[1], the tests show that the
>> result of 'format json' is what you want.
>>
>> [1]
>>
>> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
>>
>>
>> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
>>
>> > That sounds very reasonable to me
>> >
>> > I don't know the Calcite codebase as well as other folks -- and
>> certainly
>> > not nearly as well as you
>> > Where would be the place to put such a thing/the overall approach?
>> >
>> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jh...@gmail.com>
>> > wrote:
>> >
>> > > I think you’re proposing making the JSON_ functions smarter at
>> runtime.
>> > My
>> > > general philosophy is to have the smarts at prepare time and make the
>> > > runtime operators dumb. I think that philosophy can be applied here.
>> Some
>> > > extra logic would kick in when preparing a query that has JSON_
>> > functions,
>> > > and that logic would fold together multiple nested JSON_ functions.
>> > Perhaps
>> > > we need to add a new (internal) JSON_ function that can do all of the
>> > steps.
>> > >
>> > > Julian
>> > >
>> > >
>> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com>
>> wrote:
>> > > >
>> > > > Sorry to beat a dead horse here, but I'm one of those weirdos that
>> > gets a
>> > > > lot of use out of Calcite's JSON operators.
>> > > > Calcite's JSON implementation is broken for queries that have more
>> than
>> > > one
>> > > > depth of JSON object/array calls.
>> > > >
>> > > > The reason is because the operator calls "jsonize()", which parses
>> the
>> > > > (presumably) JVM object as a JSON string
>> > > > This works for something like Map<String, String>, but if you have
>> > > > Map<String, Map<String, String>>, what happens is this:
>> > > >
>> > > > JSON_OBJECT(
>> > > >  foo: 1,
>> > > >  bar: JSON_OBJECT(
>> > > >     qux: 2
>> > > >  )
>> > > > )
>> > > >
>> > > > The parse happens inside-out, so first we get the innermost object
>> > > parsed,
>> > > > which gives:
>> > > >
>> > > > { "qux": 2 }
>> > > >
>> > > > But -- as a string! This is important!
>> > > > Now, when we parse the next object, we get this:
>> > > >
>> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
>> > > >
>> > > > This is because the object with "qux" isn't an object, but a string
>> > value
>> > > > Which to be valid JSON, needs to have its quotes and braces escaped
>> > > >
>> > > > Definitely not what you want, and the value isn't usable =(
>> > > >
>> > > > Since there is no state/context/stack (that I can tell) when the
>> parse
>> > > > function is called,
>> > > > how might it be possible to write something to the effect of:
>> > > >
>> > > > "Analyze the query, and if the number of JSON operations is greater
>> > than
>> > > > one,
>> > > > only call 'jsonize()' on the outer-most parse/object."
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
>> > > >
>> > >
>> >
>> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
>> > >
>> > >
>> >
>>
>>
>> --
>>
>> Best,
>> Benchao Li
>>
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Gavin Ray <ra...@gmail.com>.
Hot damn, you're right -- that doesn't look like it's double-escaped!
I will try to this out ASAP and post an update here, thank you for the tip
=)


On Sun, Mar 20, 2022 at 8:51 PM Benchao Li <li...@apache.org> wrote:

> Hi Gavin,
>
> Have you ever tried 'FORMAT JSON' like this[1], the tests show that the
> result of 'format json' is what you want.
>
> [1]
>
> https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381
>
>
> Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:
>
> > That sounds very reasonable to me
> >
> > I don't know the Calcite codebase as well as other folks -- and certainly
> > not nearly as well as you
> > Where would be the place to put such a thing/the overall approach?
> >
> > On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jh...@gmail.com>
> > wrote:
> >
> > > I think you’re proposing making the JSON_ functions smarter at runtime.
> > My
> > > general philosophy is to have the smarts at prepare time and make the
> > > runtime operators dumb. I think that philosophy can be applied here.
> Some
> > > extra logic would kick in when preparing a query that has JSON_
> > functions,
> > > and that logic would fold together multiple nested JSON_ functions.
> > Perhaps
> > > we need to add a new (internal) JSON_ function that can do all of the
> > steps.
> > >
> > > Julian
> > >
> > >
> > > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com>
> wrote:
> > > >
> > > > Sorry to beat a dead horse here, but I'm one of those weirdos that
> > gets a
> > > > lot of use out of Calcite's JSON operators.
> > > > Calcite's JSON implementation is broken for queries that have more
> than
> > > one
> > > > depth of JSON object/array calls.
> > > >
> > > > The reason is because the operator calls "jsonize()", which parses
> the
> > > > (presumably) JVM object as a JSON string
> > > > This works for something like Map<String, String>, but if you have
> > > > Map<String, Map<String, String>>, what happens is this:
> > > >
> > > > JSON_OBJECT(
> > > >  foo: 1,
> > > >  bar: JSON_OBJECT(
> > > >     qux: 2
> > > >  )
> > > > )
> > > >
> > > > The parse happens inside-out, so first we get the innermost object
> > > parsed,
> > > > which gives:
> > > >
> > > > { "qux": 2 }
> > > >
> > > > But -- as a string! This is important!
> > > > Now, when we parse the next object, we get this:
> > > >
> > > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > > >
> > > > This is because the object with "qux" isn't an object, but a string
> > value
> > > > Which to be valid JSON, needs to have its quotes and braces escaped
> > > >
> > > > Definitely not what you want, and the value isn't usable =(
> > > >
> > > > Since there is no state/context/stack (that I can tell) when the
> parse
> > > > function is called,
> > > > how might it be possible to write something to the effect of:
> > > >
> > > > "Analyze the query, and if the number of JSON operations is greater
> > than
> > > > one,
> > > > only call 'jsonize()' on the outer-most parse/object."
> > > >
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > > >
> > >
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> > >
> > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Benchao Li <li...@apache.org>.
Hi Gavin,

Have you ever tried 'FORMAT JSON' like this[1], the tests show that the
result of 'format json' is what you want.

[1]
https://github.com/apache/calcite/blob/master/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4381


Gavin Ray <ra...@gmail.com> 于2022年3月19日周六 05:19写道:

> That sounds very reasonable to me
>
> I don't know the Calcite codebase as well as other folks -- and certainly
> not nearly as well as you
> Where would be the place to put such a thing/the overall approach?
>
> On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jh...@gmail.com>
> wrote:
>
> > I think you’re proposing making the JSON_ functions smarter at runtime.
> My
> > general philosophy is to have the smarts at prepare time and make the
> > runtime operators dumb. I think that philosophy can be applied here. Some
> > extra logic would kick in when preparing a query that has JSON_
> functions,
> > and that logic would fold together multiple nested JSON_ functions.
> Perhaps
> > we need to add a new (internal) JSON_ function that can do all of the
> steps.
> >
> > Julian
> >
> >
> > > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com> wrote:
> > >
> > > Sorry to beat a dead horse here, but I'm one of those weirdos that
> gets a
> > > lot of use out of Calcite's JSON operators.
> > > Calcite's JSON implementation is broken for queries that have more than
> > one
> > > depth of JSON object/array calls.
> > >
> > > The reason is because the operator calls "jsonize()", which parses the
> > > (presumably) JVM object as a JSON string
> > > This works for something like Map<String, String>, but if you have
> > > Map<String, Map<String, String>>, what happens is this:
> > >
> > > JSON_OBJECT(
> > >  foo: 1,
> > >  bar: JSON_OBJECT(
> > >     qux: 2
> > >  )
> > > )
> > >
> > > The parse happens inside-out, so first we get the innermost object
> > parsed,
> > > which gives:
> > >
> > > { "qux": 2 }
> > >
> > > But -- as a string! This is important!
> > > Now, when we parse the next object, we get this:
> > >
> > > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> > >
> > > This is because the object with "qux" isn't an object, but a string
> value
> > > Which to be valid JSON, needs to have its quotes and braces escaped
> > >
> > > Definitely not what you want, and the value isn't usable =(
> > >
> > > Since there is no state/context/stack (that I can tell) when the parse
> > > function is called,
> > > how might it be possible to write something to the effect of:
> > >
> > > "Analyze the query, and if the number of JSON operations is greater
> than
> > > one,
> > > only call 'jsonize()' on the outer-most parse/object."
> > >
> > >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> > >
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
> >
> >
>


-- 

Best,
Benchao Li

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Gavin Ray <ra...@gmail.com>.
That sounds very reasonable to me

I don't know the Calcite codebase as well as other folks -- and certainly
not nearly as well as you
Where would be the place to put such a thing/the overall approach?

On Fri, Mar 18, 2022 at 1:53 PM Julian Hyde <jh...@gmail.com> wrote:

> I think you’re proposing making the JSON_ functions smarter at runtime. My
> general philosophy is to have the smarts at prepare time and make the
> runtime operators dumb. I think that philosophy can be applied here. Some
> extra logic would kick in when preparing a query that has JSON_ functions,
> and that logic would fold together multiple nested JSON_ functions. Perhaps
> we need to add a new (internal) JSON_ function that can do all of the steps.
>
> Julian
>
>
> > On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com> wrote:
> >
> > Sorry to beat a dead horse here, but I'm one of those weirdos that gets a
> > lot of use out of Calcite's JSON operators.
> > Calcite's JSON implementation is broken for queries that have more than
> one
> > depth of JSON object/array calls.
> >
> > The reason is because the operator calls "jsonize()", which parses the
> > (presumably) JVM object as a JSON string
> > This works for something like Map<String, String>, but if you have
> > Map<String, Map<String, String>>, what happens is this:
> >
> > JSON_OBJECT(
> >  foo: 1,
> >  bar: JSON_OBJECT(
> >     qux: 2
> >  )
> > )
> >
> > The parse happens inside-out, so first we get the innermost object
> parsed,
> > which gives:
> >
> > { "qux": 2 }
> >
> > But -- as a string! This is important!
> > Now, when we parse the next object, we get this:
> >
> > { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> >
> > This is because the object with "qux" isn't an object, but a string value
> > Which to be valid JSON, needs to have its quotes and braces escaped
> >
> > Definitely not what you want, and the value isn't usable =(
> >
> > Since there is no state/context/stack (that I can tell) when the parse
> > function is called,
> > how might it be possible to write something to the effect of:
> >
> > "Analyze the query, and if the number of JSON operations is greater than
> > one,
> > only call 'jsonize()' on the outer-most parse/object."
> >
> >
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> >
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95
>
>

Re: Context-aware/stateful SQL operator functions? (IE adding a stack for number of times an operator has been called in query)

Posted by Julian Hyde <jh...@gmail.com>.
I think you’re proposing making the JSON_ functions smarter at runtime. My general philosophy is to have the smarts at prepare time and make the runtime operators dumb. I think that philosophy can be applied here. Some extra logic would kick in when preparing a query that has JSON_ functions, and that logic would fold together multiple nested JSON_ functions. Perhaps we need to add a new (internal) JSON_ function that can do all of the steps.

Julian


> On Mar 18, 2022, at 8:50 AM, Gavin Ray <ra...@gmail.com> wrote:
> 
> Sorry to beat a dead horse here, but I'm one of those weirdos that gets a
> lot of use out of Calcite's JSON operators.
> Calcite's JSON implementation is broken for queries that have more than one
> depth of JSON object/array calls.
> 
> The reason is because the operator calls "jsonize()", which parses the
> (presumably) JVM object as a JSON string
> This works for something like Map<String, String>, but if you have
> Map<String, Map<String, String>>, what happens is this:
> 
> JSON_OBJECT(
>  foo: 1,
>  bar: JSON_OBJECT(
>     qux: 2
>  )
> )
> 
> The parse happens inside-out, so first we get the innermost object parsed,
> which gives:
> 
> { "qux": 2 }
> 
> But -- as a string! This is important!
> Now, when we parse the next object, we get this:
> 
> { "foo": 1, "bar": \"{ \"qux\": 2 }\" }
> 
> This is because the object with "qux" isn't an object, but a string value
> Which to be valid JSON, needs to have its quotes and braces escaped
> 
> Definitely not what you want, and the value isn't usable =(
> 
> Since there is no state/context/stack (that I can tell) when the parse
> function is called,
> how might it be possible to write something to the effect of:
> 
> "Analyze the query, and if the number of JSON operations is greater than
> one,
> only call 'jsonize()' on the outer-most parse/object."
> 
> https://github.com/apache/calcite/blob/8d21c3f2f0b75d788e70bbeea9746695f2fde552/core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java#L1886-L1891
> https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/runtime/JsonFunctions.java#L93-L95