You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Soumyadeep Mukhopadhyay <so...@gmail.com> on 2023/06/03 09:45:27 UTC
Implementing EXPLODE operator
Hello all,
I was wondering if there's any virtue in creating an operator like UNNEST
by having a similar implementation like that of SqlUnnestOperator.
(SqlExplodeOperator maybe)
The functionalities are close but there are nuances that I have not
discovered yet.
My approach was kind of a hack :
Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator name
as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
.nullable(true).build() in the inferReturnType to "$explode" -> then check
if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
NOTE: I have also tried creating a new kind EXPLODE and then followed the
above step, both of them work.
As Julian had pointed out about semantics and how everything within Calcite
should be in Calcite's dialect, I feel there could be antipatterns in my
approach.
Could any of you please point me to where I might be going wrong? How could
I make this more robust?
(If any of you are wondering why I am doing this then this is what I want
to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
"SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
Spark for example)
Thank you for your time!
Best,
Soumyadeep.
RE: Implementing EXPLODE operator
Posted by mb...@gmail.com.
This seems to be the same as UNNEST WITH ORDINALITY, in Calcite since 2016.
https://issues.apache.org/jira/browse/CALCITE-854
Mihai
-----Original Message-----
From: Soumyadeep Mukhopadhyay
Sent: Saturday, June 03, 2023 10:42 AM
To: dev@calcite.apache.org
Subject: Re: Implementing EXPLODE operator
My apologies for not providing the context. The following links may provide a better background :
Vertica
https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
and Spark
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
use the function EXPLODE and the way it is supposed to work is convert a single row with an array or set to multiple rows, the number of rows shall be equal to the number of items in the array / set (and also hashmaps in the case of Spark).
For Snowflake the equivalent would be
https://docs.snowflake.com/en/sql-reference/functions/flatten.
They are variations of the existing UNNEST. So if I have to implement them shall I inherit from the existing SqlUnnestOperator?
Thanks again for your inputs!
Soumyadeep.
On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com> wrote:
> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in
> the SQL standard.
>
> Julian
>
> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay
> > <so...@gmail.com>
> wrote:
> >
> > Actually I am hoping to contribute, but only if it doesn’t seem
> redundant.
> > That’s why I wanted to understand if the effort would even make sense.
> Was
> > EXPLODE not included by design? Would you recommend against adding it?
> >
> > Soumyadeep.
> >
> >
> >
> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
> wrote:
> >>
> >> I can’t tell whether you are intending to contribute EXPLODE or do
> >> it
> on a
> >> private branch. You don’t make a case for what EXPLODE could do, so
> >> I presume the latter.
> >>
> >> UNNEST is a unique function. Its implementation is (unfortunately
> >> but
> >> necessarily) spread over many files. Copy-pasting it to make a new
> function
> >> seems like a bad idea, because you would be multiplying a mess. I
> >> don’t know what you’re trying to achieve but I would lean on the
> >> existing facilities (UDFs, table functions, adding syntactic sugar if necessary).
> >>
> >> Julian
> >>
> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> >> soumyamyb95@gmail.com> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I was wondering if there's any virtue in creating an operator like
> UNNEST
> >>> by having a similar implementation like that of SqlUnnestOperator.
> >>> (SqlExplodeOperator maybe)
> >>>
> >>> The functionalities are close but there are nuances that I have
> >>> not discovered yet.
> >>>
> >>> My approach was kind of a hack :
> >>> Clone the class SqlUnnestOperator -> rename the
> >>> SqlFunctionalOperator
> >> name
> >>> as "EXPLODE" and rename return builder.add("$unnest",
> >>> SqlTypeName.ANY)
> >>> .nullable(true).build() in the inferReturnType to "$explode" ->
> >>> then
> >> check
> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
> false).
> >>> NOTE: I have also tried creating a new kind EXPLODE and then
> >>> followed
> the
> >>> above step, both of them work.
> >>>
> >>> As Julian had pointed out about semantics and how everything
> >>> within
> >> Calcite
> >>> should be in Calcite's dialect, I feel there could be antipatterns
> >>> in
> my
> >>> approach.
> >>> Could any of you please point me to where I might be going wrong?
> >>> How
> >> could
> >>> I make this more robust?
> >>>
> >>> (If any of you are wondering why I am doing this then this is what
> >>> I
> want
> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be
> >>> written as "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an
> >>> appropriate dialect,
> like
> >>> Spark for example)
> >>>
> >>> Thank you for your time!
> >>>
> >>> Best,
> >>> Soumyadeep.
> >>
>
Re: Implementing EXPLODE operator
Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
On a side note, is the following implementation acceptable, as a part of a
dialect's unparse call?
@Override public void unparseCall(SqlWriter writer, SqlCall call,
int leftPrec, int rightPrec) {
switch (call.getKind()) {
case FLOOR:
...
break;
case OTHER_FUNCTION:
if (call.getOperator().toString().equalsIgnoreCase("explode")) {
SqlUtil.unparseFunctionSyntax(SqlStdOperatorTable.UNNEST_WITH_ORDINALITY,
writer,
call, false);
return;
}
break;
case UNNEST:
writer.keyword("EXPLODE");
call.operand(0).unparseWithParentheses(writer, leftPrec, rightPrec, true);
return;
default:
super.unparseCall(writer, call, leftPrec, rightPrec);
}
Soumyadeep.
On Sat, Jun 3, 2023 at 11:12 PM Soumyadeep Mukhopadhyay <
soumyamyb95@gmail.com> wrote:
> My apologies for not providing the context. The following links may
> provide a better background :
> Vertica
>
> https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
> and Spark
>
> https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
> use the function EXPLODE and the way it is supposed to work is convert a
> single row with an array or set to multiple rows, the number of rows shall
> be equal to the number of items in the array / set (and also hashmaps in
> the case of Spark).
> For Snowflake the equivalent would be
> https://docs.snowflake.com/en/sql-reference/functions/flatten.
> They are variations of the existing UNNEST. So if I have to implement them
> shall I inherit from the existing SqlUnnestOperator?
>
> Thanks again for your inputs!
>
> Soumyadeep.
>
> On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com>
> wrote:
>
>> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the
>> SQL standard.
>>
>> Julian
>>
>> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com> wrote:
>> >
>> > Actually I am hoping to contribute, but only if it doesn’t seem
>> redundant.
>> > That’s why I wanted to understand if the effort would even make sense.
>> Was
>> > EXPLODE not included by design? Would you recommend against adding it?
>> >
>> > Soumyadeep.
>> >
>> >
>> >
>> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
>> wrote:
>> >>
>> >> I can’t tell whether you are intending to contribute EXPLODE or do it
>> on a
>> >> private branch. You don’t make a case for what EXPLODE could do, so I
>> >> presume the latter.
>> >>
>> >> UNNEST is a unique function. Its implementation is (unfortunately but
>> >> necessarily) spread over many files. Copy-pasting it to make a new
>> function
>> >> seems like a bad idea, because you would be multiplying a mess. I don’t
>> >> know what you’re trying to achieve but I would lean on the existing
>> >> facilities (UDFs, table functions, adding syntactic sugar if
>> necessary).
>> >>
>> >> Julian
>> >>
>> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
>> >> soumyamyb95@gmail.com> wrote:
>> >>>
>> >>> Hello all,
>> >>>
>> >>> I was wondering if there's any virtue in creating an operator like
>> UNNEST
>> >>> by having a similar implementation like that of SqlUnnestOperator.
>> >>> (SqlExplodeOperator maybe)
>> >>>
>> >>> The functionalities are close but there are nuances that I have not
>> >>> discovered yet.
>> >>>
>> >>> My approach was kind of a hack :
>> >>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
>> >> name
>> >>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
>> >>> .nullable(true).build() in the inferReturnType to "$explode" -> then
>> >> check
>> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
>> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
>> false).
>> >>> NOTE: I have also tried creating a new kind EXPLODE and then followed
>> the
>> >>> above step, both of them work.
>> >>>
>> >>> As Julian had pointed out about semantics and how everything within
>> >> Calcite
>> >>> should be in Calcite's dialect, I feel there could be antipatterns in
>> my
>> >>> approach.
>> >>> Could any of you please point me to where I might be going wrong? How
>> >> could
>> >>> I make this more robust?
>> >>>
>> >>> (If any of you are wondering why I am doing this then this is what I
>> want
>> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
>> >>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect,
>> like
>> >>> Spark for example)
>> >>>
>> >>> Thank you for your time!
>> >>>
>> >>> Best,
>> >>> Soumyadeep.
>> >>
>>
>
Re: Implementing EXPLODE operator
Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
My apologies for not providing the context. The following links may provide
a better background :
Vertica
https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
and Spark
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
use the function EXPLODE and the way it is supposed to work is convert a
single row with an array or set to multiple rows, the number of rows shall
be equal to the number of items in the array / set (and also hashmaps in
the case of Spark).
For Snowflake the equivalent would be
https://docs.snowflake.com/en/sql-reference/functions/flatten.
They are variations of the existing UNNEST. So if I have to implement them
shall I inherit from the existing SqlUnnestOperator?
Thanks again for your inputs!
Soumyadeep.
On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com> wrote:
> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the
> SQL standard.
>
> Julian
>
> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <so...@gmail.com>
> wrote:
> >
> > Actually I am hoping to contribute, but only if it doesn’t seem
> redundant.
> > That’s why I wanted to understand if the effort would even make sense.
> Was
> > EXPLODE not included by design? Would you recommend against adding it?
> >
> > Soumyadeep.
> >
> >
> >
> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
> wrote:
> >>
> >> I can’t tell whether you are intending to contribute EXPLODE or do it
> on a
> >> private branch. You don’t make a case for what EXPLODE could do, so I
> >> presume the latter.
> >>
> >> UNNEST is a unique function. Its implementation is (unfortunately but
> >> necessarily) spread over many files. Copy-pasting it to make a new
> function
> >> seems like a bad idea, because you would be multiplying a mess. I don’t
> >> know what you’re trying to achieve but I would lean on the existing
> >> facilities (UDFs, table functions, adding syntactic sugar if necessary).
> >>
> >> Julian
> >>
> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> >> soumyamyb95@gmail.com> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I was wondering if there's any virtue in creating an operator like
> UNNEST
> >>> by having a similar implementation like that of SqlUnnestOperator.
> >>> (SqlExplodeOperator maybe)
> >>>
> >>> The functionalities are close but there are nuances that I have not
> >>> discovered yet.
> >>>
> >>> My approach was kind of a hack :
> >>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
> >> name
> >>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> >>> .nullable(true).build() in the inferReturnType to "$explode" -> then
> >> check
> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
> false).
> >>> NOTE: I have also tried creating a new kind EXPLODE and then followed
> the
> >>> above step, both of them work.
> >>>
> >>> As Julian had pointed out about semantics and how everything within
> >> Calcite
> >>> should be in Calcite's dialect, I feel there could be antipatterns in
> my
> >>> approach.
> >>> Could any of you please point me to where I might be going wrong? How
> >> could
> >>> I make this more robust?
> >>>
> >>> (If any of you are wondering why I am doing this then this is what I
> want
> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> >>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect,
> like
> >>> Spark for example)
> >>>
> >>> Thank you for your time!
> >>>
> >>> Best,
> >>> Soumyadeep.
> >>
>
Re: Implementing EXPLODE operator
Posted by Julian Hyde <jh...@gmail.com>.
I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the SQL standard.
Julian
> On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <so...@gmail.com> wrote:
>
> Actually I am hoping to contribute, but only if it doesn’t seem redundant.
> That’s why I wanted to understand if the effort would even make sense. Was
> EXPLODE not included by design? Would you recommend against adding it?
>
> Soumyadeep.
>
>
>
>> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com> wrote:
>>
>> I can’t tell whether you are intending to contribute EXPLODE or do it on a
>> private branch. You don’t make a case for what EXPLODE could do, so I
>> presume the latter.
>>
>> UNNEST is a unique function. Its implementation is (unfortunately but
>> necessarily) spread over many files. Copy-pasting it to make a new function
>> seems like a bad idea, because you would be multiplying a mess. I don’t
>> know what you’re trying to achieve but I would lean on the existing
>> facilities (UDFs, table functions, adding syntactic sugar if necessary).
>>
>> Julian
>>
>>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com> wrote:
>>>
>>> Hello all,
>>>
>>> I was wondering if there's any virtue in creating an operator like UNNEST
>>> by having a similar implementation like that of SqlUnnestOperator.
>>> (SqlExplodeOperator maybe)
>>>
>>> The functionalities are close but there are nuances that I have not
>>> discovered yet.
>>>
>>> My approach was kind of a hack :
>>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
>> name
>>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
>>> .nullable(true).build() in the inferReturnType to "$explode" -> then
>> check
>>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
>>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
>>> NOTE: I have also tried creating a new kind EXPLODE and then followed the
>>> above step, both of them work.
>>>
>>> As Julian had pointed out about semantics and how everything within
>> Calcite
>>> should be in Calcite's dialect, I feel there could be antipatterns in my
>>> approach.
>>> Could any of you please point me to where I might be going wrong? How
>> could
>>> I make this more robust?
>>>
>>> (If any of you are wondering why I am doing this then this is what I want
>>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
>>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
>>> Spark for example)
>>>
>>> Thank you for your time!
>>>
>>> Best,
>>> Soumyadeep.
>>
Re: Implementing EXPLODE operator
Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
Actually I am hoping to contribute, but only if it doesn’t seem redundant.
That’s why I wanted to understand if the effort would even make sense. Was
EXPLODE not included by design? Would you recommend against adding it?
Soumyadeep.
On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com> wrote:
> I can’t tell whether you are intending to contribute EXPLODE or do it on a
> private branch. You don’t make a case for what EXPLODE could do, so I
> presume the latter.
>
> UNNEST is a unique function. Its implementation is (unfortunately but
> necessarily) spread over many files. Copy-pasting it to make a new function
> seems like a bad idea, because you would be multiplying a mess. I don’t
> know what you’re trying to achieve but I would lean on the existing
> facilities (UDFs, table functions, adding syntactic sugar if necessary).
>
> Julian
>
> > On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com> wrote:
> >
> > Hello all,
> >
> > I was wondering if there's any virtue in creating an operator like UNNEST
> > by having a similar implementation like that of SqlUnnestOperator.
> > (SqlExplodeOperator maybe)
> >
> > The functionalities are close but there are nuances that I have not
> > discovered yet.
> >
> > My approach was kind of a hack :
> > Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
> name
> > as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> > .nullable(true).build() in the inferReturnType to "$explode" -> then
> check
> > if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> > unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
> > NOTE: I have also tried creating a new kind EXPLODE and then followed the
> > above step, both of them work.
> >
> > As Julian had pointed out about semantics and how everything within
> Calcite
> > should be in Calcite's dialect, I feel there could be antipatterns in my
> > approach.
> > Could any of you please point me to where I might be going wrong? How
> could
> > I make this more robust?
> >
> > (If any of you are wondering why I am doing this then this is what I want
> > to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> > "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
> > Spark for example)
> >
> > Thank you for your time!
> >
> > Best,
> > Soumyadeep.
>
Re: Implementing EXPLODE operator
Posted by Julian Hyde <jh...@gmail.com>.
I can’t tell whether you are intending to contribute EXPLODE or do it on a private branch. You don’t make a case for what EXPLODE could do, so I presume the latter.
UNNEST is a unique function. Its implementation is (unfortunately but necessarily) spread over many files. Copy-pasting it to make a new function seems like a bad idea, because you would be multiplying a mess. I don’t know what you’re trying to achieve but I would lean on the existing facilities (UDFs, table functions, adding syntactic sugar if necessary).
Julian
> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <so...@gmail.com> wrote:
>
> Hello all,
>
> I was wondering if there's any virtue in creating an operator like UNNEST
> by having a similar implementation like that of SqlUnnestOperator.
> (SqlExplodeOperator maybe)
>
> The functionalities are close but there are nuances that I have not
> discovered yet.
>
> My approach was kind of a hack :
> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator name
> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> .nullable(true).build() in the inferReturnType to "$explode" -> then check
> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
> NOTE: I have also tried creating a new kind EXPLODE and then followed the
> above step, both of them work.
>
> As Julian had pointed out about semantics and how everything within Calcite
> should be in Calcite's dialect, I feel there could be antipatterns in my
> approach.
> Could any of you please point me to where I might be going wrong? How could
> I make this more robust?
>
> (If any of you are wondering why I am doing this then this is what I want
> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
> Spark for example)
>
> Thank you for your time!
>
> Best,
> Soumyadeep.