You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Soumyadeep Mukhopadhyay <so...@gmail.com> on 2023/06/03 09:45:27 UTC

Implementing EXPLODE operator

Hello all,

I was wondering if there's any virtue in creating an operator like UNNEST
by having a similar implementation like that of SqlUnnestOperator.
(SqlExplodeOperator maybe)

The functionalities are close but there are nuances that I have not
discovered yet.

My approach was kind of a hack :
Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator name
as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
.nullable(true).build() in the inferReturnType to "$explode" -> then check
if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
NOTE: I have also tried creating a new kind EXPLODE and then followed the
above step, both of them work.

As Julian had pointed out about semantics and how everything within Calcite
should be in Calcite's dialect, I feel there could be antipatterns in my
approach.
Could any of you please point me to where I might be going wrong? How could
I make this more robust?

(If any of you are wondering why I am doing this then this is what I want
to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
"SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
Spark for example)

Thank you for your time!

Best,
Soumyadeep.

RE: Implementing EXPLODE operator

Posted by mb...@gmail.com.
This seems to be the same as UNNEST WITH ORDINALITY, in Calcite since 2016.
https://issues.apache.org/jira/browse/CALCITE-854

Mihai

-----Original Message-----
From: Soumyadeep Mukhopadhyay 
Sent: Saturday, June 03, 2023 10:42 AM
To: dev@calcite.apache.org
Subject: Re: Implementing EXPLODE operator

My apologies for not providing the context. The following links may provide a better background :
Vertica
https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
and Spark
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
use the function EXPLODE and the way it is supposed to work is convert a single row with an array or set to multiple rows, the number of rows shall be equal to the number of items in the array / set (and also hashmaps in the case of Spark).
For Snowflake the equivalent would be
https://docs.snowflake.com/en/sql-reference/functions/flatten.
They are variations of the existing UNNEST. So if I have to implement them shall I inherit from the existing SqlUnnestOperator?

Thanks again for your inputs!

Soumyadeep.

On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com> wrote:

> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in 
> the SQL standard.
>
> Julian
>
> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay 
> > <so...@gmail.com>
> wrote:
> >
> > Actually I am hoping to contribute, but only if it doesn’t seem
> redundant.
> > That’s why I wanted to understand if the effort would even make sense.
> Was
> > EXPLODE not included by design? Would you recommend against adding it?
> >
> > Soumyadeep.
> >
> >
> >
> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
> wrote:
> >>
> >> I can’t tell whether you are intending to contribute EXPLODE or do 
> >> it
> on a
> >> private branch. You don’t make a case for what EXPLODE could do, so 
> >> I presume the latter.
> >>
> >> UNNEST is a unique function. Its implementation is (unfortunately 
> >> but
> >> necessarily) spread over many files. Copy-pasting it to make a new
> function
> >> seems like a bad idea, because you would be multiplying a mess. I 
> >> don’t know what you’re trying to achieve but I would lean on the 
> >> existing facilities (UDFs, table functions, adding syntactic sugar if necessary).
> >>
> >> Julian
> >>
> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> >> soumyamyb95@gmail.com> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I was wondering if there's any virtue in creating an operator like
> UNNEST
> >>> by having a similar implementation like that of SqlUnnestOperator.
> >>> (SqlExplodeOperator maybe)
> >>>
> >>> The functionalities are close but there are nuances that I have 
> >>> not discovered yet.
> >>>
> >>> My approach was kind of a hack :
> >>> Clone the class SqlUnnestOperator -> rename the 
> >>> SqlFunctionalOperator
> >> name
> >>> as "EXPLODE" and rename return builder.add("$unnest", 
> >>> SqlTypeName.ANY)
> >>> .nullable(true).build() in the inferReturnType to "$explode" -> 
> >>> then
> >> check
> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
> false).
> >>> NOTE: I have also tried creating a new kind EXPLODE and then 
> >>> followed
> the
> >>> above step, both of them work.
> >>>
> >>> As Julian had pointed out about semantics and how everything 
> >>> within
> >> Calcite
> >>> should be in Calcite's dialect, I feel there could be antipatterns 
> >>> in
> my
> >>> approach.
> >>> Could any of you please point me to where I might be going wrong? 
> >>> How
> >> could
> >>> I make this more robust?
> >>>
> >>> (If any of you are wondering why I am doing this then this is what 
> >>> I
> want
> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be 
> >>> written as "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an 
> >>> appropriate dialect,
> like
> >>> Spark for example)
> >>>
> >>> Thank you for your time!
> >>>
> >>> Best,
> >>> Soumyadeep.
> >>
>


Re: Implementing EXPLODE operator

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
On a side note, is the following implementation acceptable, as a part of a
dialect's unparse call?

@Override public void unparseCall(SqlWriter writer, SqlCall call,
    int leftPrec, int rightPrec) {
  switch (call.getKind()) {
  case FLOOR:
    ...
    break;
  case OTHER_FUNCTION:
    if (call.getOperator().toString().equalsIgnoreCase("explode")) {
      SqlUtil.unparseFunctionSyntax(SqlStdOperatorTable.UNNEST_WITH_ORDINALITY,
writer,
          call, false);
      return;
    }
    break;
  case UNNEST:
    writer.keyword("EXPLODE");
    call.operand(0).unparseWithParentheses(writer, leftPrec, rightPrec, true);
    return;
  default:
    super.unparseCall(writer, call, leftPrec, rightPrec);
  }

Soumyadeep.

On Sat, Jun 3, 2023 at 11:12 PM Soumyadeep Mukhopadhyay <
soumyamyb95@gmail.com> wrote:

> My apologies for not providing the context. The following links may
> provide a better background :
> Vertica
>
> https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
> and Spark
>
> https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
> use the function EXPLODE and the way it is supposed to work is convert a
> single row with an array or set to multiple rows, the number of rows shall
> be equal to the number of items in the array / set (and also hashmaps in
> the case of Spark).
> For Snowflake the equivalent would be
> https://docs.snowflake.com/en/sql-reference/functions/flatten.
> They are variations of the existing UNNEST. So if I have to implement them
> shall I inherit from the existing SqlUnnestOperator?
>
> Thanks again for your inputs!
>
> Soumyadeep.
>
> On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com>
> wrote:
>
>> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the
>> SQL standard.
>>
>> Julian
>>
>> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com> wrote:
>> >
>> > Actually I am hoping to contribute, but only if it doesn’t seem
>> redundant.
>> > That’s why I wanted to understand if the effort would even make sense.
>> Was
>> > EXPLODE not included by design? Would you recommend against adding it?
>> >
>> > Soumyadeep.
>> >
>> >
>> >
>> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
>> wrote:
>> >>
>> >> I can’t tell whether you are intending to contribute EXPLODE or do it
>> on a
>> >> private branch. You don’t make a case for what EXPLODE could do, so I
>> >> presume the latter.
>> >>
>> >> UNNEST is a unique function. Its implementation is (unfortunately but
>> >> necessarily) spread over many files. Copy-pasting it to make a new
>> function
>> >> seems like a bad idea, because you would be multiplying a mess. I don’t
>> >> know what you’re trying to achieve but I would lean on the existing
>> >> facilities (UDFs, table functions, adding syntactic sugar if
>> necessary).
>> >>
>> >> Julian
>> >>
>> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
>> >> soumyamyb95@gmail.com> wrote:
>> >>>
>> >>> Hello all,
>> >>>
>> >>> I was wondering if there's any virtue in creating an operator like
>> UNNEST
>> >>> by having a similar implementation like that of SqlUnnestOperator.
>> >>> (SqlExplodeOperator maybe)
>> >>>
>> >>> The functionalities are close but there are nuances that I have not
>> >>> discovered yet.
>> >>>
>> >>> My approach was kind of a hack :
>> >>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
>> >> name
>> >>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
>> >>> .nullable(true).build() in the inferReturnType to "$explode" -> then
>> >> check
>> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
>> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
>> false).
>> >>> NOTE: I have also tried creating a new kind EXPLODE and then followed
>> the
>> >>> above step, both of them work.
>> >>>
>> >>> As Julian had pointed out about semantics and how everything within
>> >> Calcite
>> >>> should be in Calcite's dialect, I feel there could be antipatterns in
>> my
>> >>> approach.
>> >>> Could any of you please point me to where I might be going wrong? How
>> >> could
>> >>> I make this more robust?
>> >>>
>> >>> (If any of you are wondering why I am doing this then this is what I
>> want
>> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
>> >>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect,
>> like
>> >>> Spark for example)
>> >>>
>> >>> Thank you for your time!
>> >>>
>> >>> Best,
>> >>> Soumyadeep.
>> >>
>>
>

Re: Implementing EXPLODE operator

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
My apologies for not providing the context. The following links may provide
a better background :
Vertica
https://docs.vertica.com/12.0.x/en/sql-reference/functions/data-type-specific-functions/collection-functions/explode/
and Spark
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.explode.html
use the function EXPLODE and the way it is supposed to work is convert a
single row with an array or set to multiple rows, the number of rows shall
be equal to the number of items in the array / set (and also hashmaps in
the case of Spark).
For Snowflake the equivalent would be
https://docs.snowflake.com/en/sql-reference/functions/flatten.
They are variations of the existing UNNEST. So if I have to implement them
shall I inherit from the existing SqlUnnestOperator?

Thanks again for your inputs!

Soumyadeep.

On Sat, 3 Jun 2023 at 10:43 PM, Julian Hyde <jh...@gmail.com> wrote:

> I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the
> SQL standard.
>
> Julian
>
> > On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <so...@gmail.com>
> wrote:
> >
> > Actually I am hoping to contribute, but only if it doesn’t seem
> redundant.
> > That’s why I wanted to understand if the effort would even make sense.
> Was
> > EXPLODE not included by design? Would you recommend against adding it?
> >
> > Soumyadeep.
> >
> >
> >
> >> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com>
> wrote:
> >>
> >> I can’t tell whether you are intending to contribute EXPLODE or do it
> on a
> >> private branch. You don’t make a case for what EXPLODE could do, so I
> >> presume the latter.
> >>
> >> UNNEST is a unique function. Its implementation is (unfortunately but
> >> necessarily) spread over many files. Copy-pasting it to make a new
> function
> >> seems like a bad idea, because you would be multiplying a mess. I don’t
> >> know what you’re trying to achieve but I would lean on the existing
> >> facilities (UDFs, table functions, adding syntactic sugar if necessary).
> >>
> >> Julian
> >>
> >>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> >> soumyamyb95@gmail.com> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I was wondering if there's any virtue in creating an operator like
> UNNEST
> >>> by having a similar implementation like that of SqlUnnestOperator.
> >>> (SqlExplodeOperator maybe)
> >>>
> >>> The functionalities are close but there are nuances that I have not
> >>> discovered yet.
> >>>
> >>> My approach was kind of a hack :
> >>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
> >> name
> >>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> >>> .nullable(true).build() in the inferReturnType to "$explode" -> then
> >> check
> >>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> >>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call,
> false).
> >>> NOTE: I have also tried creating a new kind EXPLODE and then followed
> the
> >>> above step, both of them work.
> >>>
> >>> As Julian had pointed out about semantics and how everything within
> >> Calcite
> >>> should be in Calcite's dialect, I feel there could be antipatterns in
> my
> >>> approach.
> >>> Could any of you please point me to where I might be going wrong? How
> >> could
> >>> I make this more robust?
> >>>
> >>> (If any of you are wondering why I am doing this then this is what I
> want
> >>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> >>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect,
> like
> >>> Spark for example)
> >>>
> >>> Thank you for your time!
> >>>
> >>> Best,
> >>> Soumyadeep.
> >>
>

Re: Implementing EXPLODE operator

Posted by Julian Hyde <jh...@gmail.com>.
I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the SQL standard. 

Julian

> On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay <so...@gmail.com> wrote:
> 
> Actually I am hoping to contribute, but only if it doesn’t seem redundant.
> That’s why I wanted to understand if the effort would even make sense. Was
> EXPLODE not included by design? Would you recommend against adding it?
> 
> Soumyadeep.
> 
> 
> 
>> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com> wrote:
>> 
>> I can’t tell whether you are intending to contribute EXPLODE or do it on a
>> private branch. You don’t make a case for what EXPLODE could do, so I
>> presume the latter.
>> 
>> UNNEST is a unique function. Its implementation is (unfortunately but
>> necessarily) spread over many files. Copy-pasting it to make a new function
>> seems like a bad idea, because you would be multiplying a mess. I don’t
>> know what you’re trying to achieve but I would lean on the existing
>> facilities (UDFs, table functions, adding syntactic sugar if necessary).
>> 
>> Julian
>> 
>>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
>> soumyamyb95@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I was wondering if there's any virtue in creating an operator like UNNEST
>>> by having a similar implementation like that of SqlUnnestOperator.
>>> (SqlExplodeOperator maybe)
>>> 
>>> The functionalities are close but there are nuances that I have not
>>> discovered yet.
>>> 
>>> My approach was kind of a hack :
>>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
>> name
>>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
>>> .nullable(true).build() in the inferReturnType to "$explode" -> then
>> check
>>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
>>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
>>> NOTE: I have also tried creating a new kind EXPLODE and then followed the
>>> above step, both of them work.
>>> 
>>> As Julian had pointed out about semantics and how everything within
>> Calcite
>>> should be in Calcite's dialect, I feel there could be antipatterns in my
>>> approach.
>>> Could any of you please point me to where I might be going wrong? How
>> could
>>> I make this more robust?
>>> 
>>> (If any of you are wondering why I am doing this then this is what I want
>>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
>>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
>>> Spark for example)
>>> 
>>> Thank you for your time!
>>> 
>>> Best,
>>> Soumyadeep.
>> 

Re: Implementing EXPLODE operator

Posted by Soumyadeep Mukhopadhyay <so...@gmail.com>.
Actually I am hoping to contribute, but only if it doesn’t seem redundant.
That’s why I wanted to understand if the effort would even make sense. Was
EXPLODE not included by design? Would you recommend against adding it?

Soumyadeep.



On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde <jh...@gmail.com> wrote:

> I can’t tell whether you are intending to contribute EXPLODE or do it on a
> private branch. You don’t make a case for what EXPLODE could do, so I
> presume the latter.
>
> UNNEST is a unique function. Its implementation is (unfortunately but
> necessarily) spread over many files. Copy-pasting it to make a new function
> seems like a bad idea, because you would be multiplying a mess. I don’t
> know what you’re trying to achieve but I would lean on the existing
> facilities (UDFs, table functions, adding syntactic sugar if necessary).
>
> Julian
>
> > On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
> soumyamyb95@gmail.com> wrote:
> >
> > Hello all,
> >
> > I was wondering if there's any virtue in creating an operator like UNNEST
> > by having a similar implementation like that of SqlUnnestOperator.
> > (SqlExplodeOperator maybe)
> >
> > The functionalities are close but there are nuances that I have not
> > discovered yet.
> >
> > My approach was kind of a hack :
> > Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
> name
> > as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> > .nullable(true).build() in the inferReturnType to "$explode" -> then
> check
> > if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> > unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
> > NOTE: I have also tried creating a new kind EXPLODE and then followed the
> > above step, both of them work.
> >
> > As Julian had pointed out about semantics and how everything within
> Calcite
> > should be in Calcite's dialect, I feel there could be antipatterns in my
> > approach.
> > Could any of you please point me to where I might be going wrong? How
> could
> > I make this more robust?
> >
> > (If any of you are wondering why I am doing this then this is what I want
> > to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> > "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
> > Spark for example)
> >
> > Thank you for your time!
> >
> > Best,
> > Soumyadeep.
>

Re: Implementing EXPLODE operator

Posted by Julian Hyde <jh...@gmail.com>.
I can’t tell whether you are intending to contribute EXPLODE or do it on a private branch. You don’t make a case for what EXPLODE could do, so I presume the latter. 

UNNEST is a unique function. Its implementation is (unfortunately but necessarily) spread over many files. Copy-pasting it to make a new function seems like a bad idea, because you would be multiplying a mess. I don’t know what you’re trying to achieve but I would lean on the existing facilities (UDFs, table functions, adding syntactic sugar if necessary).

Julian

> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <so...@gmail.com> wrote:
> 
> Hello all,
> 
> I was wondering if there's any virtue in creating an operator like UNNEST
> by having a similar implementation like that of SqlUnnestOperator.
> (SqlExplodeOperator maybe)
> 
> The functionalities are close but there are nuances that I have not
> discovered yet.
> 
> My approach was kind of a hack :
> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator name
> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> .nullable(true).build() in the inferReturnType to "$explode" -> then check
> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
> NOTE: I have also tried creating a new kind EXPLODE and then followed the
> above step, both of them work.
> 
> As Julian had pointed out about semantics and how everything within Calcite
> should be in Calcite's dialect, I feel there could be antipatterns in my
> approach.
> Could any of you please point me to where I might be going wrong? How could
> I make this more robust?
> 
> (If any of you are wondering why I am doing this then this is what I want
> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
> Spark for example)
> 
> Thank you for your time!
> 
> Best,
> Soumyadeep.