You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillaume Masse <ma...@narrative.io> on 2023/03/28 18:59:11 UTC

spark.catalog.listFunctions type signatures

Hi,

I'm using Apache Calcite to run some SQL transformations on Apache sparks
SQL statements. I would like to extract the type signature out
of spark.catalog.listFunctions to be able to register them in Calcite with
their proper signature.

From the API, I can get the fully qualified class name and the name, but
unfortunately, the type signature is not present. Would there be a way to
use reflection to extract? For example:

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L424

Ideally, it would be convenient to get the type signature
from org.apache.spark.sql.catalog.Function itself when available.


-- 
Guillaume Massé
[Gee-OHM]
(马赛卫)

Re: spark.catalog.listFunctions type signatures

Posted by Guillaume Masse <ma...@narrative.io>.
Hi Jacek,

Thanks for the hints, I would rather have the information statically rather
than build a logical plan.

I'm using Apache Calcite to build SQL expressions and then I feed them to
spark to run, so the pipeline goes like this:

initial query in SQL (from the user) +
schema definition (from db) +
udf definition (spark + custom lib (Sedona, etc))
=>
calcite query plan ( + transformations from business logic)
=>
SQL (with Spark Dialect)

In my case, udf definitions are an input I would get from whatever is
loaded in Spark. That's why it's more convenient to have the information
statically here. (bonus: it would be helpful to generate documentation:
https://spark.apache.org/docs/latest/api/sql/index.html)

For example, this is how you would define acos type signature
https://github.com/apache/calcite/blob/5c7be55ffee836366dcc7fefb6adfc0b8c47465f/core/src/main/java/org/apache/calcite/sql/fun/SqlStdOperatorTable.java#L1692-L1696


On Tue, Mar 28, 2023 at 3:27 PM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Interesting question indeed!
>
> The closest I could get would be to use lookupFunctionBuilder(name:
> FunctionIdentifier): Option[FunctionBuilder] [1] followed by extracting the
> dataType from T in `type FunctionBuilder = Seq[Expression] => T` which can
> be Expression (regular functions) or LogicalPlan (table-valued functions).
> Expression has got dataType while LogicalPlan has got output
> (or outputAttributes).
>
> HTH
>
> Let us know how you're doing.
>
> BTW, Can you describe how you "using Apache Calcite to run some SQL
> transformations on Apache sparks SQL statements"?
>
> [1]
> https://github.com/apache/spark/blob/e60ce3e85081ca8bb247aeceb2681faf6a59a056/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L91
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Tue, Mar 28, 2023 at 9:01 PM Guillaume Masse <
> masse.guillaume@narrative.io> wrote:
>
>> Hi,
>>
>> I'm using Apache Calcite to run some SQL transformations on Apache sparks
>> SQL statements. I would like to extract the type signature out
>> of spark.catalog.listFunctions to be able to register them in Calcite with
>> their proper signature.
>>
>> From the API, I can get the fully qualified class name and the name, but
>> unfortunately, the type signature is not present. Would there be a way to
>> use reflection to extract? For example:
>>
>>
>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L424
>>
>> Ideally, it would be convenient to get the type signature
>> from org.apache.spark.sql.catalog.Function itself when available.
>>
>>
>> --
>> Guillaume Massé
>> [Gee-OHM]
>> (马赛卫)
>>
>

Re: spark.catalog.listFunctions type signatures

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

Interesting question indeed!

The closest I could get would be to use lookupFunctionBuilder(name:
FunctionIdentifier): Option[FunctionBuilder] [1] followed by extracting the
dataType from T in `type FunctionBuilder = Seq[Expression] => T` which can
be Expression (regular functions) or LogicalPlan (table-valued functions).
Expression has got dataType while LogicalPlan has got output
(or outputAttributes).

HTH

Let us know how you're doing.

BTW, Can you describe how you "using Apache Calcite to run some SQL
transformations on Apache sparks SQL statements"?

[1]
https://github.com/apache/spark/blob/e60ce3e85081ca8bb247aeceb2681faf6a59a056/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala#L91

Pozdrawiam,
Jacek Laskowski
----
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Tue, Mar 28, 2023 at 9:01 PM Guillaume Masse <
masse.guillaume@narrative.io> wrote:

> Hi,
>
> I'm using Apache Calcite to run some SQL transformations on Apache sparks
> SQL statements. I would like to extract the type signature out
> of spark.catalog.listFunctions to be able to register them in Calcite with
> their proper signature.
>
> From the API, I can get the fully qualified class name and the name, but
> unfortunately, the type signature is not present. Would there be a way to
> use reflection to extract? For example:
>
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L424
>
> Ideally, it would be convenient to get the type signature
> from org.apache.spark.sql.catalog.Function itself when available.
>
>
> --
> Guillaume Massé
> [Gee-OHM]
> (马赛卫)
>