You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by Sergii Mikhtoniuk <mi...@gmail.com> on 2019/08/11 22:14:20 UTC

Pre-registering UDTs / UDFs in Livy session

Hi,

I'm currently using Livy in two different contexts:
- from Jupyter notebooks
- from SqlLine/Beeline CLI over Thrift/JDBC connection.

The data I work with includes GIS, so it sometimes necessary to register
custom (GeoSpark) geometry UDTs and UDFs in the Spark session.

For Jupyter notebook case I was able to simply add a custom step to my
Jupyter kernel that registers UDTs after session is created, but I don't
know how to achieve the same in JDBC client scenario.

Is there any extension mechanism in Livy or Spark that would execute a
custom code on session init or to automatically discover and register
UDFs/UDTs?

As I understand from https://issues.apache.org/jira/browse/SPARK-7768 the
UDT mechanism is still in flux, but perhaps there's a better solution than
to fork Livy to add my custom registration code.

Any pointers are much appreciated.

- Sergii

Re: Pre-registering UDTs / UDFs in Livy session

Posted by Sergii Mikhtoniuk <mi...@gmail.com>.
Thank a lot for the suggestion, Saisai.

I was able to successfully hook into onApplicationStart event, but
immediately faced another issue.

The listener facility in Spark is asynchronous, so it dispatches the events
in a background thread. Naturally events don't include references to the
SparkSessions, and since active spark session is a thread-local variable
there seems to be no way to obtain it from the listener.

This is not an issue for UDTRegistration, as it maintains a global state,
but UDFRegistration is an instance that lives in the SessionState of the
SparkSession, so I'm out of luck there.

- Sergii

On Sun, Aug 11, 2019 at 7:44 PM Saisai Shao <sa...@gmail.com> wrote:

> Unfortunately there's no such mechanism to inject custom code when session
> is started in Livy side. I think you can add some code in Spark side, Spark
> has a listener hook "SparkListener", in which there has a hook `
> onApplicationStart`, this hook will be called immediately after
> application is started. You can take a look at SparkListener.
>
> Thanks
> Saisai
>
> Sergii Mikhtoniuk <mi...@gmail.com> 于2019年8月12日周一 上午6:14写道:
>
>> Hi,
>>
>> I'm currently using Livy in two different contexts:
>> - from Jupyter notebooks
>> - from SqlLine/Beeline CLI over Thrift/JDBC connection.
>>
>> The data I work with includes GIS, so it sometimes necessary to register
>> custom (GeoSpark) geometry UDTs and UDFs in the Spark session.
>>
>> For Jupyter notebook case I was able to simply add a custom step to my
>> Jupyter kernel that registers UDTs after session is created, but I don't
>> know how to achieve the same in JDBC client scenario.
>>
>> Is there any extension mechanism in Livy or Spark that would execute a
>> custom code on session init or to automatically discover and register
>> UDFs/UDTs?
>>
>> As I understand from https://issues.apache.org/jira/browse/SPARK-7768 the
>> UDT mechanism is still in flux, but perhaps there's a better solution than
>> to fork Livy to add my custom registration code.
>>
>> Any pointers are much appreciated.
>>
>> - Sergii
>>
>

Re: Pre-registering UDTs / UDFs in Livy session

Posted by Saisai Shao <sa...@gmail.com>.
Unfortunately there's no such mechanism to inject custom code when session
is started in Livy side. I think you can add some code in Spark side, Spark
has a listener hook "SparkListener", in which there has a hook `
onApplicationStart`, this hook will be called immediately after application
is started. You can take a look at SparkListener.

Thanks
Saisai

Sergii Mikhtoniuk <mi...@gmail.com> 于2019年8月12日周一 上午6:14写道:

> Hi,
>
> I'm currently using Livy in two different contexts:
> - from Jupyter notebooks
> - from SqlLine/Beeline CLI over Thrift/JDBC connection.
>
> The data I work with includes GIS, so it sometimes necessary to register
> custom (GeoSpark) geometry UDTs and UDFs in the Spark session.
>
> For Jupyter notebook case I was able to simply add a custom step to my
> Jupyter kernel that registers UDTs after session is created, but I don't
> know how to achieve the same in JDBC client scenario.
>
> Is there any extension mechanism in Livy or Spark that would execute a
> custom code on session init or to automatically discover and register
> UDFs/UDTs?
>
> As I understand from https://issues.apache.org/jira/browse/SPARK-7768 the
> UDT mechanism is still in flux, but perhaps there's a better solution than
> to fork Livy to add my custom registration code.
>
> Any pointers are much appreciated.
>
> - Sergii
>