You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Lakshmi Nivedita <kl...@gmail.com> on 2020/09/30 11:58:11 UTC

[Spark SQL] does pyspark udf support spark.sql inside def

Here is a spark udf structure as an example

Def sampl_fn(x):
           Spark.sql(“select count(Id) from sample Where Id = x ”)


Spark.udf.register(“sample_fn”, sample_fn)

Spark.sql(“select id, sampl_fn(Id) from example”)

Advance Thanks for the help
-- 
k.Lakshmi Nivedita

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Posted by Lakshmi Nivedita <kl...@gmail.com>.

Sure, will do that.I am using impala in pyspark. to retrieve the data

A table schema
date1 Bigint
date2 Bigint
ctry   string
sample data for table A:
date1                  date2     ctry
22-12-2012   06-01-2013  IN


B table schema

holidate Bigint
Holiday =0/1 —string

0 means holiday—-
1 means working

Country string

Sample data for table B :holidate    holiday country
                      25-12-2012  0        IN
                      01-01-2013  0    IN

Thanks
Nivedita




On Thu, Oct 1, 2020 at 9:25 AM Amit Joshi <ma...@gmail.com> wrote:

> Can you pls post the schema of both the tables.
>
> On Wednesday, September 30, 2020, Lakshmi Nivedita <kl...@gmail.com>
> wrote:
>
>> Thank you for the clarification.I would like to how can I  proceed for
>> this kind of scenario in pyspark
>>
>> I have a scenario subtracting the total number of days with the number of
>> holidays in pyspark by using dataframes
>>
>> I have a table with dates  date1  date2 in one table and number of
>> holidays in another table
>> df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
>> totalnumberofdays  - df2.holidays  from table A;
>>
>> df2 = select count(holiays)
>> from table B
>> where holidate >= 'date1'(table A)
>> and holidate < = date2(table A)
>> and country = A.ctry(table A)
>>
>> Except country no other column is not a unique key
>>
>>
>>
>>
>> On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> No, you can't use the SparkSession from within a function executed by
>>> Spark tasks.
>>>
>>> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
>>> wrote:
>>>
>>>> Here is a spark udf structure as an example
>>>>
>>>> Def sampl_fn(x):
>>>>            Spark.sql(“select count(Id) from sample Where Id = x ”)
>>>>
>>>>
>>>> Spark.udf.register(“sample_fn”, sample_fn)
>>>>
>>>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>>>
>>>> Advance Thanks for the help
>>>> --
>>>> k.Lakshmi Nivedita
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Posted by Amit Joshi <ma...@gmail.com>.

Can you pls post the schema of both the tables.

On Wednesday, September 30, 2020, Lakshmi Nivedita <kl...@gmail.com>
wrote:

> Thank you for the clarification.I would like to how can I  proceed for
> this kind of scenario in pyspark
>
> I have a scenario subtracting the total number of days with the number of
> holidays in pyspark by using dataframes
>
> I have a table with dates  date1  date2 in one table and number of
> holidays in another table
> df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
> totalnumberofdays  - df2.holidays  from table A;
>
> df2 = select count(holiays)
> from table B
> where holidate >= 'date1'(table A)
> and holidate < = date2(table A)
> and country = A.ctry(table A)
>
> Except country no other column is not a unique key
>
>
>
>
> On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:
>
>> No, you can't use the SparkSession from within a function executed by
>> Spark tasks.
>>
>> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
>> wrote:
>>
>>> Here is a spark udf structure as an example
>>>
>>> Def sampl_fn(x):
>>>            Spark.sql(“select count(Id) from sample Where Id = x ”)
>>>
>>>
>>> Spark.udf.register(“sample_fn”, sample_fn)
>>>
>>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>>
>>> Advance Thanks for the help
>>> --
>>> k.Lakshmi Nivedita
>>>
>>>
>>>
>>>
>>
>>

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Posted by Lakshmi Nivedita <kl...@gmail.com>.

Thank you for the clarification.I would like to how can I  proceed for this
kind of scenario in pyspark

I have a scenario subtracting the total number of days with the number of
holidays in pyspark by using dataframes

I have a table with dates  date1  date2 in one table and number of holidays
in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
totalnumberofdays  - df2.holidays  from table A;

df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date2(table A)
and country = A.ctry(table A)

Except country no other column is not a unique key




On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:

> No, you can't use the SparkSession from within a function executed by
> Spark tasks.
>
> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
> wrote:
>
>> Here is a spark udf structure as an example
>>
>> Def sampl_fn(x):
>>            Spark.sql(“select count(Id) from sample Where Id = x ”)
>>
>>
>> Spark.udf.register(“sample_fn”, sample_fn)
>>
>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>
>> Advance Thanks for the help
>> --
>> k.Lakshmi Nivedita
>>
>>
>>
>>
>
>

Re: [Spark SQL] does pyspark udf support spark.sql inside def

Posted by Sean Owen <sr...@gmail.com>.

No, you can't use the SparkSession from within a function executed by Spark
tasks.

On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
wrote:

> Here is a spark udf structure as an example
>
> Def sampl_fn(x):
>            Spark.sql(“select count(Id) from sample Where Id = x ”)
>
>
> Spark.udf.register(“sample_fn”, sample_fn)
>
> Spark.sql(“select id, sampl_fn(Id) from example”)
>
> Advance Thanks for the help
> --
> k.Lakshmi Nivedita
>
>