You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lakshmi Nivedita <kl...@gmail.com> on 2020/09/30 11:58:11 UTC
[Spark SQL] does pyspark udf support spark.sql inside def
Here is a spark udf structure as an example
Def sampl_fn(x):
Spark.sql(“select count(Id) from sample Where Id = x ”)
Spark.udf.register(“sample_fn”, sample_fn)
Spark.sql(“select id, sampl_fn(Id) from example”)
Advance Thanks for the help
--
k.Lakshmi Nivedita
Re: [Spark SQL] does pyspark udf support spark.sql inside def
Posted by Lakshmi Nivedita <kl...@gmail.com>.
Sure, will do that.I am using impala in pyspark. to retrieve the data
A table schema
date1 Bigint
date2 Bigint
ctry string
sample data for table A:
date1 date2 ctry
22-12-2012 06-01-2013 IN
B table schema
holidate Bigint
Holiday =0/1 —string
0 means holiday—-
1 means working
Country string
Sample data for table B :holidate holiday country
25-12-2012 0 IN
01-01-2013 0 IN
Thanks
Nivedita
On Thu, Oct 1, 2020 at 9:25 AM Amit Joshi <ma...@gmail.com> wrote:
> Can you pls post the schema of both the tables.
>
> On Wednesday, September 30, 2020, Lakshmi Nivedita <kl...@gmail.com>
> wrote:
>
>> Thank you for the clarification.I would like to how can I proceed for
>> this kind of scenario in pyspark
>>
>> I have a scenario subtracting the total number of days with the number of
>> holidays in pyspark by using dataframes
>>
>> I have a table with dates date1 date2 in one table and number of
>> holidays in another table
>> df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
>> totalnumberofdays - df2.holidays from table A;
>>
>> df2 = select count(holiays)
>> from table B
>> where holidate >= 'date1'(table A)
>> and holidate < = date2(table A)
>> and country = A.ctry(table A)
>>
>> Except country no other column is not a unique key
>>
>>
>>
>>
>> On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> No, you can't use the SparkSession from within a function executed by
>>> Spark tasks.
>>>
>>> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
>>> wrote:
>>>
>>>> Here is a spark udf structure as an example
>>>>
>>>> Def sampl_fn(x):
>>>> Spark.sql(“select count(Id) from sample Where Id = x ”)
>>>>
>>>>
>>>> Spark.udf.register(“sample_fn”, sample_fn)
>>>>
>>>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>>>
>>>> Advance Thanks for the help
>>>> --
>>>> k.Lakshmi Nivedita
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Re: [Spark SQL] does pyspark udf support spark.sql inside def
Posted by Amit Joshi <ma...@gmail.com>.
Can you pls post the schema of both the tables.
On Wednesday, September 30, 2020, Lakshmi Nivedita <kl...@gmail.com>
wrote:
> Thank you for the clarification.I would like to how can I proceed for
> this kind of scenario in pyspark
>
> I have a scenario subtracting the total number of days with the number of
> holidays in pyspark by using dataframes
>
> I have a table with dates date1 date2 in one table and number of
> holidays in another table
> df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
> totalnumberofdays - df2.holidays from table A;
>
> df2 = select count(holiays)
> from table B
> where holidate >= 'date1'(table A)
> and holidate < = date2(table A)
> and country = A.ctry(table A)
>
> Except country no other column is not a unique key
>
>
>
>
> On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:
>
>> No, you can't use the SparkSession from within a function executed by
>> Spark tasks.
>>
>> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
>> wrote:
>>
>>> Here is a spark udf structure as an example
>>>
>>> Def sampl_fn(x):
>>> Spark.sql(“select count(Id) from sample Where Id = x ”)
>>>
>>>
>>> Spark.udf.register(“sample_fn”, sample_fn)
>>>
>>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>>
>>> Advance Thanks for the help
>>> --
>>> k.Lakshmi Nivedita
>>>
>>>
>>>
>>>
>>
>>
Re: [Spark SQL] does pyspark udf support spark.sql inside def
Posted by Lakshmi Nivedita <kl...@gmail.com>.
Thank you for the clarification.I would like to how can I proceed for this
kind of scenario in pyspark
I have a scenario subtracting the total number of days with the number of
holidays in pyspark by using dataframes
I have a table with dates date1 date2 in one table and number of holidays
in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
totalnumberofdays - df2.holidays from table A;
df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date2(table A)
and country = A.ctry(table A)
Except country no other column is not a unique key
On Wed, Sep 30, 2020 at 6:05 PM Sean Owen <sr...@gmail.com> wrote:
> No, you can't use the SparkSession from within a function executed by
> Spark tasks.
>
> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
> wrote:
>
>> Here is a spark udf structure as an example
>>
>> Def sampl_fn(x):
>> Spark.sql(“select count(Id) from sample Where Id = x ”)
>>
>>
>> Spark.udf.register(“sample_fn”, sample_fn)
>>
>> Spark.sql(“select id, sampl_fn(Id) from example”)
>>
>> Advance Thanks for the help
>> --
>> k.Lakshmi Nivedita
>>
>>
>>
>>
>
>
Re: [Spark SQL] does pyspark udf support spark.sql inside def
Posted by Sean Owen <sr...@gmail.com>.
No, you can't use the SparkSession from within a function executed by Spark
tasks.
On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita <kl...@gmail.com>
wrote:
> Here is a spark udf structure as an example
>
> Def sampl_fn(x):
> Spark.sql(“select count(Id) from sample Where Id = x ”)
>
>
> Spark.udf.register(“sample_fn”, sample_fn)
>
> Spark.sql(“select id, sampl_fn(Id) from example”)
>
> Advance Thanks for the help
> --
> k.Lakshmi Nivedita
>
>