You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by brightsparc <br...@gmail.com> on 2015/10/12 02:14:27 UTC

Handling expirying state in UDF

Hi,

I have created a python UDF to make an API which requires an expirying OAuth
token which requires refreshing every 600 seconds which is longer than any
given stage.

Due to the nature of threads and local state, if I use a global variable,
the variable goes out of scope regularly. 

I look into using a broadcast variable, but this doesn't support the ability
to expire/refresh the variable.  So I looked into using setLocalProperty and
getLocalProperty on the spark context, but this can't be accessed within a
UDF.

Is there a recommended way to handle this scenario in PySpark?

Thanks,
Julian. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Handling-expirying-state-in-UDF-tp25021.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Handling expirying state in UDF

Posted by Davies Liu <da...@databricks.com>.
Could you try this?

my_token = None
def my_udf(a):
     global my_token
     if my_token is None:
       # create token
    # do something

In this way, a new token will be created for each pyspark task

On Sun, Oct 11, 2015 at 5:14 PM, brightsparc <br...@gmail.com> wrote:
> Hi,
>
> I have created a python UDF to make an API which requires an expirying OAuth
> token which requires refreshing every 600 seconds which is longer than any
> given stage.
>
> Due to the nature of threads and local state, if I use a global variable,
> the variable goes out of scope regularly.
>
> I look into using a broadcast variable, but this doesn't support the ability
> to expire/refresh the variable.  So I looked into using setLocalProperty and
> getLocalProperty on the spark context, but this can't be accessed within a
> UDF.
>
> Is there a recommended way to handle this scenario in PySpark?
>
> Thanks,
> Julian.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Handling-expirying-state-in-UDF-tp25021.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org