You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chang Chen <ba...@gmail.com> on 2019/11/12 01:48:07 UTC

Is RDD thread safe?

Hi all

I meet a case where I need cache a source RDD, and then create different
DataFrame from it in different threads to accelerate query.

I know that SparkSession is thread safe(
https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
whether RDD  si thread safe or not

Thanks
Chang

Re: Is RDD thread safe?

Posted by Chang Chen <ba...@gmail.com>.
I need to cache the DataFrame for accelerating query.  In such case, the
two query may simultaneously run the DAG before cache data actually happen.

Sonal Goyal <so...@gmail.com> 于2019年11月19日周二 下午9:46写道:

> the RDD or the dataframe is distributed and partitioned by Spark so as to
> leverage all your workers (CPUs) effectively. So all the Dataframe
> operations are actually happening simultaneously on a section of the data.
> Why do you want to use threading here?
>
> Thanks,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <ba...@gmail.com> wrote:
>
>>
>> Hi all
>>
>> I meet a case where I need cache a source RDD, and then create different
>> DataFrame from it in different threads to accelerate query.
>>
>> I know that SparkSession is thread safe(
>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>> whether RDD  si thread safe or not
>>
>> Thanks
>> Chang
>>
>

Re: Is RDD thread safe?

Posted by Sonal Goyal <so...@gmail.com>.
the RDD or the dataframe is distributed and partitioned by Spark so as to
leverage all your workers (CPUs) effectively. So all the Dataframe
operations are actually happening simultaneously on a section of the data.
Why do you want to use threading here?

Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <ba...@gmail.com> wrote:

>
> Hi all
>
> I meet a case where I need cache a source RDD, and then create different
> DataFrame from it in different threads to accelerate query.
>
> I know that SparkSession is thread safe(
> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
> whether RDD  si thread safe or not
>
> Thanks
> Chang
>