You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Younes Naguib <Yo...@tritondigital.com> on 2015/10/26 19:17:10 UTC

Broadcast table

Hi all,

I use the thrift server, and I cache a table using "cache table mytab".
Is there any sql to broadcast it too?

Thanks
Younes Naguib
Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G 1R8
Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib@tritondigital.com <ma...@streamtheworld.com>

Re: Broadcast table

Posted by Deenar Toraskar <de...@gmail.com>.

1) if you are using thrift server any cached tables would be cached for all
sessions (I am not sure if this was your question)
2) If you want to ensure that the smaller table in the join is replicated
to all nodes, you can do the following

left.join(broadcast(right), "joinKey")

look at this https://issues.apache.org/jira/browse/SPARK-8300,

Deenar

On 26 October 2015 at 20:43, Jags Ramnarayanan <jr...@pivotal.io>
wrote:

> If you are using Spark SQL and joining two dataFrames the optimizer would
> automatically broadcast the smaller table (You can configure the size if
> the default is too small).
>
> Else, in code, you can collect any RDD to the driver and broadcast using
> the context.broadcast method.
>
> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
>
> -- Jags
> (www.snappydata.io)
>
>
> On Mon, Oct 26, 2015 at 11:17 AM, Younes Naguib <
> Younes.Naguib@tritondigital.com> wrote:
>
>> Hi all,
>>
>>
>>
>> I use the thrift server, and I cache a table using “cache table mytab”.
>>
>> Is there any sql to broadcast it too?
>>
>>
>>
>> *Thanks*
>>
>> *Younes Naguib*
>>
>> Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G
>> 1R8
>>
>> Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib
>> @tritondigital.com <yo...@streamtheworld.com>
>>
>>
>>
>
>

Re: Broadcast table

Posted by Jags Ramnarayanan <jr...@pivotal.io>.

If you are using Spark SQL and joining two dataFrames the optimizer would
automatically broadcast the smaller table (You can configure the size if
the default is too small).

Else, in code, you can collect any RDD to the driver and broadcast using
the context.broadcast method.
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf

-- Jags
(www.snappydata.io)

On Mon, Oct 26, 2015 at 11:17 AM, Younes Naguib <
Younes.Naguib@tritondigital.com> wrote:

> Hi all,
>
>
>
> I use the thrift server, and I cache a table using “cache table mytab”.
>
> Is there any sql to broadcast it too?
>
>
>
> *Thanks*
>
> *Younes Naguib*
>
> Triton Digital | 1440 Ste-Catherine W., Suite 1200 | Montreal, QC  H3G 1R8
>
> Tel.: +1 514 448 4037 x2688 | Tel.: +1 866 448 4037 x2688 | younes.naguib
> @tritondigital.com <yo...@streamtheworld.com>
>
>
>