You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Saravanan Subramanian <to...@yahoo.com.INVALID> on 2016/07/14 21:32:09 UTC

Maximum Size of Reference Look Up Table in Spark

Hello All,
I am in the middle of designing real time data enhancement services using spark streaming.  As part of this, I have to look up some reference data while processing the incoming stream.
I have below questions:
1) what is the maximum size of look up table / variable can be stored as Broadcast variable ()2) What is the impact of cluster performance, if I store a 10GB data in broadcast variable
Any suggestions and thoughts are welcome.
Thanks,Saravanan S.

Re: Maximum Size of Reference Look Up Table in Spark

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

Never worked in a project that would require it.

Jacek

On 15 Jul 2016 5:31 p.m., "Saravanan Subramanian" <to...@yahoo.com>
wrote:

> Hello Jacek,
>
> Have you seen any practical limitation or performance degradation issues
> while using more than 10GB of broadcast cache ?
>
> Thanks,
> Saravanan S.
>
>
> On Thursday, 14 July 2016 8:06 PM, Jacek Laskowski <ja...@japila.pl>
> wrote:
>
>
> Hi,
>
> My understanding is that the maximum size of a broadcast is the
> Long.MAX_VALUE (and plus some more since the data is going to be
> encoded to save space, esp. for catalyst-driver datasets).
>
> Ad 2. Before the tasks access the broadcast variable it has to be sent
> across network that may be too slow to be acceptable.
>
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian
> <to...@yahoo.com.invalid> wrote:
> > Hello All,
> >
> > I am in the middle of designing real time data enhancement services using
> > spark streaming.  As part of this, I have to look up some reference data
> > while processing the incoming stream.
> >
> > I have below questions:
> >
> > 1) what is the maximum size of look up table / variable can be stored as
> > Broadcast variable ()
> > 2) What is the impact of cluster performance, if I store a 10GB data in
> > broadcast variable
> >
> > Any suggestions and thoughts are welcome.
> >
> > Thanks,
> > Saravanan S.
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>
>
>

Re: Maximum Size of Reference Look Up Table in Spark

Posted by Saravanan Subramanian <to...@yahoo.com.INVALID>.
Hello Jacek,
Have you seen any practical limitation or performance degradation issues while using more than 10GB of broadcast cache ?
Thanks,Saravanan S. 

    On Thursday, 14 July 2016 8:06 PM, Jacek Laskowski <ja...@japila.pl> wrote:
 

 Hi,

My understanding is that the maximum size of a broadcast is the
Long.MAX_VALUE (and plus some more since the data is going to be
encoded to save space, esp. for catalyst-driver datasets).

Ad 2. Before the tasks access the broadcast variable it has to be sent
across network that may be too slow to be acceptable.


Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian
<to...@yahoo.com.invalid> wrote:
> Hello All,
>
> I am in the middle of designing real time data enhancement services using
> spark streaming.  As part of this, I have to look up some reference data
> while processing the incoming stream.
>
> I have below questions:
>
> 1) what is the maximum size of look up table / variable can be stored as
> Broadcast variable ()
> 2) What is the impact of cluster performance, if I store a 10GB data in
> broadcast variable
>
> Any suggestions and thoughts are welcome.
>
> Thanks,
> Saravanan S.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org



  

Re: Maximum Size of Reference Look Up Table in Spark

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

My understanding is that the maximum size of a broadcast is the
Long.MAX_VALUE (and plus some more since the data is going to be
encoded to save space, esp. for catalyst-driver datasets).

Ad 2. Before the tasks access the broadcast variable it has to be sent
across network that may be too slow to be acceptable.


Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian
<to...@yahoo.com.invalid> wrote:
> Hello All,
>
> I am in the middle of designing real time data enhancement services using
> spark streaming.  As part of this, I have to look up some reference data
> while processing the incoming stream.
>
> I have below questions:
>
> 1) what is the maximum size of look up table / variable can be stored as
> Broadcast variable ()
> 2) What is the impact of cluster performance, if I store a 10GB data in
> broadcast variable
>
> Any suggestions and thoughts are welcome.
>
> Thanks,
> Saravanan S.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org