You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by YaoPau <jo...@gmail.com> on 2014/11/17 18:08:36 UTC

How to broadcast a textFile?

I have a 1 million row file that I'd like to read from my edge node, and then
send a copy of it to each Hadoop machine's memory in order to run JOINs in
my spark streaming code.

I see examples in the docs of how use use broadcast() for a simple array,
but how about when the data is in a textFile?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-broadcast-a-textFile-tp19083.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to broadcast a textFile?

Posted by YaoPau <jo...@gmail.com>.
OK then I'd still need to write the code (within my spark streaming code I'm
guessing) to convert my text file into an object like a HashMap before
broadcasting.  

How can I make sure only the HashMap is being broadcast while all the
pre-processing to create the HashMap is only performed once?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-broadcast-a-textFile-tp19083p19094.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to broadcast a textFile?

Posted by Andy Twigg <an...@gmail.com>.
Broadcast copies arbitrary objects, so you could read it into an object
such an array of lines then broadcast that.

Andy

On Monday, 17 November 2014, YaoPau <jo...@gmail.com> wrote:

> I have a 1 million row file that I'd like to read from my edge node, and
> then
> send a copy of it to each Hadoop machine's memory in order to run JOINs in
> my spark streaming code.
>
> I see examples in the docs of how use use broadcast() for a simple array,
> but how about when the data is in a textFile?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-broadcast-a-textFile-tp19083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: user-help@spark.apache.org <javascript:;>
>
>