You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by goi cto <go...@gmail.com> on 2014/02/04 12:15:08 UTC
Looking for resources on Map\Reduce concepts
Hi,
I am a newbie with spark and scala and trying to get around.
I am looking for resources to learn more (if possible, by example) on how
to program with map & reduce functions.
Any good recommendations?
(I did the getting started guides on the site but still don't feel
comfortable with that)...
--
Eran | CTO
Re: Looking for resources on Map\Reduce concepts
Posted by Mayur Rustagi <ma...@gmail.com>.
This is scala you are looking to learn. I would suggest you to look at
this:
http://www.brunton-spall.co.uk/post/2011/12/02/map-map-and-flatmap-in-scala/
Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Tue, Feb 4, 2014 at 8:27 PM, goi cto <go...@gmail.com> wrote:
> Thanks all for the info provided.
> One of the things I noticed is that both Map and Reduce functions receives
> a function which is used on all objects
> (map : Return a new distributed dataset formed by passing each element of
> the source through a function *func*)
>
> Q1: all the examples I seen so far have a very simple function as func e.g
> (line => line.split(" ")) any examples or cases were a more complex
> function is needed? if so, what is the syntax for this?
>
You can create your own functions (complex ones ) and call them here in map
(line =>customFunc(line)) since two elements of the list cannot interact in
map nothing more complex is possible.
>
>
Q2: what is the difference between Map and flatMap? when should I use which?
>
Use flatmap when your map converts each element into a array and you are
looking to flatten all the arrays into a single array.
>
> Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in
> the word count sample. I understand it takes the value argument of the K,V
> pair and preform the function on them. e.g. + but what does the a , b
> represent? what if my value is not an integer?
>
+ is overloaded in most cases.. so in String it'll mean append, in set it
will mean union etc etc. In you custom class you can overload it your way.
>
> Thanks
> Eran
>
>
>
> On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <ak...@mobipulse.in> wrote:
>
>> From the Spark download page, you may download a prebuilt package. If you
>> download source package, build it against the hadoop version that you have.
>>
>> You can open Spark's interactive shell in standalone local mode like
>> this, by issuing *./spark-shell *command inside Spark directory
>>
>> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
>>
>> Now You can run a word count example in the shell, taking input from hdfs
>> and writing output back to hdfs*.*
>>
>> *scala> var file =
>> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
>>
>> *scala> var count = file.flatMap(line => line.split(" ")).map(word =>
>> (word, 1)).reduceByKey(_+_)*
>>
>> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
>>
>> *You may find similar information over here *
>> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
>>
>>
>> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:
>>
>>> Hi,
>>> I am a newbie with spark and scala and trying to get around.
>>> I am looking for resources to learn more (if possible, by example) on
>>> how to program with map & reduce functions.
>>> Any good recommendations?
>>> (I did the getting started guides on the site but still don't feel
>>> comfortable with that)...
>>>
>>> --
>>> Eran | CTO
>>>
>>
>>
>>
>> --
>> Thanks
>> Best Regards
>>
>
>
>
> --
> Eran | CTO
>
Re: Looking for resources on Map\Reduce concepts
Posted by goi cto <go...@gmail.com>.
Thanks all for the info provided.
One of the things I noticed is that both Map and Reduce functions receives
a function which is used on all objects
(map : Return a new distributed dataset formed by passing each element of
the source through a function *func*)
Q1: all the examples I seen so far have a very simple function as func e.g
(line => line.split(" ")) any examples or cases were a more complex
function is needed? if so, what is the syntax for this?
Q2: what is the difference between Map and flatMap? when should I use which?
Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in
the word count sample. I understand it takes the value argument of the K,V
pair and preform the function on them. e.g. + but what does the a , b
represent? what if my value is not an integer?
Thanks
Eran
On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <ak...@mobipulse.in> wrote:
> From the Spark download page, you may download a prebuilt package. If you
> download source package, build it against the hadoop version that you have.
>
> You can open Spark's interactive shell in standalone local mode like this,
> by issuing *./spark-shell *command inside Spark directory
>
> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
>
> Now You can run a word count example in the shell, taking input from hdfs
> and writing output back to hdfs*.*
>
> *scala> var file =
> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
>
> *scala> var count = file.flatMap(line => line.split(" ")).map(word =>
> (word, 1)).reduceByKey(_+_)*
>
> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
>
> *You may find similar information over here *
> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
>
>
> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:
>
>> Hi,
>> I am a newbie with spark and scala and trying to get around.
>> I am looking for resources to learn more (if possible, by example) on how
>> to program with map & reduce functions.
>> Any good recommendations?
>> (I did the getting started guides on the site but still don't feel
>> comfortable with that)...
>>
>> --
>> Eran | CTO
>>
>
>
>
> --
> Thanks
> Best Regards
>
--
Eran | CTO
Re: Looking for resources on Map\Reduce concepts
Posted by Akhil Das <ak...@mobipulse.in>.
>From the Spark download page, you may download a prebuilt package. If you
download source package, build it against the hadoop version that you have.
You can open Spark's interactive shell in standalone local mode like this,
by issuing *./spark-shell *command inside Spark directory
*akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
Now You can run a word count example in the shell, taking input from hdfs
and writing output back to hdfs*.*
*scala> var file =
sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
*scala> var count = file.flatMap(line => line.split(" ")).map(word =>
(word, 1)).reduceByKey(_+_)*
*scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
*You may find similar information over here *
http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:
> Hi,
> I am a newbie with spark and scala and trying to get around.
> I am looking for resources to learn more (if possible, by example) on how
> to program with map & reduce functions.
> Any good recommendations?
> (I did the getting started guides on the site but still don't feel
> comfortable with that)...
>
> --
> Eran | CTO
>
--
Thanks
Best Regards
Re: Looking for resources on Map\Reduce concepts
Posted by Ognen Duzlevski <og...@nengoiksvelzud.com>.
Eran,
How much do you know in general about Map/Reduce? Do you know anything?
If not, this is a VERY nice explanation of what MR is about at a very
high conceptual level:
http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/
However, going from the above link to implementing something
semi-complicated on Spark is a long journey ;)
I do not know any Hadoop but I have a sneaky suspicion that most of the
Spark primitives are mirror copies of how Hadoop does these things.
Figuring out what to use when and why will be your biggest challenge.
People will tell you to look at the Word Count example (like they
already have) and that's nice, but that's the tip of the iceberg. To
make Spark really useful and appealing to the broader audiences, it
should have a set of documents describing how to do the whole
"map/reduce thing" with its own primitives.
Good luck,
Ognen
On 2/4/14, 5:15 AM, goi cto wrote:
> Hi,
> I am a newbie with spark and scala and trying to get around.
> I am looking for resources to learn more (if possible, by example) on
> how to program with map & reduce functions.
> Any good recommendations?
> (I did the getting started guides on the site but still don't feel
> comfortable with that)...
>
> --
> Eran | CTO