You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by goi cto <go...@gmail.com> on 2014/02/04 12:15:08 UTC

Looking for resources on Map\Reduce concepts

Hi,
I am a newbie with spark and scala and trying to get around.
I am looking for resources to learn more (if possible, by example) on how
to program with map & reduce functions.
Any good recommendations?
(I did the getting started guides on the site but still don't feel
comfortable with that)...

-- 
Eran | CTO

Re: Looking for resources on Map\Reduce concepts

Posted by Mayur Rustagi <ma...@gmail.com>.

This is scala you are looking to learn. I would suggest you to look at
this:
http://www.brunton-spall.co.uk/post/2011/12/02/map-map-and-flatmap-in-scala/

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 4, 2014 at 8:27 PM, goi cto <go...@gmail.com> wrote:

> Thanks all for the info provided.
> One of the things I noticed is that both Map and Reduce functions receives
> a function which is used on all objects
> (map : Return a new distributed dataset formed by passing each element of
> the source through a function *func*)
>
> Q1: all the examples I seen so far have a very simple function as func e.g
> (line => line.split(" ")) any examples or cases were a more complex
> function is needed? if so, what is the syntax for this?
>
You can create your own functions (complex ones ) and call them here in map
(line =>customFunc(line)) since two elements of the list cannot interact in
map nothing more complex is possible.

>
>
Q2: what is the difference between Map and flatMap? when should I use which?
>
Use flatmap when your map converts each element into a array and you are
looking to flatten all the arrays into a single array.

>
> Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in
> the word count sample. I understand it takes the value argument of the K,V
> pair and preform the function on them. e.g. +  but what does the a , b
> represent? what if my value is not an integer?
>
+ is overloaded in most cases.. so in String it'll mean append, in set it
will mean union etc etc. In you custom class you can overload it your way.

>
> Thanks
> Eran
>
>
>
> On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <ak...@mobipulse.in> wrote:
>
>> From the Spark download page, you may download a prebuilt package. If you
>> download source package, build it against the hadoop version that you have.
>>
>> You can open Spark's interactive shell in standalone local mode like
>> this, by issuing *./spark-shell *command inside Spark directory
>>
>> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
>>
>> Now You can run a word count example in the shell, taking input from hdfs
>> and writing output back to hdfs*.*
>>
>>  *scala> var file =
>> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
>>
>> *scala> var count = file.flatMap(line => line.split(" ")).map(word =>
>> (word, 1)).reduceByKey(_+_)*
>>
>> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
>>
>> *You may find similar information over here *
>> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
>>
>>
>> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:
>>
>>> Hi,
>>> I am a newbie with spark and scala and trying to get around.
>>> I am looking for resources to learn more (if possible, by example) on
>>> how to program with map & reduce functions.
>>> Any good recommendations?
>>> (I did the getting started guides on the site but still don't feel
>>> comfortable with that)...
>>>
>>> --
>>> Eran | CTO
>>>
>>
>>
>>
>> --
>> Thanks
>> Best Regards
>>
>
>
>
> --
> Eran | CTO
>

Re: Looking for resources on Map\Reduce concepts

Posted by goi cto <go...@gmail.com>.

Thanks all for the info provided.
One of the things I noticed is that both Map and Reduce functions receives
a function which is used on all objects
(map : Return a new distributed dataset formed by passing each element of
the source through a function *func*)

Q1: all the examples I seen so far have a very simple function as func e.g
(line => line.split(" ")) any examples or cases were a more complex
function is needed? if so, what is the syntax for this?

Q2: what is the difference between Map and flatMap? when should I use which?

Q3: reduceByKey((a, b) => a + b) -> Here again, this example was used in
the word count sample. I understand it takes the value argument of the K,V
pair and preform the function on them. e.g. +  but what does the a , b
represent? what if my value is not an integer?

Thanks
Eran

On Tue, Feb 4, 2014 at 1:19 PM, Akhil Das <ak...@mobipulse.in> wrote:

> From the Spark download page, you may download a prebuilt package. If you
> download source package, build it against the hadoop version that you have.
>
> You can open Spark's interactive shell in standalone local mode like this,
> by issuing *./spark-shell *command inside Spark directory
>
> *akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *
>
> Now You can run a word count example in the shell, taking input from hdfs
> and writing output back to hdfs*.*
>
>  *scala> var file =
> sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*
>
> *scala> var count = file.flatMap(line => line.split(" ")).map(word =>
> (word, 1)).reduceByKey(_+_)*
>
> *scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*
>
> *You may find similar information over here *
> http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html
>
>
> On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:
>
>> Hi,
>> I am a newbie with spark and scala and trying to get around.
>> I am looking for resources to learn more (if possible, by example) on how
>> to program with map & reduce functions.
>> Any good recommendations?
>> (I did the getting started guides on the site but still don't feel
>> comfortable with that)...
>>
>> --
>> Eran | CTO
>>
>
>
>
> --
> Thanks
> Best Regards
>

-- 
Eran | CTO

Re: Looking for resources on Map\Reduce concepts

Posted by Akhil Das <ak...@mobipulse.in>.

>From the Spark download page, you may download a prebuilt package. If you
download source package, build it against the hadoop version that you have.

You can open Spark's interactive shell in standalone local mode like this,
by issuing *./spark-shell *command inside Spark directory

*akhld@akhldz:/data/spark-0.8.0$ ./spark-shell *

Now You can run a word count example in the shell, taking input from hdfs
and writing output back to hdfs*.*

*scala> var file =
sc.textFile("hdfs://bigmaster:54310/sampledata/textbook.txt")*

*scala> var count = file.flatMap(line => line.split(" ")).map(word =>
(word, 1)).reduceByKey(_+_)*

*scala> count.saveAsTextFile("hdfs://bigmaster:54310/sampledata/wcout")*

*You may find similar information over here *
http://sprism.blogspot.in/2012/11/lightning-fast-wordcount-using-spark.html


On Tue, Feb 4, 2014 at 4:45 PM, goi cto <go...@gmail.com> wrote:

> Hi,
> I am a newbie with spark and scala and trying to get around.
> I am looking for resources to learn more (if possible, by example) on how
> to program with map & reduce functions.
> Any good recommendations?
> (I did the getting started guides on the site but still don't feel
> comfortable with that)...
>
> --
> Eran | CTO
>



-- 
Thanks
Best Regards

Re: Looking for resources on Map\Reduce concepts

Posted by Ognen Duzlevski <og...@nengoiksvelzud.com>.

Eran,

How much do you know in general about Map/Reduce? Do you know anything? 
If not, this is a VERY nice explanation of what MR is about at a very 
high conceptual level: 
http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/

However, going from the above link to implementing something 
semi-complicated on Spark is a long journey ;)

I do not know any Hadoop but I have a sneaky suspicion that most of the 
Spark primitives are mirror copies of how Hadoop does these things. 
Figuring out what to use when and why will be your biggest challenge. 
People will tell you to look at the Word Count example (like they 
already have) and that's nice, but that's the tip of the iceberg. To 
make Spark really useful and appealing to the broader audiences, it 
should have a set of documents describing how to do the whole 
"map/reduce thing" with its own primitives.

Good luck,
Ognen

On 2/4/14, 5:15 AM, goi cto wrote:
> Hi,
> I am a newbie with spark and scala and trying to get around.
> I am looking for resources to learn more (if possible, by example) on 
> how to program with map & reduce functions.
> Any good recommendations?
> (I did the getting started guides on the site but still don't feel 
> comfortable with that)...
>
> -- 
> Eran | CTO