You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2012/06/16 00:46:27 UTC

Streaming in mapreduce

Hi,

Hadoop mapreduce can be used for streaming. But what is streaming from the point of view of mapreduce? For me, streaming are video and audio data.

 Why mapreduce supports streaming?

Can anyone give me an example on why to use streaming in mapreduce?

Thanks,
Pedro

Re: Streaming in mapreduce

Posted by swathi v <sw...@gmail.com>.
Hi Pedro,

Adding to the response of *Bejoy*,

   - Hadoop streaming provides the user with the ability to use arbitrary
   programs in other languages like ruby, python, for a job’s map and reduce
   methods.
   - Streaming provides the ability to use external programs as any of the
   job’s mapper, combiner, or reducer.
   - The job is a traditional MapReduce job, with the framework handling
   input splitting, scheduling map tasks, scheduling input split pairs to run,
   shuffling and sorting map outputs, scheduling reduce tasks to run, and then
   writing the reduce output to the Hadoop Distributed File System (HDFS).
   - The framework handles a streaming job like any other MapReduce job.
   - The job might specify that an executable be used as the map processor
   and for the reduce processor.
   - Each task will start an instance of the applicable executable and
   write an applicable representation of the input key/value pairs to the
   executable.
   - The standard output of the executable is parsed as textual key/value
   pairs.
   - The executable being run for the reduce task will given an input line
   for each value in the reduce value iterator, composed of the key and
   that value. This link explains the same:
   http://wiki.apache.org/hadoop/HadoopStreaming (same as the response
   given by Ruslan Al-Fakikh)
   - *EXAMPLE:*The Hadoop Core distribution provides a Jython example
   MapReduce application in *src/examples/python/WordCount.py*
   - FYI : There are libraries available for C++. The C++ interface lends
   itself to usage by Simplified Wrapper and Interface Generator (SWIG) to
   generate other language interfaces. The usage of* Hadoop Pipes* and its
   example goes here: http://wiki.apache.org/hadoop/C++WordCount

Hope you find this useful.  :)
Thank You.

On Sat, Jun 16, 2012 at 3:00 PM, Bejoy KS <be...@gmail.com> wrote:

> Hi Pedro
>
> In simple terms Streaming API is used in hadoop if you have your mapper or
> reducer is in any language other than java . Say ruby or python.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Pedro Costa <ps...@gmail.com>
> *Date: *Sat, 16 Jun 2012 10:23:20 +0100
> *To: *mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
> *ReplyTo: * mapreduce-user@hadoop.apache.org
> *Subject: *Re: Streaming in mapreduce
>
> I still don't get why hadoop streaming is useful. If I have man and reduce
> functions defined in shell script, like the one below, why should I use
> Hadoop?
>
> cat someInputFile | shellMapper.sh | shellReducer.sh > someOutputFile
>
>
>
> On 16/06/2012, at 01:21, Ruslan Al-Fakikh <me...@gmail.com> wrote:
>
> Hi Pedro,
>
> You can find it here
> http://wiki.apache.org/hadoop/HadoopStreaming
>
> Thanks
>
> On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa <ps...@gmail.com> wrote:
>
> Hi,
>
>
> Hadoop mapreduce can be used for streaming. But what is streaming from the
> point of view of mapreduce? For me, streaming are video and audio data.
>
>
>  Why mapreduce supports streaming?
>
>
> Can anyone give me an example on why to use streaming in mapreduce?
>
>
> Thanks,
>
> Pedro
>
>


-- 
- Regards,
Swathi.V. ,
Software Developer
Blog URL :http://femgeekz.blogspot.in

Re: Streaming in mapreduce

Posted by Bejoy KS <be...@gmail.com>.
Hi Pedro

In simple terms Streaming API is used in hadoop if you have your mapper or reducer is in any language other than java . Say ruby or python. 


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Pedro Costa <ps...@gmail.com>
Date: Sat, 16 Jun 2012 10:23:20 
To: mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Streaming in mapreduce

I still don't get why hadoop streaming is useful. If I have man and reduce functions defined in shell script, like the one below, why should I use Hadoop?
cat someInputFile | shellMapper.sh | shellReducer.sh > someOutputFile


On 16/06/2012, at 01:21, Ruslan Al-Fakikh <me...@gmail.com> wrote:

> Hi Pedro,
> 
> You can find it here
> http://wiki.apache.org/hadoop/HadoopStreaming
> 
> Thanks
> 
> On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa <ps...@gmail.com> wrote:
>> Hi,
>> 
>> Hadoop mapreduce can be used for streaming. But what is streaming from the point of view of mapreduce? For me, streaming are video and audio data.
>> 
>>  Why mapreduce supports streaming?
>> 
>> Can anyone give me an example on why to use streaming in mapreduce?
>> 
>> Thanks,
>> Pedro


Re: Streaming in mapreduce

Posted by Harsh J <ha...@cloudera.com>.
Hey Pedro,

Your script will not run across all nodes, nor read data local blocks.
Hadoop streaming allows you to achieve that. Agree the name may
confuse you a bit, the 'streaming' part comes from the way it
'streams'/'pipes' data into and out of a newly launched process (your
script, written in any preferred language) that it takes care of
executing and terminating across your TaskTrackers, giving you MR in
your own language, as opposed to the Java API.

On Sat, Jun 16, 2012 at 2:53 PM, Pedro Costa <ps...@gmail.com> wrote:
> I still don't get why hadoop streaming is useful. If I have man and reduce
> functions defined in shell script, like the one below, why should I use
> Hadoop?
>
> cat someInputFile | shellMapper.sh | shellReducer.sh > someOutputFile
>
>
>
> On 16/06/2012, at 01:21, Ruslan Al-Fakikh <me...@gmail.com> wrote:
>
> Hi Pedro,
>
> You can find it here
> http://wiki.apache.org/hadoop/HadoopStreaming
>
> Thanks
>
> On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa <ps...@gmail.com> wrote:
>
> Hi,
>
>
> Hadoop mapreduce can be used for streaming. But what is streaming from the
> point of view of mapreduce? For me, streaming are video and audio data.
>
>
>  Why mapreduce supports streaming?
>
>
> Can anyone give me an example on why to use streaming in mapreduce?
>
>
> Thanks,
>
> Pedro



-- 
Harsh J

Re: Streaming in mapreduce

Posted by Pedro Costa <ps...@gmail.com>.
I still don't get why hadoop streaming is useful. If I have man and reduce functions defined in shell script, like the one below, why should I use Hadoop?
cat someInputFile | shellMapper.sh | shellReducer.sh > someOutputFile


On 16/06/2012, at 01:21, Ruslan Al-Fakikh <me...@gmail.com> wrote:

> Hi Pedro,
> 
> You can find it here
> http://wiki.apache.org/hadoop/HadoopStreaming
> 
> Thanks
> 
> On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa <ps...@gmail.com> wrote:
>> Hi,
>> 
>> Hadoop mapreduce can be used for streaming. But what is streaming from the point of view of mapreduce? For me, streaming are video and audio data.
>> 
>>  Why mapreduce supports streaming?
>> 
>> Can anyone give me an example on why to use streaming in mapreduce?
>> 
>> Thanks,
>> Pedro

Re: Streaming in mapreduce

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Hi Pedro,

You can find it here
http://wiki.apache.org/hadoop/HadoopStreaming

Thanks

On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa <ps...@gmail.com> wrote:
> Hi,
>
> Hadoop mapreduce can be used for streaming. But what is streaming from the point of view of mapreduce? For me, streaming are video and audio data.
>
>  Why mapreduce supports streaming?
>
> Can anyone give me an example on why to use streaming in mapreduce?
>
> Thanks,
> Pedro