You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Radhe Radhe <ra...@live.com> on 2014/03/05 09:08:52 UTC

Streaming data access in HDFS: Design Feature

Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.
Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).

Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Nirmal Kumar <ni...@impetus.co.in>.
All,

IMHO, Streaming data access is like reading data continuously rather than in packets\chunks in traditional FS involving seek time for small blocks.
You can consider it as a Spout which emits water continuously.
Since in HDFS the data blocks are very large compared to the traditional FS the seek time is less.
For an input file of 10240 MB(10 GB) in size and a block size of 64 MB we will have just 160 blocks.
Now consider the same 10GB file in a traditional FS where the block size is 512 bytes. This will do 20971520 blocks and seek time for each will some into picture.

However, with HDFS the NameNode contains an in-memory directory of where all the blocks for a file(+ their replicas) are stored across the DataNodes in the cluster.
For reading, the client code asks the NameNode for a list of blocks and then reads the blocks sequentially block by block(large blocks ~ 64MB).
The data is thus "streamed" off the hard drive by maintaining the maximum I/O rate.

Let me know of your opinion and correct me if I not correct.

Thanks,
-Nirmal

From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Wednesday, March 05, 2014 2:37 PM
To: user@hadoop.apache.org
Subject: Re: Streaming data access in HDFS: Design Feature

Hadoop streaming  allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. In other words, you need not need to learn java programming for writing simple mapreduce program.

Where as streaming data access in HDFS is totally different. When mapreduce framework tries to read/write data from/to hdfs blocks, its done by byte streams. Bytes are always appended to the end of a stream, and byte streams are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you want to understand more of it then you can look at the codebase for any fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===

On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR

________________________________
Date: Wed, 5 Mar 2014 14:17:24 +0530

Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com<ma...@gmail.com>
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?"

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR
________________________________
From: dwivedishashwat@gmail.com<ma...@gmail.com>
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
CC: radhe.krishna.radhe@live.com<ma...@live.com>

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types

i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


Warm Regards_∞_
Shashwat Shriparv
[http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<ht...@yahoo.com>


On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>> wrote:
Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.
Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).

Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR




--
Nitin Pawar



--
Nitin Pawar

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Streaming data access in HDFS: Design Feature

Posted by Nirmal Kumar <ni...@impetus.co.in>.
All,

IMHO, Streaming data access is like reading data continuously rather than in packets\chunks in traditional FS involving seek time for small blocks.
You can consider it as a Spout which emits water continuously.
Since in HDFS the data blocks are very large compared to the traditional FS the seek time is less.
For an input file of 10240 MB(10 GB) in size and a block size of 64 MB we will have just 160 blocks.
Now consider the same 10GB file in a traditional FS where the block size is 512 bytes. This will do 20971520 blocks and seek time for each will some into picture.

However, with HDFS the NameNode contains an in-memory directory of where all the blocks for a file(+ their replicas) are stored across the DataNodes in the cluster.
For reading, the client code asks the NameNode for a list of blocks and then reads the blocks sequentially block by block(large blocks ~ 64MB).
The data is thus "streamed" off the hard drive by maintaining the maximum I/O rate.

Let me know of your opinion and correct me if I not correct.

Thanks,
-Nirmal

From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Wednesday, March 05, 2014 2:37 PM
To: user@hadoop.apache.org
Subject: Re: Streaming data access in HDFS: Design Feature

Hadoop streaming  allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. In other words, you need not need to learn java programming for writing simple mapreduce program.

Where as streaming data access in HDFS is totally different. When mapreduce framework tries to read/write data from/to hdfs blocks, its done by byte streams. Bytes are always appended to the end of a stream, and byte streams are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you want to understand more of it then you can look at the codebase for any fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===

On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR

________________________________
Date: Wed, 5 Mar 2014 14:17:24 +0530

Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com<ma...@gmail.com>
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?"

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR
________________________________
From: dwivedishashwat@gmail.com<ma...@gmail.com>
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
CC: radhe.krishna.radhe@live.com<ma...@live.com>

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types

i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


Warm Regards_∞_
Shashwat Shriparv
[http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<ht...@yahoo.com>


On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>> wrote:
Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.
Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).

Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR




--
Nitin Pawar



--
Nitin Pawar

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Streaming data access in HDFS: Design Feature

Posted by Nirmal Kumar <ni...@impetus.co.in>.
All,

IMHO, Streaming data access is like reading data continuously rather than in packets\chunks in traditional FS involving seek time for small blocks.
You can consider it as a Spout which emits water continuously.
Since in HDFS the data blocks are very large compared to the traditional FS the seek time is less.
For an input file of 10240 MB(10 GB) in size and a block size of 64 MB we will have just 160 blocks.
Now consider the same 10GB file in a traditional FS where the block size is 512 bytes. This will do 20971520 blocks and seek time for each will some into picture.

However, with HDFS the NameNode contains an in-memory directory of where all the blocks for a file(+ their replicas) are stored across the DataNodes in the cluster.
For reading, the client code asks the NameNode for a list of blocks and then reads the blocks sequentially block by block(large blocks ~ 64MB).
The data is thus "streamed" off the hard drive by maintaining the maximum I/O rate.

Let me know of your opinion and correct me if I not correct.

Thanks,
-Nirmal

From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Wednesday, March 05, 2014 2:37 PM
To: user@hadoop.apache.org
Subject: Re: Streaming data access in HDFS: Design Feature

Hadoop streaming  allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. In other words, you need not need to learn java programming for writing simple mapreduce program.

Where as streaming data access in HDFS is totally different. When mapreduce framework tries to read/write data from/to hdfs blocks, its done by byte streams. Bytes are always appended to the end of a stream, and byte streams are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you want to understand more of it then you can look at the codebase for any fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===

On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR

________________________________
Date: Wed, 5 Mar 2014 14:17:24 +0530

Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com<ma...@gmail.com>
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?"

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR
________________________________
From: dwivedishashwat@gmail.com<ma...@gmail.com>
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
CC: radhe.krishna.radhe@live.com<ma...@live.com>

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types

i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


Warm Regards_∞_
Shashwat Shriparv
[http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<ht...@yahoo.com>


On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>> wrote:
Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.
Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).

Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR




--
Nitin Pawar



--
Nitin Pawar

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

RE: Streaming data access in HDFS: Design Feature

Posted by Nirmal Kumar <ni...@impetus.co.in>.
All,

IMHO, Streaming data access is like reading data continuously rather than in packets\chunks in traditional FS involving seek time for small blocks.
You can consider it as a Spout which emits water continuously.
Since in HDFS the data blocks are very large compared to the traditional FS the seek time is less.
For an input file of 10240 MB(10 GB) in size and a block size of 64 MB we will have just 160 blocks.
Now consider the same 10GB file in a traditional FS where the block size is 512 bytes. This will do 20971520 blocks and seek time for each will some into picture.

However, with HDFS the NameNode contains an in-memory directory of where all the blocks for a file(+ their replicas) are stored across the DataNodes in the cluster.
For reading, the client code asks the NameNode for a list of blocks and then reads the blocks sequentially block by block(large blocks ~ 64MB).
The data is thus "streamed" off the hard drive by maintaining the maximum I/O rate.

Let me know of your opinion and correct me if I not correct.

Thanks,
-Nirmal

From: Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Wednesday, March 05, 2014 2:37 PM
To: user@hadoop.apache.org
Subject: Re: Streaming data access in HDFS: Design Feature

Hadoop streaming  allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. In other words, you need not need to learn java programming for writing simple mapreduce program.

Where as streaming data access in HDFS is totally different. When mapreduce framework tries to read/write data from/to hdfs blocks, its done by byte streams. Bytes are always appended to the end of a stream, and byte streams are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you want to understand more of it then you can look at the codebase for any fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===

On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR

________________________________
Date: Wed, 5 Mar 2014 14:17:24 +0530

Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com<ma...@gmail.com>
To: user@hadoop.apache.org<ma...@hadoop.apache.org>

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?"

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>> wrote:
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR
________________________________
From: dwivedishashwat@gmail.com<ma...@gmail.com>
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
CC: radhe.krishna.radhe@live.com<ma...@live.com>

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types

i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


Warm Regards_∞_
Shashwat Shriparv
[http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<ht...@yahoo.com>


On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>> wrote:
Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.
Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).

Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR




--
Nitin Pawar



--
Nitin Pawar

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
Hadoop streaming  allows you to create and run Map/Reduce jobs with any
executable or script as the mapper and/or the reducer. In other words, you
need not need to learn java programming for writing simple mapreduce
program.

Where as streaming data access in HDFS is totally different. When mapreduce
framework tries to read/write data from/to hdfs blocks, its done by byte
streams. Bytes are always appended to the end of a stream, and byte streams
are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you
want to understand more of it then you can look at the codebase for any
fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===


On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Nitin,
>
> I believe *Hadoop Streaming* is different from *Streaming Data Access* in
> HDFS.
>
> We usually copy the data in HDFS and then the MR application reads the
> data through Map and Reduce tasks.
> I need to clear about WHAT and HOW is done in *Streaming Data Access* in
> HDFS.
>
> Thanks,
> RR
>
>
> ------------------------------
> Date: Wed, 5 Mar 2014 14:17:24 +0530
>
> Subject: Re: Streaming data access in HDFS: Design Feature
> From: nitinpawar432@gmail.com
> To: user@hadoop.apache.org
>
>
> are you asking "why data read/write from/to hdfs blocks via mapreduce
> framework  is done in streaming manner?"
>
>
> On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
Hadoop streaming  allows you to create and run Map/Reduce jobs with any
executable or script as the mapper and/or the reducer. In other words, you
need not need to learn java programming for writing simple mapreduce
program.

Where as streaming data access in HDFS is totally different. When mapreduce
framework tries to read/write data from/to hdfs blocks, its done by byte
streams. Bytes are always appended to the end of a stream, and byte streams
are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you
want to understand more of it then you can look at the codebase for any
fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===


On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Nitin,
>
> I believe *Hadoop Streaming* is different from *Streaming Data Access* in
> HDFS.
>
> We usually copy the data in HDFS and then the MR application reads the
> data through Map and Reduce tasks.
> I need to clear about WHAT and HOW is done in *Streaming Data Access* in
> HDFS.
>
> Thanks,
> RR
>
>
> ------------------------------
> Date: Wed, 5 Mar 2014 14:17:24 +0530
>
> Subject: Re: Streaming data access in HDFS: Design Feature
> From: nitinpawar432@gmail.com
> To: user@hadoop.apache.org
>
>
> are you asking "why data read/write from/to hdfs blocks via mapreduce
> framework  is done in streaming manner?"
>
>
> On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
Hadoop streaming  allows you to create and run Map/Reduce jobs with any
executable or script as the mapper and/or the reducer. In other words, you
need not need to learn java programming for writing simple mapreduce
program.

Where as streaming data access in HDFS is totally different. When mapreduce
framework tries to read/write data from/to hdfs blocks, its done by byte
streams. Bytes are always appended to the end of a stream, and byte streams
are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you
want to understand more of it then you can look at the codebase for any
fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===


On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Nitin,
>
> I believe *Hadoop Streaming* is different from *Streaming Data Access* in
> HDFS.
>
> We usually copy the data in HDFS and then the MR application reads the
> data through Map and Reduce tasks.
> I need to clear about WHAT and HOW is done in *Streaming Data Access* in
> HDFS.
>
> Thanks,
> RR
>
>
> ------------------------------
> Date: Wed, 5 Mar 2014 14:17:24 +0530
>
> Subject: Re: Streaming data access in HDFS: Design Feature
> From: nitinpawar432@gmail.com
> To: user@hadoop.apache.org
>
>
> are you asking "why data read/write from/to hdfs blocks via mapreduce
> framework  is done in streaming manner?"
>
>
> On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
Hadoop streaming  allows you to create and run Map/Reduce jobs with any
executable or script as the mapper and/or the reducer. In other words, you
need not need to learn java programming for writing simple mapreduce
program.

Where as streaming data access in HDFS is totally different. When mapreduce
framework tries to read/write data from/to hdfs blocks, its done by byte
streams. Bytes are always appended to the end of a stream, and byte streams
are guaranteed to be stored in the order written.
following code snippet shows how the steam data is written to HDFS. If you
want to understand more of it then you can look at the codebase for any
fileformat like sequencefile format.
Hope this helps a  bit

===
// Create a new file and write data to it.
    FSDataOutputStream out = fileSystem.create(path);
    InputStream in = new BufferedInputStream(new FileInputStream(
        new File(source)));

    byte[] b = new byte[1024];
    int numBytes = 0;
    while ((numBytes = in.read(b)) > 0) {
        out.write(b, 0, numBytes);
    }

    // Close all the file descripters
    in.close();
    out.close();
    fileSystem.close();
===


On Wed, Mar 5, 2014 at 2:25 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Nitin,
>
> I believe *Hadoop Streaming* is different from *Streaming Data Access* in
> HDFS.
>
> We usually copy the data in HDFS and then the MR application reads the
> data through Map and Reduce tasks.
> I need to clear about WHAT and HOW is done in *Streaming Data Access* in
> HDFS.
>
> Thanks,
> RR
>
>
> ------------------------------
> Date: Wed, 5 Mar 2014 14:17:24 +0530
>
> Subject: Re: Streaming data access in HDFS: Design Feature
> From: nitinpawar432@gmail.com
> To: user@hadoop.apache.org
>
>
> are you asking "why data read/write from/to hdfs blocks via mapreduce
> framework  is done in streaming manner?"
>
>
> On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.  
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR


Date: Wed, 5 Mar 2014 14:17:24 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com
To: user@hadoop.apache.org

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?" 

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com> wrote:




Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams

as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can

handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair

is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by

key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com

Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com


Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 



i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 



Warm Regards_∞_



 Shashwat Shriparv













On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:






Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.



Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).




Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  


-- 
Nitin Pawar

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.  
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR


Date: Wed, 5 Mar 2014 14:17:24 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com
To: user@hadoop.apache.org

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?" 

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com> wrote:




Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams

as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can

handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair

is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by

key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com

Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com


Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 



i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 



Warm Regards_∞_



 Shashwat Shriparv













On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:






Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.



Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).




Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  


-- 
Nitin Pawar

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.  
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR


Date: Wed, 5 Mar 2014 14:17:24 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com
To: user@hadoop.apache.org

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?" 

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com> wrote:




Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams

as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can

handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair

is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by

key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com

Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com


Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 



i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 



Warm Regards_∞_



 Shashwat Shriparv













On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:






Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.



Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).




Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  


-- 
Nitin Pawar

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Nitin,

I believe Hadoop Streaming is different from Streaming Data Access in HDFS.

We usually copy the data in HDFS and then the MR application reads the data through Map and Reduce tasks.  
I need to clear about WHAT and HOW is done in Streaming Data Access in HDFS.

Thanks,
RR


Date: Wed, 5 Mar 2014 14:17:24 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
From: nitinpawar432@gmail.com
To: user@hadoop.apache.org

are you asking "why data read/write from/to hdfs blocks via mapreduce framework  is done in streaming manner?" 

On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com> wrote:




Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams

as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can

handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair

is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by

key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com

Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com


Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 



i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 



Warm Regards_∞_



 Shashwat Shriparv













On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:






Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.



Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).




Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  


-- 
Nitin Pawar

 		 	   		  

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"


On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>


-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"


On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>


-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"


On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>


-- 
Nitin Pawar

Re: Streaming data access in HDFS: Design Feature

Posted by Nitin Pawar <ni...@gmail.com>.
are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"


On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <ra...@live.com>wrote:

> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>


-- 
Nitin Pawar

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 


i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 


Warm Regards_∞_


 Shashwat Shriparv










On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:





Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.


Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).



Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 


i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 


Warm Regards_∞_


 Shashwat Shriparv










On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:





Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.


Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).



Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 


i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 


Warm Regards_∞_


 Shashwat Shriparv










On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:





Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.


Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).



Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  

RE: Streaming data access in HDFS: Design Feature

Posted by Radhe Radhe <ra...@live.com>.
Hi Shashwat,

This is an excerpt from Hadoop The Definitive Guide--Tom White
Hadoop Streaming
Hadoop provides an API to MapReduce that allows you to write your map and reduce
functions in languages other than Java. Hadoop Streaming uses Unix standard streams
as the interface between Hadoop and your program, so you can use any language that
can read standard input and write to standard output to write your MapReduce
program.
Streaming is naturally suited for text processing (although, as of version 0.21.0, it can
handle binary streams, too), and when used in text mode, it has a line-oriented view of
data. Map input data is passed over standard input to your map function, which processes
it line by line and writes lines to standard output. A map output key-value pair
is written as a single tab-delimited line. Input to the reduce function is in the same
format—a tab-separated key-value pair—passed over standard input. The reduce function
reads lines from standard input, which the framework guarantees are sorted by
key, and writes its results to standard output.

I think this is not what I am asking for.

Thanks.
-RR

From: dwivedishashwat@gmail.com
Date: Wed, 5 Mar 2014 13:47:09 +0530
Subject: Re: Streaming data access in HDFS: Design Feature
To: user@hadoop.apache.org
CC: radhe.krishna.radhe@live.com

Streaming means process it as its coming to HDFS, like where in hadoop this hadoop streaming enable hadoop to receive data using executable of different types 


i hope you have already read this : http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming 


Warm Regards_∞_


 Shashwat Shriparv










On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com> wrote:





Hello All,

Can anyone please explain what we mean by Streaming data access in HDFS.

Data is usually copied to HDFS and in HDFS the data is splitted across DataNodes in blocks.


Say for example, I have an input file of 10240 MB(10 GB) in size and a block size of 64 MB. Then there will be 160 blocks.
These blocks will be distributed across DataNodes in blocks.
Now the Mappers will read data from these DataNodes keeping the data locality feature in mind(i.e. blocks local to a DataNode will be read by the map tasks running in that DataNode).



Can you please point me where is the "Streaming data access in HDFS" is coming into picture here?

Thanks,
RR
 		 	   		  

 		 	   		  

Re: Streaming data access in HDFS: Design Feature

Posted by shashwat shriparv <dw...@gmail.com>.
Streaming means process it as its coming to HDFS, like where in hadoop this
hadoop streaming enable hadoop to receive data using executable of
different types

i hope you have already read this :
http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>



On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:

> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>

Re: Streaming data access in HDFS: Design Feature

Posted by shashwat shriparv <dw...@gmail.com>.
Streaming means process it as its coming to HDFS, like where in hadoop this
hadoop streaming enable hadoop to receive data using executable of
different types

i hope you have already read this :
http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>



On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:

> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>

Re: Streaming data access in HDFS: Design Feature

Posted by shashwat shriparv <dw...@gmail.com>.
Streaming means process it as its coming to HDFS, like where in hadoop this
hadoop streaming enable hadoop to receive data using executable of
different types

i hope you have already read this :
http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>



On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:

> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>

Re: Streaming data access in HDFS: Design Feature

Posted by shashwat shriparv <dw...@gmail.com>.
Streaming means process it as its coming to HDFS, like where in hadoop this
hadoop streaming enable hadoop to receive data using executable of
different types

i hope you have already read this :
http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>



On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <ra...@live.com>wrote:

> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>