You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Sundeep Kambhampati <ka...@cse.ohio-state.edu> on 2013/01/26 16:49:13 UTC

Difference between HDFS and local filesystem

Hi Users,
I am kind of new to MapReduce programming I am trying to understand the 
integration between MapReduce and HDFS.
I could understand MapReduce can use HDFS for data access. But is 
possible not to use HDFS at all and run MapReduce programs?
HDFS does file replication and partitioning. But if I use the following 
command to run the Example MaxTemperature

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature 
file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4

instead of

  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature 
usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs 
file system.

it uses local file system files and writing to local file system when I 
run in pseudo distributed mode. Since it is single node there is no 
problem of non local data.
What happens in a fully distributed mode. Will the files be copied to 
other machines or will it throw errors? will the files be replicated and 
will they be partitioned for running MapReduce if i use Localfile system?

Can someone please explain.

Regards
Sundeep

Re: Difference between HDFS and local filesystem

Posted by Harsh J <ha...@cloudera.com>.

The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>



-- 
Harsh J

Re: Difference between HDFS and local filesystem

Posted by Harsh J <ha...@cloudera.com>.

The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>



-- 
Harsh J

Re: Difference between HDFS and local filesystem

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Sundeep,

      As Harsh has said it doesn't make much sense to use MR with the
native FS. If you really want to leverage the power of Hadoop, you should
use MR+HDFS combo, as "Divide and Rule"  is Hadoop's strength. It's a
distributed system where each component gets its own piece of work to do in
parallel with other components, unlike the grid computing paradigm where
several machines work on the same piece together by sharing resources like
memory and so forth.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sat, Jan 26, 2013 at 10:16 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Yes. It's possible to use your local file system instead of HDFS. As you
> said, doesn't really matter when you are running a pseudo-distributed
> cluster. This is generally fine if your dataset is fairly small. The place
> where HDFS access really shines is if your file is huge, generally several
> TB or PB. That is when individual mappers can access different partitioned
> data on different nodes improving performance.
>
> In a fully distributed mode, your data gets partitioned and gets stored on
> several different nodes on HDFS.
> But when you use local data, the data is not replication or partitioned,
> it's just like accessing a single file.
>
>
> On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi Users,
>> I am kind of new to MapReduce programming I am trying to understand the
>> integration between MapReduce and HDFS.
>> I could understand MapReduce can use HDFS for data access. But is
>> possible not to use HDFS at all and run MapReduce programs?
>> HDFS does file replication and partitioning. But if I use the following
>> command to run the Example MaxTemperature
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>>
>> instead of
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
>> file system.
>>
>> it uses local file system files and writing to local file system when I
>> run in pseudo distributed mode. Since it is single node there is no problem
>> of non local data.
>> What happens in a fully distributed mode. Will the files be copied to
>> other machines or will it throw errors? will the files be replicated and
>> will they be partitioned for running MapReduce if i use Localfile system?
>>
>> Can someone please explain.
>>
>> Regards
>> Sundeep
>>
>>
>>
>>
>>
>

Re: Difference between HDFS and local filesystem

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Sundeep,

      As Harsh has said it doesn't make much sense to use MR with the
native FS. If you really want to leverage the power of Hadoop, you should
use MR+HDFS combo, as "Divide and Rule"  is Hadoop's strength. It's a
distributed system where each component gets its own piece of work to do in
parallel with other components, unlike the grid computing paradigm where
several machines work on the same piece together by sharing resources like
memory and so forth.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sat, Jan 26, 2013 at 10:16 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Yes. It's possible to use your local file system instead of HDFS. As you
> said, doesn't really matter when you are running a pseudo-distributed
> cluster. This is generally fine if your dataset is fairly small. The place
> where HDFS access really shines is if your file is huge, generally several
> TB or PB. That is when individual mappers can access different partitioned
> data on different nodes improving performance.
>
> In a fully distributed mode, your data gets partitioned and gets stored on
> several different nodes on HDFS.
> But when you use local data, the data is not replication or partitioned,
> it's just like accessing a single file.
>
>
> On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi Users,
>> I am kind of new to MapReduce programming I am trying to understand the
>> integration between MapReduce and HDFS.
>> I could understand MapReduce can use HDFS for data access. But is
>> possible not to use HDFS at all and run MapReduce programs?
>> HDFS does file replication and partitioning. But if I use the following
>> command to run the Example MaxTemperature
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>>
>> instead of
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
>> file system.
>>
>> it uses local file system files and writing to local file system when I
>> run in pseudo distributed mode. Since it is single node there is no problem
>> of non local data.
>> What happens in a fully distributed mode. Will the files be copied to
>> other machines or will it throw errors? will the files be replicated and
>> will they be partitioned for running MapReduce if i use Localfile system?
>>
>> Can someone please explain.
>>
>> Regards
>> Sundeep
>>
>>
>>
>>
>>
>

Re: Difference between HDFS and local filesystem

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Sundeep,

      As Harsh has said it doesn't make much sense to use MR with the
native FS. If you really want to leverage the power of Hadoop, you should
use MR+HDFS combo, as "Divide and Rule"  is Hadoop's strength. It's a
distributed system where each component gets its own piece of work to do in
parallel with other components, unlike the grid computing paradigm where
several machines work on the same piece together by sharing resources like
memory and so forth.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sat, Jan 26, 2013 at 10:16 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Yes. It's possible to use your local file system instead of HDFS. As you
> said, doesn't really matter when you are running a pseudo-distributed
> cluster. This is generally fine if your dataset is fairly small. The place
> where HDFS access really shines is if your file is huge, generally several
> TB or PB. That is when individual mappers can access different partitioned
> data on different nodes improving performance.
>
> In a fully distributed mode, your data gets partitioned and gets stored on
> several different nodes on HDFS.
> But when you use local data, the data is not replication or partitioned,
> it's just like accessing a single file.
>
>
> On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi Users,
>> I am kind of new to MapReduce programming I am trying to understand the
>> integration between MapReduce and HDFS.
>> I could understand MapReduce can use HDFS for data access. But is
>> possible not to use HDFS at all and run MapReduce programs?
>> HDFS does file replication and partitioning. But if I use the following
>> command to run the Example MaxTemperature
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>>
>> instead of
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
>> file system.
>>
>> it uses local file system files and writing to local file system when I
>> run in pseudo distributed mode. Since it is single node there is no problem
>> of non local data.
>> What happens in a fully distributed mode. Will the files be copied to
>> other machines or will it throw errors? will the files be replicated and
>> will they be partitioned for running MapReduce if i use Localfile system?
>>
>> Can someone please explain.
>>
>> Regards
>> Sundeep
>>
>>
>>
>>
>>
>

Re: Difference between HDFS and local filesystem

Posted by Mohammad Tariq <do...@gmail.com>.

Hello Sundeep,

      As Harsh has said it doesn't make much sense to use MR with the
native FS. If you really want to leverage the power of Hadoop, you should
use MR+HDFS combo, as "Divide and Rule"  is Hadoop's strength. It's a
distributed system where each component gets its own piece of work to do in
parallel with other components, unlike the grid computing paradigm where
several machines work on the same piece together by sharing resources like
memory and so forth.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sat, Jan 26, 2013 at 10:16 PM, Preethi Vinayak Ponangi <
vinayakponangi@gmail.com> wrote:

> Yes. It's possible to use your local file system instead of HDFS. As you
> said, doesn't really matter when you are running a pseudo-distributed
> cluster. This is generally fine if your dataset is fairly small. The place
> where HDFS access really shines is if your file is huge, generally several
> TB or PB. That is when individual mappers can access different partitioned
> data on different nodes improving performance.
>
> In a fully distributed mode, your data gets partitioned and gets stored on
> several different nodes on HDFS.
> But when you use local data, the data is not replication or partitioned,
> it's just like accessing a single file.
>
>
> On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
> kambhamp@cse.ohio-state.edu> wrote:
>
>> Hi Users,
>> I am kind of new to MapReduce programming I am trying to understand the
>> integration between MapReduce and HDFS.
>> I could understand MapReduce can use HDFS for data access. But is
>> possible not to use HDFS at all and run MapReduce programs?
>> HDFS does file replication and partitioning. But if I use the following
>> command to run the Example MaxTemperature
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>>
>> instead of
>>
>>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
>> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
>> file system.
>>
>> it uses local file system files and writing to local file system when I
>> run in pseudo distributed mode. Since it is single node there is no problem
>> of non local data.
>> What happens in a fully distributed mode. Will the files be copied to
>> other machines or will it throw errors? will the files be replicated and
>> will they be partitioned for running MapReduce if i use Localfile system?
>>
>> Can someone please explain.
>>
>> Regards
>> Sundeep
>>
>>
>>
>>
>>
>

Re: Difference between HDFS and local filesystem

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Yes. It's possible to use your local file system instead of HDFS. As you
said, doesn't really matter when you are running a pseudo-distributed
cluster. This is generally fine if your dataset is fairly small. The place
where HDFS access really shines is if your file is huge, generally several
TB or PB. That is when individual mappers can access different partitioned
data on different nodes improving performance.

In a fully distributed mode, your data gets partitioned and gets stored on
several different nodes on HDFS.
But when you use local data, the data is not replication or partitioned,
it's just like accessing a single file.

On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I
> run in pseudo distributed mode. Since it is single node there is no problem
> of non local data.
> What happens in a fully distributed mode. Will the files be copied to
> other machines or will it throw errors? will the files be replicated and
> will they be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>
>

Re: Difference between HDFS and local filesystem

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Yes. It's possible to use your local file system instead of HDFS. As you
said, doesn't really matter when you are running a pseudo-distributed
cluster. This is generally fine if your dataset is fairly small. The place
where HDFS access really shines is if your file is huge, generally several
TB or PB. That is when individual mappers can access different partitioned
data on different nodes improving performance.

In a fully distributed mode, your data gets partitioned and gets stored on
several different nodes on HDFS.
But when you use local data, the data is not replication or partitioned,
it's just like accessing a single file.

On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I
> run in pseudo distributed mode. Since it is single node there is no problem
> of non local data.
> What happens in a fully distributed mode. Will the files be copied to
> other machines or will it throw errors? will the files be replicated and
> will they be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>
>

Re: Difference between HDFS and local filesystem

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Yes. It's possible to use your local file system instead of HDFS. As you
said, doesn't really matter when you are running a pseudo-distributed
cluster. This is generally fine if your dataset is fairly small. The place
where HDFS access really shines is if your file is huge, generally several
TB or PB. That is when individual mappers can access different partitioned
data on different nodes improving performance.

In a fully distributed mode, your data gets partitioned and gets stored on
several different nodes on HDFS.
But when you use local data, the data is not replication or partitioned,
it's just like accessing a single file.

On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I
> run in pseudo distributed mode. Since it is single node there is no problem
> of non local data.
> What happens in a fully distributed mode. Will the files be copied to
> other machines or will it throw errors? will the files be replicated and
> will they be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>
>

Re: Difference between HDFS and local filesystem

Posted by Preethi Vinayak Ponangi <vi...@gmail.com>.

Yes. It's possible to use your local file system instead of HDFS. As you
said, doesn't really matter when you are running a pseudo-distributed
cluster. This is generally fine if your dataset is fairly small. The place
where HDFS access really shines is if your file is huge, generally several
TB or PB. That is when individual mappers can access different partitioned
data on different nodes improving performance.

In a fully distributed mode, your data gets partitioned and gets stored on
several different nodes on HDFS.
But when you use local data, the data is not replication or partitioned,
it's just like accessing a single file.

On Sat, Jan 26, 2013 at 9:49 AM, Sundeep Kambhampati <
kambhamp@cse.ohio-state.edu> wrote:

> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/**sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I
> run in pseudo distributed mode. Since it is single node there is no problem
> of non local data.
> What happens in a fully distributed mode. Will the files be copied to
> other machines or will it throw errors? will the files be replicated and
> will they be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>
>

Re: Difference between HDFS and local filesystem

Posted by Harsh J <ha...@cloudera.com>.

The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>



-- 
Harsh J

Re: Difference between HDFS and local filesystem

Posted by Harsh J <ha...@cloudera.com>.

The local filesystem has no sense of being 'distributed'. If you run a
distributed mode of Hadoop over file:// (Local FS), then unless the
file:// points being used itself is distributed (such as an NFS), then
your jobs will fail their tasks on all the nodes the referenced files
cannot be found on.

Essentially, for a distributed operation, MR relies on a distributed
file system and local filesystem is opposite of that.

On Sat, Jan 26, 2013 at 9:19 PM, Sundeep Kambhampati
<ka...@cse.ohio-state.edu> wrote:
> Hi Users,
> I am kind of new to MapReduce programming I am trying to understand the
> integration between MapReduce and HDFS.
> I could understand MapReduce can use HDFS for data access. But is possible
> not to use HDFS at all and run MapReduce programs?
> HDFS does file replication and partitioning. But if I use the following
> command to run the Example MaxTemperature
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> file:///usr/local/ncdcinput/sample.txt file:///usr/local/out4
>
> instead of
>
>  bin/hadoop jar /usr/local/hadoop/maxtemp.jar MaxTemperature
> usr/local/ncdcinput/sample.txt usr/local/out4     ->> this will use hdfs
> file system.
>
> it uses local file system files and writing to local file system when I run
> in pseudo distributed mode. Since it is single node there is no problem of
> non local data.
> What happens in a fully distributed mode. Will the files be copied to other
> machines or will it throw errors? will the files be replicated and will they
> be partitioned for running MapReduce if i use Localfile system?
>
> Can someone please explain.
>
> Regards
> Sundeep
>
>
>
>



-- 
Harsh J