You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Agarwal, Nikhil" <Ni...@netapp.com> on 2013/05/31 08:37:08 UTC

MapReduce on Local FileSystem

Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the "slave" nodes are not able to access the "jobtoken" file which is present in the Hadoop.tmp.dir in "master" node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

Re: MapReduce on Local FileSystem

Posted by Kun Ling <lk...@gmail.com>.

Hi Agarwal,
   I once have similar questions, and have done some experiment. Here is my
experience:
1. For some applications over MR, like HBase, Hive, which does not need to
submit additional files to HDFS, file:///  could work well without any
problem (According to my test).

2. For simple MR applications, like TeraSort, there is some problems by
simply using file:///, since MR will maintain some MR-control files both in
shared FileSystem, and local file sytem in one list, and will lookup the
list for the file, and simply using file:/// will cause the shared FS looks
the same as local filesystem, while in fact, they are two different kinds
of filesystem, and have different path conversion-rules.

For the 2nd issue, you can just create a new shared filesystem class by
deriving the existing org.apache.hadoop.fs.FileSystem , I have create such
a  repository with an example filesystem class implementation(
https://github.com/Lingcc/hadoop-lingccfs ), hoping it is helpful to you.

yours,
Ling Kun.

On Fri, May 31, 2013 at 2:37 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

-- 
http://www.lingcc.com

Re: MapReduce on Local FileSystem

Posted by 王洪军 <wa...@gmail.com>.

Ingesting the data in HDFS is slow  ,Because it need a jvm process. But if
you don't use hdfs, you can't benifit from its features.   Without hdfs,the
big data will not be splited and distributed; I think  the initial time of
jvm is affordable if data is big, and hadoop is not good choice if the data
 is small.
file://   is cited local data, without distribution, other tasktracker
can't cite it until you copy it to the node all tasktrackers reside.


2013/5/31 Harsh J <ha...@cloudera.com>

> Then why not simply run with Write Replication Factor set to 1?
>
> On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
> <Ni...@netapp.com> wrote:
> > Hi,
> >
> >
> >
> > Thank you for your reply. One simple answer can be to reduce the time
> taken
> > for ingesting the data in HDFS.
> >
> >
> >
> > Regards,
> >
> > Nikhil
> >
> >
> >
> > From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> > Sent: Friday, May 31, 2013 12:50 PM
> > To: <us...@hadoop.apache.org>
> > Cc: user@hadoop.apache.org
> >
> >
> > Subject: Re: MapReduce on Local FileSystem
> >
> >
> >
> > Basic question. Why would u want to do that ? Also I think the Map R
> Hadoop
> > distribution has an NFS mountable HDFS
> >
> > Sanjay
> >
> > Sent from my iPhone
> >
> >
> > On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <
> Nikhil.Agarwal@netapp.com>
> > wrote:
> >
> > Hi,
> >
> >
> >
> > Is it possible to run MapReduce on multiple nodes using Local File system
> > (file:///)  ?
> >
> > I am able to run it in single node setup but in a multiple node setup the
> > “slave” nodes are not able to access the “jobtoken” file which is
> present in
> > the Hadoop.tmp.dir in “master” node.
> >
> >
> >
> > Please let me know if it is possible to do this.
> >
> >
> >
> > Thanks & Regards,
> >
> > Nikhil
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > ======================
> > This email message and any attachments are for the exclusive use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or distribution is
> > prohibited. If you are not the intended recipient, please contact the
> sender
> > by reply email and destroy all copies of the original message along with
> any
> > attachments, from your computer system. If you are the intended
> recipient,
> > please be advised that the content of this message is subject to access,
> > review and disclosure by the sender's Email System Administrator.
>
>
>
> --
> Harsh J
>

Re: MapReduce on Local FileSystem

Posted by 王洪军 <wa...@gmail.com>.

Ingesting the data in HDFS is slow  ,Because it need a jvm process. But if
you don't use hdfs, you can't benifit from its features.   Without hdfs,the
big data will not be splited and distributed; I think  the initial time of
jvm is affordable if data is big, and hadoop is not good choice if the data
 is small.
file://   is cited local data, without distribution, other tasktracker
can't cite it until you copy it to the node all tasktrackers reside.


2013/5/31 Harsh J <ha...@cloudera.com>

> Then why not simply run with Write Replication Factor set to 1?
>
> On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
> <Ni...@netapp.com> wrote:
> > Hi,
> >
> >
> >
> > Thank you for your reply. One simple answer can be to reduce the time
> taken
> > for ingesting the data in HDFS.
> >
> >
> >
> > Regards,
> >
> > Nikhil
> >
> >
> >
> > From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> > Sent: Friday, May 31, 2013 12:50 PM
> > To: <us...@hadoop.apache.org>
> > Cc: user@hadoop.apache.org
> >
> >
> > Subject: Re: MapReduce on Local FileSystem
> >
> >
> >
> > Basic question. Why would u want to do that ? Also I think the Map R
> Hadoop
> > distribution has an NFS mountable HDFS
> >
> > Sanjay
> >
> > Sent from my iPhone
> >
> >
> > On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <
> Nikhil.Agarwal@netapp.com>
> > wrote:
> >
> > Hi,
> >
> >
> >
> > Is it possible to run MapReduce on multiple nodes using Local File system
> > (file:///)  ?
> >
> > I am able to run it in single node setup but in a multiple node setup the
> > “slave” nodes are not able to access the “jobtoken” file which is
> present in
> > the Hadoop.tmp.dir in “master” node.
> >
> >
> >
> > Please let me know if it is possible to do this.
> >
> >
> >
> > Thanks & Regards,
> >
> > Nikhil
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > ======================
> > This email message and any attachments are for the exclusive use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or distribution is
> > prohibited. If you are not the intended recipient, please contact the
> sender
> > by reply email and destroy all copies of the original message along with
> any
> > attachments, from your computer system. If you are the intended
> recipient,
> > please be advised that the content of this message is subject to access,
> > review and disclosure by the sender's Email System Administrator.
>
>
>
> --
> Harsh J
>

Re: MapReduce on Local FileSystem

Posted by 王洪军 <wa...@gmail.com>.

Ingesting the data in HDFS is slow  ,Because it need a jvm process. But if
you don't use hdfs, you can't benifit from its features.   Without hdfs,the
big data will not be splited and distributed; I think  the initial time of
jvm is affordable if data is big, and hadoop is not good choice if the data
 is small.
file://   is cited local data, without distribution, other tasktracker
can't cite it until you copy it to the node all tasktrackers reside.


2013/5/31 Harsh J <ha...@cloudera.com>

> Then why not simply run with Write Replication Factor set to 1?
>
> On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
> <Ni...@netapp.com> wrote:
> > Hi,
> >
> >
> >
> > Thank you for your reply. One simple answer can be to reduce the time
> taken
> > for ingesting the data in HDFS.
> >
> >
> >
> > Regards,
> >
> > Nikhil
> >
> >
> >
> > From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> > Sent: Friday, May 31, 2013 12:50 PM
> > To: <us...@hadoop.apache.org>
> > Cc: user@hadoop.apache.org
> >
> >
> > Subject: Re: MapReduce on Local FileSystem
> >
> >
> >
> > Basic question. Why would u want to do that ? Also I think the Map R
> Hadoop
> > distribution has an NFS mountable HDFS
> >
> > Sanjay
> >
> > Sent from my iPhone
> >
> >
> > On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <
> Nikhil.Agarwal@netapp.com>
> > wrote:
> >
> > Hi,
> >
> >
> >
> > Is it possible to run MapReduce on multiple nodes using Local File system
> > (file:///)  ?
> >
> > I am able to run it in single node setup but in a multiple node setup the
> > “slave” nodes are not able to access the “jobtoken” file which is
> present in
> > the Hadoop.tmp.dir in “master” node.
> >
> >
> >
> > Please let me know if it is possible to do this.
> >
> >
> >
> > Thanks & Regards,
> >
> > Nikhil
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > ======================
> > This email message and any attachments are for the exclusive use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or distribution is
> > prohibited. If you are not the intended recipient, please contact the
> sender
> > by reply email and destroy all copies of the original message along with
> any
> > attachments, from your computer system. If you are the intended
> recipient,
> > please be advised that the content of this message is subject to access,
> > review and disclosure by the sender's Email System Administrator.
>
>
>
> --
> Harsh J
>

Re: MapReduce on Local FileSystem

Posted by 王洪军 <wa...@gmail.com>.

Ingesting the data in HDFS is slow  ,Because it need a jvm process. But if
you don't use hdfs, you can't benifit from its features.   Without hdfs,the
big data will not be splited and distributed; I think  the initial time of
jvm is affordable if data is big, and hadoop is not good choice if the data
 is small.
file://   is cited local data, without distribution, other tasktracker
can't cite it until you copy it to the node all tasktrackers reside.


2013/5/31 Harsh J <ha...@cloudera.com>

> Then why not simply run with Write Replication Factor set to 1?
>
> On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
> <Ni...@netapp.com> wrote:
> > Hi,
> >
> >
> >
> > Thank you for your reply. One simple answer can be to reduce the time
> taken
> > for ingesting the data in HDFS.
> >
> >
> >
> > Regards,
> >
> > Nikhil
> >
> >
> >
> > From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> > Sent: Friday, May 31, 2013 12:50 PM
> > To: <us...@hadoop.apache.org>
> > Cc: user@hadoop.apache.org
> >
> >
> > Subject: Re: MapReduce on Local FileSystem
> >
> >
> >
> > Basic question. Why would u want to do that ? Also I think the Map R
> Hadoop
> > distribution has an NFS mountable HDFS
> >
> > Sanjay
> >
> > Sent from my iPhone
> >
> >
> > On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <
> Nikhil.Agarwal@netapp.com>
> > wrote:
> >
> > Hi,
> >
> >
> >
> > Is it possible to run MapReduce on multiple nodes using Local File system
> > (file:///)  ?
> >
> > I am able to run it in single node setup but in a multiple node setup the
> > “slave” nodes are not able to access the “jobtoken” file which is
> present in
> > the Hadoop.tmp.dir in “master” node.
> >
> >
> >
> > Please let me know if it is possible to do this.
> >
> >
> >
> > Thanks & Regards,
> >
> > Nikhil
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > ======================
> > This email message and any attachments are for the exclusive use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. Any unauthorized review, use, disclosure or distribution is
> > prohibited. If you are not the intended recipient, please contact the
> sender
> > by reply email and destroy all copies of the original message along with
> any
> > attachments, from your computer system. If you are the intended
> recipient,
> > please be advised that the content of this message is subject to access,
> > review and disclosure by the sender's Email System Administrator.
>
>
>
> --
> Harsh J
>

Re: MapReduce on Local FileSystem

Posted by Harsh J <ha...@cloudera.com>.

Then why not simply run with Write Replication Factor set to 1?

On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
<Ni...@netapp.com> wrote:
> Hi,
>
>
>
> Thank you for your reply. One simple answer can be to reduce the time taken
> for ingesting the data in HDFS.
>
>
>
> Regards,
>
> Nikhil
>
>
>
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: Friday, May 31, 2013 12:50 PM
> To: <us...@hadoop.apache.org>
> Cc: user@hadoop.apache.org
>
>
> Subject: Re: MapReduce on Local FileSystem
>
>
>
> Basic question. Why would u want to do that ? Also I think the Map R Hadoop
> distribution has an NFS mountable HDFS
>
> Sanjay
>
> Sent from my iPhone
>
>
> On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>
> wrote:
>
> Hi,
>
>
>
> Is it possible to run MapReduce on multiple nodes using Local File system
> (file:///)  ?
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present in
> the Hadoop.tmp.dir in “master” node.
>
>
>
> Please let me know if it is possible to do this.
>
>
>
> Thanks & Regards,
>
> Nikhil
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.



-- 
Harsh J

Re: MapReduce on Local FileSystem

Posted by Harsh J <ha...@cloudera.com>.

Then why not simply run with Write Replication Factor set to 1?

On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
<Ni...@netapp.com> wrote:
> Hi,
>
>
>
> Thank you for your reply. One simple answer can be to reduce the time taken
> for ingesting the data in HDFS.
>
>
>
> Regards,
>
> Nikhil
>
>
>
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: Friday, May 31, 2013 12:50 PM
> To: <us...@hadoop.apache.org>
> Cc: user@hadoop.apache.org
>
>
> Subject: Re: MapReduce on Local FileSystem
>
>
>
> Basic question. Why would u want to do that ? Also I think the Map R Hadoop
> distribution has an NFS mountable HDFS
>
> Sanjay
>
> Sent from my iPhone
>
>
> On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>
> wrote:
>
> Hi,
>
>
>
> Is it possible to run MapReduce on multiple nodes using Local File system
> (file:///)  ?
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present in
> the Hadoop.tmp.dir in “master” node.
>
>
>
> Please let me know if it is possible to do this.
>
>
>
> Thanks & Regards,
>
> Nikhil
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.



-- 
Harsh J

Re: MapReduce on Local FileSystem

Posted by Harsh J <ha...@cloudera.com>.

Then why not simply run with Write Replication Factor set to 1?

On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
<Ni...@netapp.com> wrote:
> Hi,
>
>
>
> Thank you for your reply. One simple answer can be to reduce the time taken
> for ingesting the data in HDFS.
>
>
>
> Regards,
>
> Nikhil
>
>
>
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: Friday, May 31, 2013 12:50 PM
> To: <us...@hadoop.apache.org>
> Cc: user@hadoop.apache.org
>
>
> Subject: Re: MapReduce on Local FileSystem
>
>
>
> Basic question. Why would u want to do that ? Also I think the Map R Hadoop
> distribution has an NFS mountable HDFS
>
> Sanjay
>
> Sent from my iPhone
>
>
> On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>
> wrote:
>
> Hi,
>
>
>
> Is it possible to run MapReduce on multiple nodes using Local File system
> (file:///)  ?
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present in
> the Hadoop.tmp.dir in “master” node.
>
>
>
> Please let me know if it is possible to do this.
>
>
>
> Thanks & Regards,
>
> Nikhil
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.



-- 
Harsh J

Re: MapReduce on Local FileSystem

Posted by Harsh J <ha...@cloudera.com>.

Then why not simply run with Write Replication Factor set to 1?

On Fri, May 31, 2013 at 12:54 PM, Agarwal, Nikhil
<Ni...@netapp.com> wrote:
> Hi,
>
>
>
> Thank you for your reply. One simple answer can be to reduce the time taken
> for ingesting the data in HDFS.
>
>
>
> Regards,
>
> Nikhil
>
>
>
> From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
> Sent: Friday, May 31, 2013 12:50 PM
> To: <us...@hadoop.apache.org>
> Cc: user@hadoop.apache.org
>
>
> Subject: Re: MapReduce on Local FileSystem
>
>
>
> Basic question. Why would u want to do that ? Also I think the Map R Hadoop
> distribution has an NFS mountable HDFS
>
> Sanjay
>
> Sent from my iPhone
>
>
> On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>
> wrote:
>
> Hi,
>
>
>
> Is it possible to run MapReduce on multiple nodes using Local File system
> (file:///)  ?
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present in
> the Hadoop.tmp.dir in “master” node.
>
>
>
> Please let me know if it is possible to do this.
>
>
>
> Thanks & Regards,
>
> Nikhil
>
>
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply email and destroy all copies of the original message along with any
> attachments, from your computer system. If you are the intended recipient,
> please be advised that the content of this message is subject to access,
> review and disclosure by the sender's Email System Administrator.



-- 
Harsh J

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi
Whats the data per hour or per day u r looking to put into HDFS ?

For dumping source data into HDFS there are again few options

Option 1
=======
Have parallel threads dumping raw data into HDFS from your source

Option 2
=======
Design how your Objects will look and write code to convert raw input files into Sequence Files and then dump it into HDFS

The community may have more options….depends on your use case

Regards
sanjay


From: <Agarwal>, Nikhil <Ni...@netapp.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, May 31, 2013 12:24 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: RE: MapReduce on Local FileSystem

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi
Whats the data per hour or per day u r looking to put into HDFS ?

For dumping source data into HDFS there are again few options

Option 1
=======
Have parallel threads dumping raw data into HDFS from your source

Option 2
=======
Design how your Objects will look and write code to convert raw input files into Sequence Files and then dump it into HDFS

The community may have more options….depends on your use case

Regards
sanjay


From: <Agarwal>, Nikhil <Ni...@netapp.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, May 31, 2013 12:24 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: RE: MapReduce on Local FileSystem

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi
Whats the data per hour or per day u r looking to put into HDFS ?

For dumping source data into HDFS there are again few options

Option 1
=======
Have parallel threads dumping raw data into HDFS from your source

Option 2
=======
Design how your Objects will look and write code to convert raw input files into Sequence Files and then dump it into HDFS

The community may have more options….depends on your use case

Regards
sanjay


From: <Agarwal>, Nikhil <Ni...@netapp.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, May 31, 2013 12:24 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: RE: MapReduce on Local FileSystem

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Hi
Whats the data per hour or per day u r looking to put into HDFS ?

For dumping source data into HDFS there are again few options

Option 1
=======
Have parallel threads dumping raw data into HDFS from your source

Option 2
=======
Design how your Objects will look and write code to convert raw input files into Sequence Files and then dump it into HDFS

The community may have more options….depends on your use case

Regards
sanjay


From: <Agarwal>, Nikhil <Ni...@netapp.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, May 31, 2013 12:24 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: RE: MapReduce on Local FileSystem

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>>
Cc: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>
Cc: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the "slave" nodes are not able to access the "jobtoken" file which is present in the Hadoop.tmp.dir in "master" node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>
Cc: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the "slave" nodes are not able to access the "jobtoken" file which is present in the Hadoop.tmp.dir in "master" node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>
Cc: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the "slave" nodes are not able to access the "jobtoken" file which is present in the Hadoop.tmp.dir in "master" node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi,

Thank you for your reply. One simple answer can be to reduce the time taken for ingesting the data in HDFS.

Regards,
Nikhil

From: Sanjay Subramanian [mailto:Sanjay.Subramanian@wizecommerce.com]
Sent: Friday, May 31, 2013 12:50 PM
To: <us...@hadoop.apache.org>
Cc: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the "slave" nodes are not able to access the "jobtoken" file which is present in the Hadoop.tmp.dir in "master" node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:

Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:

Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

yeah , I meant nfs mount.

thanks,
Rahul


On Fri, May 31, 2013 at 12:42 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi Rahul,****
>
> ** **
>
> Can you please explain what do you mean by “filer directory mounted to
> all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid
> NFS-mount. With NFS-mount it is possible to do it.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>
> *From:* Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
> *Sent:* Friday, May 31, 2013 12:33 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: MapReduce on Local FileSystem****
>
> ** **
>
> Just a hunch. Can have a filer directory mounted to all the DN and then
> file:/// should be usuable in a distributed fashion. (Just a guess)****
>
> Thanks,****
>
> Rahul****
>
> ** **
>
> On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <
> Nikhil.Agarwal@netapp.com> wrote:****
>
> Hi, ****
>
>  ****
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
>  ****
>
> Please let me know if it is possible to do this.****
>
>  ****
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

yeah , I meant nfs mount.

thanks,
Rahul


On Fri, May 31, 2013 at 12:42 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi Rahul,****
>
> ** **
>
> Can you please explain what do you mean by “filer directory mounted to
> all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid
> NFS-mount. With NFS-mount it is possible to do it.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>
> *From:* Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
> *Sent:* Friday, May 31, 2013 12:33 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: MapReduce on Local FileSystem****
>
> ** **
>
> Just a hunch. Can have a filer directory mounted to all the DN and then
> file:/// should be usuable in a distributed fashion. (Just a guess)****
>
> Thanks,****
>
> Rahul****
>
> ** **
>
> On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <
> Nikhil.Agarwal@netapp.com> wrote:****
>
> Hi, ****
>
>  ****
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
>  ****
>
> Please let me know if it is possible to do this.****
>
>  ****
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

yeah , I meant nfs mount.

thanks,
Rahul


On Fri, May 31, 2013 at 12:42 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi Rahul,****
>
> ** **
>
> Can you please explain what do you mean by “filer directory mounted to
> all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid
> NFS-mount. With NFS-mount it is possible to do it.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>
> *From:* Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
> *Sent:* Friday, May 31, 2013 12:33 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: MapReduce on Local FileSystem****
>
> ** **
>
> Just a hunch. Can have a filer directory mounted to all the DN and then
> file:/// should be usuable in a distributed fashion. (Just a guess)****
>
> Thanks,****
>
> Rahul****
>
> ** **
>
> On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <
> Nikhil.Agarwal@netapp.com> wrote:****
>
> Hi, ****
>
>  ****
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
>  ****
>
> Please let me know if it is possible to do this.****
>
>  ****
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

yeah , I meant nfs mount.

thanks,
Rahul


On Fri, May 31, 2013 at 12:42 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi Rahul,****
>
> ** **
>
> Can you please explain what do you mean by “filer directory mounted to
> all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid
> NFS-mount. With NFS-mount it is possible to do it.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>
> *From:* Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
> *Sent:* Friday, May 31, 2013 12:33 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: MapReduce on Local FileSystem****
>
> ** **
>
> Just a hunch. Can have a filer directory mounted to all the DN and then
> file:/// should be usuable in a distributed fashion. (Just a guess)****
>
> Thanks,****
>
> Rahul****
>
> ** **
>
> On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <
> Nikhil.Agarwal@netapp.com> wrote:****
>
> Hi, ****
>
>  ****
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
>  ****
>
> Please let me know if it is possible to do this.****
>
>  ****
>
> Thanks & Regards,****
>
> Nikhil****
>
> ** **
>

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi Rahul,

Can you please explain what do you mean by “filer directory mounted to all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid NFS-mount. With NFS-mount it is possible to do it.

Thanks & Regards,
Nikhil

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Friday, May 31, 2013 12:33 PM
To: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Just a hunch. Can have a filer directory mounted to all the DN and then file:///<file:///\\> should be usuable in a distributed fashion. (Just a guess)
Thanks,
Rahul

On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi Rahul,

Can you please explain what do you mean by “filer directory mounted to all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid NFS-mount. With NFS-mount it is possible to do it.

Thanks & Regards,
Nikhil

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Friday, May 31, 2013 12:33 PM
To: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Just a hunch. Can have a filer directory mounted to all the DN and then file:///<file:///\\> should be usuable in a distributed fashion. (Just a guess)
Thanks,
Rahul

On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi Rahul,

Can you please explain what do you mean by “filer directory mounted to all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid NFS-mount. With NFS-mount it is possible to do it.

Thanks & Regards,
Nikhil

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Friday, May 31, 2013 12:33 PM
To: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Just a hunch. Can have a filer directory mounted to all the DN and then file:///<file:///\\> should be usuable in a distributed fashion. (Just a guess)
Thanks,
Rahul

On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

RE: MapReduce on Local FileSystem

Posted by "Agarwal, Nikhil" <Ni...@netapp.com>.

Hi Rahul,

Can you please explain what do you mean by “filer directory mounted to all the DN” ? Do you mean a NFS-mount? If yes then I want to avoid NFS-mount. With NFS-mount it is possible to do it.

Thanks & Regards,
Nikhil

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Friday, May 31, 2013 12:33 PM
To: user@hadoop.apache.org
Subject: Re: MapReduce on Local FileSystem

Just a hunch. Can have a filer directory mounted to all the DN and then file:///<file:///\\> should be usuable in a distributed fashion. (Just a guess)
Thanks,
Rahul

On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Ni...@netapp.com>> wrote:
Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Just a hunch. Can have a filer directory mounted to all the DN and then
file:/// should be usuable in a distributed fashion. (Just a guess)

Thanks,
Rahul


On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Just a hunch. Can have a filer directory mounted to all the DN and then
file:/// should be usuable in a distributed fashion. (Just a guess)

Thanks,
Rahul


On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Just a hunch. Can have a filer directory mounted to all the DN and then
file:/// should be usuable in a distributed fashion. (Just a guess)

Thanks,
Rahul


On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

Re: MapReduce on Local FileSystem

Posted by Kun Ling <lk...@gmail.com>.

Hi Agarwal,
   I once have similar questions, and have done some experiment. Here is my
experience:
1. For some applications over MR, like HBase, Hive, which does not need to
submit additional files to HDFS, file:///  could work well without any
problem (According to my test).

2. For simple MR applications, like TeraSort, there is some problems by
simply using file:///, since MR will maintain some MR-control files both in
shared FileSystem, and local file sytem in one list, and will lookup the
list for the file, and simply using file:/// will cause the shared FS looks
the same as local filesystem, while in fact, they are two different kinds
of filesystem, and have different path conversion-rules.

For the 2nd issue, you can just create a new shared filesystem class by
deriving the existing org.apache.hadoop.fs.FileSystem , I have create such
a  repository with an example filesystem class implementation(
https://github.com/Lingcc/hadoop-lingccfs ), hoping it is helpful to you.

yours,
Ling Kun.

On Fri, May 31, 2013 at 2:37 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

-- 
http://www.lingcc.com

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:

Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: MapReduce on Local FileSystem

Posted by Kun Ling <lk...@gmail.com>.

Hi Agarwal,
   I once have similar questions, and have done some experiment. Here is my
experience:
1. For some applications over MR, like HBase, Hive, which does not need to
submit additional files to HDFS, file:///  could work well without any
problem (According to my test).

2. For simple MR applications, like TeraSort, there is some problems by
simply using file:///, since MR will maintain some MR-control files both in
shared FileSystem, and local file sytem in one list, and will lookup the
list for the file, and simply using file:/// will cause the shared FS looks
the same as local filesystem, while in fact, they are two different kinds
of filesystem, and have different path conversion-rules.

For the 2nd issue, you can just create a new shared filesystem class by
deriving the existing org.apache.hadoop.fs.FileSystem , I have create such
a  repository with an example filesystem class implementation(
https://github.com/Lingcc/hadoop-lingccfs ), hoping it is helpful to you.

yours,
Ling Kun.

On Fri, May 31, 2013 at 2:37 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

-- 
http://www.lingcc.com

Re: MapReduce on Local FileSystem

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Just a hunch. Can have a filer directory mounted to all the DN and then
file:/// should be usuable in a distributed fashion. (Just a guess)

Thanks,
Rahul


On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil <Nikhil.Agarwal@netapp.com
> wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

Re: MapReduce on Local FileSystem

Posted by Kun Ling <lk...@gmail.com>.

Hi Agarwal,
   I once have similar questions, and have done some experiment. Here is my
experience:
1. For some applications over MR, like HBase, Hive, which does not need to
submit additional files to HDFS, file:///  could work well without any
problem (According to my test).

2. For simple MR applications, like TeraSort, there is some problems by
simply using file:///, since MR will maintain some MR-control files both in
shared FileSystem, and local file sytem in one list, and will lookup the
list for the file, and simply using file:/// will cause the shared FS looks
the same as local filesystem, while in fact, they are two different kinds
of filesystem, and have different path conversion-rules.

For the 2nd issue, you can just create a new shared filesystem class by
deriving the existing org.apache.hadoop.fs.FileSystem , I have create such
a  repository with an example filesystem class implementation(
https://github.com/Lingcc/hadoop-lingccfs ), hoping it is helpful to you.

yours,
Ling Kun.

On Fri, May 31, 2013 at 2:37 PM, Agarwal, Nikhil
<Ni...@netapp.com>wrote:

>  Hi, ****
>
> ** **
>
> Is it possible to run MapReduce on *multiple nodes* using Local File
> system (file:///)  ?****
>
> I am able to run it in single node setup but in a multiple node setup the
> “slave” nodes are not able to access the “jobtoken” file which is present
> in the Hadoop.tmp.dir in “master” node. ****
>
> ** **
>
> Please let me know if it is possible to do this.****
>
> ** **
>
> Thanks & Regards,****
>
> Nikhil****
>

-- 
http://www.lingcc.com

Re: MapReduce on Local FileSystem

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

Basic question. Why would u want to do that ? Also I think the Map R Hadoop distribution has an NFS mountable HDFS
Sanjay

Sent from my iPhone

On May 30, 2013, at 11:37 PM, "Agarwal, Nikhil" <Ni...@netapp.com>> wrote:

Hi,

Is it possible to run MapReduce on multiple nodes using Local File system (file:///<file:///\\>)  ?
I am able to run it in single node setup but in a multiple node setup the “slave” nodes are not able to access the “jobtoken” file which is present in the Hadoop.tmp.dir in “master” node.

Please let me know if it is possible to do this.

Thanks & Regards,
Nikhil

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.