You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2014/02/04 03:33:15 UTC

DistCP : Is it gauranteed to work for any two uri schemes?

Hi folks:

I've been thinking about the AWS S3DistCP class and am wondering : is
distcp built to work between any two hadoop file system classes ?

Or is it implicitly built mainly to work to copy between to HDFS file
sytems.

I've havent found many examples online with different URI schemes.

With emerging HDFS alternatives, I'd be interested in ways to otimize IO
between different filesystems using distcp.

-- 
Jay Vyas
http://jayunit100.blogspot.com

RE: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by java8964 <ja...@hotmail.com>.
Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS for Hadoop, DistCP should work. One thing is that all the required library (including any conf files) needs to be in the classpath, if they are not available in the runtime cluster.
Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp.
Yong

> From: harsh@cloudera.com
> Date: Tue, 4 Feb 2014 19:03:15 +0530
> Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes?
> To: user@hadoop.apache.org
> 
> Overall the whole DistCp utility is devoid of any HDFS specific items,
> but does have some (mostly skippable) checks pertaining to FS level
> features such as permissions, checksums, etc.. It should and does work
> with any valid URI scheme that the libraries understand to be valid
> FSes today.
> 
> On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi folks:
> >
> > I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> > built to work between any two hadoop file system classes ?
> >
> > Or is it implicitly built mainly to work to copy between to HDFS file
> > sytems.
> >
> > I've havent found many examples online with different URI schemes.
> >
> > With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> > between different filesystems using distcp.
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by java8964 <ja...@hotmail.com>.
Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS for Hadoop, DistCP should work. One thing is that all the required library (including any conf files) needs to be in the classpath, if they are not available in the runtime cluster.
Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp.
Yong

> From: harsh@cloudera.com
> Date: Tue, 4 Feb 2014 19:03:15 +0530
> Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes?
> To: user@hadoop.apache.org
> 
> Overall the whole DistCp utility is devoid of any HDFS specific items,
> but does have some (mostly skippable) checks pertaining to FS level
> features such as permissions, checksums, etc.. It should and does work
> with any valid URI scheme that the libraries understand to be valid
> FSes today.
> 
> On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi folks:
> >
> > I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> > built to work between any two hadoop file system classes ?
> >
> > Or is it implicitly built mainly to work to copy between to HDFS file
> > sytems.
> >
> > I've havent found many examples online with different URI schemes.
> >
> > With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> > between different filesystems using distcp.
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by java8964 <ja...@hotmail.com>.
Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS for Hadoop, DistCP should work. One thing is that all the required library (including any conf files) needs to be in the classpath, if they are not available in the runtime cluster.
Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp.
Yong

> From: harsh@cloudera.com
> Date: Tue, 4 Feb 2014 19:03:15 +0530
> Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes?
> To: user@hadoop.apache.org
> 
> Overall the whole DistCp utility is devoid of any HDFS specific items,
> but does have some (mostly skippable) checks pertaining to FS level
> features such as permissions, checksums, etc.. It should and does work
> with any valid URI scheme that the libraries understand to be valid
> FSes today.
> 
> On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi folks:
> >
> > I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> > built to work between any two hadoop file system classes ?
> >
> > Or is it implicitly built mainly to work to copy between to HDFS file
> > sytems.
> >
> > I've havent found many examples online with different URI schemes.
> >
> > With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> > between different filesystems using distcp.
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

RE: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by java8964 <ja...@hotmail.com>.
Just as Harsh pointed out, as long as the underline DFS provides all the required API of DFS for Hadoop, DistCP should work. One thing is that all the required library (including any conf files) needs to be in the classpath, if they are not available in the runtime cluster.
Same as S3 file system works fine in DistCP, our project copied TBs data between CFS (Cassandra DFS) and HDFS, when we migrated the data from DFS to HDFS, using Distcp.
Yong

> From: harsh@cloudera.com
> Date: Tue, 4 Feb 2014 19:03:15 +0530
> Subject: Re: DistCP : Is it gauranteed to work for any two uri schemes?
> To: user@hadoop.apache.org
> 
> Overall the whole DistCp utility is devoid of any HDFS specific items,
> but does have some (mostly skippable) checks pertaining to FS level
> features such as permissions, checksums, etc.. It should and does work
> with any valid URI scheme that the libraries understand to be valid
> FSes today.
> 
> On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi folks:
> >
> > I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> > built to work between any two hadoop file system classes ?
> >
> > Or is it implicitly built mainly to work to copy between to HDFS file
> > sytems.
> >
> > I've havent found many examples online with different URI schemes.
> >
> > With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> > between different filesystems using distcp.
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> 
> 
> 
> -- 
> Harsh J
 		 	   		  

Re: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by Harsh J <ha...@cloudera.com>.
Overall the whole DistCp utility is devoid of any HDFS specific items,
but does have some (mostly skippable) checks pertaining to FS level
features such as permissions, checksums, etc.. It should and does work
with any valid URI scheme that the libraries understand to be valid
FSes today.

On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hi folks:
>
> I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> built to work between any two hadoop file system classes ?
>
> Or is it implicitly built mainly to work to copy between to HDFS file
> sytems.
>
> I've havent found many examples online with different URI schemes.
>
> With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> between different filesystems using distcp.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com



-- 
Harsh J

Re: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by Harsh J <ha...@cloudera.com>.
Overall the whole DistCp utility is devoid of any HDFS specific items,
but does have some (mostly skippable) checks pertaining to FS level
features such as permissions, checksums, etc.. It should and does work
with any valid URI scheme that the libraries understand to be valid
FSes today.

On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hi folks:
>
> I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> built to work between any two hadoop file system classes ?
>
> Or is it implicitly built mainly to work to copy between to HDFS file
> sytems.
>
> I've havent found many examples online with different URI schemes.
>
> With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> between different filesystems using distcp.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com



-- 
Harsh J

Re: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by Harsh J <ha...@cloudera.com>.
Overall the whole DistCp utility is devoid of any HDFS specific items,
but does have some (mostly skippable) checks pertaining to FS level
features such as permissions, checksums, etc.. It should and does work
with any valid URI scheme that the libraries understand to be valid
FSes today.

On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hi folks:
>
> I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> built to work between any two hadoop file system classes ?
>
> Or is it implicitly built mainly to work to copy between to HDFS file
> sytems.
>
> I've havent found many examples online with different URI schemes.
>
> With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> between different filesystems using distcp.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com



-- 
Harsh J

Re: DistCP : Is it gauranteed to work for any two uri schemes?

Posted by Harsh J <ha...@cloudera.com>.
Overall the whole DistCp utility is devoid of any HDFS specific items,
but does have some (mostly skippable) checks pertaining to FS level
features such as permissions, checksums, etc.. It should and does work
with any valid URI scheme that the libraries understand to be valid
FSes today.

On Tue, Feb 4, 2014 at 8:03 AM, Jay Vyas <ja...@gmail.com> wrote:
> Hi folks:
>
> I've been thinking about the AWS S3DistCP class and am wondering : is distcp
> built to work between any two hadoop file system classes ?
>
> Or is it implicitly built mainly to work to copy between to HDFS file
> sytems.
>
> I've havent found many examples online with different URI schemes.
>
> With emerging HDFS alternatives, I'd be interested in ways to otimize IO
> between different filesystems using distcp.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com



-- 
Harsh J