You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by prasenjit mukherjee <pr...@gmail.com> on 2010/01/23 17:57:19 UTC

distributing hadoop push

I have  hundreds of large files  ( ~ 100MB ) in a  /mnt/ location which is
shared by all my hadoop nodes. Was wondering if I could directly use "hadoop
distcp file:///mnt/data/tr* /input" to parallelize/distribute hadoop push.
Hadoop push is indeed becoming a bottle neck for me and any help in this
regard is greatly appreciated.  Currently I am using "hadoop -moveFromlocal
..." and it is taking too much of time.

-Thanks,
Prasen

Re: distributing hadoop push

Posted by Jason Venner <ja...@gmail.com>.
You can indeed use file:/// urls, when the mount point is shared.
Expect extreme io loading on the machines hosting that mount point ;)

On Sat, Jan 23, 2010 at 8:57 AM, prasenjit mukherjee
<pr...@gmail.com>wrote:

> I have  hundreds of large files  ( ~ 100MB ) in a  /mnt/ location which is
> shared by all my hadoop nodes. Was wondering if I could directly use
> "hadoop
> distcp file:///mnt/data/tr* /input" to parallelize/distribute hadoop push.
> Hadoop push is indeed becoming a bottle neck for me and any help in this
> regard is greatly appreciated.  Currently I am using "hadoop -moveFromlocal
> ..." and it is taking too much of time.
>
> -Thanks,
> Prasen
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals