You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Miles Osborne <mi...@inf.ed.ac.uk> on 2008/02/28 11:43:31 UTC
Cross-data centre DFS communication?
Currently, we have the following setup:
--cluster A, running Nutch: small RAM per node
--cluster B, just running Hadoop: lots of RAM per node
At some point in the future we will want cluster B to talk to cluster A, and
ideally this should be DFS-to-DFS
Is this possible? Or do we need to do something like:
Cluster A --> Unix filesystem --> Cluster B
via hadoop dfs -cat / -put operations etc
Thanks
Miles
--
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.
Re: Cross-data centre DFS communication?
Posted by Steve Sapovits <ss...@invitemedia.com>.
Owen O'Malley wrote:
> Sure, the info server on the name node of HDFS has a read-only interface
> that lists directories in xml and allows the client to read files over
> http. There is a FileSystem implementation that provides the client side
> interface to the xml/http access.
>
> To use it, you need a path with hftp as the protocol:
> hadoop distcp hftp://namenode1:50070/foo/bar hdfs://namenode2:8020/foo
Very useful. Thanks.
--
Steve Sapovits
Invite Media - http://www.invitemedia.com
ssapovits@invitemedia.com
Re: Cross-data centre DFS communication?
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Feb 28, 2008, at 8:20 AM, Steve Sapovits wrote:
> Can you further explain the hftp part of this? I'm not familiar
> with that. We have a similar need to go cross-data center.
Sure, the info server on the name node of HDFS has a read-only
interface that lists directories in xml and allows the client to read
files over http. There is a FileSystem implementation that provides
the client side interface to the xml/http access.
To use it, you need a path with hftp as the protocol:
hadoop distcp hftp://namenode1:50070/foo/bar hdfs://namenode2:8020/foo
> In an earlier post it
> was suggested that there was no map/reduce model for that so this
> sounds more like what we're looking for.
It isn't a good idea to run map/reduce jobs across clusters, so you
usually need to copy the data locally.
-- Owen
Re: Cross-data centre DFS communication?
Posted by Steve Sapovits <ss...@invitemedia.com>.
Owen O'Malley wrote:
> To copy between clusters, there is a tool called distcp. Look at
> "bin/hadoop distcp". It runs a map/reduce job that copies a group of
> files. It can also be used to copy between versions of hadoop, if the
> source file system is hftp, which uses xml to read hdfs.
Can you further explain the hftp part of this? I'm not familiar with that.
We have a similar need to go cross-data center. In an earlier post it
was suggested that there was no map/reduce model for that so this
sounds more like what we're looking for.
--
Steve Sapovits
Invite Media - http://www.invitemedia.com
ssapovits@invitemedia.com
Re: Cross-data centre DFS communication?
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Feb 28, 2008, at 2:43 AM, Miles Osborne wrote:
> Currently, we have the following setup:
>
> --cluster A, running Nutch: small RAM per node
>
> --cluster B, just running Hadoop: lots of RAM per node
>
> At some point in the future we will want cluster B to talk to
> cluster A, and
> ideally this should be DFS-to-DFS
>
> Is this possible? Or do we need to do something like:
>
> Cluster A --> Unix filesystem --> Cluster B
>
> via hadoop dfs -cat / -put operations etc
To copy between clusters, there is a tool called distcp. Look at "bin/
hadoop distcp". It runs a map/reduce job that copies a group of
files. It can also be used to copy between versions of hadoop, if the
source file system is hftp, which uses xml to read hdfs.
-- Owen