You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lex Toumbourou <le...@scrunch.com> on 2016/05/29 04:23:50 UTC

Network connectivity for CopyTable between clusters

Hi all,

I'm trying to run a large CopyTable job between clusters in totally
different datacenters and I'm trying to determine what network connectivity
is required here.

As per the Cloudera blog post about Copytable, I understand that the
network should be such that "MR TaskTrackers can access all the HBase and
ZK nodes in the destination cluster." So in practise that means that the
source task trackers should be able to access:

*  Zookeeper on port 2181
* the Master on its RPC port (16000)
* the Regions' on their RPC ports (16020)

Anything else I need to configure here? Does Hadoop on the source need to
talk to directly with the destination Hadoop etc?

Also, what's unclear to me is what I should be doing with DNS. I'm guessing
that the source cluster needs to be able to resolve the hostnames of remote
RegionServers and Master nodes as stored in Zookeeper. Anything else I need
to configure here?

Thanks for your time!

-- 
Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>

Re: Network connectivity for CopyTable between clusters

Posted by Lex Toumbourou <le...@scrunch.com>.
Great. Thank you, St.Ack.

On 30 May 2016 at 08:57, Stack <st...@duboce.net> wrote:

> On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou <le...@scrunch.com> wrote:
>
> > Hi all,
> >
> > I'm trying to run a large CopyTable job between clusters in totally
> > different datacenters and I'm trying to determine what network
> connectivity
> > is required here.
> >
> > As per the Cloudera blog post about Copytable, I understand that the
> > network should be such that "MR TaskTrackers can access all the HBase and
> > ZK nodes in the destination cluster." So in practise that means that the
> > source task trackers should be able to access:
> >
> > *  Zookeeper on port 2181
> > * the Master on its RPC port (16000)
> > * the Regions' on their RPC ports (16020)
> >
> >
> You'd have access to the UIs?
>
>
> > Anything else I need to configure here? Does Hadoop on the source need to
> > talk to directly with the destination Hadoop etc?
> >
> >
> Looking at code, it looks like it is just the source MR task doing bulk
> mutations against remote cluster.
>
>
>
> > Also, what's unclear to me is what I should be doing with DNS. I'm
> guessing
> > that the source cluster needs to be able to resolve the hostnames of
> remote
> > RegionServers and Master nodes as stored in Zookeeper. Anything else I
> need
> > to configure here?
> >
> >
> Yeah. Source HBase client is doing puts against remote cluster so that
> means being able to read the remote metatable and then being able to
> address whatever regionserver it finds there from the destination cluster.
>
> St.Ack
>
>
>
> > Thanks for your time!
> >
> > --
> > Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
> >
>



-- 
Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>

Re: Network connectivity for CopyTable between clusters

Posted by Stack <st...@duboce.net>.
On Sat, May 28, 2016 at 9:23 PM, Lex Toumbourou <le...@scrunch.com> wrote:

> Hi all,
>
> I'm trying to run a large CopyTable job between clusters in totally
> different datacenters and I'm trying to determine what network connectivity
> is required here.
>
> As per the Cloudera blog post about Copytable, I understand that the
> network should be such that "MR TaskTrackers can access all the HBase and
> ZK nodes in the destination cluster." So in practise that means that the
> source task trackers should be able to access:
>
> *  Zookeeper on port 2181
> * the Master on its RPC port (16000)
> * the Regions' on their RPC ports (16020)
>
>
You'd have access to the UIs?


> Anything else I need to configure here? Does Hadoop on the source need to
> talk to directly with the destination Hadoop etc?
>
>
Looking at code, it looks like it is just the source MR task doing bulk
mutations against remote cluster.



> Also, what's unclear to me is what I should be doing with DNS. I'm guessing
> that the source cluster needs to be able to resolve the hostnames of remote
> RegionServers and Master nodes as stored in Zookeeper. Anything else I need
> to configure here?
>
>
Yeah. Source HBase client is doing puts against remote cluster so that
means being able to read the remote metatable and then being able to
address whatever regionserver it finds there from the destination cluster.

St.Ack



> Thanks for your time!
>
> --
> Lex ToumbourouLead engineer at scrunch.com <http://scrunch.com/>
>