You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jimmy Xiang <jx...@cloudera.com> on 2014/03/14 20:22:07 UTC

ClusterConnection package private

In implementing HBASE-10569 (co-locating meta and master), I run into some
issue with the connections.

The issue is that ClusterConnection is package private (on purpose). I have
to create an adapter (see the patch here https://reviews.apache.org/r/19198/)
so that I can override some of the logic. Because meta and master are on
the same JVM, I'd like to bypass the network/RPC layer when master tries to
scan the meta table, or assign the meta region, and when the same
regionserver tries to send reports to the master.

I was wondering what we can do here. Is it a good solution to create an
adapter? That's kind of similar to make the connection public, right?

Thanks,
Jimmy

Re: ClusterConnection package private

Posted by Enis Söztutar <en...@gmail.com>.
I was in favor of co-locating, because we had the "meta is one region" for
so long, our regions are big, and we did not spend much time on master
redesign. However, in an ideal case, we should be going with the splittable
meta design from BT, and shoot for regions being sized around hdfs block
size (128 / 512M) and having millions of regions.The reason we currently
get away with single meta region is that, our regions can be 10-20GB, so
100K regions would be enough to address 1-2 PB data. It seems clear that we
do not want two state machines, one in master, and one in meta per region
which can diverge and make AM the hell that it is today. One way to ease
this is to move meta into master and ensure master in-memory == meta. The
other way would be to make master stateless and meta the only authoritative
source. I would vote for the latter.

Coming to the ClusterConnection, I thought that CoprocessorHConnection is
kind of similar. It should be fine to have an in-process ClusterConnection
implementation.

Enis


On Fri, Mar 14, 2014 at 3:23 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Taking advantage of region replicas will require the indirection and
> potential network hop. Could be a "short-circuit" local read optimization
> is possible, but I don't think it worth it for scanning meta.
>
> On Friday, March 14, 2014, Stack <st...@duboce.net> wrote:
>
> > On Fri, Mar 14, 2014 at 1:22 PM, Jimmy Xiang <jxiang@cloudera.com
> <javascript:;>>
> > wrote:
> >
> > > That means there will be many small meta regions. If we just have one
> > > instance of each region, that should help. But we are moving towards HA
> > > regions, right?
> > >
> > >
> > Even if the region is 'HA', there will be an indirection.
> >
> > So question stands, should we do this direct route at all?  There is a
> big
> > advantage?
> >
> > St.Ack
> >
>

Re: ClusterConnection package private

Posted by Nick Dimiduk <nd...@gmail.com>.
Taking advantage of region replicas will require the indirection and
potential network hop. Could be a "short-circuit" local read optimization
is possible, but I don't think it worth it for scanning meta.

On Friday, March 14, 2014, Stack <st...@duboce.net> wrote:

> On Fri, Mar 14, 2014 at 1:22 PM, Jimmy Xiang <jxiang@cloudera.com<javascript:;>>
> wrote:
>
> > That means there will be many small meta regions. If we just have one
> > instance of each region, that should help. But we are moving towards HA
> > regions, right?
> >
> >
> Even if the region is 'HA', there will be an indirection.
>
> So question stands, should we do this direct route at all?  There is a big
> advantage?
>
> St.Ack
>

Re: ClusterConnection package private

Posted by Stack <st...@duboce.net>.
On Fri, Mar 14, 2014 at 1:22 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> That means there will be many small meta regions. If we just have one
> instance of each region, that should help. But we are moving towards HA
> regions, right?
>
>
Even if the region is 'HA', there will be an indirection.

So question stands, should we do this direct route at all?  There is a big
advantage?

St.Ack

Re: ClusterConnection package private

Posted by Jimmy Xiang <jx...@cloudera.com>.
That means there will be many small meta regions. If we just have one
instance of each region, that should help. But we are moving towards HA
regions, right?


On Fri, Mar 14, 2014 at 12:31 PM, Stack <st...@duboce.net> wrote:

> On Fri, Mar 14, 2014 at 12:22 PM, Jimmy Xiang <jx...@cloudera.com> wrote:
>
> > In implementing HBASE-10569 (co-locating meta and master), I run into
> some
> > issue with the connections.
> >
> > The issue is that ClusterConnection is package private (on purpose). I
> have
> > to create an adapter (see the patch here
> > https://reviews.apache.org/r/19198/)
> > so that I can override some of the logic. Because meta and master are on
> > the same JVM, I'd like to bypass the network/RPC layer when master tries
> to
> > scan the meta table, or assign the meta region, and when the same
> > regionserver tries to send reports to the master.
> >
> > I was wondering what we can do here. Is it a good solution to create an
> > adapter? That's kind of similar to make the connection public, right?
> >
>
>
> One thought I was having this morning about your fancy patch Jimmy is that
> making it so the master regionserver has the meta region only might not be
> the way to go.  Rather than have a single 'special' meta region, we might
> want to distribute it around the cluster -- i.e. let it split (like the
> accumulo fellows do) -- so that when meta is offline, it is less of a body
> blow.
>
> That would mean that though the meta was on the same server as the master,
> you'd access it as you would any other region.
>
> So, I'm asking if we should be going the above route at all?
>
> St.Ack
>

Re: ClusterConnection package private

Posted by Stack <st...@duboce.net>.
On Fri, Mar 14, 2014 at 12:22 PM, Jimmy Xiang <jx...@cloudera.com> wrote:

> In implementing HBASE-10569 (co-locating meta and master), I run into some
> issue with the connections.
>
> The issue is that ClusterConnection is package private (on purpose). I have
> to create an adapter (see the patch here
> https://reviews.apache.org/r/19198/)
> so that I can override some of the logic. Because meta and master are on
> the same JVM, I'd like to bypass the network/RPC layer when master tries to
> scan the meta table, or assign the meta region, and when the same
> regionserver tries to send reports to the master.
>
> I was wondering what we can do here. Is it a good solution to create an
> adapter? That's kind of similar to make the connection public, right?
>


One thought I was having this morning about your fancy patch Jimmy is that
making it so the master regionserver has the meta region only might not be
the way to go.  Rather than have a single 'special' meta region, we might
want to distribute it around the cluster -- i.e. let it split (like the
accumulo fellows do) -- so that when meta is offline, it is less of a body
blow.

That would mean that though the meta was on the same server as the master,
you'd access it as you would any other region.

So, I'm asking if we should be going the above route at all?

St.Ack