You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by donmai <do...@gmail.com> on 2015/08/25 18:30:10 UTC

Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Hi,

I'm curious about how exactly region movement works with regard to data
transfer. To my understanding from the docs given an HDFS-backed cluster, a
region movement / transition involves changing things in meta only, all
data movement for locality is handled by HDFS. In the case where rootdir is
a shared file system, there shouldn't be any data movement with a region
reassignment, correct? I'm running into performance issues where region
assignment takes a very long time and I'm trying to figure out why.

Thanks!

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by donmai <do...@gmail.com>.

Got it, thanks!
On Aug 27, 2015 10:14 AM, "Ted Yu" <yu...@gmail.com> wrote:

> I looked at commit history of StochasticLoadBalancer.java.
> 0.98.10 should have most of the recent fixes.
>
> Can you capture a few jstack's when load balancer does the computation and
> pastebin them ?
>
> Please start a new thread since recent discussion is no longer about shared
> filesystem.
>
> If you can take a look at TestStochasticLoadBalancer and add a test which
> reproduces what you saw, that would help us troubleshoot.
>
> Cheers
>
> On Thu, Aug 27, 2015 at 6:56 AM, donmai <do...@gmail.com> wrote:
>
> > Very unbalanced due to the addition of a few nodes at 0 regions each.
> When
> > I ran balancer in hbase shell without these nodes and a balanced cluster
> > (+- 3 regions per node), balancer ran very quickly, around 3 seconds.
> >
> > On Thu, Aug 27, 2015 at 9:50 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > How balanced are the table regions in your cluster ?
> > >
> > > Cheers
> > >
> > > On Thu, Aug 27, 2015 at 6:15 AM, donmai <do...@gmail.com> wrote:
> > >
> > > > I figured out the issue - the reason wasn't actually region movement
> > > taking
> > > > a while, the balancer is actually the thing taking forever:
> > > >
> > > > 2015-08-27 12:50:13,582 DEBUG
> > > [hostname,60000,1440642755872-BalancerChore]
> > > > balancer.StochasticLoadBalancer: Could not find a better load balance
> > > > plan.  Tried 0 different configurations in 2211294ms, and did not
> find
> > > > anything with a computed cost less than 54.18640355329625
> > > >
> > > > After waiting for half an hour to an hour, only one region is ever
> > moved
> > > by
> > > > the balancer and this process is repeated. I'm using default settings
> > > with
> > > > regard to slop / overall balancing...any idea why it's taking so
> long?
> > > > Thanks!
> > > >
> > > > On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <an...@gmail.com>
> > > wrote:
> > > >
> > > > > AFAIK, region movement does not moves the data of region on the
> > > > > (distributed)FileSystem. It should only, update metadata of HBase.
> > > > > Did you check diskio stats during region movement?
> > > > >
> > > > > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Please see
> > http://hbase.apache.org/book.html#regions.arch.assignment
> > > > > >
> > > > > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > NFS
> > > > > > > 0.98.10
> > > > > > > Will get to you as soon as I am able, on travel
> > > > > > >
> > > > > > > Is my general understanding correct, though, that there
> shouldn't
> > > be
> > > > > any
> > > > > > > data movement from a region reassignment?
> > > > > > >
> > > > > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Can you give a bit more information:
> > > > > > > >
> > > > > > > > which filesystem you use
> > > > > > > > which hbase release you use
> > > > > > > > master log snippet for the long region assignment
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com>
> > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I'm curious about how exactly region movement works with
> > regard
> > > > to
> > > > > > data
> > > > > > > > > transfer. To my understanding from the docs given an
> > > HDFS-backed
> > > > > > > > cluster, a
> > > > > > > > > region movement / transition involves changing things in
> meta
> > > > only,
> > > > > > all
> > > > > > > > > data movement for locality is handled by HDFS. In the case
> > > where
> > > > > > > rootdir
> > > > > > > > is
> > > > > > > > > a shared file system, there shouldn't be any data movement
> > > with a
> > > > > > > region
> > > > > > > > > reassignment, correct? I'm running into performance issues
> > > where
> > > > > > region
> > > > > > > > > assignment takes a very long time and I'm trying to figure
> > out
> > > > why.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta
> > > > >
> > > >
> > >
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by Ted Yu <yu...@gmail.com>.

I looked at commit history of StochasticLoadBalancer.java.
0.98.10 should have most of the recent fixes.

Can you capture a few jstack's when load balancer does the computation and
pastebin them ?

Please start a new thread since recent discussion is no longer about shared
filesystem.

If you can take a look at TestStochasticLoadBalancer and add a test which
reproduces what you saw, that would help us troubleshoot.

Cheers

On Thu, Aug 27, 2015 at 6:56 AM, donmai <do...@gmail.com> wrote:

> Very unbalanced due to the addition of a few nodes at 0 regions each. When
> I ran balancer in hbase shell without these nodes and a balanced cluster
> (+- 3 regions per node), balancer ran very quickly, around 3 seconds.
>
> On Thu, Aug 27, 2015 at 9:50 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > How balanced are the table regions in your cluster ?
> >
> > Cheers
> >
> > On Thu, Aug 27, 2015 at 6:15 AM, donmai <do...@gmail.com> wrote:
> >
> > > I figured out the issue - the reason wasn't actually region movement
> > taking
> > > a while, the balancer is actually the thing taking forever:
> > >
> > > 2015-08-27 12:50:13,582 DEBUG
> > [hostname,60000,1440642755872-BalancerChore]
> > > balancer.StochasticLoadBalancer: Could not find a better load balance
> > > plan.  Tried 0 different configurations in 2211294ms, and did not find
> > > anything with a computed cost less than 54.18640355329625
> > >
> > > After waiting for half an hour to an hour, only one region is ever
> moved
> > by
> > > the balancer and this process is repeated. I'm using default settings
> > with
> > > regard to slop / overall balancing...any idea why it's taking so long?
> > > Thanks!
> > >
> > > On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <an...@gmail.com>
> > wrote:
> > >
> > > > AFAIK, region movement does not moves the data of region on the
> > > > (distributed)FileSystem. It should only, update metadata of HBase.
> > > > Did you check diskio stats during region movement?
> > > >
> > > > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > Please see
> http://hbase.apache.org/book.html#regions.arch.assignment
> > > > >
> > > > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com>
> wrote:
> > > > >
> > > > > > NFS
> > > > > > 0.98.10
> > > > > > Will get to you as soon as I am able, on travel
> > > > > >
> > > > > > Is my general understanding correct, though, that there shouldn't
> > be
> > > > any
> > > > > > data movement from a region reassignment?
> > > > > >
> > > > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Can you give a bit more information:
> > > > > > >
> > > > > > > which filesystem you use
> > > > > > > which hbase release you use
> > > > > > > master log snippet for the long region assignment
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm curious about how exactly region movement works with
> regard
> > > to
> > > > > data
> > > > > > > > transfer. To my understanding from the docs given an
> > HDFS-backed
> > > > > > > cluster, a
> > > > > > > > region movement / transition involves changing things in meta
> > > only,
> > > > > all
> > > > > > > > data movement for locality is handled by HDFS. In the case
> > where
> > > > > > rootdir
> > > > > > > is
> > > > > > > > a shared file system, there shouldn't be any data movement
> > with a
> > > > > > region
> > > > > > > > reassignment, correct? I'm running into performance issues
> > where
> > > > > region
> > > > > > > > assignment takes a very long time and I'm trying to figure
> out
> > > why.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by donmai <do...@gmail.com>.

Very unbalanced due to the addition of a few nodes at 0 regions each. When
I ran balancer in hbase shell without these nodes and a balanced cluster
(+- 3 regions per node), balancer ran very quickly, around 3 seconds.

On Thu, Aug 27, 2015 at 9:50 AM, Ted Yu <yu...@gmail.com> wrote:

> How balanced are the table regions in your cluster ?
>
> Cheers
>
> On Thu, Aug 27, 2015 at 6:15 AM, donmai <do...@gmail.com> wrote:
>
> > I figured out the issue - the reason wasn't actually region movement
> taking
> > a while, the balancer is actually the thing taking forever:
> >
> > 2015-08-27 12:50:13,582 DEBUG
> [hostname,60000,1440642755872-BalancerChore]
> > balancer.StochasticLoadBalancer: Could not find a better load balance
> > plan.  Tried 0 different configurations in 2211294ms, and did not find
> > anything with a computed cost less than 54.18640355329625
> >
> > After waiting for half an hour to an hour, only one region is ever moved
> by
> > the balancer and this process is repeated. I'm using default settings
> with
> > regard to slop / overall balancing...any idea why it's taking so long?
> > Thanks!
> >
> > On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <an...@gmail.com>
> wrote:
> >
> > > AFAIK, region movement does not moves the data of region on the
> > > (distributed)FileSystem. It should only, update metadata of HBase.
> > > Did you check diskio stats during region movement?
> > >
> > > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Please see http://hbase.apache.org/book.html#regions.arch.assignment
> > > >
> > > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com> wrote:
> > > >
> > > > > NFS
> > > > > 0.98.10
> > > > > Will get to you as soon as I am able, on travel
> > > > >
> > > > > Is my general understanding correct, though, that there shouldn't
> be
> > > any
> > > > > data movement from a region reassignment?
> > > > >
> > > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Can you give a bit more information:
> > > > > >
> > > > > > which filesystem you use
> > > > > > which hbase release you use
> > > > > > master log snippet for the long region assignment
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm curious about how exactly region movement works with regard
> > to
> > > > data
> > > > > > > transfer. To my understanding from the docs given an
> HDFS-backed
> > > > > > cluster, a
> > > > > > > region movement / transition involves changing things in meta
> > only,
> > > > all
> > > > > > > data movement for locality is handled by HDFS. In the case
> where
> > > > > rootdir
> > > > > > is
> > > > > > > a shared file system, there shouldn't be any data movement
> with a
> > > > > region
> > > > > > > reassignment, correct? I'm running into performance issues
> where
> > > > region
> > > > > > > assignment takes a very long time and I'm trying to figure out
> > why.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by Ted Yu <yu...@gmail.com>.

How balanced are the table regions in your cluster ?

Cheers

On Thu, Aug 27, 2015 at 6:15 AM, donmai <do...@gmail.com> wrote:

> I figured out the issue - the reason wasn't actually region movement taking
> a while, the balancer is actually the thing taking forever:
>
> 2015-08-27 12:50:13,582 DEBUG [hostname,60000,1440642755872-BalancerChore]
> balancer.StochasticLoadBalancer: Could not find a better load balance
> plan.  Tried 0 different configurations in 2211294ms, and did not find
> anything with a computed cost less than 54.18640355329625
>
> After waiting for half an hour to an hour, only one region is ever moved by
> the balancer and this process is repeated. I'm using default settings with
> regard to slop / overall balancing...any idea why it's taking so long?
> Thanks!
>
> On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <an...@gmail.com> wrote:
>
> > AFAIK, region movement does not moves the data of region on the
> > (distributed)FileSystem. It should only, update metadata of HBase.
> > Did you check diskio stats during region movement?
> >
> > On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Please see http://hbase.apache.org/book.html#regions.arch.assignment
> > >
> > > On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com> wrote:
> > >
> > > > NFS
> > > > 0.98.10
> > > > Will get to you as soon as I am able, on travel
> > > >
> > > > Is my general understanding correct, though, that there shouldn't be
> > any
> > > > data movement from a region reassignment?
> > > >
> > > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > Can you give a bit more information:
> > > > >
> > > > > which filesystem you use
> > > > > which hbase release you use
> > > > > master log snippet for the long region assignment
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm curious about how exactly region movement works with regard
> to
> > > data
> > > > > > transfer. To my understanding from the docs given an HDFS-backed
> > > > > cluster, a
> > > > > > region movement / transition involves changing things in meta
> only,
> > > all
> > > > > > data movement for locality is handled by HDFS. In the case where
> > > > rootdir
> > > > > is
> > > > > > a shared file system, there shouldn't be any data movement with a
> > > > region
> > > > > > reassignment, correct? I'm running into performance issues where
> > > region
> > > > > > assignment takes a very long time and I'm trying to figure out
> why.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by donmai <do...@gmail.com>.

I figured out the issue - the reason wasn't actually region movement taking
a while, the balancer is actually the thing taking forever:

2015-08-27 12:50:13,582 DEBUG [hostname,60000,1440642755872-BalancerChore]
balancer.StochasticLoadBalancer: Could not find a better load balance
plan.  Tried 0 different configurations in 2211294ms, and did not find
anything with a computed cost less than 54.18640355329625

After waiting for half an hour to an hour, only one region is ever moved by
the balancer and this process is repeated. I'm using default settings with
regard to slop / overall balancing...any idea why it's taking so long?
Thanks!

On Tue, Aug 25, 2015 at 4:46 PM, anil gupta <an...@gmail.com> wrote:

> AFAIK, region movement does not moves the data of region on the
> (distributed)FileSystem. It should only, update metadata of HBase.
> Did you check diskio stats during region movement?
>
> On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Please see http://hbase.apache.org/book.html#regions.arch.assignment
> >
> > On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com> wrote:
> >
> > > NFS
> > > 0.98.10
> > > Will get to you as soon as I am able, on travel
> > >
> > > Is my general understanding correct, though, that there shouldn't be
> any
> > > data movement from a region reassignment?
> > >
> > > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Can you give a bit more information:
> > > >
> > > > which filesystem you use
> > > > which hbase release you use
> > > > master log snippet for the long region assignment
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm curious about how exactly region movement works with regard to
> > data
> > > > > transfer. To my understanding from the docs given an HDFS-backed
> > > > cluster, a
> > > > > region movement / transition involves changing things in meta only,
> > all
> > > > > data movement for locality is handled by HDFS. In the case where
> > > rootdir
> > > > is
> > > > > a shared file system, there shouldn't be any data movement with a
> > > region
> > > > > reassignment, correct? I'm running into performance issues where
> > region
> > > > > assignment takes a very long time and I'm trying to figure out why.
> > > > >
> > > > > Thanks!
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by anil gupta <an...@gmail.com>.

AFAIK, region movement does not moves the data of region on the
(distributed)FileSystem. It should only, update metadata of HBase.
Did you check diskio stats during region movement?

On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu <yu...@gmail.com> wrote:

> Please see http://hbase.apache.org/book.html#regions.arch.assignment
>
> On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com> wrote:
>
> > NFS
> > 0.98.10
> > Will get to you as soon as I am able, on travel
> >
> > Is my general understanding correct, though, that there shouldn't be any
> > data movement from a region reassignment?
> >
> > On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Can you give a bit more information:
> > >
> > > which filesystem you use
> > > which hbase release you use
> > > master log snippet for the long region assignment
> > >
> > > Thanks
> > >
> > > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm curious about how exactly region movement works with regard to
> data
> > > > transfer. To my understanding from the docs given an HDFS-backed
> > > cluster, a
> > > > region movement / transition involves changing things in meta only,
> all
> > > > data movement for locality is handled by HDFS. In the case where
> > rootdir
> > > is
> > > > a shared file system, there shouldn't be any data movement with a
> > region
> > > > reassignment, correct? I'm running into performance issues where
> region
> > > > assignment takes a very long time and I'm trying to figure out why.
> > > >
> > > > Thanks!
> > > >
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by Ted Yu <yu...@gmail.com>.

Please see http://hbase.apache.org/book.html#regions.arch.assignment

On Tue, Aug 25, 2015 at 10:37 AM, donmai <do...@gmail.com> wrote:

> NFS
> 0.98.10
> Will get to you as soon as I am able, on travel
>
> Is my general understanding correct, though, that there shouldn't be any
> data movement from a region reassignment?
>
> On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Can you give a bit more information:
> >
> > which filesystem you use
> > which hbase release you use
> > master log snippet for the long region assignment
> >
> > Thanks
> >
> > On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I'm curious about how exactly region movement works with regard to data
> > > transfer. To my understanding from the docs given an HDFS-backed
> > cluster, a
> > > region movement / transition involves changing things in meta only, all
> > > data movement for locality is handled by HDFS. In the case where
> rootdir
> > is
> > > a shared file system, there shouldn't be any data movement with a
> region
> > > reassignment, correct? I'm running into performance issues where region
> > > assignment takes a very long time and I'm trying to figure out why.
> > >
> > > Thanks!
> > >
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by donmai <do...@gmail.com>.

NFS
0.98.10
Will get to you as soon as I am able, on travel

Is my general understanding correct, though, that there shouldn't be any
data movement from a region reassignment?

On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you give a bit more information:
>
> which filesystem you use
> which hbase release you use
> master log snippet for the long region assignment
>
> Thanks
>
> On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm curious about how exactly region movement works with regard to data
> > transfer. To my understanding from the docs given an HDFS-backed
> cluster, a
> > region movement / transition involves changing things in meta only, all
> > data movement for locality is handled by HDFS. In the case where rootdir
> is
> > a shared file system, there shouldn't be any data movement with a region
> > reassignment, correct? I'm running into performance issues where region
> > assignment takes a very long time and I'm trying to figure out why.
> >
> > Thanks!
> >
>

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

Posted by Ted Yu <yu...@gmail.com>.

Can you give a bit more information:

which filesystem you use
which hbase release you use
master log snippet for the long region assignment

Thanks

On Tue, Aug 25, 2015 at 9:30 AM, donmai <do...@gmail.com> wrote:

> Hi,
>
> I'm curious about how exactly region movement works with regard to data
> transfer. To my understanding from the docs given an HDFS-backed cluster, a
> region movement / transition involves changing things in meta only, all
> data movement for locality is handled by HDFS. In the case where rootdir is
> a shared file system, there shouldn't be any data movement with a region
> reassignment, correct? I'm running into performance issues where region
> assignment takes a very long time and I'm trying to figure out why.
>
> Thanks!
>