You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eugeny Morozov <em...@griddynamics.com> on 2012/12/18 09:01:55 UTC

Many scanner opening

Hello!

We faced an issue recently that the more map tasks are completed, the
longer it takes to complete one more map task.

In our architecture we have two scanners to read the table. The first one,
which is called 'outer' scanner is reading table and filter some rowkeys.
These rowkeys are used as a filter for second scanner - 'internal'. Thus we
constantly open 'internal' scanner with different filters.

As an additional symptoms we see that our cluster practically does nothing
- there is no CPU loading, no disk loading, no network, etc. Most of the
time it means we are waiting on some locks, but I'm not sure.

I would appreciate any ideas or suggestions to understand the case.
Thank you in advance.
-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Many scanner opening

Posted by Eugeny Morozov <em...@griddynamics.com>.
Lars,

We tried, but I didn't know there is such a contention issue.
We have two different column families. First one contains data, that are
partially used as a filter. And actual data lives in  second column family.

So, outer scanner (the first one) goes through the table and filter out
keys that contain required data. Then, these keys are moved to the inner
(second) scanner.
BTW, second scanner utilizes FuzzyRowFilter:
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/

We have pretty small cluster - only 18 mappers, but looks like it's enough
to get contention =)


On Thu, Dec 20, 2012 at 10:51 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Cool.
>
> You probably made it less likely that your scanners will scan the same
> HFile in parallel.
>
> -- Lars
>
>
>
> ________________________________
>  From: Eugeny Morozov <em...@griddynamics.com>
> To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Sent: Thursday, December 20, 2012 2:32 AM
> Subject: Re: Many scanner opening
>
> Lars,
>
> Cool stuff! Thanks a lot! I'm not sure I can apply the patch, cause we're
> using CDH-4.1.1, but increasing size of internal scanner does the trick -
> decreased number of scanners.
> At least temporarily it's good enough.
>
> Thanks!
>
> On Wed, Dec 19, 2012 at 6:23 AM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > You might have run into HBASE-7336.
> > (Not available in any official release, yet)
> >
> > If you're using 0.94 (and probably 0.92) you can just apply this patch
> > (it's save and simple).
> >
> >
> >
> > ________________________________
> >  From: Eugeny Morozov <em...@griddynamics.com>
> > To: user@hbase.apache.org
> > Sent: Tuesday, December 18, 2012 12:01 AM
> > Subject: Many scanner opening
> >
> > Hello!
> >
> > We faced an issue recently that the more map tasks are completed, the
> > longer it takes to complete one more map task.
> >
> > In our architecture we have two scanners to read the table. The first
> one,
> > which is called 'outer' scanner is reading table and filter some rowkeys.
> > These rowkeys are used as a filter for second scanner - 'internal'. Thus
> we
> > constantly open 'internal' scanner with different filters.
> >
> > As an additional symptoms we see that our cluster practically does
> nothing
> > - there is no CPU loading, no disk loading, no network, etc. Most of the
> > time it means we are waiting on some locks, but I'm not sure.
> >
> > I would appreciate any ideas or suggestions to understand the case.
> > Thank you in advance.
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > emorozov@griddynamics.com
> >
>
>
>
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Many scanner opening

Posted by lars hofhansl <lh...@yahoo.com>.
Cool.

You probably made it less likely that your scanners will scan the same HFile in parallel.

-- Lars



________________________________
 From: Eugeny Morozov <em...@griddynamics.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Thursday, December 20, 2012 2:32 AM
Subject: Re: Many scanner opening
 
Lars,

Cool stuff! Thanks a lot! I'm not sure I can apply the patch, cause we're
using CDH-4.1.1, but increasing size of internal scanner does the trick -
decreased number of scanners.
At least temporarily it's good enough.

Thanks!

On Wed, Dec 19, 2012 at 6:23 AM, lars hofhansl <lh...@yahoo.com> wrote:

> You might have run into HBASE-7336.
> (Not available in any official release, yet)
>
> If you're using 0.94 (and probably 0.92) you can just apply this patch
> (it's save and simple).
>
>
>
> ________________________________
>  From: Eugeny Morozov <em...@griddynamics.com>
> To: user@hbase.apache.org
> Sent: Tuesday, December 18, 2012 12:01 AM
> Subject: Many scanner opening
>
> Hello!
>
> We faced an issue recently that the more map tasks are completed, the
> longer it takes to complete one more map task.
>
> In our architecture we have two scanners to read the table. The first one,
> which is called 'outer' scanner is reading table and filter some rowkeys.
> These rowkeys are used as a filter for second scanner - 'internal'. Thus we
> constantly open 'internal' scanner with different filters.
>
> As an additional symptoms we see that our cluster practically does nothing
> - there is no CPU loading, no disk loading, no network, etc. Most of the
> time it means we are waiting on some locks, but I'm not sure.
>
> I would appreciate any ideas or suggestions to understand the case.
> Thank you in advance.
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Many scanner opening

Posted by Eugeny Morozov <em...@griddynamics.com>.
Lars,

Cool stuff! Thanks a lot! I'm not sure I can apply the patch, cause we're
using CDH-4.1.1, but increasing size of internal scanner does the trick -
decreased number of scanners.
At least temporarily it's good enough.

Thanks!

On Wed, Dec 19, 2012 at 6:23 AM, lars hofhansl <lh...@yahoo.com> wrote:

> You might have run into HBASE-7336.
> (Not available in any official release, yet)
>
> If you're using 0.94 (and probably 0.92) you can just apply this patch
> (it's save and simple).
>
>
>
> ________________________________
>  From: Eugeny Morozov <em...@griddynamics.com>
> To: user@hbase.apache.org
> Sent: Tuesday, December 18, 2012 12:01 AM
> Subject: Many scanner opening
>
> Hello!
>
> We faced an issue recently that the more map tasks are completed, the
> longer it takes to complete one more map task.
>
> In our architecture we have two scanners to read the table. The first one,
> which is called 'outer' scanner is reading table and filter some rowkeys.
> These rowkeys are used as a filter for second scanner - 'internal'. Thus we
> constantly open 'internal' scanner with different filters.
>
> As an additional symptoms we see that our cluster practically does nothing
> - there is no CPU loading, no disk loading, no network, etc. Most of the
> time it means we are waiting on some locks, but I'm not sure.
>
> I would appreciate any ideas or suggestions to understand the case.
> Thank you in advance.
> --
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Many scanner opening

Posted by lars hofhansl <lh...@yahoo.com>.
You might have run into HBASE-7336.
(Not available in any official release, yet)

If you're using 0.94 (and probably 0.92) you can just apply this patch (it's save and simple).



________________________________
 From: Eugeny Morozov <em...@griddynamics.com>
To: user@hbase.apache.org 
Sent: Tuesday, December 18, 2012 12:01 AM
Subject: Many scanner opening
 
Hello!

We faced an issue recently that the more map tasks are completed, the
longer it takes to complete one more map task.

In our architecture we have two scanners to read the table. The first one,
which is called 'outer' scanner is reading table and filter some rowkeys.
These rowkeys are used as a filter for second scanner - 'internal'. Thus we
constantly open 'internal' scanner with different filters.

As an additional symptoms we see that our cluster practically does nothing
- there is no CPU loading, no disk loading, no network, etc. Most of the
time it means we are waiting on some locks, but I'm not sure.

I would appreciate any ideas or suggestions to understand the case.
Thank you in advance.
-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
emorozov@griddynamics.com

Re: Many scanner opening

Posted by Michael Segel <mi...@hotmail.com>.
I'd suggest looking in to a schema design change.

On Dec 18, 2012, at 2:01 AM, Eugeny Morozov <em...@griddynamics.com> wrote:

> Hello!
> 
> We faced an issue recently that the more map tasks are completed, the
> longer it takes to complete one more map task.
> 
> In our architecture we have two scanners to read the table. The first one,
> which is called 'outer' scanner is reading table and filter some rowkeys.
> These rowkeys are used as a filter for second scanner - 'internal'. Thus we
> constantly open 'internal' scanner with different filters.
> 
> As an additional symptoms we see that our cluster practically does nothing
> - there is no CPU loading, no disk loading, no network, etc. Most of the
> time it means we are waiting on some locks, but I'm not sure.
> 
> I would appreciate any ideas or suggestions to understand the case.
> Thank you in advance.
> -- 
> Evgeny Morozov
> Developer Grid Dynamics
> Skype: morozov.evgeny
> www.griddynamics.com
> emorozov@griddynamics.com