You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by ameet kini <am...@gmail.com> on 2012/09/25 20:22:18 UTC

number of query threads for batch scanner

I have a table with 4 tablets on a given tablet server. Depending on the
numQueryThreads parameter below, I see a varying number of maximum
concurrent scans on that table. This maximum number varies from 1 to 3
(i.e., some values for numQueryThreads result in maximum concurrent scan of
1, some values result in 2 concurrent scans, etc.). Can someone shed light
on what is the relationship between numQueryThreads and number of
concurrent scans?

public BatchScanner createBatchScanner(String tableName,
                                       Authorizations authorizations,
                                       int numQueryThreads)

A follow-on question would be what is general rule of thumb for setting
numQueryThreads? Should it be set to the  # of hosted tablets expected to
be consumed by that BatchScanner? Should it be the # of tablet servers
expected to be hit by that BatchScanner? Something else?

Thanks,
Ameet

Re: number of query threads for batch scanner

Posted by Eric Newton <er...@gmail.com>.
The threads used by the batch scanner is largely used for spreading
I/O to different servers.

If you have 50 matching ranges, and they are on 25 machines, and you
have 10 threads, you won't get much parallelism.

If you have 50 matching ranges, and they are on 2 machines, and you
have 10 threads, you will get parallel queries.

But if you need parallelism on your tablet server because your data
seems to be uneven (4 tablets on one server, but 1 each on 10 other
servers) perhaps you need a different balancing strategy.

-Eric

On Wed, Sep 26, 2012 at 9:19 AM, ameet kini <am...@gmail.com> wrote:
>
> So I decided to try something different, and changed my splitting policy.
> This ended up with more tablets per tablet server. Interestingly, this
> bumped up my maximum concurrent scans on that tablet server. With about 19
> tablets, I was able to go up to 6 concurrent scans, which ended up using all
> my cores - happy! And I didn't change my numQueryThreads parameter from the
> already very high number.
>
> But that leaves me wondering whether the maximum number of concurrent scans
> on a given tablet server is related to the number of tablets hit by that
> scan on the tablet server. If true, that is interesting, and not what I'd
> expected. Given  that the underlying files are immutable, I'm not sure why
> there can't be, say, 4 concurrent scans on 1 tablet if there were 4 cores
> free to host those scans. What I'm seeing, as described above, is I need to
> further split my tablet into > 4 tablets in order to have 4 concurrent
> scans.
>
> Ameet
>
>
> On Tue, Sep 25, 2012 at 3:23 PM, ameet kini <am...@gmail.com> wrote:
>>
>> I should also state the not-so-obvious that my Range spans the entire
>> range of the four tablets in question.
>>
>> Ameet
>>
>> On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:
>>>
>>> Thanks William.
>>>
>>> The issue here is that without knowing how the numQueryThreads translates
>>> to the number of concurrent scans, I cannot effectively tune that parameter
>>> to maximize resource usage on the tablet server. What I'm seeing is that
>>> even though there are four tablets on the tablet server, my number of
>>> concurrent scans never exceeds 3. This is despite setting numQueryThreads to
>>> a very high number and having 8 cores on the tablet server. I suspect with 3
>>> concurrent scans and no garbage collection happening at that moment, most of
>>> the cores are sitting idle.
>>>
>>> Ameet
>>>
>>> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
>>> <wi...@accumulo.net> wrote:
>>>>
>>>> It should really be dependent upon the resources available to the
>>>> client. You can set an arbitrarily high number of threads, but you're still
>>>> bound by the number of parallel operations the CPU can make. I would assume
>>>> the sweet spot is somewhere around that number-- try doing a small bench
>>>> mark with 2, 4, 8, 16, etc threads and see where your performance starts to
>>>> level off.
>>>>
>>>>
>>>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Probably worth adding that the table mentioned below has a bunch of
>>>>> tablets on other tablet servers as well, which is why I'm using
>>>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>>>> number of a concurrent scans on a given tablet server.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I have a table with 4 tablets on a given tablet server. Depending on
>>>>>> the numQueryThreads parameter below, I see a varying number of maximum
>>>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>>>> on what is the relationship between numQueryThreads and number of concurrent
>>>>>> scans?
>>>>>>
>>>>>> public BatchScanner createBatchScanner(String tableName,
>>>>>>                                        Authorizations authorizations,
>>>>>>                                        int numQueryThreads)
>>>>>>
>>>>>> A follow-on question would be what is general rule of thumb for
>>>>>> setting numQueryThreads? Should it be set to the  # of hosted tablets
>>>>>> expected to be consumed by that BatchScanner? Should it be the # of tablet
>>>>>> servers expected to be hit by that BatchScanner? Something else?
>>>>>>
>>>>>> Thanks,
>>>>>> Ameet
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of query threads for batch scanner

Posted by ameet kini <am...@gmail.com>.
So I decided to try something different, and changed my splitting policy.
This ended up with more tablets per tablet server. Interestingly, this
bumped up my maximum concurrent scans on that tablet server. With about 19
tablets, I was able to go up to 6 concurrent scans, which ended up using
all my cores - happy! And I didn't change my numQueryThreads parameter from
the already very high number.

But that leaves me wondering whether the maximum number of concurrent scans
on a given tablet server is related to the number of tablets hit by that
scan on the tablet server. If true, that is interesting, and not what I'd
expected. Given  that the underlying files are immutable, I'm not sure why
there can't be, say, 4 concurrent scans on 1 tablet if there were 4 cores
free to host those scans. What I'm seeing, as described above, is I need to
further split my tablet into > 4 tablets in order to have 4 concurrent
scans.

Ameet


On Tue, Sep 25, 2012 at 3:23 PM, ameet kini <am...@gmail.com> wrote:

> I should also state the not-so-obvious that my Range spans the entire
> range of the four tablets in question.
>
> Ameet
>
> On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:
>
>> Thanks William.
>>
>> The issue here is that without knowing how the numQueryThreads translates
>> to the number of concurrent scans, I cannot effectively tune that parameter
>> to maximize resource usage on the tablet server. What I'm seeing is that
>> even though there are four tablets on the tablet server, my number of
>> concurrent scans never exceeds 3. This is despite setting numQueryThreads
>> to a very high number and having 8 cores on the tablet server. I suspect
>> with 3 concurrent scans and no garbage collection happening at that moment,
>> most of the cores are sitting idle.
>>
>> Ameet
>>
>> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <
>> wilhelm.von.cloud@accumulo.net> wrote:
>>
>>> It should really be dependent upon the resources available to the
>>> client. You can set an arbitrarily high number of threads, but you're still
>>> bound by the number of parallel operations the CPU can make. I would assume
>>> the sweet spot is somewhere around that number-- try doing a small bench
>>> mark with 2, 4, 8, 16, etc threads and see where your performance starts to
>>> level off.
>>>
>>>
>>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com>wrote:
>>>
>>>> Probably worth adding that the table mentioned below has a bunch of
>>>> tablets on other tablet servers as well, which is why I'm using
>>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>>> number of a concurrent scans on a given tablet server.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com>wrote:
>>>>
>>>>>
>>>>> I have a table with 4 tablets on a given tablet server. Depending on
>>>>> the numQueryThreads parameter below, I see a varying number of maximum
>>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>>> on what is the relationship between numQueryThreads and number of
>>>>> concurrent scans?
>>>>>
>>>>> public BatchScanner createBatchScanner(String tableName,
>>>>>                                        Authorizations authorizations,
>>>>>                                        int numQueryThreads)
>>>>>
>>>>> A follow-on question would be what is general rule of thumb for
>>>>> setting numQueryThreads? Should it be set to the  # of hosted tablets
>>>>> expected to be consumed by that BatchScanner? Should it be the # of tablet
>>>>> servers expected to be hit by that BatchScanner? Something else?
>>>>>
>>>>> Thanks,
>>>>> Ameet
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: number of query threads for batch scanner

Posted by ameet kini <am...@gmail.com>.
I should also state the not-so-obvious that my Range spans the entire range
of the four tablets in question.

Ameet

On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:

> Thanks William.
>
> The issue here is that without knowing how the numQueryThreads translates
> to the number of concurrent scans, I cannot effectively tune that parameter
> to maximize resource usage on the tablet server. What I'm seeing is that
> even though there are four tablets on the tablet server, my number of
> concurrent scans never exceeds 3. This is despite setting numQueryThreads
> to a very high number and having 8 cores on the tablet server. I suspect
> with 3 concurrent scans and no garbage collection happening at that moment,
> most of the cores are sitting idle.
>
> Ameet
>
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound
>> by the number of parallel operations the CPU can make. I would assume the
>> sweet spot is somewhere around that number-- try doing a small bench mark
>> with 2, 4, 8, 16, etc threads and see where your performance starts to
>> level off.
>>
>>
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com> wrote:
>>
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com> wrote:
>>>
>>>>
>>>> I have a table with 4 tablets on a given tablet server. Depending on
>>>> the numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of
>>>> concurrent scans?
>>>>
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>>
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>>> be consumed by that BatchScanner? Should it be the # of tablet servers
>>>> expected to be hit by that BatchScanner? Something else?
>>>>
>>>> Thanks,
>>>> Ameet
>>>>
>>>>
>>>>
>>>
>>
>

Re: number of query threads for batch scanner

Posted by Keith Turner <ke...@deenlo.com>.
On Fri, Sep 28, 2012 at 9:35 AM, ameet kini <am...@gmail.com> wrote:
>
> Thanks Eric and Keith.
>
> Is there any reason why the number of concurrent scans on a given tablet
> server depends on the number of tablets and not the number of cores on that
> tablet server? I'm looking at TabletServerBatchReaderIterator.doLookups.

Not really.  RFile has optimizations for seeking forward (ACCUMULO-473
has some numbers from an experiment I did).   So the ranges against an
individual tablet are sorted and seeked in order.   If you did break
up multiple ranges going to a single tablet, I think it would be best
to sort them and give threads sub-sequences of the sorted list to work
on.   This avoids multiple threads reading from the same rfile block
and doing redundant work to decode it.  Feel free to open a ticket to
explore this concept.

>
> Take Keith's example:
>
>  * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.
>
> Say, I had 8 cores on that tablet server and my tablet is large enough to
> warrant 8 concurrent scans. Sure, I can go about and further split my
> tablet, and get 8 concurrent scans - I ended up doing that. But is there any
> reason why 8 concurrent scans can't go against a single tablet? Maybe its
> difficult to estimate benefits of parallelism at that level, and its best
> left to users to tune the number of tablets, and base the level of
> parallelism on the number of tablets?
>
> Btw, the shell utility "merge -s <size>" rocks :)
>
> Thanks,
> Ameet
>
>
> On Fri, Sep 28, 2012 at 8:04 AM, Keith Turner <ke...@deenlo.com> wrote:
>>
>> On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:
>> > Thanks William.
>> >
>> > The issue here is that without knowing how the numQueryThreads
>> > translates to
>> > the number of concurrent scans, I cannot effectively tune that parameter
>> > to
>> > maximize resource usage on the tablet server. What I'm seeing is that
>> > even
>> > though there are four tablets on the tablet server, my number of
>> > concurrent
>> > scans never exceeds 3. This is despite setting numQueryThreads to a very
>> > high number and having 8 cores on the tablet server. I suspect with 3
>> > concurrent scans and no garbage collection happening at that moment,
>> > most of
>> > the cores are sitting idle.
>> >
>> > Ameet
>>
>> The amount if parallelism is determined by how your ranges map to
>> tablets. Below are some examples.
>>
>>  * For one range that maps to 10 tablets on 10 tablets severs, it will
>> execute 10 concurrent scans if numQueryThreads is >= 10.
>>  * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
>> will execute 10 concurrent scans if numQueryThreads is >= 10.
>>  * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
>> will execute 5 concurrent scans if numQueryThreads is 5.
>>  * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent
>> scan.
>>
>> If you have more query threads than tablet server, the client code
>> will try to execute concurrent scans on a single tablet server.
>>
>> You can look at TabletServerBatchReaderIterator.doLookups() for the
>> details.  In this method it creates QueryTask objects and places them
>> on a thread pool.  The size of the thread pool is the user specified
>> numQueryThreads.
>>
>> >
>> > On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
>> > <wi...@accumulo.net> wrote:
>> >>
>> >> It should really be dependent upon the resources available to the
>> >> client.
>> >> You can set an arbitrarily high number of threads, but you're still
>> >> bound by
>> >> the number of parallel operations the CPU can make. I would assume the
>> >> sweet
>> >> spot is somewhere around that number-- try doing a small bench mark
>> >> with 2,
>> >> 4, 8, 16, etc threads and see where your performance starts to level
>> >> off.
>> >>
>> >>
>> >> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Probably worth adding that the table mentioned below has a bunch of
>> >>> tablets on other tablet servers as well, which is why I'm using
>> >>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>> >>> number of a concurrent scans on a given tablet server.
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> I have a table with 4 tablets on a given tablet server. Depending on
>> >>>> the
>> >>>> numQueryThreads parameter below, I see a varying number of maximum
>> >>>> concurrent scans on that table. This maximum number varies from 1 to
>> >>>> 3
>> >>>> (i.e., some values for numQueryThreads result in maximum concurrent
>> >>>> scan of
>> >>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed
>> >>>> light
>> >>>> on what is the relationship between numQueryThreads and number of
>> >>>> concurrent
>> >>>> scans?
>> >>>>
>> >>>> public BatchScanner createBatchScanner(String tableName,
>> >>>>                                        Authorizations authorizations,
>> >>>>                                        int numQueryThreads)
>> >>>>
>> >>>> A follow-on question would be what is general rule of thumb for
>> >>>> setting
>> >>>> numQueryThreads? Should it be set to the  # of hosted tablets
>> >>>> expected to be
>> >>>> consumed by that BatchScanner? Should it be the # of tablet servers
>> >>>> expected
>> >>>> to be hit by that BatchScanner? Something else?
>> >>>>
>> >>>> Thanks,
>> >>>> Ameet
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>

Re: number of query threads for batch scanner

Posted by ameet kini <am...@gmail.com>.
Thanks Eric and Keith.

Is there any reason why the number of concurrent scans on a given tablet
server depends on the number of tablets and not the number of cores on that
tablet server? I'm looking at TabletServerBatchReaderIterator.doLookups.

Take Keith's example:

 * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.

Say, I had 8 cores on that tablet server and my tablet is large enough to
warrant 8 concurrent scans. Sure, I can go about and further split my
tablet, and get 8 concurrent scans - I ended up doing that. But is there
any reason why 8 concurrent scans can't go against a single tablet? Maybe
its difficult to estimate benefits of parallelism at that level, and its
best left to users to tune the number of tablets, and base the level of
parallelism on the number of tablets?

Btw, the shell utility "merge -s <size>" rocks :)

Thanks,
Ameet


On Fri, Sep 28, 2012 at 8:04 AM, Keith Turner <ke...@deenlo.com> wrote:

> On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:
> > Thanks William.
> >
> > The issue here is that without knowing how the numQueryThreads
> translates to
> > the number of concurrent scans, I cannot effectively tune that parameter
> to
> > maximize resource usage on the tablet server. What I'm seeing is that
> even
> > though there are four tablets on the tablet server, my number of
> concurrent
> > scans never exceeds 3. This is despite setting numQueryThreads to a very
> > high number and having 8 cores on the tablet server. I suspect with 3
> > concurrent scans and no garbage collection happening at that moment,
> most of
> > the cores are sitting idle.
> >
> > Ameet
>
> The amount if parallelism is determined by how your ranges map to
> tablets. Below are some examples.
>
>  * For one range that maps to 10 tablets on 10 tablets severs, it will
> execute 10 concurrent scans if numQueryThreads is >= 10.
>  * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
> will execute 10 concurrent scans if numQueryThreads is >= 10.
>  * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
> will execute 5 concurrent scans if numQueryThreads is 5.
>  * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.
>
> If you have more query threads than tablet server, the client code
> will try to execute concurrent scans on a single tablet server.
>
> You can look at TabletServerBatchReaderIterator.doLookups() for the
> details.  In this method it creates QueryTask objects and places them
> on a thread pool.  The size of the thread pool is the user specified
> numQueryThreads.
>
> >
> > On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
> > <wi...@accumulo.net> wrote:
> >>
> >> It should really be dependent upon the resources available to the
> client.
> >> You can set an arbitrarily high number of threads, but you're still
> bound by
> >> the number of parallel operations the CPU can make. I would assume the
> sweet
> >> spot is somewhere around that number-- try doing a small bench mark
> with 2,
> >> 4, 8, 16, etc threads and see where your performance starts to level
> off.
> >>
> >>
> >> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com>
> wrote:
> >>>
> >>> Probably worth adding that the table mentioned below has a bunch of
> >>> tablets on other tablet servers as well, which is why I'm using
> >>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
> >>> number of a concurrent scans on a given tablet server.
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com>
> wrote:
> >>>>
> >>>>
> >>>> I have a table with 4 tablets on a given tablet server. Depending on
> the
> >>>> numQueryThreads parameter below, I see a varying number of maximum
> >>>> concurrent scans on that table. This maximum number varies from 1 to 3
> >>>> (i.e., some values for numQueryThreads result in maximum concurrent
> scan of
> >>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed
> light
> >>>> on what is the relationship between numQueryThreads and number of
> concurrent
> >>>> scans?
> >>>>
> >>>> public BatchScanner createBatchScanner(String tableName,
> >>>>                                        Authorizations authorizations,
> >>>>                                        int numQueryThreads)
> >>>>
> >>>> A follow-on question would be what is general rule of thumb for
> setting
> >>>> numQueryThreads? Should it be set to the  # of hosted tablets
> expected to be
> >>>> consumed by that BatchScanner? Should it be the # of tablet servers
> expected
> >>>> to be hit by that BatchScanner? Something else?
> >>>>
> >>>> Thanks,
> >>>> Ameet
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: number of query threads for batch scanner

Posted by Keith Turner <ke...@deenlo.com>.
On Tue, Sep 25, 2012 at 3:17 PM, ameet kini <am...@gmail.com> wrote:
> Thanks William.
>
> The issue here is that without knowing how the numQueryThreads translates to
> the number of concurrent scans, I cannot effectively tune that parameter to
> maximize resource usage on the tablet server. What I'm seeing is that even
> though there are four tablets on the tablet server, my number of concurrent
> scans never exceeds 3. This is despite setting numQueryThreads to a very
> high number and having 8 cores on the tablet server. I suspect with 3
> concurrent scans and no garbage collection happening at that moment, most of
> the cores are sitting idle.
>
> Ameet

The amount if parallelism is determined by how your ranges map to
tablets. Below are some examples.

 * For one range that maps to 10 tablets on 10 tablets severs, it will
execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 10 concurrent scans if numQueryThreads is >= 10.
 * For 1000 ranges that map to 10 tablets on 10 tablet servers, it
will execute 5 concurrent scans if numQueryThreads is 5.
 * For 1000 ranges that map to 1 tablet, it will execute 1 concurrent scan.

If you have more query threads than tablet server, the client code
will try to execute concurrent scans on a single tablet server.

You can look at TabletServerBatchReaderIterator.doLookups() for the
details.  In this method it creates QueryTask objects and places them
on a thread pool.  The size of the thread pool is the user specified
numQueryThreads.

>
> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum
> <wi...@accumulo.net> wrote:
>>
>> It should really be dependent upon the resources available to the client.
>> You can set an arbitrarily high number of threads, but you're still bound by
>> the number of parallel operations the CPU can make. I would assume the sweet
>> spot is somewhere around that number-- try doing a small bench mark with 2,
>> 4, 8, 16, etc threads and see where your performance starts to level off.
>>
>>
>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com> wrote:
>>>
>>> Probably worth adding that the table mentioned below has a bunch of
>>> tablets on other tablet servers as well, which is why I'm using
>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>>> number of a concurrent scans on a given tablet server.
>>>
>>> Thanks
>>>
>>>
>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com> wrote:
>>>>
>>>>
>>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>>> numQueryThreads parameter below, I see a varying number of maximum
>>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>>> on what is the relationship between numQueryThreads and number of concurrent
>>>> scans?
>>>>
>>>> public BatchScanner createBatchScanner(String tableName,
>>>>                                        Authorizations authorizations,
>>>>                                        int numQueryThreads)
>>>>
>>>> A follow-on question would be what is general rule of thumb for setting
>>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to be
>>>> consumed by that BatchScanner? Should it be the # of tablet servers expected
>>>> to be hit by that BatchScanner? Something else?
>>>>
>>>> Thanks,
>>>> Ameet
>>>>
>>>>
>>>
>>
>

Re: number of query threads for batch scanner

Posted by ameet kini <am...@gmail.com>.
Thanks William.

The issue here is that without knowing how the numQueryThreads translates
to the number of concurrent scans, I cannot effectively tune that parameter
to maximize resource usage on the tablet server. What I'm seeing is that
even though there are four tablets on the tablet server, my number of
concurrent scans never exceeds 3. This is despite setting numQueryThreads
to a very high number and having 8 cores on the tablet server. I suspect
with 3 concurrent scans and no garbage collection happening at that moment,
most of the cores are sitting idle.

Ameet

On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> It should really be dependent upon the resources available to the client.
> You can set an arbitrarily high number of threads, but you're still bound
> by the number of parallel operations the CPU can make. I would assume the
> sweet spot is somewhere around that number-- try doing a small bench mark
> with 2, 4, 8, 16, etc threads and see where your performance starts to
> level off.
>
>
> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com> wrote:
>
>> Probably worth adding that the table mentioned below has a bunch of
>> tablets on other tablet servers as well, which is why I'm using
>> BatchScanner. I'm just not sure how the numQueryThreads relates to the
>> number of a concurrent scans on a given tablet server.
>>
>> Thanks
>>
>>
>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com> wrote:
>>
>>>
>>> I have a table with 4 tablets on a given tablet server. Depending on the
>>> numQueryThreads parameter below, I see a varying number of maximum
>>> concurrent scans on that table. This maximum number varies from 1 to 3
>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>>> on what is the relationship between numQueryThreads and number of
>>> concurrent scans?
>>>
>>> public BatchScanner createBatchScanner(String tableName,
>>>                                        Authorizations authorizations,
>>>                                        int numQueryThreads)
>>>
>>> A follow-on question would be what is general rule of thumb for setting
>>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>>> be consumed by that BatchScanner? Should it be the # of tablet servers
>>> expected to be hit by that BatchScanner? Something else?
>>>
>>> Thanks,
>>> Ameet
>>>
>>>
>>>
>>
>

Re: number of query threads for batch scanner

Posted by William Slacum <wi...@accumulo.net>.
It should really be dependent upon the resources available to the client.
You can set an arbitrarily high number of threads, but you're still bound
by the number of parallel operations the CPU can make. I would assume the
sweet spot is somewhere around that number-- try doing a small bench mark
with 2, 4, 8, 16, etc threads and see where your performance starts to
level off.

On Tue, Sep 25, 2012 at 11:45 AM, ameet kini <am...@gmail.com> wrote:

> Probably worth adding that the table mentioned below has a bunch of
> tablets on other tablet servers as well, which is why I'm using
> BatchScanner. I'm just not sure how the numQueryThreads relates to the
> number of a concurrent scans on a given tablet server.
>
> Thanks
>
>
> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com> wrote:
>
>>
>> I have a table with 4 tablets on a given tablet server. Depending on the
>> numQueryThreads parameter below, I see a varying number of maximum
>> concurrent scans on that table. This maximum number varies from 1 to 3
>> (i.e., some values for numQueryThreads result in maximum concurrent scan of
>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
>> on what is the relationship between numQueryThreads and number of
>> concurrent scans?
>>
>> public BatchScanner createBatchScanner(String tableName,
>>                                        Authorizations authorizations,
>>                                        int numQueryThreads)
>>
>> A follow-on question would be what is general rule of thumb for setting
>> numQueryThreads? Should it be set to the  # of hosted tablets expected to
>> be consumed by that BatchScanner? Should it be the # of tablet servers
>> expected to be hit by that BatchScanner? Something else?
>>
>> Thanks,
>> Ameet
>>
>>
>>
>

Re: number of query threads for batch scanner

Posted by ameet kini <am...@gmail.com>.
Probably worth adding that the table mentioned below has a bunch of tablets
on other tablet servers as well, which is why I'm using BatchScanner. I'm
just not sure how the numQueryThreads relates to the number of a concurrent
scans on a given tablet server.

Thanks

On Tue, Sep 25, 2012 at 2:22 PM, ameet kini <am...@gmail.com> wrote:

>
> I have a table with 4 tablets on a given tablet server. Depending on the
> numQueryThreads parameter below, I see a varying number of maximum
> concurrent scans on that table. This maximum number varies from 1 to 3
> (i.e., some values for numQueryThreads result in maximum concurrent scan of
> 1, some values result in 2 concurrent scans, etc.). Can someone shed light
> on what is the relationship between numQueryThreads and number of
> concurrent scans?
>
> public BatchScanner createBatchScanner(String tableName,
>                                        Authorizations authorizations,
>                                        int numQueryThreads)
>
> A follow-on question would be what is general rule of thumb for setting
> numQueryThreads? Should it be set to the  # of hosted tablets expected to
> be consumed by that BatchScanner? Should it be the # of tablet servers
> expected to be hit by that BatchScanner? Something else?
>
> Thanks,
> Ameet
>
>
>