You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Sven Hodapp <sv...@scai.fraunhofer.de> on 2016/08/24 13:22:19 UTC

Accumulo Seek performance

Hi there,

currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup for document storage.
This documents are decomposed up to the sentence level.

Now I'm using a BatchScanner to assemble the full document like this:

    val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table currently hosts ~30GB data, ~200M entries on ~45 tablets 
    bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
      for (entry <- bscan.asScala) yield {
        val key = entry.getKey()
        val value = entry.getValue()
        // etc.
      }

For larger full documents (e.g. 3000 exact ranges), this operation will take about 12 seconds.
But shorter documents are assembled blazing fast...

Is that to much for a BatchScanner / I'm misusing the BatchScaner?
Is that a normal time for such a (seek) operation?
Can I do something to get a better seek performance?

Note: I have already enabled bloom filtering on that table.

Thank you for any advice!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

Re: Accumulo Seek performance

Posted by Michael Moss <mi...@gmail.com>.
Setting the log level to trace helps, but overall, lack of "traditional" db
metrics has been a huge pain point for us as well.

On Wed, Sep 14, 2016 at 10:04 AM, Josh Elser <jo...@gmail.com> wrote:

> Nope! My test harness (the github repo) doesn't show any noticeable
> difference between BatchScanner and Scanner. Would have to do more digging
> with Sven to figure out what's happening.
>
> One takeaway is lack of metrics to tell us what is actually happening is a
> major defect, imo.
>
> On Sep 14, 2016 9:53 AM, "Dylan Hutchison" <dh...@cs.washington.edu>
> wrote:
>
>> Do we have a (hopefully reproducible) conclusion from this thread,
>> regarding Scanners and BatchScanners?
>>
>> On Sep 13, 2016 11:17 PM, "Josh Elser" <jo...@gmail.com> wrote:
>>
>>> Yeah, this seems to have been osx causing me grief.
>>>
>>> Spun up a 3tserver cluster (on openstack, even) and reran the same
>>> experiment. I could not reproduce the issues, even without substantial
>>> config tweaking.
>>>
>>> Josh Elser wrote:
>>>
>>>> I'm playing around with this a little more today and something is
>>>> definitely weird on my local machine. I'm seeing insane spikes in
>>>> performance using Scanners too.
>>>>
>>>> Coupled with Keith's inability to repro this, I am starting to think
>>>> that these are not worthwhile numbers to put weight behind. Something I
>>>> haven't been able to figure out is quite screwy for me.
>>>>
>>>> Josh Elser wrote:
>>>>
>>>>> Sven, et al:
>>>>>
>>>>> So, it would appear that I have been able to reproduce this one (better
>>>>> late than never, I guess...). tl;dr Serially using Scanners to do point
>>>>> lookups instead of a BatchScanner is ~20x faster. This sounds like a
>>>>> pretty serious performance issue to me.
>>>>>
>>>>> Here's a general outline for what I did.
>>>>>
>>>>> * Accumulo 1.8.0
>>>>> * Created a table with 1M rows, each row with 10 columns using YCSB
>>>>> (workloada)
>>>>> * Split the table into 9 tablets
>>>>> * Computed the set of all rows in the table
>>>>>
>>>>> For a number of iterations:
>>>>> * Shuffle this set of rows
>>>>> * Choose the first N rows
>>>>> * Construct an equivalent set of Ranges from the set of Rows, choosing
>>>>> a
>>>>> random column (0-9)
>>>>> * Partition the N rows into X collections
>>>>> * Submit X tasks to query one partition of the N rows (to a thread pool
>>>>> with X fixed threads)
>>>>>
>>>>> I have two implementations of these tasks. One, where all ranges in a
>>>>> partition are executed via one BatchWriter. A second where each range
>>>>> is
>>>>> executed in serial using a Scanner. The numbers speak for themselves.
>>>>>
>>>>> ** BatchScanners **
>>>>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>>>> all
>>>>> rows
>>>>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
>>>>> calculated: 3000 ranges found
>>>>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 40178 ms
>>>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 42296 ms
>>>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 46094 ms
>>>>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 47704 ms
>>>>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 49221 ms
>>>>>
>>>>> ** Scanners **
>>>>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>>>> all
>>>>> rows
>>>>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
>>>>> calculated: 3000 ranges found
>>>>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 2833 ms
>>>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 2536 ms
>>>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 2150 ms
>>>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 2061 ms
>>>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>>> range partitions using a pool of 6 threads
>>>>> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>>>>> executed in 2140 ms
>>>>>
>>>>> Query code is available
>>>>> https://github.com/joshelser/accumulo-range-binning
>>>>>
>>>>> Sven Hodapp wrote:
>>>>>
>>>>>> Hi Keith,
>>>>>>
>>>>>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>>>>>> amazing differences.
>>>>>> Maybe it's a problem with the table structure? For example it may
>>>>>> happen that one row id (e.g. a sentence) has several thousand column
>>>>>> families. Can this affect the seek performance?
>>>>>>
>>>>>> So for my initial example it has about 3000 row ids to seek, which
>>>>>> will return about 500k entries. If I filter for specific column
>>>>>> families (e.g. a document without annotations) it will return about 5k
>>>>>> entries, but the seek time will only be halved.
>>>>>> Are there to much column families to seek it fast?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Regards,
>>>>>> Sven
>>>>>>
>>>>>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Nope! My test harness (the github repo) doesn't show any noticeable
difference between BatchScanner and Scanner. Would have to do more digging
with Sven to figure out what's happening.

One takeaway is lack of metrics to tell us what is actually happening is a
major defect, imo.

On Sep 14, 2016 9:53 AM, "Dylan Hutchison" <dh...@cs.washington.edu>
wrote:

> Do we have a (hopefully reproducible) conclusion from this thread,
> regarding Scanners and BatchScanners?
>
> On Sep 13, 2016 11:17 PM, "Josh Elser" <jo...@gmail.com> wrote:
>
>> Yeah, this seems to have been osx causing me grief.
>>
>> Spun up a 3tserver cluster (on openstack, even) and reran the same
>> experiment. I could not reproduce the issues, even without substantial
>> config tweaking.
>>
>> Josh Elser wrote:
>>
>>> I'm playing around with this a little more today and something is
>>> definitely weird on my local machine. I'm seeing insane spikes in
>>> performance using Scanners too.
>>>
>>> Coupled with Keith's inability to repro this, I am starting to think
>>> that these are not worthwhile numbers to put weight behind. Something I
>>> haven't been able to figure out is quite screwy for me.
>>>
>>> Josh Elser wrote:
>>>
>>>> Sven, et al:
>>>>
>>>> So, it would appear that I have been able to reproduce this one (better
>>>> late than never, I guess...). tl;dr Serially using Scanners to do point
>>>> lookups instead of a BatchScanner is ~20x faster. This sounds like a
>>>> pretty serious performance issue to me.
>>>>
>>>> Here's a general outline for what I did.
>>>>
>>>> * Accumulo 1.8.0
>>>> * Created a table with 1M rows, each row with 10 columns using YCSB
>>>> (workloada)
>>>> * Split the table into 9 tablets
>>>> * Computed the set of all rows in the table
>>>>
>>>> For a number of iterations:
>>>> * Shuffle this set of rows
>>>> * Choose the first N rows
>>>> * Construct an equivalent set of Ranges from the set of Rows, choosing a
>>>> random column (0-9)
>>>> * Partition the N rows into X collections
>>>> * Submit X tasks to query one partition of the N rows (to a thread pool
>>>> with X fixed threads)
>>>>
>>>> I have two implementations of these tasks. One, where all ranges in a
>>>> partition are executed via one BatchWriter. A second where each range is
>>>> executed in serial using a Scanner. The numbers speak for themselves.
>>>>
>>>> ** BatchScanners **
>>>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>>>> rows
>>>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
>>>> calculated: 3000 ranges found
>>>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 40178 ms
>>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 42296 ms
>>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 46094 ms
>>>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 47704 ms
>>>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 49221 ms
>>>>
>>>> ** Scanners **
>>>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>>>> rows
>>>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
>>>> calculated: 3000 ranges found
>>>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 2833 ms
>>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 2536 ms
>>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 2150 ms
>>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 2061 ms
>>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>>> range partitions using a pool of 6 threads
>>>> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>>>> executed in 2140 ms
>>>>
>>>> Query code is available
>>>> https://github.com/joshelser/accumulo-range-binning
>>>>
>>>> Sven Hodapp wrote:
>>>>
>>>>> Hi Keith,
>>>>>
>>>>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>>>>> amazing differences.
>>>>> Maybe it's a problem with the table structure? For example it may
>>>>> happen that one row id (e.g. a sentence) has several thousand column
>>>>> families. Can this affect the seek performance?
>>>>>
>>>>> So for my initial example it has about 3000 row ids to seek, which
>>>>> will return about 500k entries. If I filter for specific column
>>>>> families (e.g. a document without annotations) it will return about 5k
>>>>> entries, but the seek time will only be halved.
>>>>> Are there to much column families to seek it fast?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Regards,
>>>>> Sven
>>>>>
>>>>>

Re: Accumulo Seek performance

Posted by Dylan Hutchison <dh...@cs.washington.edu>.
Do we have a (hopefully reproducible) conclusion from this thread,
regarding Scanners and BatchScanners?

On Sep 13, 2016 11:17 PM, "Josh Elser" <jo...@gmail.com> wrote:

> Yeah, this seems to have been osx causing me grief.
>
> Spun up a 3tserver cluster (on openstack, even) and reran the same
> experiment. I could not reproduce the issues, even without substantial
> config tweaking.
>
> Josh Elser wrote:
>
>> I'm playing around with this a little more today and something is
>> definitely weird on my local machine. I'm seeing insane spikes in
>> performance using Scanners too.
>>
>> Coupled with Keith's inability to repro this, I am starting to think
>> that these are not worthwhile numbers to put weight behind. Something I
>> haven't been able to figure out is quite screwy for me.
>>
>> Josh Elser wrote:
>>
>>> Sven, et al:
>>>
>>> So, it would appear that I have been able to reproduce this one (better
>>> late than never, I guess...). tl;dr Serially using Scanners to do point
>>> lookups instead of a BatchScanner is ~20x faster. This sounds like a
>>> pretty serious performance issue to me.
>>>
>>> Here's a general outline for what I did.
>>>
>>> * Accumulo 1.8.0
>>> * Created a table with 1M rows, each row with 10 columns using YCSB
>>> (workloada)
>>> * Split the table into 9 tablets
>>> * Computed the set of all rows in the table
>>>
>>> For a number of iterations:
>>> * Shuffle this set of rows
>>> * Choose the first N rows
>>> * Construct an equivalent set of Ranges from the set of Rows, choosing a
>>> random column (0-9)
>>> * Partition the N rows into X collections
>>> * Submit X tasks to query one partition of the N rows (to a thread pool
>>> with X fixed threads)
>>>
>>> I have two implementations of these tasks. One, where all ranges in a
>>> partition are executed via one BatchWriter. A second where each range is
>>> executed in serial using a Scanner. The numbers speak for themselves.
>>>
>>> ** BatchScanners **
>>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>>> rows
>>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
>>> calculated: 3000 ranges found
>>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 40178 ms
>>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 42296 ms
>>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 46094 ms
>>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 47704 ms
>>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 49221 ms
>>>
>>> ** Scanners **
>>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>>> rows
>>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
>>> calculated: 3000 ranges found
>>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 2833 ms
>>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 2536 ms
>>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 2150 ms
>>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 2061 ms
>>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
>>> range partitions using a pool of 6 threads
>>> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>>> executed in 2140 ms
>>>
>>> Query code is available
>>> https://github.com/joshelser/accumulo-range-binning
>>>
>>> Sven Hodapp wrote:
>>>
>>>> Hi Keith,
>>>>
>>>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>>>> amazing differences.
>>>> Maybe it's a problem with the table structure? For example it may
>>>> happen that one row id (e.g. a sentence) has several thousand column
>>>> families. Can this affect the seek performance?
>>>>
>>>> So for my initial example it has about 3000 row ids to seek, which
>>>> will return about 500k entries. If I filter for specific column
>>>> families (e.g. a document without annotations) it will return about 5k
>>>> entries, but the seek time will only be halved.
>>>> Are there to much column families to seek it fast?
>>>>
>>>> Thanks!
>>>>
>>>> Regards,
>>>> Sven
>>>>
>>>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Yeah, this seems to have been osx causing me grief.

Spun up a 3tserver cluster (on openstack, even) and reran the same 
experiment. I could not reproduce the issues, even without substantial 
config tweaking.

Josh Elser wrote:
> I'm playing around with this a little more today and something is
> definitely weird on my local machine. I'm seeing insane spikes in
> performance using Scanners too.
>
> Coupled with Keith's inability to repro this, I am starting to think
> that these are not worthwhile numbers to put weight behind. Something I
> haven't been able to figure out is quite screwy for me.
>
> Josh Elser wrote:
>> Sven, et al:
>>
>> So, it would appear that I have been able to reproduce this one (better
>> late than never, I guess...). tl;dr Serially using Scanners to do point
>> lookups instead of a BatchScanner is ~20x faster. This sounds like a
>> pretty serious performance issue to me.
>>
>> Here's a general outline for what I did.
>>
>> * Accumulo 1.8.0
>> * Created a table with 1M rows, each row with 10 columns using YCSB
>> (workloada)
>> * Split the table into 9 tablets
>> * Computed the set of all rows in the table
>>
>> For a number of iterations:
>> * Shuffle this set of rows
>> * Choose the first N rows
>> * Construct an equivalent set of Ranges from the set of Rows, choosing a
>> random column (0-9)
>> * Partition the N rows into X collections
>> * Submit X tasks to query one partition of the N rows (to a thread pool
>> with X fixed threads)
>>
>> I have two implementations of these tasks. One, where all ranges in a
>> partition are executed via one BatchWriter. A second where each range is
>> executed in serial using a Scanner. The numbers speak for themselves.
>>
>> ** BatchScanners **
>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>> rows
>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
>> calculated: 3000 ranges found
>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 40178 ms
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 42296 ms
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 46094 ms
>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 47704 ms
>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 49221 ms
>>
>> ** Scanners **
>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>> rows
>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
>> calculated: 3000 ranges found
>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2833 ms
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2536 ms
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2150 ms
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2061 ms
>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>> executed in 2140 ms
>>
>> Query code is available
>> https://github.com/joshelser/accumulo-range-binning
>>
>> Sven Hodapp wrote:
>>> Hi Keith,
>>>
>>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>>> amazing differences.
>>> Maybe it's a problem with the table structure? For example it may
>>> happen that one row id (e.g. a sentence) has several thousand column
>>> families. Can this affect the seek performance?
>>>
>>> So for my initial example it has about 3000 row ids to seek, which
>>> will return about 500k entries. If I filter for specific column
>>> families (e.g. a document without annotations) it will return about 5k
>>> entries, but the seek time will only be halved.
>>> Are there to much column families to seek it fast?
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Sven
>>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
I'm playing around with this a little more today and something is 
definitely weird on my local machine. I'm seeing insane spikes in 
performance using Scanners too.

Coupled with Keith's inability to repro this, I am starting to think 
that these are not worthwhile numbers to put weight behind. Something I 
haven't been able to figure out is quite screwy for me.

Josh Elser wrote:
> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a
> pretty serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/accumulo-range-binning
>
> Sven Hodapp wrote:
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may
>> happen that one row id (e.g. a sentence) has several thousand column
>> families. Can this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which
>> will return about 500k entries. If I filter for specific column
>> families (e.g. a document without annotations) it will return about 5k
>> entries, but the seek time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>

Re: Accumulo Seek performance

Posted by Keith Turner <ke...@deenlo.com>.
Note  I was running a single tserver, datanode, and zookeeper on my workstation.

On Mon, Sep 12, 2016 at 2:02 PM, Keith Turner <ke...@deenlo.com> wrote:
> Josh helped me get up and running w/ YCSB and I Am seeing very
> different results.   I am going to make a pull req to Josh's GH repo
> to add a Readme w/ what I learned from Josh in IRC.
>
> The link below is the Accumulo config I used for running a local 1.8.0 instance.
>
> https://gist.github.com/keith-turner/4678a0aac2a2a0e240ea5d73285743ab
>
> I created splits user1~ user2~ user3~ user4~ user5~ user6~ user7~
> user8~ user9~ AND then compacted the table.
>
> Below is the performance I saw with a single batch scanner (configured
> 1 partition).  The batch scanner has 10 threads.
>
> 2016-09-12 12:36:41,079 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 12:36:41,428 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 12:36:41,429 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 12:36:48,059 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 1000000 rows
> 2016-09-12 12:36:48,096 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 12:36:48,116 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 12:36:48,118 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1252 ms
> 2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1188 ms
> 2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1179 ms
> 2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1233 ms
> 2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Executing
> 1 range partitions using a pool of 1 threads
> 2016-09-12 12:36:54,146 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1171 ms
>
> Below is the performance I saw with 6 batch scanners. Each batch
> scanner has 10 threads.
>
> 2016-09-12 13:58:21,061 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 13:58:21,380 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 13:58:21,381 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 13:58:28,571 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 1000000 rows
> 2016-09-12 13:58:28,606 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 13:58:28,632 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 13:58:28,634 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1637 ms
> 2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1609 ms
> 2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1539 ms
> 2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1571 ms
> 2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 13:58:36,512 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 1517 ms
>
> Below is the performance I saw with 6 threads each using a scanner.
>
> 2016-09-12 14:01:14,972 [client.ClientConfiguration] WARN : Found no
> client.conf in default paths. Using default client configuration
> values.
> 2016-09-12 14:01:15,287 [joshelser.YcsbBatchScanner] INFO : Connected
> to Accumulo
> 2016-09-12 14:01:15,288 [joshelser.YcsbBatchScanner] INFO : Computing ranges
> 2016-09-12 14:01:22,309 [joshelser.YcsbBatchScanner] INFO : Calculated
> all rows: Found 1000000 rows
> 2016-09-12 14:01:22,352 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
> 2016-09-12 14:01:22,373 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-12 14:01:22,376 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 3318 ms
> 2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 3305 ms
> 2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:31,824 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2822 ms
> 2016-09-12 14:01:31,824 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:34,207 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2383 ms
> 2016-09-12 14:01:34,207 [joshelser.YcsbBatchScanner] INFO : Executing
> 6 range partitions using a pool of 6 threads
> 2016-09-12 14:01:36,548 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2340 ms
>
> On Sat, Sep 10, 2016 at 6:01 PM, Josh Elser <jo...@gmail.com> wrote:
>> Sven, et al:
>>
>> So, it would appear that I have been able to reproduce this one (better late
>> than never, I guess...). tl;dr Serially using Scanners to do point lookups
>> instead of a BatchScanner is ~20x faster. This sounds like a pretty serious
>> performance issue to me.
>>
>> Here's a general outline for what I did.
>>
>> * Accumulo 1.8.0
>> * Created a table with 1M rows, each row with 10 columns using YCSB
>> (workloada)
>> * Split the table into 9 tablets
>> * Computed the set of all rows in the table
>>
>> For a number of iterations:
>> * Shuffle this set of rows
>> * Choose the first N rows
>> * Construct an equivalent set of Ranges from the set of Rows, choosing a
>> random column (0-9)
>> * Partition the N rows into X collections
>> * Submit X tasks to query one partition of the N rows (to a thread pool with
>> X fixed threads)
>>
>> I have two implementations of these tasks. One, where all ranges in a
>> partition are executed via one BatchWriter. A second where each range is
>> executed in serial using a Scanner. The numbers speak for themselves.
>>
>> ** BatchScanners **
>> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>> rows
>> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
>> calculated: 3000 ranges found
>> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 40178 ms
>> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 42296 ms
>> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 46094 ms
>> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 47704 ms
>> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 49221 ms
>>
>> ** Scanners **
>> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
>> rows
>> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
>> calculated: 3000 ranges found
>> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 2833 ms
>> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 2536 ms
>> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 2150 ms
>> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 2061 ms
>> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
>> range partitions using a pool of 6 threads
>> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries executed
>> in 2140 ms
>>
>> Query code is available https://github.com/joshelser/accumulo-range-binning
>>
>>
>> Sven Hodapp wrote:
>>>
>>> Hi Keith,
>>>
>>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>>> amazing differences.
>>> Maybe it's a problem with the table structure? For example it may happen
>>> that one row id (e.g. a sentence) has several thousand column families. Can
>>> this affect the seek performance?
>>>
>>> So for my initial example it has about 3000 row ids to seek, which will
>>> return about 500k entries. If I filter for specific column families (e.g. a
>>> document without annotations) it will return about 5k entries, but the seek
>>> time will only be halved.
>>> Are there to much column families to seek it fast?
>>>
>>> Thanks!
>>>
>>> Regards,
>>> Sven
>>>
>>

Re: Accumulo Seek performance

Posted by Keith Turner <ke...@deenlo.com>.
Josh helped me get up and running w/ YCSB and I Am seeing very
different results.   I am going to make a pull req to Josh's GH repo
to add a Readme w/ what I learned from Josh in IRC.

The link below is the Accumulo config I used for running a local 1.8.0 instance.

https://gist.github.com/keith-turner/4678a0aac2a2a0e240ea5d73285743ab

I created splits user1~ user2~ user3~ user4~ user5~ user6~ user7~
user8~ user9~ AND then compacted the table.

Below is the performance I saw with a single batch scanner (configured
1 partition).  The batch scanner has 10 threads.

2016-09-12 12:36:41,079 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 12:36:41,428 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 12:36:41,429 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 12:36:48,059 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 1000000 rows
2016-09-12 12:36:48,096 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 12:36:48,116 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 12:36:48,118 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1252 ms
2016-09-12 12:36:49,372 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1188 ms
2016-09-12 12:36:50,561 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1179 ms
2016-09-12 12:36:51,741 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1233 ms
2016-09-12 12:36:52,974 [joshelser.YcsbBatchScanner] INFO : Executing
1 range partitions using a pool of 1 threads
2016-09-12 12:36:54,146 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1171 ms

Below is the performance I saw with 6 batch scanners. Each batch
scanner has 10 threads.

2016-09-12 13:58:21,061 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 13:58:21,380 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 13:58:21,381 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 13:58:28,571 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 1000000 rows
2016-09-12 13:58:28,606 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 13:58:28,632 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 13:58:28,634 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1637 ms
2016-09-12 13:58:30,273 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1609 ms
2016-09-12 13:58:31,883 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1539 ms
2016-09-12 13:58:33,422 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1571 ms
2016-09-12 13:58:34,994 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 13:58:36,512 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 1517 ms

Below is the performance I saw with 6 threads each using a scanner.

2016-09-12 14:01:14,972 [client.ClientConfiguration] WARN : Found no
client.conf in default paths. Using default client configuration
values.
2016-09-12 14:01:15,287 [joshelser.YcsbBatchScanner] INFO : Connected
to Accumulo
2016-09-12 14:01:15,288 [joshelser.YcsbBatchScanner] INFO : Computing ranges
2016-09-12 14:01:22,309 [joshelser.YcsbBatchScanner] INFO : Calculated
all rows: Found 1000000 rows
2016-09-12 14:01:22,352 [joshelser.YcsbBatchScanner] INFO : Shuffled all rows
2016-09-12 14:01:22,373 [joshelser.YcsbBatchScanner] INFO : All ranges
calculated: 3000 ranges found
2016-09-12 14:01:22,376 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 3318 ms
2016-09-12 14:01:25,696 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 3305 ms
2016-09-12 14:01:29,001 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:31,824 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2822 ms
2016-09-12 14:01:31,824 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:34,207 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2383 ms
2016-09-12 14:01:34,207 [joshelser.YcsbBatchScanner] INFO : Executing
6 range partitions using a pool of 6 threads
2016-09-12 14:01:36,548 [joshelser.YcsbBatchScanner] INFO : Queries
executed in 2340 ms

On Sat, Sep 10, 2016 at 6:01 PM, Josh Elser <jo...@gmail.com> wrote:
> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better late
> than never, I guess...). tl;dr Serially using Scanners to do point lookups
> instead of a BatchScanner is ~20x faster. This sounds like a pretty serious
> performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool with
> X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries executed
> in 2140 ms
>
> Query code is available https://github.com/joshelser/accumulo-range-binning
>
>
> Sven Hodapp wrote:
>>
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen
>> that one row id (e.g. a sentence) has several thousand column families. Can
>> this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will
>> return about 500k entries. If I filter for specific column families (e.g. a
>> document without annotations) it will return about 5k entries, but the seek
>> time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>
>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
I don't have enough context to say definitively, but I'd assume earlier 
versions too.

Dan Blum wrote:
> Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Saturday, September 10, 2016 6:01 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a
> pretty serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/accumulo-range-binning
>
> Sven Hodapp wrote:
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen that one row id (e.g. a sentence) has several thousand column families. Can this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will return about 500k entries. If I filter for specific column families (e.g. a document without annotations) it will return about 5k entries, but the seek time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>
>

RE: Accumulo Seek performance

Posted by Dan Blum <db...@bbn.com>.
Is this a problem specific to 1.8.0, or is it likely to affect earlier versions?

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Saturday, September 10, 2016 6:01 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Sven, et al:

So, it would appear that I have been able to reproduce this one (better 
late than never, I guess...). tl;dr Serially using Scanners to do point 
lookups instead of a BatchScanner is ~20x faster. This sounds like a 
pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB 
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows, choosing a 
random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread pool 
with X fixed threads)

I have two implementations of these tasks. One, where all ranges in a 
partition are executed via one BatchWriter. A second where each range is 
executed in serial using a Scanner. The numbers speak for themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2140 ms

Query code is available https://github.com/joshelser/accumulo-range-binning

Sven Hodapp wrote:
> Hi Keith,
>
> I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing differences.
> Maybe it's a problem with the table structure? For example it may happen that one row id (e.g. a sentence) has several thousand column families. Can this affect the seek performance?
>
> So for my initial example it has about 3000 row ids to seek, which will return about 500k entries. If I filter for specific column families (e.g. a document without annotations) it will return about 5k entries, but the seek time will only be halved.
> Are there to much column families to seek it fast?
>
> Thanks!
>
> Regards,
> Sven
>


Re: Accumulo Seek performance

Posted by Keith Turner <ke...@deenlo.com>.
On Mon, Sep 12, 2016 at 5:50 PM, Adam J. Shook <ad...@gmail.com> wrote:
> As an aside, this is actually pretty relevant to the work I've been doing
> for Presto/Accumulo integration.  It isn't uncommon to have around a million
> exact Ranges (that is, Ranges with a single row ID)  spread across the five
> Presto worker nodes we use for scanning Accumulo.  Right now, these ranges
> get packed into PrestoSplits, 10k ranges per split (an arbitrary number I
> chose), and each split is run in parallel (depending on the overall number
> of splits, they may be queued for execution).
>
> I'm curious to see the query impact of changing it to use a fixed thread
> pool of Scanners over the current BatchScanner implementation.  Maybe I'll
> play around with it sometime soon.

I added a readme to Josh's GH repo w/ the info I learned from Josh on
IRC.   So this should make it quicker for others to experiment.

>
> --Adam
>
> On Mon, Sep 12, 2016 at 2:47 PM, Dan Blum <db...@bbn.com> wrote:
>>
>> I think the 450 ranges returned a total of about 7.5M entries, but the
>> ranges were in fact quite small relative to the size of the table.
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:josh.elser@gmail.com]
>> Sent: Monday, September 12, 2016 2:43 PM
>> To: user@accumulo.apache.org
>> Subject: Re: Accumulo Seek performance
>>
>> What does a "large scan" mean here, Dan?
>>
>> Sven's original problem statement was running many small/pointed Ranges
>> (e.g. point lookups). My observation was that BatchScanners were slower
>> than running each in a Scanner when using multiple BS's concurrently.
>>
>> Dan Blum wrote:
>> > I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
>> > Scanners was much slower than using a BatchScanner with 11 threads, by about
>> > a 5:1 ratio. There were 450 ranges.
>> >
>> > -----Original Message-----
>> > From: Josh Elser [mailto:josh.elser@gmail.com]
>> > Sent: Monday, September 12, 2016 1:42 PM
>> > To: user@accumulo.apache.org
>> > Subject: Re: Accumulo Seek performance
>> >
>> > I had increased the readahead threed pool to 32 (from 16). I had also
>> > increased the minimum thread pool size from 20 to 40. I had 10 tablets
>> > with the data block cache turned on (probably only 256M tho).
>> >
>> > Each tablet had a single file (manually compacted). Did not observe
>> > cache rates.
>> >
>> > I've been working through this with Keith on IRC this morning too. Found
>> > that a single batchscanner (one partition) is faster than the Scanner.
>> > Two partitions and things started to slow down.
>> >
>> > Two interesting points to still pursue, IMO:
>> >
>> > 1. I saw that the tserver-side logging for MultiScanSess was near
>> > identical to the BatchScanner timings
>> > 2. The minimum server threads did not seem to be taking effect. Despite
>> > having the value set to 64, I only saw a few ClientPool threads in a
>> > jstack after running the test.
>> >
>> > Adam Fuchs wrote:
>> >> Sorry, Monday morning poor reading skills, I guess. :)
>> >>
>> >> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> >> experience HDFS seeks tend to take something like 10-100ms, and I would
>> >> expect that time to dominate here. With 60 client threads your
>> >> bottleneck should be the readahead pool, which I believe defaults to 16
>> >> threads. If you get perfect index caching then you should be seeing
>> >> something like 3000/16*50ms = 9,375ms. That's in the right ballpark,
>> >> but
>> >> it assumes no data cache hits. Do you have any idea of how many files
>> >> you had per tablet after the ingest? Do you know what your cache hit
>> >> rate was?
>> >>
>> >> Adam
>> >>
>> >>
>> >> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.elser@gmail.com
>> >> <ma...@gmail.com>>  wrote:
>> >>
>> >>      5 iterations, figured that would be apparent from the log messages
>> >> :)
>> >>
>> >>      The code is already posted in my original message.
>> >>
>> >>      Adam Fuchs wrote:
>> >>
>> >>          Josh,
>> >>
>> >>          Two questions:
>> >>
>> >>          1. How many iterations did you do? I would like to see an
>> >> absolute
>> >>          number of lookups per second to compare against other
>> >> observations.
>> >>
>> >>          2. Can you post your code somewhere so I can run it?
>> >>
>> >>          Thanks,
>> >>          Adam
>> >>
>> >>
>> >>          On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>> >>          <jo...@gmail.com>
>> >>          <ma...@gmail.com>>>
>> >> wrote:
>> >>
>> >>               Sven, et al:
>> >>
>> >>               So, it would appear that I have been able to reproduce
>> >> this one
>> >>               (better late than never, I guess...). tl;dr Serially
>> >> using
>> >>          Scanners
>> >>               to do point lookups instead of a BatchScanner is ~20x
>> >>          faster. This
>> >>               sounds like a pretty serious performance issue to me.
>> >>
>> >>               Here's a general outline for what I did.
>> >>
>> >>               * Accumulo 1.8.0
>> >>               * Created a table with 1M rows, each row with 10 columns
>> >>          using YCSB
>> >>               (workloada)
>> >>               * Split the table into 9 tablets
>> >>               * Computed the set of all rows in the table
>> >>
>> >>               For a number of iterations:
>> >>               * Shuffle this set of rows
>> >>               * Choose the first N rows
>> >>               * Construct an equivalent set of Ranges from the set of
>> >> Rows,
>> >>               choosing a random column (0-9)
>> >>               * Partition the N rows into X collections
>> >>               * Submit X tasks to query one partition of the N rows (to
>> >> a
>> >>          thread
>> >>               pool with X fixed threads)
>> >>
>> >>               I have two implementations of these tasks. One, where all
>> >>          ranges in
>> >>               a partition are executed via one BatchWriter. A second
>> >>          where each
>> >>               range is executed in serial using a Scanner. The numbers
>> >>          speak for
>> >>               themselves.
>> >>
>> >>               ** BatchScanners **
>> >>               2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Shuffled
>> >>               all rows
>> >>               2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO
>> >> : All
>> >>               ranges calculated: 3000 ranges found
>> >>               2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 40178 ms
>> >>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 42296 ms
>> >>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 46094 ms
>> >>               2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 47704 ms
>> >>               2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 49221 ms
>> >>
>> >>               ** Scanners **
>> >>               2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Shuffled
>> >>               all rows
>> >>               2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO
>> >> : All
>> >>               ranges calculated: 3000 ranges found
>> >>               2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 2833 ms
>> >>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 2536 ms
>> >>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 2150 ms
>> >>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 2061 ms
>> >>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>               Executing 6 range partitions using a pool of 6 threads
>> >>               2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO
>> >> :
>> >>          Queries
>> >>               executed in 2140 ms
>> >>
>> >>               Query code is available
>> >>          https://github.com/joshelser/accumulo-range-binning
>> >>          <https://github.com/joshelser/accumulo-range-binning>
>> >>          <https://github.com/joshelser/accumulo-range-binning
>> >>          <https://github.com/joshelser/accumulo-range-binning>>
>> >>
>> >>
>> >>               Sven Hodapp wrote:
>> >>
>> >>                   Hi Keith,
>> >>
>> >>                   I've tried it with 1, 2 or 10 threads. Unfortunately
>> >>          there where
>> >>                   no amazing differences.
>> >>                   Maybe it's a problem with the table structure? For
>> >>          example it
>> >>                   may happen that one row id (e.g. a sentence) has
>> >> several
>> >>                   thousand column families. Can this affect the seek
>> >>          performance?
>> >>
>> >>                   So for my initial example it has about 3000 row ids
>> >> to
>> >>          seek,
>> >>                   which will return about 500k entries. If I filter for
>> >>          specific
>> >>                   column families (e.g. a document without annotations)
>> >>          it will
>> >>                   return about 5k entries, but the seek time will only
>> >> be
>> >>          halved.
>> >>                   Are there to much column families to seek it fast?
>> >>
>> >>                   Thanks!
>> >>
>> >>                   Regards,
>> >>                   Sven
>> >>
>> >>
>> >>
>> >
>>
>

Re: Accumulo Seek performance

Posted by "Adam J. Shook" <ad...@gmail.com>.
As an aside, this is actually pretty relevant to the work I've been doing
for Presto/Accumulo integration.  It isn't uncommon to have around a
million exact Ranges (that is, Ranges with a single row ID)  spread across
the five Presto worker nodes we use for scanning Accumulo.  Right now,
these ranges get packed into PrestoSplits, 10k ranges per split (an
arbitrary number I chose), and each split is run in parallel (depending on
the overall number of splits, they may be queued for execution).

I'm curious to see the query impact of changing it to use a fixed thread
pool of Scanners over the current BatchScanner implementation.  Maybe I'll
play around with it sometime soon.

--Adam

On Mon, Sep 12, 2016 at 2:47 PM, Dan Blum <db...@bbn.com> wrote:

> I think the 450 ranges returned a total of about 7.5M entries, but the
> ranges were in fact quite small relative to the size of the table.
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Monday, September 12, 2016 2:43 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> What does a "large scan" mean here, Dan?
>
> Sven's original problem statement was running many small/pointed Ranges
> (e.g. point lookups). My observation was that BatchScanners were slower
> than running each in a Scanner when using multiple BS's concurrently.
>
> Dan Blum wrote:
> > I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using
> Scanners was much slower than using a BatchScanner with 11 threads, by
> about a 5:1 ratio. There were 450 ranges.
> >
> > -----Original Message-----
> > From: Josh Elser [mailto:josh.elser@gmail.com]
> > Sent: Monday, September 12, 2016 1:42 PM
> > To: user@accumulo.apache.org
> > Subject: Re: Accumulo Seek performance
> >
> > I had increased the readahead threed pool to 32 (from 16). I had also
> > increased the minimum thread pool size from 20 to 40. I had 10 tablets
> > with the data block cache turned on (probably only 256M tho).
> >
> > Each tablet had a single file (manually compacted). Did not observe
> > cache rates.
> >
> > I've been working through this with Keith on IRC this morning too. Found
> > that a single batchscanner (one partition) is faster than the Scanner.
> > Two partitions and things started to slow down.
> >
> > Two interesting points to still pursue, IMO:
> >
> > 1. I saw that the tserver-side logging for MultiScanSess was near
> > identical to the BatchScanner timings
> > 2. The minimum server threads did not seem to be taking effect. Despite
> > having the value set to 64, I only saw a few ClientPool threads in a
> > jstack after running the test.
> >
> > Adam Fuchs wrote:
> >> Sorry, Monday morning poor reading skills, I guess. :)
> >>
> >> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> >> experience HDFS seeks tend to take something like 10-100ms, and I would
> >> expect that time to dominate here. With 60 client threads your
> >> bottleneck should be the readahead pool, which I believe defaults to 16
> >> threads. If you get perfect index caching then you should be seeing
> >> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> >> it assumes no data cache hits. Do you have any idea of how many files
> >> you had per tablet after the ingest? Do you know what your cache hit
> >> rate was?
> >>
> >> Adam
> >>
> >>
> >> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.elser@gmail.com
> >> <ma...@gmail.com>>  wrote:
> >>
> >>      5 iterations, figured that would be apparent from the log messages
> :)
> >>
> >>      The code is already posted in my original message.
> >>
> >>      Adam Fuchs wrote:
> >>
> >>          Josh,
> >>
> >>          Two questions:
> >>
> >>          1. How many iterations did you do? I would like to see an
> absolute
> >>          number of lookups per second to compare against other
> observations.
> >>
> >>          2. Can you post your code somewhere so I can run it?
> >>
> >>          Thanks,
> >>          Adam
> >>
> >>
> >>          On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
> >>          <jo...@gmail.com>
> >>          <ma...@gmail.com>>>
> wrote:
> >>
> >>               Sven, et al:
> >>
> >>               So, it would appear that I have been able to reproduce
> this one
> >>               (better late than never, I guess...). tl;dr Serially using
> >>          Scanners
> >>               to do point lookups instead of a BatchScanner is ~20x
> >>          faster. This
> >>               sounds like a pretty serious performance issue to me.
> >>
> >>               Here's a general outline for what I did.
> >>
> >>               * Accumulo 1.8.0
> >>               * Created a table with 1M rows, each row with 10 columns
> >>          using YCSB
> >>               (workloada)
> >>               * Split the table into 9 tablets
> >>               * Computed the set of all rows in the table
> >>
> >>               For a number of iterations:
> >>               * Shuffle this set of rows
> >>               * Choose the first N rows
> >>               * Construct an equivalent set of Ranges from the set of
> Rows,
> >>               choosing a random column (0-9)
> >>               * Partition the N rows into X collections
> >>               * Submit X tasks to query one partition of the N rows (to
> a
> >>          thread
> >>               pool with X fixed threads)
> >>
> >>               I have two implementations of these tasks. One, where all
> >>          ranges in
> >>               a partition are executed via one BatchWriter. A second
> >>          where each
> >>               range is executed in serial using a Scanner. The numbers
> >>          speak for
> >>               themselves.
> >>
> >>               ** BatchScanners **
> >>               2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Shuffled
> >>               all rows
> >>               2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO
> : All
> >>               ranges calculated: 3000 ranges found
> >>               2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 40178 ms
> >>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 42296 ms
> >>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 46094 ms
> >>               2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 47704 ms
> >>               2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 49221 ms
> >>
> >>               ** Scanners **
> >>               2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Shuffled
> >>               all rows
> >>               2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO
> : All
> >>               ranges calculated: 3000 ranges found
> >>               2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 2833 ms
> >>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 2536 ms
> >>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 2150 ms
> >>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 2061 ms
> >>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO
> :
> >>               Executing 6 range partitions using a pool of 6 threads
> >>               2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO
> :
> >>          Queries
> >>               executed in 2140 ms
> >>
> >>               Query code is available
> >>          https://github.com/joshelser/accumulo-range-binning
> >>          <https://github.com/joshelser/accumulo-range-binning>
> >>          <https://github.com/joshelser/accumulo-range-binning
> >>          <https://github.com/joshelser/accumulo-range-binning>>
> >>
> >>
> >>               Sven Hodapp wrote:
> >>
> >>                   Hi Keith,
> >>
> >>                   I've tried it with 1, 2 or 10 threads. Unfortunately
> >>          there where
> >>                   no amazing differences.
> >>                   Maybe it's a problem with the table structure? For
> >>          example it
> >>                   may happen that one row id (e.g. a sentence) has
> several
> >>                   thousand column families. Can this affect the seek
> >>          performance?
> >>
> >>                   So for my initial example it has about 3000 row ids to
> >>          seek,
> >>                   which will return about 500k entries. If I filter for
> >>          specific
> >>                   column families (e.g. a document without annotations)
> >>          it will
> >>                   return about 5k entries, but the seek time will only
> be
> >>          halved.
> >>                   Are there to much column families to seek it fast?
> >>
> >>                   Thanks!
> >>
> >>                   Regards,
> >>                   Sven
> >>
> >>
> >>
> >
>
>

RE: Accumulo Seek performance

Posted by Dan Blum <db...@bbn.com>.
I think the 450 ranges returned a total of about 7.5M entries, but the ranges were in fact quite small relative to the size of the table.

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Monday, September 12, 2016 2:43 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

What does a "large scan" mean here, Dan?

Sven's original problem statement was running many small/pointed Ranges 
(e.g. point lookups). My observation was that BatchScanners were slower 
than running each in a Scanner when using multiple BS's concurrently.

Dan Blum wrote:
> I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using Scanners was much slower than using a BatchScanner with 11 threads, by about a 5:1 ratio. There were 450 ranges.
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Monday, September 12, 2016 1:42 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> I had increased the readahead threed pool to 32 (from 16). I had also
> increased the minimum thread pool size from 20 to 40. I had 10 tablets
> with the data block cache turned on (probably only 256M tho).
>
> Each tablet had a single file (manually compacted). Did not observe
> cache rates.
>
> I've been working through this with Keith on IRC this morning too. Found
> that a single batchscanner (one partition) is faster than the Scanner.
> Two partitions and things started to slow down.
>
> Two interesting points to still pursue, IMO:
>
> 1. I saw that the tserver-side logging for MultiScanSess was near
> identical to the BatchScanner timings
> 2. The minimum server threads did not seem to be taking effect. Despite
> having the value set to 64, I only saw a few ClientPool threads in a
> jstack after running the test.
>
> Adam Fuchs wrote:
>> Sorry, Monday morning poor reading skills, I guess. :)
>>
>> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> experience HDFS seeks tend to take something like 10-100ms, and I would
>> expect that time to dominate here. With 60 client threads your
>> bottleneck should be the readahead pool, which I believe defaults to 16
>> threads. If you get perfect index caching then you should be seeing
>> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
>> it assumes no data cache hits. Do you have any idea of how many files
>> you had per tablet after the ingest? Do you know what your cache hit
>> rate was?
>>
>> Adam
>>
>>
>> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.elser@gmail.com
>> <ma...@gmail.com>>  wrote:
>>
>>      5 iterations, figured that would be apparent from the log messages :)
>>
>>      The code is already posted in my original message.
>>
>>      Adam Fuchs wrote:
>>
>>          Josh,
>>
>>          Two questions:
>>
>>          1. How many iterations did you do? I would like to see an absolute
>>          number of lookups per second to compare against other observations.
>>
>>          2. Can you post your code somewhere so I can run it?
>>
>>          Thanks,
>>          Adam
>>
>>
>>          On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>>          <jo...@gmail.com>
>>          <ma...@gmail.com>>>  wrote:
>>
>>               Sven, et al:
>>
>>               So, it would appear that I have been able to reproduce this one
>>               (better late than never, I guess...). tl;dr Serially using
>>          Scanners
>>               to do point lookups instead of a BatchScanner is ~20x
>>          faster. This
>>               sounds like a pretty serious performance issue to me.
>>
>>               Here's a general outline for what I did.
>>
>>               * Accumulo 1.8.0
>>               * Created a table with 1M rows, each row with 10 columns
>>          using YCSB
>>               (workloada)
>>               * Split the table into 9 tablets
>>               * Computed the set of all rows in the table
>>
>>               For a number of iterations:
>>               * Shuffle this set of rows
>>               * Choose the first N rows
>>               * Construct an equivalent set of Ranges from the set of Rows,
>>               choosing a random column (0-9)
>>               * Partition the N rows into X collections
>>               * Submit X tasks to query one partition of the N rows (to a
>>          thread
>>               pool with X fixed threads)
>>
>>               I have two implementations of these tasks. One, where all
>>          ranges in
>>               a partition are executed via one BatchWriter. A second
>>          where each
>>               range is executed in serial using a Scanner. The numbers
>>          speak for
>>               themselves.
>>
>>               ** BatchScanners **
>>               2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 40178 ms
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 42296 ms
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 46094 ms
>>               2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 47704 ms
>>               2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 49221 ms
>>
>>               ** Scanners **
>>               2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2833 ms
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2536 ms
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2150 ms
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2061 ms
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2140 ms
>>
>>               Query code is available
>>          https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>
>>          <https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>>
>>
>>
>>               Sven Hodapp wrote:
>>
>>                   Hi Keith,
>>
>>                   I've tried it with 1, 2 or 10 threads. Unfortunately
>>          there where
>>                   no amazing differences.
>>                   Maybe it's a problem with the table structure? For
>>          example it
>>                   may happen that one row id (e.g. a sentence) has several
>>                   thousand column families. Can this affect the seek
>>          performance?
>>
>>                   So for my initial example it has about 3000 row ids to
>>          seek,
>>                   which will return about 500k entries. If I filter for
>>          specific
>>                   column families (e.g. a document without annotations)
>>          it will
>>                   return about 5k entries, but the seek time will only be
>>          halved.
>>                   Are there to much column families to seek it fast?
>>
>>                   Thanks!
>>
>>                   Regards,
>>                   Sven
>>
>>
>>
>


Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
What does a "large scan" mean here, Dan?

Sven's original problem statement was running many small/pointed Ranges 
(e.g. point lookups). My observation was that BatchScanners were slower 
than running each in a Scanner when using multiple BS's concurrently.

Dan Blum wrote:
> I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using Scanners was much slower than using a BatchScanner with 11 threads, by about a 5:1 ratio. There were 450 ranges.
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: Monday, September 12, 2016 1:42 PM
> To: user@accumulo.apache.org
> Subject: Re: Accumulo Seek performance
>
> I had increased the readahead threed pool to 32 (from 16). I had also
> increased the minimum thread pool size from 20 to 40. I had 10 tablets
> with the data block cache turned on (probably only 256M tho).
>
> Each tablet had a single file (manually compacted). Did not observe
> cache rates.
>
> I've been working through this with Keith on IRC this morning too. Found
> that a single batchscanner (one partition) is faster than the Scanner.
> Two partitions and things started to slow down.
>
> Two interesting points to still pursue, IMO:
>
> 1. I saw that the tserver-side logging for MultiScanSess was near
> identical to the BatchScanner timings
> 2. The minimum server threads did not seem to be taking effect. Despite
> having the value set to 64, I only saw a few ClientPool threads in a
> jstack after running the test.
>
> Adam Fuchs wrote:
>> Sorry, Monday morning poor reading skills, I guess. :)
>>
>> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
>> experience HDFS seeks tend to take something like 10-100ms, and I would
>> expect that time to dominate here. With 60 client threads your
>> bottleneck should be the readahead pool, which I believe defaults to 16
>> threads. If you get perfect index caching then you should be seeing
>> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
>> it assumes no data cache hits. Do you have any idea of how many files
>> you had per tablet after the ingest? Do you know what your cache hit
>> rate was?
>>
>> Adam
>>
>>
>> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser<josh.elser@gmail.com
>> <ma...@gmail.com>>  wrote:
>>
>>      5 iterations, figured that would be apparent from the log messages :)
>>
>>      The code is already posted in my original message.
>>
>>      Adam Fuchs wrote:
>>
>>          Josh,
>>
>>          Two questions:
>>
>>          1. How many iterations did you do? I would like to see an absolute
>>          number of lookups per second to compare against other observations.
>>
>>          2. Can you post your code somewhere so I can run it?
>>
>>          Thanks,
>>          Adam
>>
>>
>>          On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>>          <jo...@gmail.com>
>>          <ma...@gmail.com>>>  wrote:
>>
>>               Sven, et al:
>>
>>               So, it would appear that I have been able to reproduce this one
>>               (better late than never, I guess...). tl;dr Serially using
>>          Scanners
>>               to do point lookups instead of a BatchScanner is ~20x
>>          faster. This
>>               sounds like a pretty serious performance issue to me.
>>
>>               Here's a general outline for what I did.
>>
>>               * Accumulo 1.8.0
>>               * Created a table with 1M rows, each row with 10 columns
>>          using YCSB
>>               (workloada)
>>               * Split the table into 9 tablets
>>               * Computed the set of all rows in the table
>>
>>               For a number of iterations:
>>               * Shuffle this set of rows
>>               * Choose the first N rows
>>               * Construct an equivalent set of Ranges from the set of Rows,
>>               choosing a random column (0-9)
>>               * Partition the N rows into X collections
>>               * Submit X tasks to query one partition of the N rows (to a
>>          thread
>>               pool with X fixed threads)
>>
>>               I have two implementations of these tasks. One, where all
>>          ranges in
>>               a partition are executed via one BatchWriter. A second
>>          where each
>>               range is executed in serial using a Scanner. The numbers
>>          speak for
>>               themselves.
>>
>>               ** BatchScanners **
>>               2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 40178 ms
>>               2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 42296 ms
>>               2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 46094 ms
>>               2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 47704 ms
>>               2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 49221 ms
>>
>>               ** Scanners **
>>               2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
>>          Shuffled
>>               all rows
>>               2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>>               ranges calculated: 3000 ranges found
>>               2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2833 ms
>>               2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2536 ms
>>               2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2150 ms
>>               2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2061 ms
>>               2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>               Executing 6 range partitions using a pool of 6 threads
>>               2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
>>          Queries
>>               executed in 2140 ms
>>
>>               Query code is available
>>          https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>
>>          <https://github.com/joshelser/accumulo-range-binning
>>          <https://github.com/joshelser/accumulo-range-binning>>
>>
>>
>>               Sven Hodapp wrote:
>>
>>                   Hi Keith,
>>
>>                   I've tried it with 1, 2 or 10 threads. Unfortunately
>>          there where
>>                   no amazing differences.
>>                   Maybe it's a problem with the table structure? For
>>          example it
>>                   may happen that one row id (e.g. a sentence) has several
>>                   thousand column families. Can this affect the seek
>>          performance?
>>
>>                   So for my initial example it has about 3000 row ids to
>>          seek,
>>                   which will return about 500k entries. If I filter for
>>          specific
>>                   column families (e.g. a document without annotations)
>>          it will
>>                   return about 5k entries, but the seek time will only be
>>          halved.
>>                   Are there to much column families to seek it fast?
>>
>>                   Thanks!
>>
>>                   Regards,
>>                   Sven
>>
>>
>>
>

RE: Accumulo Seek performance

Posted by Dan Blum <db...@bbn.com>.
I tested a large scan on a 1.6.2 cluster with 11 tablet servers - using Scanners was much slower than using a BatchScanner with 11 threads, by about a 5:1 ratio. There were 450 ranges.

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Monday, September 12, 2016 1:42 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

I had increased the readahead threed pool to 32 (from 16). I had also 
increased the minimum thread pool size from 20 to 40. I had 10 tablets 
with the data block cache turned on (probably only 256M tho).

Each tablet had a single file (manually compacted). Did not observe 
cache rates.

I've been working through this with Keith on IRC this morning too. Found 
that a single batchscanner (one partition) is faster than the Scanner. 
Two partitions and things started to slow down.

Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near 
identical to the BatchScanner timings
2. The minimum server threads did not seem to be taking effect. Despite 
having the value set to 64, I only saw a few ClientPool threads in a 
jstack after running the test.

Adam Fuchs wrote:
> Sorry, Monday morning poor reading skills, I guess. :)
>
> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> experience HDFS seeks tend to take something like 10-100ms, and I would
> expect that time to dominate here. With 60 client threads your
> bottleneck should be the readahead pool, which I believe defaults to 16
> threads. If you get perfect index caching then you should be seeing
> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> it assumes no data cache hits. Do you have any idea of how many files
> you had per tablet after the ingest? Do you know what your cache hit
> rate was?
>
> Adam
>
>
> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     5 iterations, figured that would be apparent from the log messages :)
>
>     The code is already posted in my original message.
>
>     Adam Fuchs wrote:
>
>         Josh,
>
>         Two questions:
>
>         1. How many iterations did you do? I would like to see an absolute
>         number of lookups per second to compare against other observations.
>
>         2. Can you post your code somewhere so I can run it?
>
>         Thanks,
>         Adam
>
>
>         On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>         <josh.elser@gmail.com <ma...@gmail.com>
>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>> wrote:
>
>              Sven, et al:
>
>              So, it would appear that I have been able to reproduce this one
>              (better late than never, I guess...). tl;dr Serially using
>         Scanners
>              to do point lookups instead of a BatchScanner is ~20x
>         faster. This
>              sounds like a pretty serious performance issue to me.
>
>              Here's a general outline for what I did.
>
>              * Accumulo 1.8.0
>              * Created a table with 1M rows, each row with 10 columns
>         using YCSB
>              (workloada)
>              * Split the table into 9 tablets
>              * Computed the set of all rows in the table
>
>              For a number of iterations:
>              * Shuffle this set of rows
>              * Choose the first N rows
>              * Construct an equivalent set of Ranges from the set of Rows,
>              choosing a random column (0-9)
>              * Partition the N rows into X collections
>              * Submit X tasks to query one partition of the N rows (to a
>         thread
>              pool with X fixed threads)
>
>              I have two implementations of these tasks. One, where all
>         ranges in
>              a partition are executed via one BatchWriter. A second
>         where each
>              range is executed in serial using a Scanner. The numbers
>         speak for
>              themselves.
>
>              ** BatchScanners **
>              2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
>         Shuffled
>              all rows
>              2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>              ranges calculated: 3000 ranges found
>              2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 40178 ms
>              2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 42296 ms
>              2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 46094 ms
>              2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 47704 ms
>              2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 49221 ms
>
>              ** Scanners **
>              2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
>         Shuffled
>              all rows
>              2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>              ranges calculated: 3000 ranges found
>              2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2833 ms
>              2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2536 ms
>              2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2150 ms
>              2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2061 ms
>              2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2140 ms
>
>              Query code is available
>         https://github.com/joshelser/accumulo-range-binning
>         <https://github.com/joshelser/accumulo-range-binning>
>         <https://github.com/joshelser/accumulo-range-binning
>         <https://github.com/joshelser/accumulo-range-binning>>
>
>
>              Sven Hodapp wrote:
>
>                  Hi Keith,
>
>                  I've tried it with 1, 2 or 10 threads. Unfortunately
>         there where
>                  no amazing differences.
>                  Maybe it's a problem with the table structure? For
>         example it
>                  may happen that one row id (e.g. a sentence) has several
>                  thousand column families. Can this affect the seek
>         performance?
>
>                  So for my initial example it has about 3000 row ids to
>         seek,
>                  which will return about 500k entries. If I filter for
>         specific
>                  column families (e.g. a document without annotations)
>         it will
>                  return about 5k entries, but the seek time will only be
>         halved.
>                  Are there to much column families to seek it fast?
>
>                  Thanks!
>
>                  Regards,
>                  Sven
>
>
>


Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
I had increased the readahead threed pool to 32 (from 16). I had also 
increased the minimum thread pool size from 20 to 40. I had 10 tablets 
with the data block cache turned on (probably only 256M tho).

Each tablet had a single file (manually compacted). Did not observe 
cache rates.

I've been working through this with Keith on IRC this morning too. Found 
that a single batchscanner (one partition) is faster than the Scanner. 
Two partitions and things started to slow down.

Two interesting points to still pursue, IMO:

1. I saw that the tserver-side logging for MultiScanSess was near 
identical to the BatchScanner timings
2. The minimum server threads did not seem to be taking effect. Despite 
having the value set to 64, I only saw a few ClientPool threads in a 
jstack after running the test.

Adam Fuchs wrote:
> Sorry, Monday morning poor reading skills, I guess. :)
>
> So, 3000 ranges in 40 seconds with the BatchScanner. In my past
> experience HDFS seeks tend to take something like 10-100ms, and I would
> expect that time to dominate here. With 60 client threads your
> bottleneck should be the readahead pool, which I believe defaults to 16
> threads. If you get perfect index caching then you should be seeing
> something like 3000/16*50ms = 9,375ms. That's in the right ballpark, but
> it assumes no data cache hits. Do you have any idea of how many files
> you had per tablet after the ingest? Do you know what your cache hit
> rate was?
>
> Adam
>
>
> On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     5 iterations, figured that would be apparent from the log messages :)
>
>     The code is already posted in my original message.
>
>     Adam Fuchs wrote:
>
>         Josh,
>
>         Two questions:
>
>         1. How many iterations did you do? I would like to see an absolute
>         number of lookups per second to compare against other observations.
>
>         2. Can you post your code somewhere so I can run it?
>
>         Thanks,
>         Adam
>
>
>         On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser
>         <josh.elser@gmail.com <ma...@gmail.com>
>         <mailto:josh.elser@gmail.com <ma...@gmail.com>>> wrote:
>
>              Sven, et al:
>
>              So, it would appear that I have been able to reproduce this one
>              (better late than never, I guess...). tl;dr Serially using
>         Scanners
>              to do point lookups instead of a BatchScanner is ~20x
>         faster. This
>              sounds like a pretty serious performance issue to me.
>
>              Here's a general outline for what I did.
>
>              * Accumulo 1.8.0
>              * Created a table with 1M rows, each row with 10 columns
>         using YCSB
>              (workloada)
>              * Split the table into 9 tablets
>              * Computed the set of all rows in the table
>
>              For a number of iterations:
>              * Shuffle this set of rows
>              * Choose the first N rows
>              * Construct an equivalent set of Ranges from the set of Rows,
>              choosing a random column (0-9)
>              * Partition the N rows into X collections
>              * Submit X tasks to query one partition of the N rows (to a
>         thread
>              pool with X fixed threads)
>
>              I have two implementations of these tasks. One, where all
>         ranges in
>              a partition are executed via one BatchWriter. A second
>         where each
>              range is executed in serial using a Scanner. The numbers
>         speak for
>              themselves.
>
>              ** BatchScanners **
>              2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO :
>         Shuffled
>              all rows
>              2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>              ranges calculated: 3000 ranges found
>              2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 40178 ms
>              2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 42296 ms
>              2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 46094 ms
>              2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 47704 ms
>              2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 49221 ms
>
>              ** Scanners **
>              2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO :
>         Shuffled
>              all rows
>              2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>              ranges calculated: 3000 ranges found
>              2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2833 ms
>              2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2536 ms
>              2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2150 ms
>              2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2061 ms
>              2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>              Executing 6 range partitions using a pool of 6 threads
>              2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO :
>         Queries
>              executed in 2140 ms
>
>              Query code is available
>         https://github.com/joshelser/accumulo-range-binning
>         <https://github.com/joshelser/accumulo-range-binning>
>         <https://github.com/joshelser/accumulo-range-binning
>         <https://github.com/joshelser/accumulo-range-binning>>
>
>
>              Sven Hodapp wrote:
>
>                  Hi Keith,
>
>                  I've tried it with 1, 2 or 10 threads. Unfortunately
>         there where
>                  no amazing differences.
>                  Maybe it's a problem with the table structure? For
>         example it
>                  may happen that one row id (e.g. a sentence) has several
>                  thousand column families. Can this affect the seek
>         performance?
>
>                  So for my initial example it has about 3000 row ids to
>         seek,
>                  which will return about 500k entries. If I filter for
>         specific
>                  column families (e.g. a document without annotations)
>         it will
>                  return about 5k entries, but the seek time will only be
>         halved.
>                  Are there to much column families to seek it fast?
>
>                  Thanks!
>
>                  Regards,
>                  Sven
>
>
>

Re: Accumulo Seek performance

Posted by Adam Fuchs <af...@apache.org>.
Sorry, Monday morning poor reading skills, I guess. :)

So, 3000 ranges in 40 seconds with the BatchScanner. In my past experience
HDFS seeks tend to take something like 10-100ms, and I would expect that
time to dominate here. With 60 client threads your bottleneck should be the
readahead pool, which I believe defaults to 16 threads. If you get perfect
index caching then you should be seeing something like 3000/16*50ms =
9,375ms. That's in the right ballpark, but it assumes no data cache hits.
Do you have any idea of how many files you had per tablet after the ingest?
Do you know what your cache hit rate was?

Adam


On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <jo...@gmail.com> wrote:

> 5 iterations, figured that would be apparent from the log messages :)
>
> The code is already posted in my original message.
>
> Adam Fuchs wrote:
>
>> Josh,
>>
>> Two questions:
>>
>> 1. How many iterations did you do? I would like to see an absolute
>> number of lookups per second to compare against other observations.
>>
>> 2. Can you post your code somewhere so I can run it?
>>
>> Thanks,
>> Adam
>>
>>
>> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Sven, et al:
>>
>>     So, it would appear that I have been able to reproduce this one
>>     (better late than never, I guess...). tl;dr Serially using Scanners
>>     to do point lookups instead of a BatchScanner is ~20x faster. This
>>     sounds like a pretty serious performance issue to me.
>>
>>     Here's a general outline for what I did.
>>
>>     * Accumulo 1.8.0
>>     * Created a table with 1M rows, each row with 10 columns using YCSB
>>     (workloada)
>>     * Split the table into 9 tablets
>>     * Computed the set of all rows in the table
>>
>>     For a number of iterations:
>>     * Shuffle this set of rows
>>     * Choose the first N rows
>>     * Construct an equivalent set of Ranges from the set of Rows,
>>     choosing a random column (0-9)
>>     * Partition the N rows into X collections
>>     * Submit X tasks to query one partition of the N rows (to a thread
>>     pool with X fixed threads)
>>
>>     I have two implementations of these tasks. One, where all ranges in
>>     a partition are executed via one BatchWriter. A second where each
>>     range is executed in serial using a Scanner. The numbers speak for
>>     themselves.
>>
>>     ** BatchScanners **
>>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>     all rows
>>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>>     ranges calculated: 3000 ranges found
>>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 40178 ms
>>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 42296 ms
>>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 46094 ms
>>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 47704 ms
>>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 49221 ms
>>
>>     ** Scanners **
>>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>     all rows
>>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>>     ranges calculated: 3000 ranges found
>>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2833 ms
>>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2536 ms
>>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2150 ms
>>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2061 ms
>>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2140 ms
>>
>>     Query code is available
>>     https://github.com/joshelser/accumulo-range-binning
>>     <https://github.com/joshelser/accumulo-range-binning>
>>
>>
>>     Sven Hodapp wrote:
>>
>>         Hi Keith,
>>
>>         I've tried it with 1, 2 or 10 threads. Unfortunately there where
>>         no amazing differences.
>>         Maybe it's a problem with the table structure? For example it
>>         may happen that one row id (e.g. a sentence) has several
>>         thousand column families. Can this affect the seek performance?
>>
>>         So for my initial example it has about 3000 row ids to seek,
>>         which will return about 500k entries. If I filter for specific
>>         column families (e.g. a document without annotations) it will
>>         return about 5k entries, but the seek time will only be halved.
>>         Are there to much column families to seek it fast?
>>
>>         Thanks!
>>
>>         Regards,
>>         Sven
>>
>>
>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
5 iterations, figured that would be apparent from the log messages :)

The code is already posted in my original message.

Adam Fuchs wrote:
> Josh,
>
> Two questions:
>
> 1. How many iterations did you do? I would like to see an absolute
> number of lookups per second to compare against other observations.
>
> 2. Can you post your code somewhere so I can run it?
>
> Thanks,
> Adam
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Sven, et al:
>
>     So, it would appear that I have been able to reproduce this one
>     (better late than never, I guess...). tl;dr Serially using Scanners
>     to do point lookups instead of a BatchScanner is ~20x faster. This
>     sounds like a pretty serious performance issue to me.
>
>     Here's a general outline for what I did.
>
>     * Accumulo 1.8.0
>     * Created a table with 1M rows, each row with 10 columns using YCSB
>     (workloada)
>     * Split the table into 9 tablets
>     * Computed the set of all rows in the table
>
>     For a number of iterations:
>     * Shuffle this set of rows
>     * Choose the first N rows
>     * Construct an equivalent set of Ranges from the set of Rows,
>     choosing a random column (0-9)
>     * Partition the N rows into X collections
>     * Submit X tasks to query one partition of the N rows (to a thread
>     pool with X fixed threads)
>
>     I have two implementations of these tasks. One, where all ranges in
>     a partition are executed via one BatchWriter. A second where each
>     range is executed in serial using a Scanner. The numbers speak for
>     themselves.
>
>     ** BatchScanners **
>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 40178 ms
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 42296 ms
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 46094 ms
>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 47704 ms
>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 49221 ms
>
>     ** Scanners **
>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2833 ms
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2536 ms
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2150 ms
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2061 ms
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2140 ms
>
>     Query code is available
>     https://github.com/joshelser/accumulo-range-binning
>     <https://github.com/joshelser/accumulo-range-binning>
>
>
>     Sven Hodapp wrote:
>
>         Hi Keith,
>
>         I've tried it with 1, 2 or 10 threads. Unfortunately there where
>         no amazing differences.
>         Maybe it's a problem with the table structure? For example it
>         may happen that one row id (e.g. a sentence) has several
>         thousand column families. Can this affect the seek performance?
>
>         So for my initial example it has about 3000 row ids to seek,
>         which will return about 500k entries. If I filter for specific
>         column families (e.g. a document without annotations) it will
>         return about 5k entries, but the seek time will only be halved.
>         Are there to much column families to seek it fast?
>
>         Thanks!
>
>         Regards,
>         Sven
>
>

Re: Accumulo Seek performance

Posted by Adam Fuchs <af...@apache.org>.
Josh,

Two questions:

1. How many iterations did you do? I would like to see an absolute number
of lookups per second to compare against other observations.

2. Can you post your code somewhere so I can run it?

Thanks,
Adam


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <jo...@gmail.com> wrote:

> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a pretty
> serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/a
> ccumulo-range-binning
>
>
> Sven Hodapp wrote:
>
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen
>> that one row id (e.g. a sentence) has several thousand column families. Can
>> this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will
>> return about 500k entries. If I filter for specific column families (e.g. a
>> document without annotations) it will return about 5k entries, but the seek
>> time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>
>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Keith Turner wrote:
> On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser<jo...@gmail.com>  wrote:
>> >  Good call. I kind of forgot about BatchScanner threads and trying to factor
>> >  those in:). I guess doing one thread in the BatchScanners would be more
>> >  accurate.
>> >
>> >  Although, I only had one TServer, so I don't*think*  there would be any
>> >  difference. I don't believe we have concurrent requests from one
>> >  BatchScanner to one TServer.
>
> There are, if the batch scanner sees it has extra threads and there
> are multiple tablets on the tserver, then it will submit concurrent
> request to a single tserver.
>

Hrm, curious then. I don't think I was oversaturating the physical 
resources on my laptop, but who knows. I'll see if I can revisit this 
experiment tonight to see if it changes anything. It was very easy to 
get YCSB data up and ingested and then run this tool.

Re: Accumulo Seek performance

Posted by Keith Turner <ke...@deenlo.com>.
On Mon, Sep 12, 2016 at 10:58 AM, Josh Elser <jo...@gmail.com> wrote:
> Good call. I kind of forgot about BatchScanner threads and trying to factor
> those in :). I guess doing one thread in the BatchScanners would be more
> accurate.
>
> Although, I only had one TServer, so I don't *think* there would be any
> difference. I don't believe we have concurrent requests from one
> BatchScanner to one TServer.

There are, if the batch scanner sees it has extra threads and there
are multiple tablets on the tserver, then it will submit concurrent
request to a single tserver.

>
> Dylan Hutchison wrote:
>>
>> Nice setup Josh.  Thank you for putting together the tests.  A few
>> questions:
>>
>> The serial scanner implementation uses 6 threads: one for each thread in
>> the thread pool.
>> The batch scanner implementation uses 60 threads: 10 for each thread in
>> the thread pool, since the BatchScanner was configured with 10 threads
>> and there are 10 (9?) tablets.
>>
>> Isn't 60 threads of communication naturally inefficient?  I wonder if we
>> would see the same performance if we set each BatchScanner to use 1 or 2
>> threads.
>>
>> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
>> fixed number of threads across any number of concurrent scans, possibly
>> to the same table.
>>
>>
>> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Sven, et al:
>>
>>     So, it would appear that I have been able to reproduce this one
>>     (better late than never, I guess...). tl;dr Serially using Scanners
>>     to do point lookups instead of a BatchScanner is ~20x faster. This
>>     sounds like a pretty serious performance issue to me.
>>
>>     Here's a general outline for what I did.
>>
>>     * Accumulo 1.8.0
>>     * Created a table with 1M rows, each row with 10 columns using YCSB
>>     (workloada)
>>     * Split the table into 9 tablets
>>     * Computed the set of all rows in the table
>>
>>     For a number of iterations:
>>     * Shuffle this set of rows
>>     * Choose the first N rows
>>     * Construct an equivalent set of Ranges from the set of Rows,
>>     choosing a random column (0-9)
>>     * Partition the N rows into X collections
>>     * Submit X tasks to query one partition of the N rows (to a thread
>>     pool with X fixed threads)
>>
>>     I have two implementations of these tasks. One, where all ranges in
>>     a partition are executed via one BatchWriter. A second where each
>>     range is executed in serial using a Scanner. The numbers speak for
>>     themselves.
>>
>>     ** BatchScanners **
>>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>     all rows
>>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>>     ranges calculated: 3000 ranges found
>>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 40178 ms
>>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 42296 ms
>>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 46094 ms
>>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 47704 ms
>>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 49221 ms
>>
>>     ** Scanners **
>>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>>     all rows
>>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>>     ranges calculated: 3000 ranges found
>>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2833 ms
>>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2536 ms
>>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2150 ms
>>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2061 ms
>>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>>     Executing 6 range partitions using a pool of 6 threads
>>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>>     executed in 2140 ms
>>
>>     Query code is available
>>     https://github.com/joshelser/accumulo-range-binning
>>     <https://github.com/joshelser/accumulo-range-binning>
>>
>>
>>     Sven Hodapp wrote:
>>
>>         Hi Keith,
>>
>>         I've tried it with 1, 2 or 10 threads. Unfortunately there where
>>         no amazing differences.
>>         Maybe it's a problem with the table structure? For example it
>>         may happen that one row id (e.g. a sentence) has several
>>         thousand column families. Can this affect the seek performance?
>>
>>         So for my initial example it has about 3000 row ids to seek,
>>         which will return about 500k entries. If I filter for specific
>>         column families (e.g. a document without annotations) it will
>>         return about 5k entries, but the seek time will only be halved.
>>         Are there to much column families to seek it fast?
>>
>>         Thanks!
>>
>>         Regards,
>>         Sven
>>
>>
>

RE: Accumulo Seek performance

Posted by Dan Blum <db...@bbn.com>.
I am not sure - my recollection is that the 1.6.x code capped the number of threads requested at 1 per tablet (covered by the requested ranges), not 1 per tablet server.

-----Original Message-----
From: Josh Elser [mailto:josh.elser@gmail.com] 
Sent: Monday, September 12, 2016 10:58 AM
To: user@accumulo.apache.org
Subject: Re: Accumulo Seek performance

Good call. I kind of forgot about BatchScanner threads and trying to 
factor those in :). I guess doing one thread in the BatchScanners would 
be more accurate.

Although, I only had one TServer, so I don't *think* there would be any 
difference. I don't believe we have concurrent requests from one 
BatchScanner to one TServer.

Dylan Hutchison wrote:
> Nice setup Josh.  Thank you for putting together the tests.  A few
> questions:
>
> The serial scanner implementation uses 6 threads: one for each thread in
> the thread pool.
> The batch scanner implementation uses 60 threads: 10 for each thread in
> the thread pool, since the BatchScanner was configured with 10 threads
> and there are 10 (9?) tablets.
>
> Isn't 60 threads of communication naturally inefficient?  I wonder if we
> would see the same performance if we set each BatchScanner to use 1 or 2
> threads.
>
> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
> fixed number of threads across any number of concurrent scans, possibly
> to the same table.
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Sven, et al:
>
>     So, it would appear that I have been able to reproduce this one
>     (better late than never, I guess...). tl;dr Serially using Scanners
>     to do point lookups instead of a BatchScanner is ~20x faster. This
>     sounds like a pretty serious performance issue to me.
>
>     Here's a general outline for what I did.
>
>     * Accumulo 1.8.0
>     * Created a table with 1M rows, each row with 10 columns using YCSB
>     (workloada)
>     * Split the table into 9 tablets
>     * Computed the set of all rows in the table
>
>     For a number of iterations:
>     * Shuffle this set of rows
>     * Choose the first N rows
>     * Construct an equivalent set of Ranges from the set of Rows,
>     choosing a random column (0-9)
>     * Partition the N rows into X collections
>     * Submit X tasks to query one partition of the N rows (to a thread
>     pool with X fixed threads)
>
>     I have two implementations of these tasks. One, where all ranges in
>     a partition are executed via one BatchWriter. A second where each
>     range is executed in serial using a Scanner. The numbers speak for
>     themselves.
>
>     ** BatchScanners **
>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 40178 ms
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 42296 ms
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 46094 ms
>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 47704 ms
>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 49221 ms
>
>     ** Scanners **
>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2833 ms
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2536 ms
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2150 ms
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2061 ms
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2140 ms
>
>     Query code is available
>     https://github.com/joshelser/accumulo-range-binning
>     <https://github.com/joshelser/accumulo-range-binning>
>
>
>     Sven Hodapp wrote:
>
>         Hi Keith,
>
>         I've tried it with 1, 2 or 10 threads. Unfortunately there where
>         no amazing differences.
>         Maybe it's a problem with the table structure? For example it
>         may happen that one row id (e.g. a sentence) has several
>         thousand column families. Can this affect the seek performance?
>
>         So for my initial example it has about 3000 row ids to seek,
>         which will return about 500k entries. If I filter for specific
>         column families (e.g. a document without annotations) it will
>         return about 5k entries, but the seek time will only be halved.
>         Are there to much column families to seek it fast?
>
>         Thanks!
>
>         Regards,
>         Sven
>
>


Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Good call. I kind of forgot about BatchScanner threads and trying to 
factor those in :). I guess doing one thread in the BatchScanners would 
be more accurate.

Although, I only had one TServer, so I don't *think* there would be any 
difference. I don't believe we have concurrent requests from one 
BatchScanner to one TServer.

Dylan Hutchison wrote:
> Nice setup Josh.  Thank you for putting together the tests.  A few
> questions:
>
> The serial scanner implementation uses 6 threads: one for each thread in
> the thread pool.
> The batch scanner implementation uses 60 threads: 10 for each thread in
> the thread pool, since the BatchScanner was configured with 10 threads
> and there are 10 (9?) tablets.
>
> Isn't 60 threads of communication naturally inefficient?  I wonder if we
> would see the same performance if we set each BatchScanner to use 1 or 2
> threads.
>
> Maybe this would motivate a /MultiTableBatchScanner/, which maintains a
> fixed number of threads across any number of concurrent scans, possibly
> to the same table.
>
>
> On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <josh.elser@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Sven, et al:
>
>     So, it would appear that I have been able to reproduce this one
>     (better late than never, I guess...). tl;dr Serially using Scanners
>     to do point lookups instead of a BatchScanner is ~20x faster. This
>     sounds like a pretty serious performance issue to me.
>
>     Here's a general outline for what I did.
>
>     * Accumulo 1.8.0
>     * Created a table with 1M rows, each row with 10 columns using YCSB
>     (workloada)
>     * Split the table into 9 tablets
>     * Computed the set of all rows in the table
>
>     For a number of iterations:
>     * Shuffle this set of rows
>     * Choose the first N rows
>     * Construct an equivalent set of Ranges from the set of Rows,
>     choosing a random column (0-9)
>     * Partition the N rows into X collections
>     * Submit X tasks to query one partition of the N rows (to a thread
>     pool with X fixed threads)
>
>     I have two implementations of these tasks. One, where all ranges in
>     a partition are executed via one BatchWriter. A second where each
>     range is executed in serial using a Scanner. The numbers speak for
>     themselves.
>
>     ** BatchScanners **
>     2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 40178 ms
>     2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 42296 ms
>     2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 46094 ms
>     2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 47704 ms
>     2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 49221 ms
>
>     ** Scanners **
>     2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled
>     all rows
>     2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All
>     ranges calculated: 3000 ranges found
>     2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2833 ms
>     2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2536 ms
>     2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2150 ms
>     2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2061 ms
>     2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO :
>     Executing 6 range partitions using a pool of 6 threads
>     2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
>     executed in 2140 ms
>
>     Query code is available
>     https://github.com/joshelser/accumulo-range-binning
>     <https://github.com/joshelser/accumulo-range-binning>
>
>
>     Sven Hodapp wrote:
>
>         Hi Keith,
>
>         I've tried it with 1, 2 or 10 threads. Unfortunately there where
>         no amazing differences.
>         Maybe it's a problem with the table structure? For example it
>         may happen that one row id (e.g. a sentence) has several
>         thousand column families. Can this affect the seek performance?
>
>         So for my initial example it has about 3000 row ids to seek,
>         which will return about 500k entries. If I filter for specific
>         column families (e.g. a document without annotations) it will
>         return about 5k entries, but the seek time will only be halved.
>         Are there to much column families to seek it fast?
>
>         Thanks!
>
>         Regards,
>         Sven
>
>

Re: Accumulo Seek performance

Posted by Dylan Hutchison <dh...@cs.washington.edu>.
Nice setup Josh.  Thank you for putting together the tests.  A few
questions:

The serial scanner implementation uses 6 threads: one for each thread in
the thread pool.
The batch scanner implementation uses 60 threads: 10 for each thread in the
thread pool, since the BatchScanner was configured with 10 threads and
there are 10 (9?) tablets.

Isn't 60 threads of communication naturally inefficient?  I wonder if we
would see the same performance if we set each BatchScanner to use 1 or 2
threads.

Maybe this would motivate a *MultiTableBatchScanner*, which maintains a
fixed number of threads across any number of concurrent scans, possibly to
the same table.


On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser <jo...@gmail.com> wrote:

> Sven, et al:
>
> So, it would appear that I have been able to reproduce this one (better
> late than never, I guess...). tl;dr Serially using Scanners to do point
> lookups instead of a BatchScanner is ~20x faster. This sounds like a pretty
> serious performance issue to me.
>
> Here's a general outline for what I did.
>
> * Accumulo 1.8.0
> * Created a table with 1M rows, each row with 10 columns using YCSB
> (workloada)
> * Split the table into 9 tablets
> * Computed the set of all rows in the table
>
> For a number of iterations:
> * Shuffle this set of rows
> * Choose the first N rows
> * Construct an equivalent set of Ranges from the set of Rows, choosing a
> random column (0-9)
> * Partition the N rows into X collections
> * Submit X tasks to query one partition of the N rows (to a thread pool
> with X fixed threads)
>
> I have two implementations of these tasks. One, where all ranges in a
> partition are executed via one BatchWriter. A second where each range is
> executed in serial using a Scanner. The numbers speak for themselves.
>
> ** BatchScanners **
> 2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 40178 ms
> 2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 42296 ms
> 2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 46094 ms
> 2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 47704 ms
> 2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 49221 ms
>
> ** Scanners **
> 2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all
> rows
> 2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges
> calculated: 3000 ranges found
> 2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2833 ms
> 2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2536 ms
> 2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2150 ms
> 2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2061 ms
> 2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6
> range partitions using a pool of 6 threads
> 2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries
> executed in 2140 ms
>
> Query code is available https://github.com/joshelser/a
> ccumulo-range-binning
>
>
> Sven Hodapp wrote:
>
>> Hi Keith,
>>
>> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
>> amazing differences.
>> Maybe it's a problem with the table structure? For example it may happen
>> that one row id (e.g. a sentence) has several thousand column families. Can
>> this affect the seek performance?
>>
>> So for my initial example it has about 3000 row ids to seek, which will
>> return about 500k entries. If I filter for specific column families (e.g. a
>> document without annotations) it will return about 5k entries, but the seek
>> time will only be halved.
>> Are there to much column families to seek it fast?
>>
>> Thanks!
>>
>> Regards,
>> Sven
>>
>>

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Sven, et al:

So, it would appear that I have been able to reproduce this one (better 
late than never, I guess...). tl;dr Serially using Scanners to do point 
lookups instead of a BatchScanner is ~20x faster. This sounds like a 
pretty serious performance issue to me.

Here's a general outline for what I did.

* Accumulo 1.8.0
* Created a table with 1M rows, each row with 10 columns using YCSB 
(workloada)
* Split the table into 9 tablets
* Computed the set of all rows in the table

For a number of iterations:
* Shuffle this set of rows
* Choose the first N rows
* Construct an equivalent set of Ranges from the set of Rows, choosing a 
random column (0-9)
* Partition the N rows into X collections
* Submit X tasks to query one partition of the N rows (to a thread pool 
with X fixed threads)

I have two implementations of these tasks. One, where all ranges in a 
partition are executed via one BatchWriter. A second where each range is 
executed in serial using a Scanner. The numbers speak for themselves.

** BatchScanners **
2016-09-10 17:51:38,811 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:51:38,843 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:51:38,846 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 40178 ms
2016-09-10 17:52:19,025 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 42296 ms
2016-09-10 17:53:01,321 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:53:47,414 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 46094 ms
2016-09-10 17:53:47,415 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:54:35,118 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 47704 ms
2016-09-10 17:54:35,119 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:55:24,339 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 49221 ms

** Scanners **
2016-09-10 17:57:23,867 [joshelser.YcsbBatchScanner] INFO : Shuffled all 
rows
2016-09-10 17:57:23,898 [joshelser.YcsbBatchScanner] INFO : All ranges 
calculated: 3000 ranges found
2016-09-10 17:57:23,903 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2833 ms
2016-09-10 17:57:26,738 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2536 ms
2016-09-10 17:57:29,275 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2150 ms
2016-09-10 17:57:31,425 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2061 ms
2016-09-10 17:57:33,487 [joshelser.YcsbBatchScanner] INFO : Executing 6 
range partitions using a pool of 6 threads
2016-09-10 17:57:35,628 [joshelser.YcsbBatchScanner] INFO : Queries 
executed in 2140 ms

Query code is available https://github.com/joshelser/accumulo-range-binning

Sven Hodapp wrote:
> Hi Keith,
>
> I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing differences.
> Maybe it's a problem with the table structure? For example it may happen that one row id (e.g. a sentence) has several thousand column families. Can this affect the seek performance?
>
> So for my initial example it has about 3000 row ids to seek, which will return about 500k entries. If I filter for specific column families (e.g. a document without annotations) it will return about 5k entries, but the seek time will only be halved.
> Are there to much column families to seek it fast?
>
> Thanks!
>
> Regards,
> Sven
>

Re: Accumulo Seek performance

Posted by Dylan Hutchison <dh...@cs.washington.edu>.
Hi Sven,
  Without locality groups, your filtered scan may be reading nearly the
entire table.  The process looks like this:

   1. For each tablet that has one of the 3000 row ids (assuming sufficient
   tablet servers),
      1. *Seek* to the first column family of the first row id out of the
      target row ids in the tablet.
      2. *Read* that row+cf prefix.
      3. Find the next cf (out of the 5k cf's in your filter).
         1. *Read* the next entry and see if it is in the cf.  If it is,
         then you are lucky and go back to step 2.  Repeat this process for 10
         entries (a heuristic number).
         2. If none of the next 10 entries match the cf (or the next row in
         your target ranges), then *seek* to the next target row+cf, as in
         step 1.
      4. Continue until all target row ids in the tablet are scanned.

In the worst case, if the 5k target cf's in your filter are uniformly
spread out among the 500k total cf's (and each row has all 500k cf's, which
is probably not the case in your document-sentence table), then Accumulo
performs 5k seeks per row id, or 5k * 3k rows = 15M seeks, to be divided
among your tablet servers (assuming no significant skew).  You can adjust
this for the actual distribution of column families in your table to get an
idea of how many seeks Accumulo performs.

(On the other hand in the best case, if the 5k target cf's are all clumped
together, then Accumulo need only seek 3k times, or less if some row ids
are consecutive.)

Perhaps others could extend the model by estimating a "seconds/seek"
figure?  If we can estimate this, it would tell you whether your
BatchScanner times are in the right ballpark.  Or it might be sufficient to
compare the number of seeks.

Cheers, Dylan

On Wed, Aug 31, 2016 at 12:06 AM, Sven Hodapp <
sven.hodapp@scai.fraunhofer.de> wrote:

> Hi Keith,
>
> I've tried it with 1, 2 or 10 threads. Unfortunately there where no
> amazing differences.
> Maybe it's a problem with the table structure? For example it may happen
> that one row id (e.g. a sentence) has several thousand column families. Can
> this affect the seek performance?
>
> So for my initial example it has about 3000 row ids to seek, which will
> return about 500k entries. If I filter for specific column families (e.g. a
> document without annotations) it will return about 5k entries, but the seek
> time will only be halved.
> Are there to much column families to seek it fast?
>
> Thanks!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
>
> ----- Ursprüngliche Mail -----
> > Von: "Keith Turner" <ke...@deenlo.com>
> > An: "user" <us...@accumulo.apache.org>
> > Gesendet: Montag, 29. August 2016 22:37:32
> > Betreff: Re: Accumulo Seek performance
>
> > On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> > <sv...@scai.fraunhofer.de> wrote:
> >> Hi there,
> >>
> >> currently we're experimenting with a two node Accumulo cluster (two
> tablet
> >> servers) setup for document storage.
> >> This documents are decomposed up to the sentence level.
> >>
> >> Now I'm using a BatchScanner to assemble the full document like this:
> >>
> >>     val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
> ARTIFACTS table
> >>     currently hosts ~30GB data, ~200M entries on ~45 tablets
> >>     bscan.setRanges(ranges)  // there are like 3000 Range.exact's in
> the ranges-list
> >>       for (entry <- bscan.asScala) yield {
> >>         val key = entry.getKey()
> >>         val value = entry.getValue()
> >>         // etc.
> >>       }
> >>
> >> For larger full documents (e.g. 3000 exact ranges), this operation will
> take
> >> about 12 seconds.
> >> But shorter documents are assembled blazing fast...
> >>
> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> >> Is that a normal time for such a (seek) operation?
> >> Can I do something to get a better seek performance?
> >
> > How many threads did you configure the batch scanner with and did you
> > try varying this?
> >
> >>
> >> Note: I have already enabled bloom filtering on that table.
> >>
> >> Thank you for any advice!
> >>
> >> Regards,
> >> Sven
> >>
> >> --
> >> Sven Hodapp, M.Sc.,
> >> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> >> Department of Bioinformatics
> >> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> >> sven.hodapp@scai.fraunhofer.de
> > > www.scai.fraunhofer.de
>

Re: Accumulo Seek performance

Posted by Sven Hodapp <sv...@scai.fraunhofer.de>.
Hi Keith,

I've tried it with 1, 2 or 10 threads. Unfortunately there where no amazing differences.
Maybe it's a problem with the table structure? For example it may happen that one row id (e.g. a sentence) has several thousand column families. Can this affect the seek performance?

So for my initial example it has about 3000 row ids to seek, which will return about 500k entries. If I filter for specific column families (e.g. a document without annotations) it will return about 5k entries, but the seek time will only be halved.
Are there to much column families to seek it fast?

Thanks!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: "Keith Turner" <ke...@deenlo.com>
> An: "user" <us...@accumulo.apache.org>
> Gesendet: Montag, 29. August 2016 22:37:32
> Betreff: Re: Accumulo Seek performance

> On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
> <sv...@scai.fraunhofer.de> wrote:
>> Hi there,
>>
>> currently we're experimenting with a two node Accumulo cluster (two tablet
>> servers) setup for document storage.
>> This documents are decomposed up to the sentence level.
>>
>> Now I'm using a BatchScanner to assemble the full document like this:
>>
>>     val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table
>>     currently hosts ~30GB data, ~200M entries on ~45 tablets
>>     bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
>>       for (entry <- bscan.asScala) yield {
>>         val key = entry.getKey()
>>         val value = entry.getValue()
>>         // etc.
>>       }
>>
>> For larger full documents (e.g. 3000 exact ranges), this operation will take
>> about 12 seconds.
>> But shorter documents are assembled blazing fast...
>>
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>> Is that a normal time for such a (seek) operation?
>> Can I do something to get a better seek performance?
> 
> How many threads did you configure the batch scanner with and did you
> try varying this?
> 
>>
>> Note: I have already enabled bloom filtering on that table.
>>
>> Thank you for any advice!
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hodapp@scai.fraunhofer.de
> > www.scai.fraunhofer.de

Re: Accumulo Seek performance

Posted by Keith Turner <ke...@deenlo.com>.
On Wed, Aug 24, 2016 at 9:22 AM, Sven Hodapp
<sv...@scai.fraunhofer.de> wrote:
> Hi there,
>
> currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup for document storage.
> This documents are decomposed up to the sentence level.
>
> Now I'm using a BatchScanner to assemble the full document like this:
>
>     val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table currently hosts ~30GB data, ~200M entries on ~45 tablets
>     bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
>       for (entry <- bscan.asScala) yield {
>         val key = entry.getKey()
>         val value = entry.getValue()
>         // etc.
>       }
>
> For larger full documents (e.g. 3000 exact ranges), this operation will take about 12 seconds.
> But shorter documents are assembled blazing fast...
>
> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> Is that a normal time for such a (seek) operation?
> Can I do something to get a better seek performance?

How many threads did you configure the batch scanner with and did you
try varying this?

>
> Note: I have already enabled bloom filtering on that table.
>
> Thank you for any advice!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de

Re: Accumulo Seek performance

Posted by Sven Hodapp <sv...@scai.fraunhofer.de>.
Hi Dave,

toList will exhaust the iterator. But all 6 iterators will be concurrently exhausted within the Future object (http://docs.scala-lang.org/overviews/core/futures.html).

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: dlmarion@comcast.net
> An: "user" <us...@accumulo.apache.org>
> Gesendet: Donnerstag, 25. August 2016 16:22:35
> Betreff: Re: Accumulo Seek performance

> But does toList exhaust the first iterator() before going to the next?
> 
> - Dave
> 
> 
> ----- Original Message -----
> 
> From: "Sven Hodapp" <sv...@scai.fraunhofer.de>
> To: "user" <us...@accumulo.apache.org>
> Sent: Thursday, August 25, 2016 9:42:00 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi dlmarion,
> 
> toList should also call iterator(), and that is done in independently for each
> batch scanner iterator in the context of the Future.
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> ----- Ursprüngliche Mail -----
>> Von: dlmarion@comcast.net
>> An: "user" <us...@accumulo.apache.org>
>> Gesendet: Donnerstag, 25. August 2016 14:34:39
>> Betreff: Re: Accumulo Seek performance
> 
>> Calling BatchScanner.iterator() is what starts the work on the server side. You
>> should do this first for all 6 batch scanners, then iterate over all of them in
>> parallel.
>> 
>> ----- Original Message -----
>> 
>> From: "Sven Hodapp" <sv...@scai.fraunhofer.de>
>> To: "user" <us...@accumulo.apache.org>
>> Sent: Thursday, August 25, 2016 4:53:41 AM
>> Subject: Re: Accumulo Seek performance
>> 
>> Hi,
>> 
>> I've changed the code a little bit, so that it uses a thread pool (via the
>> Future):
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will
>> be created
>> 
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
>> bscan.setRanges(ranges.asJava)
>> Future {
>> time("mult-scanner") {
>> bscan.asScala.toList // toList forces the iteration of the iterator
>> }
>> }
>> }
>> 
>> Here are the results:
>> 
>> background log: info: mult-scanner time: 4807.289358 ms
>> background log: info: mult-scanner time: 4930.996522 ms
>> background log: info: mult-scanner time: 9510.010808 ms
>> background log: info: mult-scanner time: 11394.152391 ms
>> background log: info: mult-scanner time: 13297.247295 ms
>> background log: info: mult-scanner time: 14032.704837 ms
>> 
>> background log: info: single-scanner time: 15322.624393 ms
>> 
>> Every Future completes independent, but in return every batch scanner iterator
>> needs more time to complete. :(
>> This means the batch scanners aren't really processed in parallel on the server
>> side?
>> Should I reconfigure something? Maybe the tablet servers haven't/can't allocate
>> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
>> and a storage with ~300MB/s...)
>> 
>> Regards,
>> Sven
>> 
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hodapp@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>> 
>> ----- Ursprüngliche Mail -----
>>> Von: "Josh Elser" <jo...@gmail.com>
>>> An: "user" <us...@accumulo.apache.org>
>>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>>> Betreff: Re: Accumulo Seek performance
>> 
>>> Ahh duh. Bad advice from me in the first place :)
>>> 
>>> Throw 'em in a threadpool locally.
>>> 
>>> dlmarion@comcast.net wrote:
>>>> Doesn't this use the 6 batch scanners serially?
>>>> 
>>>> ------------------------------------------------------------------------
>>>> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de>
>>>> *To: *"user" <us...@accumulo.apache.org>
>>>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>>>> *Subject: *Re: Accumulo Seek performance
>>>> 
>>>> Hi Josh,
>>>> 
>>>> thanks for your reply!
>>>> 
>>>> I've tested your suggestion with a implementation like that:
>>>> 
>>>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>>>> BatchScanners will be created
>>>> 
>>>> time("mult-scanner") {
>>>> for (ranges <- ranges500) {
>>>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>>>> bscan.setRanges(ranges.asJava)
>>>> for (entry <- bscan.asScala) yield {
>>>> entry.getKey()
>>>> }
>>>> }
>>>> }
>>>> 
>>>> And the result is a bit disappointing:
>>>> 
>>>> background log: info: mult-scanner time: 18064.969281 ms
>>>> background log: info: single-scanner time: 6527.482383 ms
>>>> 
>>>> I'm doing something wrong here?
>>>> 
>>>> 
>>>> Regards,
>>>> Sven
>>>> 
>>>> --
>>>> Sven Hodapp, M.Sc.,
>>>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>>>> Department of Bioinformatics
>>>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>>>> sven.hodapp@scai.fraunhofer.de
>>>> www.scai.fraunhofer.de
>>>> 
>>>> ----- Ursprüngliche Mail -----
>>>> > Von: "Josh Elser" <jo...@gmail.com>
>>>> > An: "user" <us...@accumulo.apache.org>
>>>> > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>>> > Betreff: Re: Accumulo Seek performance
>>>> 
>>>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>>> > 
>>>> > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>>> > 
>>>> > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>>> > to the TabletServer(s) currently hosting the corresponding data, clip
>>>> > (truncate) each Range to match the Tablet boundaries, and then does an
>>>> > RPC to each TabletServer with just the Ranges hosted there.
>>>> > 
>>>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>>> > (KeyExtent, to be precise). This will spawn a
>>>> > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>>>> > results to send back to the client.
>>>> > 
>>>> > The caveat here is that those ranges are processed serially on a
>>>> > TabletServer. Maybe, you're swamping one TabletServer with lots of
>>>> > Ranges that it could be processing in parallel.
>>>> > 
>>>> > Could you experiment with using multiple BatchScanners and something
>>>> > like Guava's Iterables.concat to make it appear like one Iterator?
>>>> > 
>>>> > I'm curious if we should put an optimization into the BatchScanner
>>>> > itself to limit the number of ranges we send in one RPC to a
>>>> > TabletServer (e.g. one BatchScanner might open multiple
>>>> > MultiScanSessions to a TabletServer).
>>>> > 
>>>> > Sven Hodapp wrote:
>>>> >> Hi there,
>>>> >> 
>>>> >> currently we're experimenting with a two node Accumulo cluster (two
>>>> tablet
>>>> >> servers) setup for document storage.
>>>> >> This documents are decomposed up to the sentence level.
>>>> >> 
>>>> >> Now I'm using a BatchScanner to assemble the full document like this:
>>>> >> 
>>>> >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
>>>> ARTIFACTS table
>>>> >> currently hosts ~30GB data, ~200M entries on ~45 tablets
>>>> >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the
>>>> ranges-list
>>>> >> for (entry<- bscan.asScala) yield {
>>>> >> val key = entry.getKey()
>>>> >> val value = entry.getValue()
>>>> >> // etc.
>>>> >> }
>>>> >> 
>>>> >> For larger full documents (e.g. 3000 exact ranges), this operation
>>>> will take
>>>> >> about 12 seconds.
>>>> >> But shorter documents are assembled blazing fast...
>>>> >> 
>>>> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>>>> >> Is that a normal time for such a (seek) operation?
>>>> >> Can I do something to get a better seek performance?
>>>> >> 
>>>> >> Note: I have already enabled bloom filtering on that table.
>>>> >> 
>>>> >> Thank you for any advice!
>>>> >> 
>>>> >> Regards,
> > >> >> Sven

Re: Accumulo Seek performance

Posted by dl...@comcast.net.
But does toList exhaust the first iterator() before going to the next? 

- Dave 


----- Original Message -----

From: "Sven Hodapp" <sv...@scai.fraunhofer.de> 
To: "user" <us...@accumulo.apache.org> 
Sent: Thursday, August 25, 2016 9:42:00 AM 
Subject: Re: Accumulo Seek performance 

Hi dlmarion, 

toList should also call iterator(), and that is done in independently for each batch scanner iterator in the context of the Future. 

Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hodapp@scai.fraunhofer.de 
www.scai.fraunhofer.de 

----- Ursprüngliche Mail ----- 
> Von: dlmarion@comcast.net 
> An: "user" <us...@accumulo.apache.org> 
> Gesendet: Donnerstag, 25. August 2016 14:34:39 
> Betreff: Re: Accumulo Seek performance 

> Calling BatchScanner.iterator() is what starts the work on the server side. You 
> should do this first for all 6 batch scanners, then iterate over all of them in 
> parallel. 
> 
> ----- Original Message ----- 
> 
> From: "Sven Hodapp" <sv...@scai.fraunhofer.de> 
> To: "user" <us...@accumulo.apache.org> 
> Sent: Thursday, August 25, 2016 4:53:41 AM 
> Subject: Re: Accumulo Seek performance 
> 
> Hi, 
> 
> I've changed the code a little bit, so that it uses a thread pool (via the 
> Future): 
> 
> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will 
> be created 
> 
> for (ranges <- ranges500) { 
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2) 
> bscan.setRanges(ranges.asJava) 
> Future { 
> time("mult-scanner") { 
> bscan.asScala.toList // toList forces the iteration of the iterator 
> } 
> } 
> } 
> 
> Here are the results: 
> 
> background log: info: mult-scanner time: 4807.289358 ms 
> background log: info: mult-scanner time: 4930.996522 ms 
> background log: info: mult-scanner time: 9510.010808 ms 
> background log: info: mult-scanner time: 11394.152391 ms 
> background log: info: mult-scanner time: 13297.247295 ms 
> background log: info: mult-scanner time: 14032.704837 ms 
> 
> background log: info: single-scanner time: 15322.624393 ms 
> 
> Every Future completes independent, but in return every batch scanner iterator 
> needs more time to complete. :( 
> This means the batch scanners aren't really processed in parallel on the server 
> side? 
> Should I reconfigure something? Maybe the tablet servers haven't/can't allocate 
> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory 
> and a storage with ~300MB/s...) 
> 
> Regards, 
> Sven 
> 
> -- 
> Sven Hodapp, M.Sc., 
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
> Department of Bioinformatics 
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
> sven.hodapp@scai.fraunhofer.de 
> www.scai.fraunhofer.de 
> 
> ----- Ursprüngliche Mail ----- 
>> Von: "Josh Elser" <jo...@gmail.com> 
>> An: "user" <us...@accumulo.apache.org> 
>> Gesendet: Mittwoch, 24. August 2016 18:36:42 
>> Betreff: Re: Accumulo Seek performance 
> 
>> Ahh duh. Bad advice from me in the first place :) 
>> 
>> Throw 'em in a threadpool locally. 
>> 
>> dlmarion@comcast.net wrote: 
>>> Doesn't this use the 6 batch scanners serially? 
>>> 
>>> ------------------------------------------------------------------------ 
>>> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de> 
>>> *To: *"user" <us...@accumulo.apache.org> 
>>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM 
>>> *Subject: *Re: Accumulo Seek performance 
>>> 
>>> Hi Josh, 
>>> 
>>> thanks for your reply! 
>>> 
>>> I've tested your suggestion with a implementation like that: 
>>> 
>>> val ranges500 = ranges.asScala.grouped(500) // this means 6 
>>> BatchScanners will be created 
>>> 
>>> time("mult-scanner") { 
>>> for (ranges <- ranges500) { 
>>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
>>> bscan.setRanges(ranges.asJava) 
>>> for (entry <- bscan.asScala) yield { 
>>> entry.getKey() 
>>> } 
>>> } 
>>> } 
>>> 
>>> And the result is a bit disappointing: 
>>> 
>>> background log: info: mult-scanner time: 18064.969281 ms 
>>> background log: info: single-scanner time: 6527.482383 ms 
>>> 
>>> I'm doing something wrong here? 
>>> 
>>> 
>>> Regards, 
>>> Sven 
>>> 
>>> -- 
>>> Sven Hodapp, M.Sc., 
>>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
>>> Department of Bioinformatics 
>>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
>>> sven.hodapp@scai.fraunhofer.de 
>>> www.scai.fraunhofer.de 
>>> 
>>> ----- Ursprüngliche Mail ----- 
>>> > Von: "Josh Elser" <jo...@gmail.com> 
>>> > An: "user" <us...@accumulo.apache.org> 
>>> > Gesendet: Mittwoch, 24. August 2016 16:33:37 
>>> > Betreff: Re: Accumulo Seek performance 
>>> 
>>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
>>> > 
>>> > I don't feel like 3000 ranges is too many, but this isn't quantitative. 
>>> > 
>>> > IIRC, the BatchScanner will take each Range you provide, bin each Range 
>>> > to the TabletServer(s) currently hosting the corresponding data, clip 
>>> > (truncate) each Range to match the Tablet boundaries, and then does an 
>>> > RPC to each TabletServer with just the Ranges hosted there. 
>>> > 
>>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet 
>>> > (KeyExtent, to be precise). This will spawn a 
>>> > org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
>>> > results to send back to the client. 
>>> > 
>>> > The caveat here is that those ranges are processed serially on a 
>>> > TabletServer. Maybe, you're swamping one TabletServer with lots of 
>>> > Ranges that it could be processing in parallel. 
>>> > 
>>> > Could you experiment with using multiple BatchScanners and something 
>>> > like Guava's Iterables.concat to make it appear like one Iterator? 
>>> > 
>>> > I'm curious if we should put an optimization into the BatchScanner 
>>> > itself to limit the number of ranges we send in one RPC to a 
>>> > TabletServer (e.g. one BatchScanner might open multiple 
>>> > MultiScanSessions to a TabletServer). 
>>> > 
>>> > Sven Hodapp wrote: 
>>> >> Hi there, 
>>> >> 
>>> >> currently we're experimenting with a two node Accumulo cluster (two 
>>> tablet 
>>> >> servers) setup for document storage. 
>>> >> This documents are decomposed up to the sentence level. 
>>> >> 
>>> >> Now I'm using a BatchScanner to assemble the full document like this: 
>>> >> 
>>> >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
>>> ARTIFACTS table 
>>> >> currently hosts ~30GB data, ~200M entries on ~45 tablets 
>>> >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the 
>>> ranges-list 
>>> >> for (entry<- bscan.asScala) yield { 
>>> >> val key = entry.getKey() 
>>> >> val value = entry.getValue() 
>>> >> // etc. 
>>> >> } 
>>> >> 
>>> >> For larger full documents (e.g. 3000 exact ranges), this operation 
>>> will take 
>>> >> about 12 seconds. 
>>> >> But shorter documents are assembled blazing fast... 
>>> >> 
>>> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner? 
>>> >> Is that a normal time for such a (seek) operation? 
>>> >> Can I do something to get a better seek performance? 
>>> >> 
>>> >> Note: I have already enabled bloom filtering on that table. 
>>> >> 
>>> >> Thank you for any advice! 
>>> >> 
>>> >> Regards, 
> >> >> Sven 


Re: Accumulo Seek performance

Posted by Sven Hodapp <sv...@scai.fraunhofer.de>.
Hi dlmarion,

toList should also call iterator(), and that is done in independently for each batch scanner iterator in the context of the Future.

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: dlmarion@comcast.net
> An: "user" <us...@accumulo.apache.org>
> Gesendet: Donnerstag, 25. August 2016 14:34:39
> Betreff: Re: Accumulo Seek performance

> Calling BatchScanner.iterator() is what starts the work on the server side. You
> should do this first for all 6 batch scanners, then iterate over all of them in
> parallel.
> 
> ----- Original Message -----
> 
> From: "Sven Hodapp" <sv...@scai.fraunhofer.de>
> To: "user" <us...@accumulo.apache.org>
> Sent: Thursday, August 25, 2016 4:53:41 AM
> Subject: Re: Accumulo Seek performance
> 
> Hi,
> 
> I've changed the code a little bit, so that it uses a thread pool (via the
> Future):
> 
> val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will
> be created
> 
> for (ranges <- ranges500) {
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
> bscan.setRanges(ranges.asJava)
> Future {
> time("mult-scanner") {
> bscan.asScala.toList // toList forces the iteration of the iterator
> }
> }
> }
> 
> Here are the results:
> 
> background log: info: mult-scanner time: 4807.289358 ms
> background log: info: mult-scanner time: 4930.996522 ms
> background log: info: mult-scanner time: 9510.010808 ms
> background log: info: mult-scanner time: 11394.152391 ms
> background log: info: mult-scanner time: 13297.247295 ms
> background log: info: mult-scanner time: 14032.704837 ms
> 
> background log: info: single-scanner time: 15322.624393 ms
> 
> Every Future completes independent, but in return every batch scanner iterator
> needs more time to complete. :(
> This means the batch scanners aren't really processed in parallel on the server
> side?
> Should I reconfigure something? Maybe the tablet servers haven't/can't allocate
> enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory
> and a storage with ~300MB/s...)
> 
> Regards,
> Sven
> 
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
> 
> ----- Ursprüngliche Mail -----
>> Von: "Josh Elser" <jo...@gmail.com>
>> An: "user" <us...@accumulo.apache.org>
>> Gesendet: Mittwoch, 24. August 2016 18:36:42
>> Betreff: Re: Accumulo Seek performance
> 
>> Ahh duh. Bad advice from me in the first place :)
>> 
>> Throw 'em in a threadpool locally.
>> 
>> dlmarion@comcast.net wrote:
>>> Doesn't this use the 6 batch scanners serially?
>>> 
>>> ------------------------------------------------------------------------
>>> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de>
>>> *To: *"user" <us...@accumulo.apache.org>
>>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>>> *Subject: *Re: Accumulo Seek performance
>>> 
>>> Hi Josh,
>>> 
>>> thanks for your reply!
>>> 
>>> I've tested your suggestion with a implementation like that:
>>> 
>>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>>> BatchScanners will be created
>>> 
>>> time("mult-scanner") {
>>> for (ranges <- ranges500) {
>>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>>> bscan.setRanges(ranges.asJava)
>>> for (entry <- bscan.asScala) yield {
>>> entry.getKey()
>>> }
>>> }
>>> }
>>> 
>>> And the result is a bit disappointing:
>>> 
>>> background log: info: mult-scanner time: 18064.969281 ms
>>> background log: info: single-scanner time: 6527.482383 ms
>>> 
>>> I'm doing something wrong here?
>>> 
>>> 
>>> Regards,
>>> Sven
>>> 
>>> --
>>> Sven Hodapp, M.Sc.,
>>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>>> Department of Bioinformatics
>>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>>> sven.hodapp@scai.fraunhofer.de
>>> www.scai.fraunhofer.de
>>> 
>>> ----- Ursprüngliche Mail -----
>>> > Von: "Josh Elser" <jo...@gmail.com>
>>> > An: "user" <us...@accumulo.apache.org>
>>> > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>> > Betreff: Re: Accumulo Seek performance
>>> 
>>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>> > 
>>> > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>> > 
>>> > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>> > to the TabletServer(s) currently hosting the corresponding data, clip
>>> > (truncate) each Range to match the Tablet boundaries, and then does an
>>> > RPC to each TabletServer with just the Ranges hosted there.
>>> > 
>>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>> > (KeyExtent, to be precise). This will spawn a
>>> > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>>> > results to send back to the client.
>>> > 
>>> > The caveat here is that those ranges are processed serially on a
>>> > TabletServer. Maybe, you're swamping one TabletServer with lots of
>>> > Ranges that it could be processing in parallel.
>>> > 
>>> > Could you experiment with using multiple BatchScanners and something
>>> > like Guava's Iterables.concat to make it appear like one Iterator?
>>> > 
>>> > I'm curious if we should put an optimization into the BatchScanner
>>> > itself to limit the number of ranges we send in one RPC to a
>>> > TabletServer (e.g. one BatchScanner might open multiple
>>> > MultiScanSessions to a TabletServer).
>>> > 
>>> > Sven Hodapp wrote:
>>> >> Hi there,
>>> >> 
>>> >> currently we're experimenting with a two node Accumulo cluster (two
>>> tablet
>>> >> servers) setup for document storage.
>>> >> This documents are decomposed up to the sentence level.
>>> >> 
>>> >> Now I'm using a BatchScanner to assemble the full document like this:
>>> >> 
>>> >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
>>> ARTIFACTS table
>>> >> currently hosts ~30GB data, ~200M entries on ~45 tablets
>>> >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the
>>> ranges-list
>>> >> for (entry<- bscan.asScala) yield {
>>> >> val key = entry.getKey()
>>> >> val value = entry.getValue()
>>> >> // etc.
>>> >> }
>>> >> 
>>> >> For larger full documents (e.g. 3000 exact ranges), this operation
>>> will take
>>> >> about 12 seconds.
>>> >> But shorter documents are assembled blazing fast...
>>> >> 
>>> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>>> >> Is that a normal time for such a (seek) operation?
>>> >> Can I do something to get a better seek performance?
>>> >> 
>>> >> Note: I have already enabled bloom filtering on that table.
>>> >> 
>>> >> Thank you for any advice!
>>> >> 
>>> >> Regards,
> >> >> Sven

Re: Accumulo Seek performance

Posted by dl...@comcast.net.
Calling BatchScanner.iterator() is what starts the work on the server side. You should do this first for all 6 batch scanners, then iterate over all of them in parallel. 

----- Original Message -----

From: "Sven Hodapp" <sv...@scai.fraunhofer.de> 
To: "user" <us...@accumulo.apache.org> 
Sent: Thursday, August 25, 2016 4:53:41 AM 
Subject: Re: Accumulo Seek performance 

Hi, 

I've changed the code a little bit, so that it uses a thread pool (via the Future): 

val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will be created 

for (ranges <- ranges500) { 
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2) 
bscan.setRanges(ranges.asJava) 
Future { 
time("mult-scanner") { 
bscan.asScala.toList // toList forces the iteration of the iterator 
} 
} 
} 

Here are the results: 

background log: info: mult-scanner time: 4807.289358 ms 
background log: info: mult-scanner time: 4930.996522 ms 
background log: info: mult-scanner time: 9510.010808 ms 
background log: info: mult-scanner time: 11394.152391 ms 
background log: info: mult-scanner time: 13297.247295 ms 
background log: info: mult-scanner time: 14032.704837 ms 

background log: info: single-scanner time: 15322.624393 ms 

Every Future completes independent, but in return every batch scanner iterator needs more time to complete. :( 
This means the batch scanners aren't really processed in parallel on the server side? 
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory and a storage with ~300MB/s...) 

Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hodapp@scai.fraunhofer.de 
www.scai.fraunhofer.de 

----- Ursprüngliche Mail ----- 
> Von: "Josh Elser" <jo...@gmail.com> 
> An: "user" <us...@accumulo.apache.org> 
> Gesendet: Mittwoch, 24. August 2016 18:36:42 
> Betreff: Re: Accumulo Seek performance 

> Ahh duh. Bad advice from me in the first place :) 
> 
> Throw 'em in a threadpool locally. 
> 
> dlmarion@comcast.net wrote: 
>> Doesn't this use the 6 batch scanners serially? 
>> 
>> ------------------------------------------------------------------------ 
>> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de> 
>> *To: *"user" <us...@accumulo.apache.org> 
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM 
>> *Subject: *Re: Accumulo Seek performance 
>> 
>> Hi Josh, 
>> 
>> thanks for your reply! 
>> 
>> I've tested your suggestion with a implementation like that: 
>> 
>> val ranges500 = ranges.asScala.grouped(500) // this means 6 
>> BatchScanners will be created 
>> 
>> time("mult-scanner") { 
>> for (ranges <- ranges500) { 
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
>> bscan.setRanges(ranges.asJava) 
>> for (entry <- bscan.asScala) yield { 
>> entry.getKey() 
>> } 
>> } 
>> } 
>> 
>> And the result is a bit disappointing: 
>> 
>> background log: info: mult-scanner time: 18064.969281 ms 
>> background log: info: single-scanner time: 6527.482383 ms 
>> 
>> I'm doing something wrong here? 
>> 
>> 
>> Regards, 
>> Sven 
>> 
>> -- 
>> Sven Hodapp, M.Sc., 
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
>> Department of Bioinformatics 
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
>> sven.hodapp@scai.fraunhofer.de 
>> www.scai.fraunhofer.de 
>> 
>> ----- Ursprüngliche Mail ----- 
>> > Von: "Josh Elser" <jo...@gmail.com> 
>> > An: "user" <us...@accumulo.apache.org> 
>> > Gesendet: Mittwoch, 24. August 2016 16:33:37 
>> > Betreff: Re: Accumulo Seek performance 
>> 
>> > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
>> > 
>> > I don't feel like 3000 ranges is too many, but this isn't quantitative. 
>> > 
>> > IIRC, the BatchScanner will take each Range you provide, bin each Range 
>> > to the TabletServer(s) currently hosting the corresponding data, clip 
>> > (truncate) each Range to match the Tablet boundaries, and then does an 
>> > RPC to each TabletServer with just the Ranges hosted there. 
>> > 
>> > Inside the TabletServer, it will then have many Ranges, binned by Tablet 
>> > (KeyExtent, to be precise). This will spawn a 
>> > org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
>> > results to send back to the client. 
>> > 
>> > The caveat here is that those ranges are processed serially on a 
>> > TabletServer. Maybe, you're swamping one TabletServer with lots of 
>> > Ranges that it could be processing in parallel. 
>> > 
>> > Could you experiment with using multiple BatchScanners and something 
>> > like Guava's Iterables.concat to make it appear like one Iterator? 
>> > 
>> > I'm curious if we should put an optimization into the BatchScanner 
>> > itself to limit the number of ranges we send in one RPC to a 
>> > TabletServer (e.g. one BatchScanner might open multiple 
>> > MultiScanSessions to a TabletServer). 
>> > 
>> > Sven Hodapp wrote: 
>> >> Hi there, 
>> >> 
>> >> currently we're experimenting with a two node Accumulo cluster (two 
>> tablet 
>> >> servers) setup for document storage. 
>> >> This documents are decomposed up to the sentence level. 
>> >> 
>> >> Now I'm using a BatchScanner to assemble the full document like this: 
>> >> 
>> >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // 
>> ARTIFACTS table 
>> >> currently hosts ~30GB data, ~200M entries on ~45 tablets 
>> >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the 
>> ranges-list 
>> >> for (entry<- bscan.asScala) yield { 
>> >> val key = entry.getKey() 
>> >> val value = entry.getValue() 
>> >> // etc. 
>> >> } 
>> >> 
>> >> For larger full documents (e.g. 3000 exact ranges), this operation 
>> will take 
>> >> about 12 seconds. 
>> >> But shorter documents are assembled blazing fast... 
>> >> 
>> >> Is that to much for a BatchScanner / I'm misusing the BatchScaner? 
>> >> Is that a normal time for such a (seek) operation? 
>> >> Can I do something to get a better seek performance? 
>> >> 
>> >> Note: I have already enabled bloom filtering on that table. 
>> >> 
>> >> Thank you for any advice! 
>> >> 
>> >> Regards, 
>> >> Sven 


Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Sven,

Strange results. BatchScanners most definitely can be processed in 
parallel by the tabletservers.

There is a dynamically resizing threadpool in the TabletServers that 
respond to load on the system. As the pool remains full, it will grow. 
As it remains empty, it will shrink.

A few more questions: how many TabletServers do you have and did you run 
this benchmark multiple times in succession to see if the results 
changed? Also, have you tried increasing the number of threads per 
batchscanner to see if that makes a difference?

I might have to try to run a similar later today. I am curious :)

Sven Hodapp wrote:
> Hi,
>
> I've changed the code a little bit, so that it uses a thread pool (via the Future):
>
>      val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners will be created
>
>      for (ranges<- ranges500) {
>        val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
>        bscan.setRanges(ranges.asJava)
>        Future {
>          time("mult-scanner") {
>            bscan.asScala.toList  // toList forces the iteration of the iterator
>          }
>        }
>      }
>
> Here are the results:
>
>      background log: info: mult-scanner time: 4807.289358 ms
>      background log: info: mult-scanner time: 4930.996522 ms
>      background log: info: mult-scanner time: 9510.010808 ms
>      background log: info: mult-scanner time: 11394.152391 ms
>      background log: info: mult-scanner time: 13297.247295 ms
>      background log: info: mult-scanner time: 14032.704837 ms
>
>      background log: info: single-scanner time: 15322.624393 ms
>
> Every Future completes independent, but in return every batch scanner iterator needs more time to complete. :(
> This means the batch scanners aren't really processed in parallel on the server side?
> Should I reconfigure something? Maybe the tablet servers haven't/can't allocate enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory and a storage with ~300MB/s...)
>
> Regards,
> Sven
>

Re: Accumulo Seek performance

Posted by Sven Hodapp <sv...@scai.fraunhofer.de>.
Hi,

I've changed the code a little bit, so that it uses a thread pool (via the Future):

    val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners will be created

    for (ranges <- ranges500) {
      val bscan = instance.createBatchScanner(ARTIFACTS, auths, 2)
      bscan.setRanges(ranges.asJava)
      Future {
        time("mult-scanner") {
          bscan.asScala.toList  // toList forces the iteration of the iterator
        }
      }
    }

Here are the results:

    background log: info: mult-scanner time: 4807.289358 ms
    background log: info: mult-scanner time: 4930.996522 ms
    background log: info: mult-scanner time: 9510.010808 ms
    background log: info: mult-scanner time: 11394.152391 ms
    background log: info: mult-scanner time: 13297.247295 ms
    background log: info: mult-scanner time: 14032.704837 ms

    background log: info: single-scanner time: 15322.624393 ms

Every Future completes independent, but in return every batch scanner iterator needs more time to complete. :(
This means the batch scanners aren't really processed in parallel on the server side?
Should I reconfigure something? Maybe the tablet servers haven't/can't allocate enough threads or memory? (Every of the two nodes has 8 cores and 64GB memory and a storage with ~300MB/s...)

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: "Josh Elser" <jo...@gmail.com>
> An: "user" <us...@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 18:36:42
> Betreff: Re: Accumulo Seek performance

> Ahh duh. Bad advice from me in the first place :)
> 
> Throw 'em in a threadpool locally.
> 
> dlmarion@comcast.net wrote:
>> Doesn't this use the 6 batch scanners serially?
>>
>> ------------------------------------------------------------------------
>> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de>
>> *To: *"user" <us...@accumulo.apache.org>
>> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
>> *Subject: *Re: Accumulo Seek performance
>>
>> Hi Josh,
>>
>> thanks for your reply!
>>
>> I've tested your suggestion with a implementation like that:
>>
>> val ranges500 = ranges.asScala.grouped(500) // this means 6
>> BatchScanners will be created
>>
>> time("mult-scanner") {
>> for (ranges <- ranges500) {
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
>> bscan.setRanges(ranges.asJava)
>> for (entry <- bscan.asScala) yield {
>> entry.getKey()
>> }
>> }
>> }
>>
>> And the result is a bit disappointing:
>>
>> background log: info: mult-scanner time: 18064.969281 ms
>> background log: info: single-scanner time: 6527.482383 ms
>>
>> I'm doing something wrong here?
>>
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hodapp@scai.fraunhofer.de
>> www.scai.fraunhofer.de
>>
>> ----- Ursprüngliche Mail -----
>>  > Von: "Josh Elser" <jo...@gmail.com>
>>  > An: "user" <us...@accumulo.apache.org>
>>  > Gesendet: Mittwoch, 24. August 2016 16:33:37
>>  > Betreff: Re: Accumulo Seek performance
>>
>>  > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>>  >
>>  > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>>  >
>>  > IIRC, the BatchScanner will take each Range you provide, bin each Range
>>  > to the TabletServer(s) currently hosting the corresponding data, clip
>>  > (truncate) each Range to match the Tablet boundaries, and then does an
>>  > RPC to each TabletServer with just the Ranges hosted there.
>>  >
>>  > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>>  > (KeyExtent, to be precise). This will spawn a
>>  > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>>  > results to send back to the client.
>>  >
>>  > The caveat here is that those ranges are processed serially on a
>>  > TabletServer. Maybe, you're swamping one TabletServer with lots of
>>  > Ranges that it could be processing in parallel.
>>  >
>>  > Could you experiment with using multiple BatchScanners and something
>>  > like Guava's Iterables.concat to make it appear like one Iterator?
>>  >
>>  > I'm curious if we should put an optimization into the BatchScanner
>>  > itself to limit the number of ranges we send in one RPC to a
>>  > TabletServer (e.g. one BatchScanner might open multiple
>>  > MultiScanSessions to a TabletServer).
>>  >
>>  > Sven Hodapp wrote:
>>  >> Hi there,
>>  >>
>>  >> currently we're experimenting with a two node Accumulo cluster (two
>> tablet
>>  >> servers) setup for document storage.
>>  >> This documents are decomposed up to the sentence level.
>>  >>
>>  >> Now I'm using a BatchScanner to assemble the full document like this:
>>  >>
>>  >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
>> ARTIFACTS table
>>  >> currently hosts ~30GB data, ~200M entries on ~45 tablets
>>  >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the
>> ranges-list
>>  >> for (entry<- bscan.asScala) yield {
>>  >> val key = entry.getKey()
>>  >> val value = entry.getValue()
>>  >> // etc.
>>  >> }
>>  >>
>>  >> For larger full documents (e.g. 3000 exact ranges), this operation
>> will take
>>  >> about 12 seconds.
>>  >> But shorter documents are assembled blazing fast...
>>  >>
>>  >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>>  >> Is that a normal time for such a (seek) operation?
>>  >> Can I do something to get a better seek performance?
>>  >>
>>  >> Note: I have already enabled bloom filtering on that table.
>>  >>
>>  >> Thank you for any advice!
>>  >>
>>  >> Regards,
>>  >> Sven

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
Ahh duh. Bad advice from me in the first place :)

Throw 'em in a threadpool locally.

dlmarion@comcast.net wrote:
> Doesn't this use the 6 batch scanners serially?
>
> ------------------------------------------------------------------------
> *From: *"Sven Hodapp" <sv...@scai.fraunhofer.de>
> *To: *"user" <us...@accumulo.apache.org>
> *Sent: *Wednesday, August 24, 2016 11:56:14 AM
> *Subject: *Re: Accumulo Seek performance
>
> Hi Josh,
>
> thanks for your reply!
>
> I've tested your suggestion with a implementation like that:
>
> val ranges500 = ranges.asScala.grouped(500) // this means 6
> BatchScanners will be created
>
> time("mult-scanner") {
> for (ranges <- ranges500) {
> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
> bscan.setRanges(ranges.asJava)
> for (entry <- bscan.asScala) yield {
> entry.getKey()
> }
> }
> }
>
> And the result is a bit disappointing:
>
> background log: info: mult-scanner time: 18064.969281 ms
> background log: info: single-scanner time: 6527.482383 ms
>
> I'm doing something wrong here?
>
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
>
> ----- Urspr�ngliche Mail -----
>  > Von: "Josh Elser" <jo...@gmail.com>
>  > An: "user" <us...@accumulo.apache.org>
>  > Gesendet: Mittwoch, 24. August 2016 16:33:37
>  > Betreff: Re: Accumulo Seek performance
>
>  > This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
>  >
>  > I don't feel like 3000 ranges is too many, but this isn't quantitative.
>  >
>  > IIRC, the BatchScanner will take each Range you provide, bin each Range
>  > to the TabletServer(s) currently hosting the corresponding data, clip
>  > (truncate) each Range to match the Tablet boundaries, and then does an
>  > RPC to each TabletServer with just the Ranges hosted there.
>  >
>  > Inside the TabletServer, it will then have many Ranges, binned by Tablet
>  > (KeyExtent, to be precise). This will spawn a
>  > org.apache.accumulo.tserver.scan.LookupTask will will start collecting
>  > results to send back to the client.
>  >
>  > The caveat here is that those ranges are processed serially on a
>  > TabletServer. Maybe, you're swamping one TabletServer with lots of
>  > Ranges that it could be processing in parallel.
>  >
>  > Could you experiment with using multiple BatchScanners and something
>  > like Guava's Iterables.concat to make it appear like one Iterator?
>  >
>  > I'm curious if we should put an optimization into the BatchScanner
>  > itself to limit the number of ranges we send in one RPC to a
>  > TabletServer (e.g. one BatchScanner might open multiple
>  > MultiScanSessions to a TabletServer).
>  >
>  > Sven Hodapp wrote:
>  >> Hi there,
>  >>
>  >> currently we're experimenting with a two node Accumulo cluster (two
> tablet
>  >> servers) setup for document storage.
>  >> This documents are decomposed up to the sentence level.
>  >>
>  >> Now I'm using a BatchScanner to assemble the full document like this:
>  >>
>  >> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) //
> ARTIFACTS table
>  >> currently hosts ~30GB data, ~200M entries on ~45 tablets
>  >> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the
> ranges-list
>  >> for (entry<- bscan.asScala) yield {
>  >> val key = entry.getKey()
>  >> val value = entry.getValue()
>  >> // etc.
>  >> }
>  >>
>  >> For larger full documents (e.g. 3000 exact ranges), this operation
> will take
>  >> about 12 seconds.
>  >> But shorter documents are assembled blazing fast...
>  >>
>  >> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>  >> Is that a normal time for such a (seek) operation?
>  >> Can I do something to get a better seek performance?
>  >>
>  >> Note: I have already enabled bloom filtering on that table.
>  >>
>  >> Thank you for any advice!
>  >>
>  >> Regards,
>  >> Sven
>

Re: Accumulo Seek performance

Posted by dl...@comcast.net.
Doesn't this use the 6 batch scanners serially? 

----- Original Message -----

From: "Sven Hodapp" <sv...@scai.fraunhofer.de> 
To: "user" <us...@accumulo.apache.org> 
Sent: Wednesday, August 24, 2016 11:56:14 AM 
Subject: Re: Accumulo Seek performance 

Hi Josh, 

thanks for your reply! 

I've tested your suggestion with a implementation like that: 

val ranges500 = ranges.asScala.grouped(500) // this means 6 BatchScanners will be created 

time("mult-scanner") { 
for (ranges <- ranges500) { 
val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1) 
bscan.setRanges(ranges.asJava) 
for (entry <- bscan.asScala) yield { 
entry.getKey() 
} 
} 
} 

And the result is a bit disappointing: 

background log: info: mult-scanner time: 18064.969281 ms 
background log: info: single-scanner time: 6527.482383 ms 

I'm doing something wrong here? 


Regards, 
Sven 

-- 
Sven Hodapp, M.Sc., 
Fraunhofer Institute for Algorithms and Scientific Computing SCAI, 
Department of Bioinformatics 
Schloss Birlinghoven, 53754 Sankt Augustin, Germany 
sven.hodapp@scai.fraunhofer.de 
www.scai.fraunhofer.de 

----- Ursprüngliche Mail ----- 
> Von: "Josh Elser" <jo...@gmail.com> 
> An: "user" <us...@accumulo.apache.org> 
> Gesendet: Mittwoch, 24. August 2016 16:33:37 
> Betreff: Re: Accumulo Seek performance 

> This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710 
> 
> I don't feel like 3000 ranges is too many, but this isn't quantitative. 
> 
> IIRC, the BatchScanner will take each Range you provide, bin each Range 
> to the TabletServer(s) currently hosting the corresponding data, clip 
> (truncate) each Range to match the Tablet boundaries, and then does an 
> RPC to each TabletServer with just the Ranges hosted there. 
> 
> Inside the TabletServer, it will then have many Ranges, binned by Tablet 
> (KeyExtent, to be precise). This will spawn a 
> org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
> results to send back to the client. 
> 
> The caveat here is that those ranges are processed serially on a 
> TabletServer. Maybe, you're swamping one TabletServer with lots of 
> Ranges that it could be processing in parallel. 
> 
> Could you experiment with using multiple BatchScanners and something 
> like Guava's Iterables.concat to make it appear like one Iterator? 
> 
> I'm curious if we should put an optimization into the BatchScanner 
> itself to limit the number of ranges we send in one RPC to a 
> TabletServer (e.g. one BatchScanner might open multiple 
> MultiScanSessions to a TabletServer). 
> 
> Sven Hodapp wrote: 
>> Hi there, 
>> 
>> currently we're experimenting with a two node Accumulo cluster (two tablet 
>> servers) setup for document storage. 
>> This documents are decomposed up to the sentence level. 
>> 
>> Now I'm using a BatchScanner to assemble the full document like this: 
>> 
>> val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table 
>> currently hosts ~30GB data, ~200M entries on ~45 tablets 
>> bscan.setRanges(ranges) // there are like 3000 Range.exact's in the ranges-list 
>> for (entry<- bscan.asScala) yield { 
>> val key = entry.getKey() 
>> val value = entry.getValue() 
>> // etc. 
>> } 
>> 
>> For larger full documents (e.g. 3000 exact ranges), this operation will take 
>> about 12 seconds. 
>> But shorter documents are assembled blazing fast... 
>> 
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner? 
>> Is that a normal time for such a (seek) operation? 
>> Can I do something to get a better seek performance? 
>> 
>> Note: I have already enabled bloom filtering on that table. 
>> 
>> Thank you for any advice! 
>> 
>> Regards, 
>> Sven 


Re: Accumulo Seek performance

Posted by Sven Hodapp <sv...@scai.fraunhofer.de>.
Hi Josh,

thanks for your reply!

I've tested your suggestion with a implementation like that:

    val ranges500 = ranges.asScala.grouped(500)  // this means 6 BatchScanners will be created

    time("mult-scanner") {
      for (ranges <- ranges500) {
        val bscan = instance.createBatchScanner(ARTIFACTS, auths, 1)
        bscan.setRanges(ranges.asJava)
        for (entry <- bscan.asScala) yield {
          entry.getKey()
        }
      }
    }

And the result is a bit disappointing:

background log: info: mult-scanner time: 18064.969281 ms
background log: info: single-scanner time: 6527.482383 ms

I'm doing something wrong here?


Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Ursprüngliche Mail -----
> Von: "Josh Elser" <jo...@gmail.com>
> An: "user" <us...@accumulo.apache.org>
> Gesendet: Mittwoch, 24. August 2016 16:33:37
> Betreff: Re: Accumulo Seek performance

> This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710
> 
> I don't feel like 3000 ranges is too many, but this isn't quantitative.
> 
> IIRC, the BatchScanner will take each Range you provide, bin each Range
> to the TabletServer(s) currently hosting the corresponding data, clip
> (truncate) each Range to match the Tablet boundaries, and then does an
> RPC to each TabletServer with just the Ranges hosted there.
> 
> Inside the TabletServer, it will then have many Ranges, binned by Tablet
> (KeyExtent, to be precise). This will spawn a
> org.apache.accumulo.tserver.scan.LookupTask will will start collecting
> results to send back to the client.
> 
> The caveat here is that those ranges are processed serially on a
> TabletServer. Maybe, you're swamping one TabletServer with lots of
> Ranges that it could be processing in parallel.
> 
> Could you experiment with using multiple BatchScanners and something
> like Guava's Iterables.concat to make it appear like one Iterator?
> 
> I'm curious if we should put an optimization into the BatchScanner
> itself to limit the number of ranges we send in one RPC to a
> TabletServer (e.g. one BatchScanner might open multiple
> MultiScanSessions to a TabletServer).
> 
> Sven Hodapp wrote:
>> Hi there,
>>
>> currently we're experimenting with a two node Accumulo cluster (two tablet
>> servers) setup for document storage.
>> This documents are decomposed up to the sentence level.
>>
>> Now I'm using a BatchScanner to assemble the full document like this:
>>
>>      val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table
>>      currently hosts ~30GB data, ~200M entries on ~45 tablets
>>      bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
>>        for (entry<- bscan.asScala) yield {
>>          val key = entry.getKey()
>>          val value = entry.getValue()
>>          // etc.
>>        }
>>
>> For larger full documents (e.g. 3000 exact ranges), this operation will take
>> about 12 seconds.
>> But shorter documents are assembled blazing fast...
>>
>> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
>> Is that a normal time for such a (seek) operation?
>> Can I do something to get a better seek performance?
>>
>> Note: I have already enabled bloom filtering on that table.
>>
>> Thank you for any advice!
>>
>> Regards,
>> Sven

Re: Accumulo Seek performance

Posted by Josh Elser <jo...@gmail.com>.
This reminded me of https://issues.apache.org/jira/browse/ACCUMULO-3710

I don't feel like 3000 ranges is too many, but this isn't quantitative.

IIRC, the BatchScanner will take each Range you provide, bin each Range 
to the TabletServer(s) currently hosting the corresponding data, clip 
(truncate) each Range to match the Tablet boundaries, and then does an 
RPC to each TabletServer with just the Ranges hosted there.

Inside the TabletServer, it will then have many Ranges, binned by Tablet 
(KeyExtent, to be precise). This will spawn a 
org.apache.accumulo.tserver.scan.LookupTask will will start collecting 
results to send back to the client.

The caveat here is that those ranges are processed serially on a 
TabletServer. Maybe, you're swamping one TabletServer with lots of 
Ranges that it could be processing in parallel.

Could you experiment with using multiple BatchScanners and something 
like Guava's Iterables.concat to make it appear like one Iterator?

I'm curious if we should put an optimization into the BatchScanner 
itself to limit the number of ranges we send in one RPC to a 
TabletServer (e.g. one BatchScanner might open multiple 
MultiScanSessions to a TabletServer).

Sven Hodapp wrote:
> Hi there,
>
> currently we're experimenting with a two node Accumulo cluster (two tablet servers) setup for document storage.
> This documents are decomposed up to the sentence level.
>
> Now I'm using a BatchScanner to assemble the full document like this:
>
>      val bscan = instance.createBatchScanner(ARTIFACTS, auths, 10) // ARTIFACTS table currently hosts ~30GB data, ~200M entries on ~45 tablets
>      bscan.setRanges(ranges)  // there are like 3000 Range.exact's in the ranges-list
>        for (entry<- bscan.asScala) yield {
>          val key = entry.getKey()
>          val value = entry.getValue()
>          // etc.
>        }
>
> For larger full documents (e.g. 3000 exact ranges), this operation will take about 12 seconds.
> But shorter documents are assembled blazing fast...
>
> Is that to much for a BatchScanner / I'm misusing the BatchScaner?
> Is that a normal time for such a (seek) operation?
> Can I do something to get a better seek performance?
>
> Note: I have already enabled bloom filtering on that table.
>
> Thank you for any advice!
>
> Regards,
> Sven
>