You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/11/20 21:14:38 UTC

[GitHub] [accumulo] keith-turner commented on issue #1440: Give scans control over cache via scan dispatchers #1383

keith-turner commented on issue #1440: Give scans control over cache via scan dispatchers #1383
URL: https://github.com/apache/accumulo/pull/1440#issuecomment-556405223
 
 
   I ran a performance test on a small EC2 cluster to give this a try.  I had the following setup. 
   
    * Table with around 192 million entries
    * 8 tablets
    * 3 tservers (d2xlarge)
    * 30% of 4G data cache size
    * 6 threads running circular full table  scans (each with different random start) with execution hint `scan_type=background`
    * 16 threads doing random lookups out from set of 8K random rows
   
   For the threads doing random lookup on a subset of the table, their data just fit in the data cache. The full table did not fit into the cache. For this test scenario all of the data was local for each tserver and it fit in the OS cache, so a cache miss was not terribly slow. Also HDFS was optimized for local data. Cache misses were noticeable from a latency perspective, which is all I cared about. 
   
   I ran three test all setup with the following Accumulo shell commands.  This makes all the scans that set the execution hint `scan_type=background` go to a special executor with a single thread.
   
   ```
     createtable dc
     config -s tserver.scan.executors.bge.threads=1
     config -t dc -s table.scan.dispatcher.opts.executor.background=bge
     config -t dc -s table.cache.index.enable=true
     config -t dc -s table.cache.block.enable=true
     config -t dc -s table.file.compress.type=snappy
   ```
   
   For one test run I set the following to make scans that set the execution hint `scan_type=background` use opportunistic caching.  This means those scans would use data if its in the cache, but would never load missing data into the cache.
   
   ```
     config -t dc -s table.scan.dispatcher.opts.cacheUsage.background=opportunistic
   ```
   
   For another test run I set the following to make scans that set the execution hint `scan_type=background` fully use the cache.
   
   ```
     config -t dc -s table.scan.dispatcher.opts.cacheUsage.background=enabled
   ```
   
   For the last test run, I just did not run background scans.  I only ran the scans doing random lookups.
   
   Below are the average time for the scans doing random lookups for the three test runs. If I ran in a situation where cache misses had higher latency, I suspect the plot would differ more dramatically. I wish I had run a 4th test where the random lookup threads did not use cache.  In all three test the random lookup threads always used cache.
   
   ![test-results](https://user-images.githubusercontent.com/1268739/69278178-c75a7d00-0baf-11ea-9793-a18dee7b1d1f.png)
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services