You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/06 11:04:52 UTC

[GitHub] [incubator-druid] eranmeir opened a new pull request #7838: Improve IncrementalIndex concurrency scalability

eranmeir opened a new pull request #7838: Improve IncrementalIndex concurrency scalability
URL: https://github.com/apache/incubator-druid/pull/7838
 
 
   ### Background
   Our work on Oak (see PR #7676), shows that there are significant performance gains with multi-threaded indexing (even when not using Oak). In our benchmarks we noticed that ingestion was not scaling as expected with multiple threads.
   
   We traced the threads’ blocking states to two causes:
   1. A monitor in `IncrementalIndex` that synchronized access to `dimensionDescs`
   2. A Read-Write lock in `StringDimensionIndexer`
   
   This PR proposes a solution to the first issue. The proposed solution is based on the observation that dimension data is updated infrequently and so ongoing exclusive locking is wasteful.
   
   ### Summary of changes
   - Shared state is encapsulated in a new class - `DimensionData`. This includes `dimensionDescs`, `dimensionDescsList` and `columnCapabilities`
   - Concurrent threads share an atomic reference to an instance of `DimensionData`
   - CoW: Only when a thread needs to update the shared state, it will copy the instance, update the copy, and eventually swap the reference atomically.
   - Consistency is maintained when the reference is updated. This simplifies row processing, removes the need for keeping an “overflow” array, and allows fast failure when a row contains duplicate dimensions.
   - New multi-threaded ingestion benchmark: `IndexIngestionMultithreadedBenchmark`
   
   
   For benchmark results see attached document: 
   [Incremental Index Scaling.pdf](https://github.com/apache/incubator-druid/files/3261231/Incremental.Index.Scaling.pdf)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org