You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/18 09:40:41 UTC

[GitHub] [incubator-druid] ebortnik commented on issue #7838: Improve IncrementalIndex concurrency scalability

ebortnik commented on issue #7838: Improve IncrementalIndex concurrency scalability
URL: https://github.com/apache/incubator-druid/pull/7838#issuecomment-503029762

@jihoonson, thanks for all the insightful questions and comments. Let me chime in :)

All-in-all, the motivation for our work is getting data into Druid faster, and making the system performance more predictable over time. How? By using Oak as primary index for data ingestion, and exploiting more of its capabilities over time. Oak is an off-heap data structure, pretty immune to GC, which behaves much better than the JDK concurrent skip list in microbenchmarks (and we keep improving it). The master plan is:
1. Use the Oak-based Incremental Index in the same context as today (single writer thread, multiple reader threads), and show how the ingestion becomes faster with it. As a by-product, Druid will be able to build larger segments, and reduce the compaction rate at the system level over time. But for now, we only look at the segment that is being built.
2. Start using the Oak-based index with more writer threads, to let it really shine (the microbenchmarks show that it becomes more attractive as the number of writers scales). This might require more far-fetched changes, which we might not fully appreciate at the moment, so let's discuss.

Technically, the Incremental Index is a complex beast that does many things around the basic fact table management. Therefore, it's nontrivial to show the gains from using Oak immediately. This patch is a step in the direction of goal #2 above - making the whole solution more concurrency-friendly in a way that is independent on the fact table implementation. Doing this will make it easier to merge with the non-blocking Oak code down the road. The benchmark's goal was to show that we don't make things worse, and improve as the number of threads scales.

Does this make more sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org