You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/28 21:22:46 UTC
[GitHub] [incubator-druid] jon-wei commented on issue #5698: Oak: New Concurrent Key-Value Map

jon-wei commented on issue #5698: Oak: New Concurrent Key-Value Map 
URL: https://github.com/apache/incubator-druid/issues/5698#issuecomment-506881097
 
 
   @sanastas 
   
   Thanks for contributing https://github.com/apache/incubator-druid/pull/7676!
   
   I've been thinking about what the path to potentially merging #7676 would look like.
   
   In Druid, there are currently two categories of code contributions, and for merging consideration, these two categories have different requirements.
     - Core Druid and core extensions
     - Contrib extensions
   
   The requirements for contrib extensions are looser and could roughly be described as "reasonable implementation and potentially useful for some use cases or experimentation". The contrib extensions aren't actively maintained by the Druid committers, and are generally less extensively tested.
   
   It can make sense for a new feature to start out as a contrib extension, and potentially migrate to core as it evolves. Examples of this include the Google Cloud Storage extension and the ORC format extension, which started out as contrib extensions and were recently adopted as core extensions.
   
   For the Oak-based incremental index, this path could make sense as well, but Druid does not currently provide an extension point for incremental index implementations. To open that as an extension point would first involve discussion/consensus on whether it's a good idea to have that extension point, and there would also be significant design thought/implementation work required.
   
   Given those difficulties, I think it makes sense to think about the path to merging Oak-based incremental index as a core feature. For merging a contribution into core, the requirement is essentially: "Convince Druid committers such that they are willing to take responsibility for and maintain the contribution going forward."
   
   At the highest level, setting aside implementation details, I think it'd be helpful to see a comparison of performance metrics between Oak incremental index and the existing implementation on a real cluster. 
   
   I would try to set up realistic workloads for native batch ingestion and Kafka indexing service ingestion, and gather metrics for the following:
   - Ingestion throughput
   - Query performance (realtime tasks like Kafka indexing service tasks can answer queries)
   - Index persist performance
   
   -------------------------
   
   Separately from the incremental index topic, I wonder if OakMap could be used as part of Druid's GroupBy V2 query. There is a class called `ConcurrentGrouper` which is responsible for grouping/aggregating rows off-heap, with concurrent writes. This sounds like an area where OakMap could potentially be beneficial. If you're interested, that could be another worthwhile avenue for investigation.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org