You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/19 00:36:25 UTC

[GitHub] [druid] wjhypo commented on a change in pull request #11307: [WIP] Add an option to enable bitmap in IncrementalIndex during real time ingestion

wjhypo commented on a change in pull request #11307:
URL: https://github.com/apache/druid/pull/11307#discussion_r752778734



##########
File path: docs/configuration/index.md
##########
@@ -1403,6 +1403,7 @@ Additional peon configs include:
 |`druid.indexer.task.restoreTasksOnRestart`|If true, MiddleManagers will attempt to stop tasks gracefully on shutdown and restore them on restart.|false|
 |`druid.indexer.task.ignoreTimestampSpecForDruidInputSource`|If true, tasks using the [Druid input source](../ingestion/native-batch.md#druid-input-source) will ignore the provided timestampSpec, and will use the `__time` column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.|false|
 |`druid.indexer.server.maxChatRequests`|Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.|0|
+|`druid.indexer.task.enableInMemoryBitmap`| If true, stream ingestion will enable in memory bitmap for applicable dimensions when data is still in memory during real time writes before disk persistence triggers. Queries can leverage the bitmaps to avoid a full scan to speed up for this stage of data. |false|

Review comment:
       In memory bitmap index is only applied to string type. There is only a wrapper without bitmap index to __time to make the query engine not break. I'll document this.
   

##########
File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
##########
@@ -303,16 +322,48 @@ protected IncrementalIndex(
       DimensionHandler handler = DimensionHandlerUtils.getHandlerFromCapabilities(
           dimName,
           capabilities,
-          dimSchema.getMultiValueHandling()
+          dimSchema.getMultiValueHandling(),
+          enableInMemoryBitmap
       );
-      addNewDimension(dimName, handler);
+      DimensionDesc desc = addNewDimension(dimName, handler);
+
+      if (enableInMemoryBitmap && type.equals(ColumnType.STRING)) {

Review comment:
       Good catch!
   
   https://github.com/apache/druid/blob/master/docs/ingestion/ingestion-spec.md
   in dimensionSpec, createBitmapIndex (default to true)
   ```
   For string typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for string typed dimensions.	
   ```
   
   So it means currently bitmap index in batch immutable segments only support string type columns, in this PR I try to follow the same design by still only supporting string typed dimensions but extending the bitmap index support from batch immutable segments to real-time incremental index. In-memory bitmaps will be enabled if both `createBitmapIndex` and `enableInMemoryBitmap` are true. 
   
   Not sure if it makes sense to have any such case: `createBitmapIndex` is false (bitmap index is disabled in batch immutable segments) and `enableInMemoryBitmap` is true (bitmap index is enabled in incremental index), to avoid confusion, I avoided this case by disabling in-memory bitmap in this case. Let me know if you have other thoughts.
   
   I'll also document this in detail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org