You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/25 15:46:09 UTC

[GitHub] [druid] liran-funaro opened a new issue #10321: [Proposal] Design for a Configurable Index Type

liran-funaro opened a new issue #10321:
URL: https://github.com/apache/druid/issues/10321


   ### Motivation
   
   Alternative approaches to indexing events in-memory might have benefits in some workloads, or be irrelevant in other workloads.
   But the current Druid design allows only one (hard-coded) approach for that end; i.e., incremental-index, specifically `OnheapIncrementalIndex`.
   Other implementations such as the `OffheapIncrementalIndex` might also be beneficial in some [workloads](https://github.com/liran-funaro/druid/wiki/Evaluation) (albeit deprecated).
   Furthermore, future approaches such as suggested in PR #10001 (ISSUE #9967), doesn’t even require the notation of incremental-index and can achieve much better resource utilization than the exiting approaches ([source](https://github.com/liran-funaro/druid/wiki/Evaluation)).
   
   But for current and future approaches to be implemented, tested, and embedded into Druid, Druid must allow users to select different indexing approaches (as core or as extensions).
   The purpose of this issue is to present and discuss a design for a configurable index type.
   Afterwhich, we plan to implement the decided (and approved) approach and submit it as a PR.
   
   ### Proposed changes
   
   We propose the following modifications:
   1. Add `IngestionIndex` interface
   1. Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   1. Add an ingestion knob for tuningConfig
   1. Instantiate an `IngestionIndexBuilder` using the given factory
   
   #### Add `IngestionIndex` interface
   `IngestionIndex` will include all the externally visible API of the current `IncrementalIndex`. For example:
   ```java
   public interface IngestionIndex
   {
     FactsHolder getFacts();
     boolean canAppendRow();
     String getOutOfRowsReason();
     IngestionIndexAddResult add(InputRow row, boolean skipMaxRowsInMemoryCheck) throws IndexSizeExceededException;
     // etc...
   }
   ```
   Note that `IncrementalIndexAddResult` was renamed to `IngestionIndexAddResult` in this example.
   The introduction of the `IngestionIndex` interface incurs many of these small line changes.
   To minimize diff size, we consider avoiding this new interface (see [rationale](#rationale) below).
   
   #### Add `IngestionIndexBuilder` and `IngestionIndexFactory` interfaces
   `IngestionIndexBuilder` will be an abstract class. It will be identical to the existing `IncrementalIndex.Builder` class, except for the `build()` method that will be abstract and will return an `IngestionIndex` instead of `IncrementalIndex`.
   
   Each implementation of `IngestionIndex` will have its own Builder.
   
   `IngestionIndexFactory` will be an extension point. It will have a single method:
   ```java
   @ExtensionPoint
   public interface IngestionIndexBuilderFactory
   {
     IngestionIndexBuilder builder();
   }
   ```
   
   Each implementation of `IngestionIndexFactory` will return the builder of its ingestion-index, and will register as `JsonSubType` via a dedicated `Module`.
   
   #### Add an ingestion knob for tuningConfig
   First, we add a method to `AppenderatorConfig` interface: `IngestionIndexFactory getIngestionIndexFactory()`.
   Then, we add to each of its implementations a constructor parameter:
   ```java
   @JsonProperty("indexType") @Nullable IngestionIndexFactory indexType
   ```
   which will be returned by the `getIngestionIndexFactory()` method.
   
   This will affect all the implementations of `AppenderatorConfig`: `RealtimeTuningConfig`, `RealtimeAppenderatorTuningConfig`, `IndexTask.IndexTuningConfig`, `ParallelIndexTuningConfig`, `HadoopTuningConfig`, `SeekableStreamIndexTaskTuningConfig`, `KafkaIndexTaskTuningConfig`, `KafkaSupervisorTuningConfig`, `KinesisIndexTaskTuningConfig`, `KinesisSupervisorTuningConfig`.
   
   The above will introduce a new configuration knob:
   ```json
   "tuningConfig": {
     "indexType": "onheap"
   }
   ```
   
   #### Instantiate an `IngestionIndexBuilder` using the given factory
   All places that instantiate `IncrementalIndex.Builder`, will be modified to instatiate it from
   ```java
   config.getSchema().getTuningConfig().getIngestionIndexFactory().builder()
   ```
   That is: `Sink`, `InputSourceSampler`, `IndexGeneratorJob` (hadoop).
   
   ### Rationale
   
   We considered a few alternative choices for this design.
   1. All future indexes will inherit from `IncrementalIndex`
   1. Avoiding using factories: “indexType” parameter will be a `String` rather than a factory
   
   
   #### All future indexes will inherit from `IncrementalIndex`
   This approach will reduce the number of modified lines but will force all future implementations to an incremental-index design.
   As noted, for example, #10001 does not need incremental-index to function but it inherits from it to reduce the total diff size.
   We prefer to add an additional interface to avoid this constraint.
   
   #### Avoiding using factories: “indexType” parameter will be a `String` rather than a factory
   An alternative design could be adding another member/method to the existing `IncrementalIndex.Builder`, and adding a “generic” `build()` method:
   ```java
   // ...
   
   public Builder setIncrementalIndexType(final String incrementalIndexType)
   {
     this.incrementalIndexType = incrementalIndexType;
     return this;
   }
   
   // ...
   
   public IncrementalIndex build()
   {
     switch (incrementalIndexType) {
       case "onheap":
         return buildOnheap();
       case "offheap":
         return buildOffheap();
       case "novel-index":
         return buildNovel();
       default:
         throw new IllegalArgumentException("Unsupported incremental index: " + incrementalIndexType);
     }
   }
   ```
   
   Then, add the ingestion configuration as a string, rather than a factory class.
   
   This alternative is simpler and is the one that is currently implemented in #10001.
   But it does not allow extensions to implement their own ingestion index.
   This is especially important with the current effort to support Java-11.
   Currently, #10001 only supports Java-8, but it will support Java-11 it in the future.
   Since extensions allowed lagging behind this effort (e.g., data sketches), we prefer the design that supports index extensions.
   
   ### Operational impact
   
   This change will not affect any existing clusters. It will work seamlessly when the `indexType` parameter is omitted.
   
   ### Future work
   
   This configuration can also affect queries that use the index, such as in `GrouByQueryHelper`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-694417401


   @liran-funaro this proposal has been open for a while and we consider silence as acceptance :slightly_smiling_face:. I think you can go ahead and make the linked PR ready for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-691769526


   Sorry @liran-funaro, it took a while to get back to this proposal. The proposed changes sound reasonable to me. Having `AppendableIndexBuilder` in tuningConfig with appendableIndex-specific knobs looks especially useful. However, I'm assuming that those knobs will have default values and users usually don't have to worry about them. Is this right?
   
   Regarding `AppendableIndexSpec` interface, do you think it's stable enough or do you perhaps need to make some changes in the near future? If it's the later, `UnstableApi` may be better than making it an `ExtensionPoint` already.
   
   > Similar configuration can be added to queries; e.g., to allow alternative indexing methods in group-by queries (such as in `GroupByQueryHelper`)
   
   `GroupByQueryHelper` is used in only groupBy v1 which is deprecated in favor of groupBy v2. I'm not sure how much it's slower compared to groupBy v2 or how much using Oak can help with its performance. You may want to look into concurrent processing in groupBy v2 more closely such as `ConcurrentGrouper`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro edited a comment on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro edited a comment on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-704079912


   @jihoonson @a2l007 
   As per @Eshcar suggestion, I moved some of the refactorings in the linked PR (#10335) to an independent (much smaller) PR.
   Please checkout out #10478.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro edited a comment on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro edited a comment on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-704079912


   @jihoonson @a2l007 
   As per @Eshcar suggestion, I moved some of the refactorings the linked PR (#10335) to an independent (much smaller) PR.
   Please checkout out #10478.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-696584945


   Great! The PR (#10335) is now ready for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-704079912


   @jihoonson @a2l007 
   As per @Eshcar suggestion, I moved some of the refactorings in this RR to an independent (much smaller) PR.
   Please checkout out #10478.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson closed issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
jihoonson closed issue #10321:
URL: https://github.com/apache/druid/issues/10321


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-693436288


   @jihoonson Thanks! We hope we will get to evaluate queries in 2021, but for now, ingestion is our main focus.
   
   I updated the proposal as per your suggestion. I also updated the linked WIP PR (#10335).
   
   Do you have any suggestions on how to move this forward?
   Do we need more reviewers for the design/PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-691852058


   @jihoonson Thank you for taking the time to review this proposal.
   
   >I'm assuming that those knobs will have default values and users usually don't have to worry about them. Is this right?
   
   Yes. They will be `@Nullable` and will be set to the default value in that case. E.g., for `OffheapIncrementalIndex.Spec`:
   ```java
       public Spec(
           final @JsonProperty("bufferSize") @Nullable Integer bufferSize,
           final @JsonProperty("cacheSize") @Nullable Integer cacheSize
       )
       {
         this.bufferSize = bufferSize != null && bufferSize > 0 ? bufferSize : DEFAULT_BUFFER_SIZE;
         this.cacheSize = cacheSize != null && cacheSize > this.bufferSize ? cacheSize : DEFAULT_CACHE_SIZE;
         // ...
       }
   ```
   
   > do you think it's stable enough or do you perhaps need to make some changes in the near future?
   
   Using the `@UnstableApi` seems like a good idea at this point.
   
   > I'm not sure how much it's slower compared to groupBy v2 or how much using Oak can help with its performance. You may want to look into concurrent processing in groupBy v2 more closely such as `ConcurrentGrouper`.
   
   I'll look into that. Improving query performance is on our agenda, and we considered comparing the groupBy v1 with Oak to the groupBy v2 mechanism. We will report our results in a different issue in the future.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] Eshcar commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
Eshcar commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-703596218


   Reviewed PR #10335 
   I suggest to split it into 2 PRs:
   1) all refactoring replacing method/class names etc - this incur changing >100 files but has no logical/operational affect
   2) the changes that actually add the extension point - this will only change few to dozen files and will make it easier for the reviewer to give feedback
   
   Anyway, this change (as one PR or split into 2 PRs) should not affect the normal execution of ingestion/queries; it is only meant to enable future flexibility in supporting external (pluggable) incremental index implementations.
   I would suggest to adopt it as is.
   
   Would appreciate if others in the community can take a look and support the PR
   @jihoonson @a2l007 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-692390625


   @liran-funaro thanks for your answer.
   
   > I'll look into that. Improving query performance is on our agenda, and we considered comparing the groupBy v1 with Oak to the groupBy v2 mechanism. We will report our results in a different issue in the future.
   
   Sounds good. I'm looking forward to seeing your report. :slightly_smiling_face: 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] liran-funaro commented on issue #10321: [Proposal] Design for a Configurable Index Type

Posted by GitBox <gi...@apache.org>.
liran-funaro commented on issue #10321:
URL: https://github.com/apache/druid/issues/10321#issuecomment-696584945


   Great! The PR (#10335) is now ready for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org