You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/09/23 07:02:31 UTC

[GitHub] [incubator-druid] jihoonson opened a new pull request #8571: Use hash of Segment IDs instead of a list of explicit segments in auto compaction

jihoonson opened a new pull request #8571: Use hash of Segment IDs instead of a list of explicit segments in auto compaction
URL: https://github.com/apache/incubator-druid/pull/8571
 
 
   ### Description
   
   Currently when the coordinator issues a compaction task in auto compaction, it specifies a list of segments to compact explicitly. The list of segments is used to validate the given segments are still the most recent segments in compaction task. 
   
   This could lead to a very large compaction task spec which could be larger than the max znode size of ZooKeeper. To avoid this problem, auto compaction supports a configuration of `maxNumSegmentsToCompact` which limits the number of segments to compact together at the same time. However, with this way, the auto compaction has a limitation that it cannot compact an interval if there are too many segments.
   
   This PR is to avoid this issue by using a hash of segment IDs instead of the list of segments for validating input segments. The below changes are also included.
   
   #### New IOConfig for compaction task
   
   Compaction task now requires an ioConfig. You can set `inputSpec` in the ioConfig. An example ioConfig is:
   
   ```json
     "ioConfig" : {
       "type": "compact",
       "inputSpec": {
         "type": "interval",
         "interval": "2017-01-01/2018-01-01"
       }
     }
   ```
   
   There are two types of `inputSpec`s, i.e., `interval` and `segments`, for now. 
   
   ```json
       "inputSpec": {
         "type": "interval",
         "interval": "2017-01-01/2018-01-01"
       }
   ```
   
   ```json
       "inputSpec": {
         "type": "segments",
         "segments": ["segmentId1", "segmentId2", ...]
       }
   ```
   
   #### Using interval inputSpec for auto compaction
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/incubator-druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/incubator-druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist above are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org