You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/21 23:33:40 UTC

[GitHub] [druid] maytasm opened a new pull request #10413: Add is_compacted to sys.segments table

maytasm opened a new pull request #10413:
URL: https://github.com/apache/druid/pull/10413


   Add is_compacted to sys.segments table
   
   ### Description
   
   Segments being compacted or not compacted have a big impact on query performance. 
   Currently it is not easy to verify questions like:
   - if performance of query can be improve by compaction 
   - if a performance of query is bad, was it because segments for that interval was not compacted.
   
   This PR expose the information on whether segments is compacted or not via the sys.segments table (so user can do query with slice/dice, group, time filter, etc) by using the lastCompactionState stored within the segment metadata. If a compactionState exist then the segment was compacted. This PR also change storing lastCompactionState as default for manual compaction in addition to auto compaction.
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493235894



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##########
@@ -422,6 +422,7 @@ ParallelIndexSupervisorTask newTask(String taskId, ParallelIndexIngestionSpec in
   {
     final Map<String, Object> newContext = new HashMap<>(getContext());
     newContext.put(CTX_KEY_APPENDERATOR_TRACKING_TASK_ID, getId());
+    newContext.put(CompactSegments.STORE_COMPACTION_STATE_KEY, getContextValue(Tasks.STORE_COMPACTION_STATE_KEY, true));

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10413:
URL: https://github.com/apache/druid/pull/10413#issuecomment-696981469


   Added Release Notes because it changed the default behavior of `Tasks.DEFAULT_STORE_COMPACTION_STATE`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493207542



##########
File path: docs/querying/sql.md
##########
@@ -1086,6 +1086,7 @@ Segments table provides details on all Druid segments, whether they are publishe
 |shardSpec|STRING|The toString of specific `ShardSpec`|
 |dimensions|STRING|The dimensions of the segment|
 |metrics|STRING|The metrics of the segment|
+|last_compaction_state|STRING|The configurations of the compaction task which created this segment. May be null if segment was not created by compaction task.|
 
 For example to retrieve all segments for datasource "wikipedia", use the query:

Review comment:
       Done. It's very basic example but any one with proficient SQL can probably think of better things to query.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #10413:
URL: https://github.com/apache/druid/pull/10413#issuecomment-696981469


   Added Release Notes because it changed the default behavior of `Tasks.DEFAULT_STORE_COMPACTION_STATE`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #10413:
URL: https://github.com/apache/druid/pull/10413#issuecomment-698005053


   I'll make the change with re-write to docs in a separate PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm merged pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm merged pull request #10413:
URL: https://github.com/apache/druid/pull/10413


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493782020



##########
File path: docs/querying/sql.md
##########
@@ -1108,6 +1108,18 @@ GROUP BY 1
 ORDER BY 2 DESC
 ```
 
+If you want to retrieve segment that was compacted (ANY compaction):
+
+```sql
+SELECT * FROM sys.segments WHERE last_compaction_state is not null
+```
+
+or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction):

Review comment:
       ```suggestion
   Sometimes, you may have updated your compaction spec to a more optimal spec. In these cases you can find which segments are using the latest compaction spec using a query like the one below:
   ```
   
   Can you also add instructions on how to produce the `last_compaction_state` blob from the web console. If I'm looking at the compaction config for a datasource - can I copy something to paste in the where clause here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493031245



##########
File path: core/src/main/java/org/apache/druid/timeline/DataSegment.java
##########
@@ -100,7 +100,7 @@
   /**
    * Stores some configurations of the compaction task which created this segment.
    * This field is filled in the metadata store only when "storeCompactionState" is set true in the context of the
-   * compaction task which is false by default.
+   * compaction task (true by default).

Review comment:
       super nit - to link to where the default is.
   ```suggestion
      * compaction task (true by default). {@link org.apache.druid.indexing.common.task.Tasks#DEFAULT_STORE_COMPACTION_STATE}.
   ```

##########
File path: docs/querying/sql.md
##########
@@ -1086,6 +1086,7 @@ Segments table provides details on all Druid segments, whether they are publishe
 |shardSpec|STRING|The toString of specific `ShardSpec`|
 |dimensions|STRING|The dimensions of the segment|
 |metrics|STRING|The metrics of the segment|
+|last_compaction_state|STRING|The configurations of the compaction task which created this segment. May be null if segment was not created by compaction task.|
 
 For example to retrieve all segments for datasource "wikipedia", use the query:

Review comment:
       The PR description talks about being able to identify performance issues where looking at last_compaction_state can help. Should we add some example sql statements below to explain how to identify the 3 issues outlined in the PR description? 

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
##########
@@ -311,7 +312,8 @@ public TableType getJdbcTableType()
                 val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE,
                 segment.getShardSpec(),
                 segment.getDimensions(),
-                segment.getMetrics()
+                segment.getMetrics(),
+                segment.getLastCompactionState()

Review comment:
       `CompactionState` includes the `PartitionsSpec` and `indexSpec`. 
   
   What should a caller of the API do with these?
   The compaction state appears to be an unbounded json object. Do we foresee any issues around serializing the entire json blob every time we need to call the sys table?

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##########
@@ -422,6 +422,7 @@ ParallelIndexSupervisorTask newTask(String taskId, ParallelIndexIngestionSpec in
   {
     final Map<String, Object> newContext = new HashMap<>(getContext());
     newContext.put(CTX_KEY_APPENDERATOR_TRACKING_TASK_ID, getId());
+    newContext.put(CompactSegments.STORE_COMPACTION_STATE_KEY, getContextValue(Tasks.STORE_COMPACTION_STATE_KEY, true));

Review comment:
       I think you should use putIfAbsent. That will allow the user to provide a `STORE_COMPACTION_STATE_KEY` that is different from the default value of true if needed.
   
   Also nit: use a constant for the default value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493031245



##########
File path: core/src/main/java/org/apache/druid/timeline/DataSegment.java
##########
@@ -100,7 +100,7 @@
   /**
    * Stores some configurations of the compaction task which created this segment.
    * This field is filled in the metadata store only when "storeCompactionState" is set true in the context of the
-   * compaction task which is false by default.
+   * compaction task (true by default).

Review comment:
       super nit - to link to where the default is.
   ```suggestion
      * compaction task (true by default). {@link org.apache.druid.indexing.common.task.Tasks#DEFAULT_STORE_COMPACTION_STATE}.
   ```

##########
File path: docs/querying/sql.md
##########
@@ -1086,6 +1086,7 @@ Segments table provides details on all Druid segments, whether they are publishe
 |shardSpec|STRING|The toString of specific `ShardSpec`|
 |dimensions|STRING|The dimensions of the segment|
 |metrics|STRING|The metrics of the segment|
+|last_compaction_state|STRING|The configurations of the compaction task which created this segment. May be null if segment was not created by compaction task.|
 
 For example to retrieve all segments for datasource "wikipedia", use the query:

Review comment:
       The PR description talks about being able to identify performance issues where looking at last_compaction_state can help. Should we add some example sql statements below to explain how to identify the 3 issues outlined in the PR description? 

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
##########
@@ -311,7 +312,8 @@ public TableType getJdbcTableType()
                 val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE,
                 segment.getShardSpec(),
                 segment.getDimensions(),
-                segment.getMetrics()
+                segment.getMetrics(),
+                segment.getLastCompactionState()

Review comment:
       `CompactionState` includes the `PartitionsSpec` and `indexSpec`. 
   
   What should a caller of the API do with these?
   The compaction state appears to be an unbounded json object. Do we foresee any issues around serializing the entire json blob every time we need to call the sys table?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493205072



##########
File path: core/src/main/java/org/apache/druid/timeline/DataSegment.java
##########
@@ -100,7 +100,7 @@
   /**
    * Stores some configurations of the compaction task which created this segment.
    * This field is filled in the metadata store only when "storeCompactionState" is set true in the context of the
-   * compaction task which is false by default.
+   * compaction task (true by default).

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493085850



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##########
@@ -422,6 +422,7 @@ ParallelIndexSupervisorTask newTask(String taskId, ParallelIndexIngestionSpec in
   {
     final Map<String, Object> newContext = new HashMap<>(getContext());
     newContext.put(CTX_KEY_APPENDERATOR_TRACKING_TASK_ID, getId());
+    newContext.put(CompactSegments.STORE_COMPACTION_STATE_KEY, getContextValue(Tasks.STORE_COMPACTION_STATE_KEY, true));

Review comment:
       I think you should use putIfAbsent. That will allow the user to provide a `STORE_COMPACTION_STATE_KEY` that is different from the default value of true if needed.
   
   Also nit: use a constant for the default value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493779975



##########
File path: docs/querying/sql.md
##########
@@ -1108,6 +1108,18 @@ GROUP BY 1
 ORDER BY 2 DESC
 ```
 
+If you want to retrieve segment that was compacted (ANY compaction):
+
+```sql
+SELECT * FROM sys.segments WHERE last_compaction_state is not null
+```

Review comment:
       Suggested re-write to talk about the problem an admin would be looking to solve more explicitly.
   
   ```suggestion
   If you want to know whether segments in the wikipedia table are not compacted, and therefore may be contributing to query slowness, you can run a query like this to tell you how many segments have not been compacted per interval.
   
   ```sql
   SELECT start, end, count(*) FROM sys.segments WHERE datasource = 'wikipedia' and last_compaction_state is not null group by 1, 2
   ```
   
   If you find that queries that hit these intervals are slow - consider enabling [compaction](../operations/segment-optimization.md) to produce more optimal segments.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493215560



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/schema/SystemSchema.java
##########
@@ -311,7 +312,8 @@ public TableType getJdbcTableType()
                 val.isOvershadowed() ? IS_OVERSHADOWED_TRUE : IS_OVERSHADOWED_FALSE,
                 segment.getShardSpec(),
                 segment.getDimensions(),
-                segment.getMetrics()
+                segment.getMetrics(),
+                segment.getLastCompactionState()

Review comment:
       Both `PartitionsSpec` and `indexSpec` can be helpful in determining the cause of query performance issue. We will also want to know what segments auto compaction will compact as both of those Spec determine that. 'indexSpec' can also impact query performance (see shape shifting PR). 
   
   CompactionState is a small json and is already stored in memory as part of the DataSegment. We are also returning the shardSpec so this should not be a problem. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493909582



##########
File path: docs/querying/sql.md
##########
@@ -1108,6 +1108,18 @@ GROUP BY 1
 ORDER BY 2 DESC
 ```
 
+If you want to retrieve segment that was compacted (ANY compaction):
+
+```sql
+SELECT * FROM sys.segments WHERE last_compaction_state is not null
+```

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #10413: Add last_compaction_state to sys.segments table

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #10413:
URL: https://github.com/apache/druid/pull/10413#discussion_r493909950



##########
File path: docs/querying/sql.md
##########
@@ -1108,6 +1108,18 @@ GROUP BY 1
 ORDER BY 2 DESC
 ```
 
+If you want to retrieve segment that was compacted (ANY compaction):
+
+```sql
+SELECT * FROM sys.segments WHERE last_compaction_state is not null
+```
+
+or if you want to retrieve segment that was compacted only by a particular compaction spec (such as that of the auto compaction):

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org