You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/28 02:35:07 UTC

[GitHub] [druid] loquisgon opened a new pull request, #12488: Emit state of replace and append for native batch tasks

loquisgon opened a new pull request, #12488:
URL: https://github.com/apache/druid/pull/12488

   Emit the state of a task, whether it is appending or it is a replace/overwrite. This is to understand the utilization of these modes of batch ingestion.
   
   This PR has:
   - [ X] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon merged pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon merged PR #12488:
URL: https://github.com/apache/druid/pull/12488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869661721


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Well, this is to document the batch ingestion mode not the semantics of the method so I am not sure that the explanation for its semantics belongs here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1130459366

   I don't understand the LGTM concern about the "new publishedSegmentsAndCommitMetadata". That variable existed already before I added the metric. Maybe it is complaining because the new code explicitly check that the variable is not null?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869604265


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if taskId is batch append job, 0 otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869619119


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |

Review Comment:
   Looking at code, it doesnt seem that we throw an exception in this case, but rather treat it as APPEND, since we consider a job an append job if `isAppendToExisting` is true. and short-circuit the check to `isDropExisting`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869604265


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch append job, 0 otherwise? Always 1 sounds confusing, since it doesn't seem that the value of metric is always 1, correct me if wrong.



##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch overwrite job, 0 otherwise? Always 1 sounds confusing, since it doesn't seem that the value of metric is always 1, correct me if wrong.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870762966


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   The invalid combination is enabled by the fact that the `IOConfig` part of the spec exposes both `dropExisting` and `appendToExisting` today. If we were to deprecate those and expose instead `batchProcessingMode` with its three possible values then this invalid state would no longer be possible. But I think that it is better to do this later if we agree it is a good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r871592775


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -422,9 +423,32 @@ public boolean isPerfectRollup()
     return tuningConfig != null && tuningConfig.isForceGuaranteedRollup();
   }
 
+  @VisibleForTesting
+  void emitCompactIngestionModeMetrics(
+      ServiceEmitter emitter,
+      boolean isDropExisting
+  )
+  {
+
+    if (emitter == null) {
+      return;
+    }
+
+    // compact does not support appendToExisting
+    if (isDropExisting) {
+      emitter.emit(buildEvent("compact/replace/count", 1));

Review Comment:
   Where are the dimensions set for this metric?



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -422,9 +423,32 @@ public boolean isPerfectRollup()
     return tuningConfig != null && tuningConfig.isForceGuaranteedRollup();
   }
 
+  @VisibleForTesting
+  void emitCompactIngestionModeMetrics(
+      ServiceEmitter emitter,
+      boolean isDropExisting
+  )
+  {
+
+    if (emitter == null) {
+      return;
+    }
+
+    // compact does not support appendToExisting
+    if (isDropExisting) {
+      emitter.emit(buildEvent("compact/replace/count", 1));

Review Comment:
   Where are the dimensions set for these metrics?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870644745


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |

Review Comment:
   ```suggestion
   |`isAppendToExisting` | `isDropExisting` | mode |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869605512


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch replace job, 0 otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r875446646


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   I have removed the batch type from the metric and added `taskIngestionMode` as a dimension. As a result the new metrics are much clean and the code is cleaner as well. Also, now it was simple to add the metrics to streaming as well, take a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r878573809


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)
+  {
+    final boolean isAppendToExisting;
+    final boolean isDropExisting;
+
+    if (ioConfig == null) {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;
+      isDropExisting = BatchIOConfig.DEFAULT_DROP_EXISTING;
+    } else {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;

Review Comment:
   This is specific to `CompactionIOCOnfig` which does not have a `isAppendToExisting()` flag/method. Added a comment to the code so this is more clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869609742


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Might want to add details here about the creation of tombstones for time-chunks within the date interval specified for ingestion that have no data, if in REPLACE mode, and the absence of this in OVERWRITE mode, if my understanding is correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869916750


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Oh ok, I guess the semantics are described elsewhere? If so should we link to that? not a big deal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1130722334

   This pull request **introduces 1 alert** when merging 8ba3f60d751cf63b9f4aaf65bb51e61e179d132e into 3e8d7a6d9f43f889df7d73dedb2a94e03574615e - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-6b77f56f824459a49153cf071ff3a531ff21635c)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
jon-wei commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r878537193


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)

Review Comment:
   Suggest distinguishing these two `computeIngestionMode` methods with different names since they both have nullable parameters



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)
+  {
+    final boolean isAppendToExisting;
+    final boolean isDropExisting;
+
+    if (ioConfig == null) {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;
+      isDropExisting = BatchIOConfig.DEFAULT_DROP_EXISTING;
+    } else {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;

Review Comment:
   How come this uses the default from BatchIOConfig instead of using ioConfig.isAppendToExisting()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r871587134


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|
+
+## Compaction ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/compact/ovewrite/count`|Count of `1` every time a compaction overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/compact/replace/count`|Count of `1` every time a compaction replace job runs|dataSource, taskId, taskType|Always `1`.|
+
+|`isDropExisting` | mode |

Review Comment:
   Should this table have a description? Or maybe I missed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869605512


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch replace job, 0 otherwise? Always 1 sounds confusing, since it doesn't seem that the value of metric is always 1, correct me if wrong.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869609742


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Might want to dad details here about the creation of tombstones for timechunks within the date interval specified for ingestion that have no data, if in REPLACE mode, and the absense of this in OVERWRITE mode, if my understanding is correct?



##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Might want to add details here about the creation of tombstones for timechunks within the date interval specified for ingestion that have no data, if in REPLACE mode, and the absense of this in OVERWRITE mode, if my understanding is correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869916750


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Oh ok, I guess the semantics are described elsewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
jon-wei commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r875163392


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   > In a similar vein, won't these metrics be useful for streaming ingestion as well - Why not make the metric name ingest/count and then the taskType dimension tells you whether or not it is streaming or batch?
   
   I had suggested a batch-specific name when discussing with Agustin since I think the semantics of the metric are batch specific, it's being emitted once per ingest spec posted, and I'm not sure what the equivalent for streaming would be. It's a metric indicating "number of times user did some action with a specific intent" vs. "detail of execution" (like a parallel subtask count for batch or number of streaming subtasks)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r876315264


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|
+
+## Compaction ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/compact/ovewrite/count`|Count of `1` every time a compaction overwrite job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   Removed task type from metric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869783472


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |

Review Comment:
   Yes it does but it was confusing where. Due to this great comment I decided to fully integrate the concept of `BatchIngestionMode` in the batch ingestion task internals replacing all the confusing tests for append or drop existing etc. There were many changes but they should straightforward to follow. The code is better now, IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870748107


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   The goal of this feature is to gather data to let us know the utilization of the various modes of batch ingest, in particular `REPLACE` thus the categorization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869682178


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -422,9 +423,40 @@ public boolean isPerfectRollup()
     return tuningConfig != null && tuningConfig.isForceGuaranteedRollup();
   }
 
+  @VisibleForTesting
+  void emitCompactIngestionModeMetrics(
+      ServiceEmitter emitter,
+      boolean isDropExisting
+  )
+  {
+
+    if (emitter == null) {
+      return;
+    }
+
+    BatchIngestionMode mode = getBatchIngestionMode(
+        false, // compact never supports append mode
+        isDropExisting
+    );
+    // compact does not support APPEND
+    switch (mode) {
+      case REPLACE:
+        emitter.emit(buildEvent("compact/replace/count", 1));
+        break;
+      case OVERWRITE:
+        emitter.emit(buildEvent("compact/overwrite/count", 1));
+        break;
+      default:
+        throw new ISE("Invalid compact ingestion mode [%s]", mode);

Review Comment:
   Added an INVALID mode and logging a warning instead when mode is INVALID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r878573809


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)
+  {
+    final boolean isAppendToExisting;
+    final boolean isDropExisting;
+
+    if (ioConfig == null) {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;
+      isDropExisting = BatchIOConfig.DEFAULT_DROP_EXISTING;
+    } else {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;

Review Comment:
   This is specific to `CompactionIOCOnfig` which does not have a `isAppendToExisting()` flag/method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r878573809


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)
+  {
+    final boolean isAppendToExisting;
+    final boolean isDropExisting;
+
+    if (ioConfig == null) {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;
+      isDropExisting = BatchIOConfig.DEFAULT_DROP_EXISTING;
+    } else {
+      isAppendToExisting = BatchIOConfig.DEFAULT_APPEND_EXISTING;

Review Comment:
   This is specific to `CompactionIOCOnfig` which does not have a `isAppendToExisting` flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r878575694


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractTask.java:
##########
@@ -219,4 +269,75 @@ protected boolean isUseMaxMemoryEstimates()
         Tasks.DEFAULT_USE_MAX_MEMORY_ESTIMATES
     );
   }
+
+  protected ServiceMetricEvent.Builder getMetricBuilder()
+  {
+    return metricBuilder;
+  }
+
+  public IngestionMode getIngestionMode()
+  {
+    return ingestionMode;
+  }
+
+  protected static IngestionMode computeIngestionMode(@Nullable CompactionIOConfig ioConfig)

Review Comment:
   Good call. Done in next commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r874026708


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   Yes, I agree too. Working on it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1129579138

   This pull request **introduces 2 alerts** when merging 1622b74d33004378eddc695a67fd3c7ddbfa5c5c into 3e8d7a6d9f43f889df7d73dedb2a94e03574615e - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-f12248a49d11dd5c2793a40190e75277843606e9)
   
   **new alerts:**
   
   * 2 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870778242


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   The `taskType` is already a dimension (i.e. for native batch types we have: `compact`, `index`, `index_parallel` and for streaming `index_kafka` , etc; for non native `index_hadoop`...) so yeah...maybe I should just drop the 'batch' from the metrics and just use the `indexType` dimension to filter...is this what you are thinking @suneet-s ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1133477889

   This pull request **introduces 1 alert** when merging 59aa6d72eac3b789cca7531dff6b807e7c844cb3 into c23622790568e1fca2eeaae308903d3199093ece - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-586f320514d44e730af9793a04b07a1b4d4d5cfd)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869618005


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -422,9 +423,40 @@ public boolean isPerfectRollup()
     return tuningConfig != null && tuningConfig.isForceGuaranteedRollup();
   }
 
+  @VisibleForTesting
+  void emitCompactIngestionModeMetrics(
+      ServiceEmitter emitter,
+      boolean isDropExisting
+  )
+  {
+
+    if (emitter == null) {
+      return;
+    }
+
+    BatchIngestionMode mode = getBatchIngestionMode(
+        false, // compact never supports append mode
+        isDropExisting
+    );
+    // compact does not support APPEND
+    switch (mode) {
+      case REPLACE:
+        emitter.emit(buildEvent("compact/replace/count", 1));
+        break;
+      case OVERWRITE:
+        emitter.emit(buildEvent("compact/overwrite/count", 1));
+        break;
+      default:
+        throw new ISE("Invalid compact ingestion mode [%s]", mode);

Review Comment:
   Should we just log error instead of possibly failing ingestion task with this thrown exception?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1124155408

   @suneet-s No I have not considered `RequestLogger`, I am not really familiar with that. But I see your point. However, the whole area of enriching, cleaning, the logging, metrics, and alerts for ingestion is an area ripe for improvement. But I would suggest to think a little about a decent starting point on what to do/unify these areas. So for now, I would like to focus on the scope as defined for the work, but I would strongly recommend a more extended approach that would incorporate suggestions like the one you are making.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870761125


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |

Review Comment:
   fix in next commit



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870645701


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|
+
+## Compaction ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/compact/ovewrite/count`|Count of `1` every time a compaction overwrite job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   Won't taskType always be `compact` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869604854


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch overwrite job, 0 otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869604265


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   1 if job referred to by taskId is batch append job, 0 otherwise?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r876316020


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|
+
+## Compaction ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/compact/ovewrite/count`|Count of `1` every time a compaction overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/compact/replace/count`|Count of `1` every time a compaction replace job runs|dataSource, taskId, taskType|Always `1`.|
+
+|`isDropExisting` | mode |

Review Comment:
   @zachjsh  Simplified & cleaned up docs... see PR description or the `metrics.md` changes



##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   @suneet-s I have removed the batch type from the metric and added `taskIngestionMode` as a dimension. As a result the new metrics are much clean and the code is cleaner as well. Also, now it was simple to add the metrics to streaming as well, take a look.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
jon-wei commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r875163392


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   > In a similar vein, won't these metrics be useful for streaming ingestion as well - Why not make the metric name ingest/count and then the taskType dimension tells you whether or not it is streaming or batch?
   
   I had suggested a batch-specific name when discussing with Agustin since I think the semantics of the metric are batch specific, it's being emitted once per ingest spec posted, and I'm not sure what the equivalent for streaming would be. It's a metric indicating "number of times user did some action with a specific intent" vs. "detail of execution" (like a parallel subtask count for batch or number of streaming subtasks). Taking it out for a shared metric name seems fine to me though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
suneet-s commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1123213056

   > The goal of this ticket is to emit Druid metrics to be able to assess the utilization of the different flavors of batch ingestion
   
   Since there are many different config flags that might be of interest, and these can change as devs add new config settings. Have you considered extending the `RequestLogger` interface to log ingestion specs as well as the current sql and native queries? This will allow cluster operators to point these logs to another system (maybe even a druid cluster) for running analytics on top of the entire ingestion spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870652754


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   In a similar vein, won't these metrics be useful for streaming ingestion as well - Why not make the metric name `ingest/count` and then the `taskType` dimension tells you whether or not it is streaming or batch?
   
   Also, reading these docs, it's not clear to me whether a compaction job will also be counted as a batch job



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870746559


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|
+
+## Compaction ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/compact/ovewrite/count`|Count of `1` every time a compaction overwrite job runs|dataSource, taskId, taskType|Always `1`.|

Review Comment:
   `taskType` yeah for compact is compact but for other batch can be other...so for completeness & consistency it is better to live that dimension there I think



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870745606


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |

Review Comment:
   Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870778242


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   The `taskType` is already a dimension (i.e. for native batch types we have: `compact`, `index`, `index_parallel` and for streaming `index_kafka` , etc; for non native `index_hadoop`...) so yeah...maybe I should just drop the 'batch' (and `compact`) from the metric name and just use the `indexType` dimension to filter...is this what you are thinking @suneet-s ?



##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   The `taskType` is already a dimension (i.e. for native batch types we have: `compact`, `index`, `index_parallel` and for streaming `index_kafka` , etc; for non native `index_hadoop`...) so yeah...maybe I should just drop the `batch` (and `compact`) from the metric name and just use the `indexType` dimension to filter...is this what you are thinking @suneet-s ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1133408493

   This pull request **introduces 1 alert** when merging 2945222bce706109fd756e071498bf153e0a4d17 into c23622790568e1fca2eeaae308903d3199093ece - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-15de7c2ce46c7f248a73f7424eddc53a3bc51261)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r869682178


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java:
##########
@@ -422,9 +423,40 @@ public boolean isPerfectRollup()
     return tuningConfig != null && tuningConfig.isForceGuaranteedRollup();
   }
 
+  @VisibleForTesting
+  void emitCompactIngestionModeMetrics(
+      ServiceEmitter emitter,
+      boolean isDropExisting
+  )
+  {
+
+    if (emitter == null) {
+      return;
+    }
+
+    BatchIngestionMode mode = getBatchIngestionMode(
+        false, // compact never supports append mode
+        isDropExisting
+    );
+    // compact does not support APPEND
+    switch (mode) {
+      case REPLACE:
+        emitter.emit(buildEvent("compact/replace/count", 1));
+        break;
+      case OVERWRITE:
+        emitter.emit(buildEvent("compact/overwrite/count", 1));
+        break;
+      default:
+        throw new ISE("Invalid compact ingestion mode [%s]", mode);

Review Comment:
   Integrating the `BatchIngestionMode` (see comment below) made this not required anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r870650751


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   Why did we decide to have a separate metric name for each type vs a single metric name with the type being a dimension?
   
   `ingest/batch/count`, `ingest/batch/segments/count` and a dimension `type` (or even 2 dimensions for `isAppendToExisting` and `isDropExisting` - in this case, you don't need to deal with the error case described on line 243)
   
   ```
   `true` | `true  ` | Invalid combination, exception thrown. |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r874026115


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Yes, in the documentation for the flag `isDropExisting`. The actual names `APPEND`, `REPLACE`, and `OVERWRITE` are not formally documented and I am introducing them here. But documenting them is our of scope for this ticket I feel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
zachjsh commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r871584118


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)

Review Comment:
   I agree with @suneet-s  suggestion to drop the batch and compact from metric name, as the task-type dimension included signifies this, and then the metric can be reused for other ingestion types, not just batch / compact.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a diff in pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
loquisgon commented on code in PR #12488:
URL: https://github.com/apache/druid/pull/12488#discussion_r876314400


##########
docs/operations/metrics.md:
##########
@@ -221,6 +221,41 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`worker/taskSlot/total/count`|Number of total task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 |`worker/taskSlot/used/count`|Number of busy task slots on the reporting worker per emission period. This metric is only available if the WorkerTaskCountStatsMonitor module is included.|category, version.|Varies.|
 
+## Batch ingestion metrics (Native parallel task)
+
+|Metric|Description|Dimensions|Normal Value|
+|------|-----------|----------|------------|
+|`ingest/batch/append/count`|Count of `1` every time a batch ingestion append job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/append/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/ovewrite/count`|Count of `1` every time a batch ingestion overwrite job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/ovewrite/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/count`|Count of `1` every time a batch ingestion replace job runs|dataSource, taskId, taskType|Always `1`.|
+|`ingest/batch/replace/segments/count`|Count of segments created |dataSource, taskId, taskType|At least `1`.|
+|`ingest/batch/replace/tombstones/count`|Count of tombstones created in this replace job|dataSource, taskId, taskType|It can be zero when replace could not find any empty time chunks in the input intervals. It cannot be more than number of segments.|
+
+`APPEND`, `OVERWRITE`, and `REPLACE` are decided using the values
+of the `isAppendToExisting` and `isDropExisting` flags in the
+task's `IOConfig` as follows:
+
+|`isAppendToExisting` | `isdDropExisting` | mode |
+|---------------------|-------------------|------|
+`true` | `false` | `APPEND`|
+`true` | `true  ` | Invalid combination, exception thrown. |
+`false` | `false` | `OVERWRITE` (this is the default for native batch ingestion). |
+`false` | `true` | `REPLACE`|

Review Comment:
   Added some explanation about what the modes mean.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12488: Emit state of replace and append for native batch tasks

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12488:
URL: https://github.com/apache/druid/pull/12488#issuecomment-1130448409

   This pull request **introduces 1 alert** when merging c5f139f67a0043723d7bb2d266cfa97591d33d64 into 3e8d7a6d9f43f889df7d73dedb2a94e03574615e - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-7e9358a245231e4c4e1d7ba1db617af203369be5)
   
   **new alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org