You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "adarshsanjeev (via GitHub)" <gi...@apache.org> on 2023/03/09 11:21:07 UTC

[GitHub] [druid] adarshsanjeev opened a new pull request, #13909: Add segment generator counters to MSQ reports

adarshsanjeev opened a new pull request, #13909:
URL: https://github.com/apache/druid/pull/13909

   Adds segment generator counters as a new type of counters for the MSQ report. These currently tracks the number of rows processed by the segment generator. These could be used to display a more accurate progress bar on the web console for the segment generation stage.
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1473168002

   @vogievetsky I wasn't able to reproduce this locally with the query supplied. I have also rebased with master, do you still see any issues in your build now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1134563426


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   @cryptoe I do not understand your suggestion. Could you please provide an example of a JSON so help illustrate your point?
   
   The current structure seems good to me although maybe I am missing something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131296561


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   Yes, I think it would cause a problem since we already use that name. I think ideally we should keep the names unique, to avoid any exceptions, since we cast to the child class after fetching based on channel name. 
   If we want "output" in the name to make it clear, we could rename this to "segmentGenerationOutput"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1139961163


##########
server/src/main/java/org/apache/druid/segment/realtime/FireDepartmentMetrics.java:
##########
@@ -113,6 +114,9 @@ public void incrementMergeTimeMillis(long millis)
   {
     mergeTimeMillis.addAndGet(millis);
   }
+  public void incrementMergeRows(long rows) {

Review Comment:
   Added a comment to the variable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1135041472


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   I was thinking something like this: 
   ```json
   {
    "1": { 
               "input0": {
                 "type": "channel",
                 "rows": [
                   465346
                 ],
                 "files": [
                   1
                 ],
                 "totalFiles": [
                   1
                 ]
               },
               "output": {
                 "type": "segmentGenerationProgress",
                 "rowsProcessed": 465346 
              }
          }
   }
   
   ```
   
   but I think that would complicate serde stuff since we already have a type:"channel" inside output. 
   
   ```json
   {
   "output":{
           "type": "channel",
           "rows": [465346],
           "files": [1],
      }
   }
   
   ```
   
   I think the correct structure should be to remove the "type" but that should be outside the scope of this PR.
   ```json
   {
   "output":{
     "channel":{
        "foo":"bar"
        }
     "segmentGenerationProgress": {
        "foo1":"bar1"
      }
    }
   }
     
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1479784620

   Oh cool, I did not realize we push as we merge! (Yes I am doing local deep storage)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131318645


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  public void incrementRowProcessedCount()
+  {
+    synchronized (this) {
+      this.rowsProcessed += 1;
+    }
+  }
+
+  @Override
+  @Nullable
+  public QueryCounterSnapshot snapshot()
+  {
+    synchronized (this) {
+      return new Snapshot(rowsProcessed);
+    }
+  }
+
+  @JsonTypeName("segmentGenerationProgress")
+  public static class Snapshot implements QueryCounterSnapshot
+  {
+    private final long rowsProcessed;
+
+    @JsonCreator
+    public Snapshot(@JsonProperty("rowsProcessed") final long rowsProcessed)
+    {
+      this.rowsProcessed = rowsProcessed;
+    }
+
+    @JsonProperty(value = "rowsProcessed")
+    @JsonInclude(JsonInclude.Include.NON_NULL)

Review Comment:
   Thanks, I missed this. Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1135041472


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   I was thinking something like this: 
   ```json
   {
    "1": { 
               "input0": {
                 "type": "channel",
                 "rows": [
                   465346
                 ],
                 "files": [
                   1
                 ],
                 "totalFiles": [
                   1
                 ]
               },
               "output": {
                 "type": "segmentGenerationProgress",
                 "rowsProcessed": 465346 
              }
          }
   }
   
   ```
   
   but I think that would complicate what type of output counter since we have a type:"channel" inside output. 
   
   I think the correct structure should be to remove the "type" but that should be outside the scope of this PR.
   ```json
   {
   "output":{
     "channel":{
        "foo":"bar"
        }
     "segmentGenerationProgress": {
        "foo1":"bar1"
      }
    }
   }
     
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1467447688

   @adarshsanjeev Looks like there is a checkstyle failure. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1135041472


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   I was thinking something like this: 
   ```json
   {
    "1": { 
               "input0": {
                 "type": "channel",
                 "rows": [
                   465346
                 ],
                 "files": [
                   1
                 ],
                 "totalFiles": [
                   1
                 ]
               },
               "output": {
                 "type": "segmentGenerationProgress",
                 "rowsProcessed": 465346 
              }
          }
   }
   
   ```
   
   but I think they would complicate what type of output counter since we have a type:"channel inside output". 
   
   I think the correct structure should be to remove the "type" but that should be outside the scope of this PR.
   ```json
   {
   "output":{
     "channel":{
        "foo":"bar"
        }
     "segmentGenerationProgress": {
        "foo1":"bar1"
      }
    }
   }
     
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1478937914

   I think this might be because we push to deep storage right after we merge, so the time where they are different is pretty narrow, especially since using local disk as deep storage might be cheaper than normal (if that is how you're running it). Persist and push steps have two different executors processing it, so we might see more of a difference there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131055875


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  public void incrementRowProcessedCount()
+  {
+    synchronized (this) {
+      this.rowsProcessed += 1;
+    }
+  }
+
+  @Override
+  @Nullable
+  public QueryCounterSnapshot snapshot()
+  {
+    synchronized (this) {
+      return new Snapshot(rowsProcessed);
+    }
+  }
+
+  @JsonTypeName("segmentGenerationProgress")
+  public static class Snapshot implements QueryCounterSnapshot
+  {
+    private final long rowsProcessed;
+
+    @JsonCreator
+    public Snapshot(@JsonProperty("rowsProcessed") final long rowsProcessed)
+    {
+      this.rowsProcessed = rowsProcessed;
+    }
+
+    @JsonProperty(value = "rowsProcessed")
+    @JsonInclude(JsonInclude.Include.NON_NULL)

Review Comment:
   Since its a long, do we need a non null check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131055875


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  public void incrementRowProcessedCount()
+  {
+    synchronized (this) {
+      this.rowsProcessed += 1;
+    }
+  }
+
+  @Override
+  @Nullable
+  public QueryCounterSnapshot snapshot()
+  {
+    synchronized (this) {
+      return new Snapshot(rowsProcessed);
+    }
+  }
+
+  @JsonTypeName("segmentGenerationProgress")
+  public static class Snapshot implements QueryCounterSnapshot
+  {
+    private final long rowsProcessed;
+
+    @JsonCreator
+    public Snapshot(@JsonProperty("rowsProcessed") final long rowsProcessed)
+    {
+      this.rowsProcessed = rowsProcessed;
+    }
+
+    @JsonProperty(value = "rowsProcessed")
+    @JsonInclude(JsonInclude.Include.NON_NULL)

Review Comment:
   Since its a long, can this ever by null ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131066213


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   Should the counter should have the key "output" and then output type is "segmentGenerationProgress"
   Does that interfere with the current metrics ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1476846327

   Side note question: should the UI show just `rowsMerged` or should it show `rowsPersisted` also (At this point I think there is no point showing `rowsProcessed` anywhere in the UI)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1139828353


##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -785,6 +785,9 @@ private DataSegment mergeAndPush(
         for (FireHydrant fireHydrant : sink) {
           Pair<ReferenceCountingSegment, Closeable> segmentAndCloseable = fireHydrant.getAndIncrementSegment();
           final QueryableIndex queryableIndex = segmentAndCloseable.lhs.asQueryableIndex();
+          if (queryableIndex != null) {

Review Comment:
   Why would the queryable index be empty here. 
   Would it cause problems in the indexes.add() method ?



##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -785,6 +785,9 @@ private DataSegment mergeAndPush(
         for (FireHydrant fireHydrant : sink) {
           Pair<ReferenceCountingSegment, Closeable> segmentAndCloseable = fireHydrant.getAndIncrementSegment();
           final QueryableIndex queryableIndex = segmentAndCloseable.lhs.asQueryableIndex();
+          if (queryableIndex != null) {

Review Comment:
   Why would the queryable index be null here. 
   Would it cause problems in the indexes.add() method ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1141664956


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/SegmentGeneratorFrameProcessor.java:
##########
@@ -202,7 +211,14 @@ private void addFrame(final Frame frame)
 
             try {
               rowsWritten++;
+              segmentGenerationProgressCounter.incrementRowProcessedCount();
               appenderator.add(segmentIdWithShardSpec, inputRow, null);
+              if (rowsWritten - lastCounterUpdatedRows > 1000) {

Review Comment:
   Changed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1477411756

   I think the rowsPushed metrics makes sense. One thing I want to mention is that the push to deep storage happens in the same function shortly after. So, in a lot of cases, rowsMerged would equal rowsPushed (like in a local deployment). This would probably help with being able to tell from just the counters that the appenderator is stuck in retry pushing the segment to deep storage, so that would be neat!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1139970789


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/SegmentGeneratorFrameProcessor.java:
##########
@@ -202,7 +211,14 @@ private void addFrame(final Frame frame)
 
             try {
               rowsWritten++;
+              segmentGenerationProgressCounter.incrementRowProcessedCount();
               appenderator.add(segmentIdWithShardSpec, inputRow, null);
+              if (rowsWritten - lastCounterUpdatedRows > 1000) {

Review Comment:
   I have made the change. I think this would still end up happening to a small extent, if say some rows are not persisted by the time the last rows are processed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1477277342

   I second the `rowsPushed` I think the best UI would have 3 lines for the segmentGenerator step:
   
   ```
   Input
   Rows merged
   Rows pushed
   ```
   
   That would be 💯 
   
   Note that I am imagining rows pushed to increment by the number of rows in the segment when it is done pushing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1478922093

   I am trying out the latest...
   
   <img width="983" alt="image" src="https://user-images.githubusercontent.com/177816/226806527-7ec54ba3-deb8-48c6-8b33-a518b3e1bc00.png">
   
   maybe I misunderstood something. It appears that `rowsMerged` and `rowsPushed` are changing in tandem. I though that `rowsPushed` would increment only when the segment is pushed to deep storage. so like it would be 0 for a long time and then jump up by segment rows. What am I missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1139960535


##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -785,6 +785,9 @@ private DataSegment mergeAndPush(
         for (FireHydrant fireHydrant : sink) {
           Pair<ReferenceCountingSegment, Closeable> segmentAndCloseable = fireHydrant.getAndIncrementSegment();
           final QueryableIndex queryableIndex = segmentAndCloseable.lhs.asQueryableIndex();
+          if (queryableIndex != null) {

Review Comment:
   I think the getter above would return null if the queryable index is closed. I think we gracefully handle this elsewhere, I haven't changed that part of the code. This is just to avoid throwing an NPE here if we do run into this problem, since we are just reporting metrics here.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  @GuardedBy("this")
+  private long rowsPersisted = 0L;
+
+  @GuardedBy("this")
+  private long rowsMerged = 0L;
+
+  public void incrementRowProcessedCount()

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1141420148


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/SegmentGeneratorFrameProcessor.java:
##########
@@ -202,7 +211,14 @@ private void addFrame(final Frame frame)
 
             try {
               rowsWritten++;
+              segmentGenerationProgressCounter.incrementRowProcessedCount();
               appenderator.add(segmentIdWithShardSpec, inputRow, null);
+              if (rowsWritten - lastCounterUpdatedRows > 1000) {

Review Comment:
   This would be really weird behavior for the end user. 
   Should we create a class that extends FireDepartMetric and then update the taskReport as and when relevant calls are made ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1142019488


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGeneratorMetricsWrapper.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import org.apache.druid.segment.realtime.FireDepartmentMetrics;
+
+/**
+ * Wrapper around {@link FireDepartmentMetrics} which updates the progress counters while updating its metrics. This
+ * is necessary as the {@link org.apache.druid.segment.realtime.appenderator.BatchAppenderator} used by the
+ * {@link org.apache.druid.msq.indexing.SegmentGeneratorFrameProcessor} is not part of the MSQ extension, and hence,
+ * cannot update the counters used in MSQ reports as it persists and pushes segments to deep storage.
+ */
+public class SegmentGeneratorMetricsWrapper extends FireDepartmentMetrics
+{
+  SegmentGenerationProgressCounter segmentGenerationProgressCounter;

Review Comment:
   Lets make this final and private ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1142828341


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  @GuardedBy("this")
+  private long rowsPersisted = 0L;
+
+  @GuardedBy("this")
+  private long rowsMerged = 0L;
+
+  public void incrementRowProcessed(long rowsProcessed)

Review Comment:
   `incrementRowsProcessed` (spelling)



##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -785,6 +785,9 @@ private DataSegment mergeAndPush(
         for (FireHydrant fireHydrant : sink) {
           Pair<ReferenceCountingSegment, Closeable> segmentAndCloseable = fireHydrant.getAndIncrementSegment();
           final QueryableIndex queryableIndex = segmentAndCloseable.lhs.asQueryableIndex();
+          if (queryableIndex != null) {
+            metrics.incrementMergeRows(queryableIndex.getNumRows());

Review Comment:
   This isn't the right place to increment the number of rows merged; it's happening _before_ merge so it's really the number of rows that are about-to-be-merged. If you want the number of rows actually merged, the hook point should be after `mergeQueryableIndex` is done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1478922615

   Btw also I love the change to rowsMerged, it now lags quite a bit behind input so it really shows work being done!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1478342920

   `rowsPushed` would lag behind `rowsMerged` by however much time it takes to push to deep storage which might take some meaningful time in S3 for larger segments. In any case I think it would look great and be very informative.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1476577521

   Looking


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #13909: Add segment generator counters to MSQ reports

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm merged PR #13909:
URL: https://github.com/apache/druid/pull/13909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] adarshsanjeev commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "adarshsanjeev (via GitHub)" <gi...@apache.org>.
adarshsanjeev commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1131296561


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   Yes, I think it would cause a problem since we already use that name. I think ideally we should keep the names unique, to avoid any exceptions, since we cast to the child class after fetching based on channel name. We could resolve this case, but the code might get more complicated. 
   If we want "output" in the name to make it clear, we could rename this to "segmentGenerationOutput"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1477224046

   In addition to the comments I left above, IMO it'd also be good to add `rowsPushed`, so we can capture the zip-and-push-to-deep-storage step as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] vogievetsky commented on pull request #13909: Add segment generator counters to MSQ reports

Posted by "vogievetsky (via GitHub)" <gi...@apache.org>.
vogievetsky commented on PR #13909:
URL: https://github.com/apache/druid/pull/13909#issuecomment-1466890449

   From an initial look at the API it seems perfectly fine, there is a new counter type that appears on segment generation and reports the progress in rows. Seems straight forward enough. Let me try to integrate this into the console it should only take a tiny amount of time (if not then something is wrong!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1135041472


##########
docs/multi-stage-query/api.md:
##########
@@ -528,6 +530,10 @@ The response shows an example report for a query.
               "frames": [
                 73
               ]
+            },
+            "segmentGenerationProgress": {
+              "type": "segmentGenerationProgress",

Review Comment:
   I was thinking something like this: 
   ```json
   {
    "1": { 
               "input0": {
                 "type": "channel",
                 "rows": [
                   465346
                 ],
                 "files": [
                   1
                 ],
                 "totalFiles": [
                   1
                 ]
               },
               "output": {
                 "type": "segmentGenerationProgress",
                 "rowsProcessed": 465346 
              }
          }
   }
   
   ```
   
   but I think they would complicate what type of output counter since we have a type:"channel inside output". 
   
   I think the correct structure should be but that should be outside the scope of this PR.
   ```json
   {
   "output":{
     "channel":{
        "foo":"bar"
        }
     "segmentGenerationProgress": {
        "foo1":"bar1"
      }
    }
   }
     
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] cryptoe commented on a diff in pull request #13909: Add segment generator counters to MSQ reports

Posted by "cryptoe (via GitHub)" <gi...@apache.org>.
cryptoe commented on code in PR #13909:
URL: https://github.com/apache/druid/pull/13909#discussion_r1139817347


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/counters/SegmentGenerationProgressCounter.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.msq.counters;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.errorprone.annotations.concurrent.GuardedBy;
+
+import javax.annotation.Nullable;
+import java.util.Objects;
+
+/**
+ * Counters for segment generation phase. Created by {@link CounterTracker#segmentGenerationProgress()}.
+ */
+public class SegmentGenerationProgressCounter implements QueryCounter
+{
+  @GuardedBy("this")
+  private long rowsProcessed = 0L;
+
+  @GuardedBy("this")
+  private long rowsPersisted = 0L;
+
+  @GuardedBy("this")
+  private long rowsMerged = 0L;
+
+  public void incrementRowProcessedCount()

Review Comment:
   Can we take a lock once and increment all three metrics ?



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/SegmentGeneratorFrameProcessor.java:
##########
@@ -202,7 +211,14 @@ private void addFrame(final Frame frame)
 
             try {
               rowsWritten++;
+              segmentGenerationProgressCounter.incrementRowProcessedCount();
               appenderator.add(segmentIdWithShardSpec, inputRow, null);
+              if (rowsWritten - lastCounterUpdatedRows > 1000) {

Review Comment:
   The the current approach, you will miss the last batch rite ?
   So if total rows are 1950, 
   you would only have metrics for the first 1000 rows. 
   
   



##########
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BatchAppenderator.java:
##########
@@ -785,6 +785,9 @@ private DataSegment mergeAndPush(
         for (FireHydrant fireHydrant : sink) {
           Pair<ReferenceCountingSegment, Closeable> segmentAndCloseable = fireHydrant.getAndIncrementSegment();
           final QueryableIndex queryableIndex = segmentAndCloseable.lhs.asQueryableIndex();
+          if (queryableIndex != null) {

Review Comment:
   Why would the queryable index by empty here. 
   Would it cause problems in the indexes.add() method ?



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/SegmentGeneratorFrameProcessor.java:
##########
@@ -202,7 +211,14 @@ private void addFrame(final Frame frame)
 
             try {
               rowsWritten++;
+              segmentGenerationProgressCounter.incrementRowProcessedCount();
               appenderator.add(segmentIdWithShardSpec, inputRow, null);
+              if (rowsWritten - lastCounterUpdatedRows > 1000) {

Review Comment:
   Lets increment stuff on each added row since we are fetching from an atomic long and the slowest operation would be the merging step anyway. 



##########
server/src/main/java/org/apache/druid/segment/realtime/FireDepartmentMetrics.java:
##########
@@ -113,6 +114,9 @@ public void incrementMergeTimeMillis(long millis)
   {
     mergeTimeMillis.addAndGet(millis);
   }
+  public void incrementMergeRows(long rows) {

Review Comment:
   Can you please document what merge rows mean's ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org