You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/06/24 19:16:38 UTC

[GitHub] [nifi] markap14 opened a new pull request, #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

markap14 opened a new pull request, #6156:
URL: https://github.com/apache/nifi/pull/6156

   … Usage, time reading/writing content repo, process session commit time, etc. Exposed via nifi.sh diagnostics and made configurable via nifi.properties
   
   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with -->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 -->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   [NIFI-00000](https://issues.apache.org/jira/browse/NIFI-00000)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
     - [ ] JDK 8
     - [ ] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r911164597


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/diagnostics/bootstrap/tasks/ProcessorTimingDiagnosticTask.java:
##########
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.diagnostics.bootstrap.tasks;
+
+import org.apache.nifi.controller.ProcessorNode;
+import org.apache.nifi.controller.flow.FlowManager;
+import org.apache.nifi.controller.repository.FlowFileEvent;
+import org.apache.nifi.controller.repository.FlowFileEventRepository;
+import org.apache.nifi.controller.repository.RepositoryStatusReport;
+import org.apache.nifi.diagnostics.DiagnosticTask;
+import org.apache.nifi.diagnostics.DiagnosticsDumpElement;
+import org.apache.nifi.diagnostics.StandardDiagnosticsDumpElement;
+import org.apache.nifi.processor.DataUnit;
+
+import java.text.DecimalFormat;
+import java.text.NumberFormat;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+
+public class ProcessorTimingDiagnosticTask implements DiagnosticTask {
+    private final FlowFileEventRepository eventRepo;
+    private final FlowManager flowManager;
+
+    //                                                     | Proc ID    | Proc Name  | Proc Type  | Group Name | Proc Secs | CPU Secs  | %CPU used by Proc |
+    private static final String PROCESSOR_TIMING_FORMAT = "| %1$-36.36s | %2$-36.36s | %3$-36.36s | %4$-36.36s | %5$15.15s | %6$27.27s | %7$25.25s | " +
+    //   Read Secs | Write Secs| Commit Sec | GC millis | MB Read    | MB Write   |
+        "%8$16.16s | %9$16.16s | %10$20.20s | %11$13.13s | %12$11.11s | %13$11.11s |";
+
+    public ProcessorTimingDiagnosticTask(final FlowFileEventRepository flowFileEventRepository, final FlowManager flowManager) {
+        this.eventRepo = flowFileEventRepository;
+        this.flowManager = flowManager;
+    }
+
+    @Override
+    public DiagnosticsDumpElement captureDump(final boolean verbose) {
+        final List<String> details = new ArrayList<>();
+
+        final RepositoryStatusReport statusReport = eventRepo.reportTransferEvents(System.currentTimeMillis());
+        final Map<String, FlowFileEvent> eventsByComponentId = statusReport.getReportEntries();
+
+        final List<ProcessorTiming> timings = new ArrayList<>();
+        eventsByComponentId.entrySet().stream()
+            .map(entry -> getTiming(entry.getKey(), entry.getValue()))
+            .filter(Objects::nonNull)
+            .forEach(timings::add); // create ArrayList and add here instead of .collect(toList()) because arraylist allows us to sort
+
+        // Sort based on the Processor CPU time, highest CPU usage first
+        timings.sort(Comparator.comparing(ProcessorTiming::getCpuNanos).reversed());
+
+        final DecimalFormat dataSizeFormat = new DecimalFormat("#,###,###.##");
+        final DecimalFormat percentageFormat = new DecimalFormat("##.##");
+        final NumberFormat secondsFormat = NumberFormat.getInstance();
+
+        long totalCpuNanos = 0L;
+        long totalProcNanos = 0L;
+        long totalReadNanos = 0L;
+        long totalWriteNanos = 0L;
+        long totalSessionCommitNanos = 0L;
+        long totalBytesRead = 0L;
+        long totalBytesWritten = 0L;
+        long totalGcNanos = 0L;
+
+        // Tally totals for all timing elements
+        for (final ProcessorTiming timing : timings) {
+            totalCpuNanos += timing.getCpuNanos();
+            totalProcNanos += timing.getProcessingNanos();
+            totalReadNanos += timing.getReadNanos();
+            totalWriteNanos += timing.getWriteNanos();
+            totalSessionCommitNanos += timing.getSessionCommitNanos();
+            totalBytesRead += timing.getBytesRead();
+            totalBytesWritten += timing.getBytesWritten();
+            totalGcNanos += timing.getGarbageCollectionNanos();
+        }
+
+        if (totalCpuNanos < 1) {
+            details.add("No Processor Timing Diagnostic information has been gathered.");
+            return new StandardDiagnosticsDumpElement("Processor Timing Diagnostics (Stats over last 5 minutes)", details);
+        }
+
+        details.add(String.format(PROCESSOR_TIMING_FORMAT, "Processor ID", "Processor Name", "Processor Type", "Process Group Name", "Processing Secs",
+            "CPU Secs (% time using CPU)", "Pct CPU Time Used by Proc", "Disk Read Secs", "Disk Write Secs", "Session Commit Secs", "GC Millis", "MB Read", "MB Written"));

Review Comment:
   Thanks for the reply @markap14, that's a good point about the potential for larger numbers in column values. In light of the variety of options, and potential for different approaches down the road, leaving it as-is sounds good for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r908767825


##########
nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java:
##########
@@ -329,6 +329,12 @@ public class NiFiProperties extends ApplicationProperties {
     public static final int DEFAULT_DIAGNOSTICS_ON_SHUTDOWN_MAX_FILE_COUNT = 10;
     public static final String DEFAULT_DIAGNOSTICS_ON_SHUTDOWN_MAX_DIRECTORY_SIZE = "10 MB";
 
+    // performance tracking
+    public static final String TRACK_PERFORMANCE_PERCENTAGE = "nifi.performance.tracking.percentage";
+
+    // performance tracking defaults
+    public static final int DEFAULT_TRACK_PERFORMANCE_PERCENTAGE = 5;

Review Comment:
   Should this value be set to `0` to align with the default setting in `nifi.properties`? That should ensure this feature is disabled by default even in the absence of having the property defined.



##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/nifi.properties:
##########
@@ -345,6 +345,14 @@ nifi.diagnostics.on.shutdown.max.filecount=10
 # The diagnostics folder's maximum permitted size in bytes. If the limit is exceeded, the oldest files are deleted.
 nifi.diagnostics.on.shutdown.max.directory.size=10 MB
 
+# Performance tracking properties
+## Specifies what percentage of the time we should track the amount of time processors are using CPU, reading from/writing to content repo, etc.
+## This can be useful to understand which components are the most expensive and to understand where system bottlenecks may be occurring.
+## The value must be in the range of 0 (inclusive) to 100 (inclusive). A larger value will produce more accurate results, while a smaller value may be
+## less expensive to compute.
+## Results can be obtained by running "nifi.sh diagnostics <filename>" and then inspecting the produced file.
+nifi.performance.tracking.percentage=${nifi.performance.tracking.percentage}

Review Comment:
   It would be helpful to include this same information in the Administrator's Guide, perhaps under `Runtime Monitoring Properties` or `NiFi Diagnostics`. If it is specific to the `diagnostics` command, should the property be named `nifi.diagnostics.performance.tracking.percentage`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory closed pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
exceptionfactory closed pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…
URL: https://github.com/apache/nifi/pull/6156


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] markap14 commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
markap14 commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r911070208


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/diagnostics/bootstrap/tasks/ProcessorTimingDiagnosticTask.java:
##########
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.diagnostics.bootstrap.tasks;
+
+import org.apache.nifi.controller.ProcessorNode;
+import org.apache.nifi.controller.flow.FlowManager;
+import org.apache.nifi.controller.repository.FlowFileEvent;
+import org.apache.nifi.controller.repository.FlowFileEventRepository;
+import org.apache.nifi.controller.repository.RepositoryStatusReport;
+import org.apache.nifi.diagnostics.DiagnosticTask;
+import org.apache.nifi.diagnostics.DiagnosticsDumpElement;
+import org.apache.nifi.diagnostics.StandardDiagnosticsDumpElement;
+import org.apache.nifi.processor.DataUnit;
+
+import java.text.DecimalFormat;
+import java.text.NumberFormat;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+
+public class ProcessorTimingDiagnosticTask implements DiagnosticTask {
+    private final FlowFileEventRepository eventRepo;
+    private final FlowManager flowManager;
+
+    //                                                     | Proc ID    | Proc Name  | Proc Type  | Group Name | Proc Secs | CPU Secs  | %CPU used by Proc |
+    private static final String PROCESSOR_TIMING_FORMAT = "| %1$-36.36s | %2$-36.36s | %3$-36.36s | %4$-36.36s | %5$15.15s | %6$27.27s | %7$25.25s | " +
+    //   Read Secs | Write Secs| Commit Sec | GC millis | MB Read    | MB Write   |
+        "%8$16.16s | %9$16.16s | %10$20.20s | %11$13.13s | %12$11.11s | %13$11.11s |";
+
+    public ProcessorTimingDiagnosticTask(final FlowFileEventRepository flowFileEventRepository, final FlowManager flowManager) {
+        this.eventRepo = flowFileEventRepository;
+        this.flowManager = flowManager;
+    }
+
+    @Override
+    public DiagnosticsDumpElement captureDump(final boolean verbose) {
+        final List<String> details = new ArrayList<>();
+
+        final RepositoryStatusReport statusReport = eventRepo.reportTransferEvents(System.currentTimeMillis());
+        final Map<String, FlowFileEvent> eventsByComponentId = statusReport.getReportEntries();
+
+        final List<ProcessorTiming> timings = new ArrayList<>();
+        eventsByComponentId.entrySet().stream()
+            .map(entry -> getTiming(entry.getKey(), entry.getValue()))
+            .filter(Objects::nonNull)
+            .forEach(timings::add); // create ArrayList and add here instead of .collect(toList()) because arraylist allows us to sort
+
+        // Sort based on the Processor CPU time, highest CPU usage first
+        timings.sort(Comparator.comparing(ProcessorTiming::getCpuNanos).reversed());
+
+        final DecimalFormat dataSizeFormat = new DecimalFormat("#,###,###.##");
+        final DecimalFormat percentageFormat = new DecimalFormat("##.##");
+        final NumberFormat secondsFormat = NumberFormat.getInstance();
+
+        long totalCpuNanos = 0L;
+        long totalProcNanos = 0L;
+        long totalReadNanos = 0L;
+        long totalWriteNanos = 0L;
+        long totalSessionCommitNanos = 0L;
+        long totalBytesRead = 0L;
+        long totalBytesWritten = 0L;
+        long totalGcNanos = 0L;
+
+        // Tally totals for all timing elements
+        for (final ProcessorTiming timing : timings) {
+            totalCpuNanos += timing.getCpuNanos();
+            totalProcNanos += timing.getProcessingNanos();
+            totalReadNanos += timing.getReadNanos();
+            totalWriteNanos += timing.getWriteNanos();
+            totalSessionCommitNanos += timing.getSessionCommitNanos();
+            totalBytesRead += timing.getBytesRead();
+            totalBytesWritten += timing.getBytesWritten();
+            totalGcNanos += timing.getGarbageCollectionNanos();
+        }
+
+        if (totalCpuNanos < 1) {
+            details.add("No Processor Timing Diagnostic information has been gathered.");
+            return new StandardDiagnosticsDumpElement("Processor Timing Diagnostics (Stats over last 5 minutes)", details);
+        }
+
+        details.add(String.format(PROCESSOR_TIMING_FORMAT, "Processor ID", "Processor Name", "Processor Type", "Process Group Name", "Processing Secs",
+            "CPU Secs (% time using CPU)", "Pct CPU Time Used by Proc", "Disk Read Secs", "Disk Write Secs", "Session Commit Secs", "GC Millis", "MB Read", "MB Written"));

Review Comment:
   -1. While I can appreciate the desire for conciseness, I think the readability is significantly hindered by reducing the column names to abbreviations. The reality is that if you're using NiFi, you generally are going to have a reasonably large screen, or the canvas is going to be too cramped anyway. And you can always grab the section out and throw it into a spreadsheet if you needed to. Additionally, the width needs to be fairly wide for some of these columns anyway. So having a heading like "MW" for a column that needs to be sufficiently wide to fit something like "4,820,182.54" is a bit awkward as well, when the heading could be explicit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] markap14 commented on pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
markap14 commented on PR #6156:
URL: https://github.com/apache/nifi/pull/6156#issuecomment-1171442331

   Thanks @exceptionfactory I appreciate you reviewing & working with me to get this merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] markap14 commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
markap14 commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r908955784


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/nifi.properties:
##########
@@ -345,6 +345,14 @@ nifi.diagnostics.on.shutdown.max.filecount=10
 # The diagnostics folder's maximum permitted size in bytes. If the limit is exceeded, the oldest files are deleted.
 nifi.diagnostics.on.shutdown.max.directory.size=10 MB
 
+# Performance tracking properties
+## Specifies what percentage of the time we should track the amount of time processors are using CPU, reading from/writing to content repo, etc.
+## This can be useful to understand which components are the most expensive and to understand where system bottlenecks may be occurring.
+## The value must be in the range of 0 (inclusive) to 100 (inclusive). A larger value will produce more accurate results, while a smaller value may be
+## less expensive to compute.
+## Results can be obtained by running "nifi.sh diagnostics <filename>" and then inspecting the produced file.
+nifi.performance.tracking.percentage=${nifi.performance.tracking.percentage}

Review Comment:
   Fair enough, I can update the admin guide. I don't want to mention `diagnostics` in the property name. These are not really related to diagnostics necessarily. That just happens to be the only mechanism by which we can expose them currently. But in the future, I can imagine them being exposed via Reporting Tasks and/or the UI, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] markap14 commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
markap14 commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r908953948


##########
nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java:
##########
@@ -329,6 +329,12 @@ public class NiFiProperties extends ApplicationProperties {
     public static final int DEFAULT_DIAGNOSTICS_ON_SHUTDOWN_MAX_FILE_COUNT = 10;
     public static final String DEFAULT_DIAGNOSTICS_ON_SHUTDOWN_MAX_DIRECTORY_SIZE = "10 MB";
 
+    // performance tracking
+    public static final String TRACK_PERFORMANCE_PERCENTAGE = "nifi.performance.tracking.percentage";
+
+    // performance tracking defaults
+    public static final int DEFAULT_TRACK_PERFORMANCE_PERCENTAGE = 5;

Review Comment:
   Yup, good catch. It should be definitely be 0. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r909034982


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/nifi.properties:
##########
@@ -345,6 +345,14 @@ nifi.diagnostics.on.shutdown.max.filecount=10
 # The diagnostics folder's maximum permitted size in bytes. If the limit is exceeded, the oldest files are deleted.
 nifi.diagnostics.on.shutdown.max.directory.size=10 MB
 
+# Performance tracking properties
+## Specifies what percentage of the time we should track the amount of time processors are using CPU, reading from/writing to content repo, etc.
+## This can be useful to understand which components are the most expensive and to understand where system bottlenecks may be occurring.
+## The value must be in the range of 0 (inclusive) to 100 (inclusive). A larger value will produce more accurate results, while a smaller value may be
+## less expensive to compute.
+## Results can be obtained by running "nifi.sh diagnostics <filename>" and then inspecting the produced file.
+nifi.performance.tracking.percentage=${nifi.performance.tracking.percentage}

Review Comment:
   Thanks for clarifying the potential uses and updating the admin guide. It seemed like there could be other ways to provide the information based on the implementation, so keeping the property name as defined sounds good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory commented on a diff in pull request #6156: NIFI-10167: Added advanced timing metrics for processors, such as CPU…

Posted by GitBox <gi...@apache.org>.
exceptionfactory commented on code in PR #6156:
URL: https://github.com/apache/nifi/pull/6156#discussion_r910370087


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/diagnostics/bootstrap/tasks/ProcessorTimingDiagnosticTask.java:
##########
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nifi.diagnostics.bootstrap.tasks;
+
+import org.apache.nifi.controller.ProcessorNode;
+import org.apache.nifi.controller.flow.FlowManager;
+import org.apache.nifi.controller.repository.FlowFileEvent;
+import org.apache.nifi.controller.repository.FlowFileEventRepository;
+import org.apache.nifi.controller.repository.RepositoryStatusReport;
+import org.apache.nifi.diagnostics.DiagnosticTask;
+import org.apache.nifi.diagnostics.DiagnosticsDumpElement;
+import org.apache.nifi.diagnostics.StandardDiagnosticsDumpElement;
+import org.apache.nifi.processor.DataUnit;
+
+import java.text.DecimalFormat;
+import java.text.NumberFormat;
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.concurrent.TimeUnit;
+
+public class ProcessorTimingDiagnosticTask implements DiagnosticTask {
+    private final FlowFileEventRepository eventRepo;
+    private final FlowManager flowManager;
+
+    //                                                     | Proc ID    | Proc Name  | Proc Type  | Group Name | Proc Secs | CPU Secs  | %CPU used by Proc |
+    private static final String PROCESSOR_TIMING_FORMAT = "| %1$-36.36s | %2$-36.36s | %3$-36.36s | %4$-36.36s | %5$15.15s | %6$27.27s | %7$25.25s | " +
+    //   Read Secs | Write Secs| Commit Sec | GC millis | MB Read    | MB Write   |
+        "%8$16.16s | %9$16.16s | %10$20.20s | %11$13.13s | %12$11.11s | %13$11.11s |";
+
+    public ProcessorTimingDiagnosticTask(final FlowFileEventRepository flowFileEventRepository, final FlowManager flowManager) {
+        this.eventRepo = flowFileEventRepository;
+        this.flowManager = flowManager;
+    }
+
+    @Override
+    public DiagnosticsDumpElement captureDump(final boolean verbose) {
+        final List<String> details = new ArrayList<>();
+
+        final RepositoryStatusReport statusReport = eventRepo.reportTransferEvents(System.currentTimeMillis());
+        final Map<String, FlowFileEvent> eventsByComponentId = statusReport.getReportEntries();
+
+        final List<ProcessorTiming> timings = new ArrayList<>();
+        eventsByComponentId.entrySet().stream()
+            .map(entry -> getTiming(entry.getKey(), entry.getValue()))
+            .filter(Objects::nonNull)
+            .forEach(timings::add); // create ArrayList and add here instead of .collect(toList()) because arraylist allows us to sort
+
+        // Sort based on the Processor CPU time, highest CPU usage first
+        timings.sort(Comparator.comparing(ProcessorTiming::getCpuNanos).reversed());
+
+        final DecimalFormat dataSizeFormat = new DecimalFormat("#,###,###.##");
+        final DecimalFormat percentageFormat = new DecimalFormat("##.##");
+        final NumberFormat secondsFormat = NumberFormat.getInstance();
+
+        long totalCpuNanos = 0L;
+        long totalProcNanos = 0L;
+        long totalReadNanos = 0L;
+        long totalWriteNanos = 0L;
+        long totalSessionCommitNanos = 0L;
+        long totalBytesRead = 0L;
+        long totalBytesWritten = 0L;
+        long totalGcNanos = 0L;
+
+        // Tally totals for all timing elements
+        for (final ProcessorTiming timing : timings) {
+            totalCpuNanos += timing.getCpuNanos();
+            totalProcNanos += timing.getProcessingNanos();
+            totalReadNanos += timing.getReadNanos();
+            totalWriteNanos += timing.getWriteNanos();
+            totalSessionCommitNanos += timing.getSessionCommitNanos();
+            totalBytesRead += timing.getBytesRead();
+            totalBytesWritten += timing.getBytesWritten();
+            totalGcNanos += timing.getGarbageCollectionNanos();
+        }
+
+        if (totalCpuNanos < 1) {
+            details.add("No Processor Timing Diagnostic information has been gathered.");
+            return new StandardDiagnosticsDumpElement("Processor Timing Diagnostics (Stats over last 5 minutes)", details);
+        }
+
+        details.add(String.format(PROCESSOR_TIMING_FORMAT, "Processor ID", "Processor Name", "Processor Type", "Process Group Name", "Processing Secs",
+            "CPU Secs (% time using CPU)", "Pct CPU Time Used by Proc", "Disk Read Secs", "Disk Write Secs", "Session Commit Secs", "GC Millis", "MB Read", "MB Written"));

Review Comment:
   Understanding the need to balance readability with length, the current headers and column width produce very long lines in the diagnostic output, making it difficult to read without a very high resolution screen.
   
   What do you think about using shorter names for most of the numeric fields, and then condensing the column width for those fields? Even the `Processor Type` field could be shortened, but the biggest gain would be for the numeric columns.  Perhaps using acronyms that could be listed prior to the tabular output?
   
   ```
   PS = Processing Seconds
   CSP = CPU Seconds and percent time using CPU
   PCTP = Percent CPU time used by Processor
   DR = Disk Read Seconds
   DW = Disk Write Seconds
   SC = Session Commit Seconds
   GC = Garbage Collection Milliseconds
   MR = Megabytes Read
   MW = Megabytes Written
   ``` 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org