You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/07/15 16:23:38 UTC

[GitHub] [hadoop] steveloughran commented on a change in pull request #2069: HADOOP-16830. IOStatistics API.

steveloughran commented on a change in pull request #2069:
URL: https://github.com/apache/hadoop/pull/2069#discussion_r455174555



##########
File path: hadoop-common-project/hadoop-common/src/site/markdown/filesystem/iostatistics.md
##########
@@ -0,0 +1,432 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Statistic collection with the IOStatistics API
+
+```java
+@InterfaceAudience.Public
+@InterfaceStability.Unstable
+```
+
+The `IOStatistics` API is intended to provide statistics on individual IO
+classes -such as input and output streams, *in a standard way which 
+applications can query*
+
+Many filesystem-related classes have implemented statistics gathering
+and provided private/unstable ways to query this, but as they were
+not common across implementations it was unsafe for applications
+to reference these values. Example: `S3AInputStream` and its statistics
+API. This is used in internal tests, but cannot be used downstream in
+applications such as Apache Hive or Apache HBase.
+
+The IOStatistics API is intended to 
+
+1. Be instance specific:, rather than shared across multiple instances
+   of a class, or thread local.
+1. Be public and stable enough to be used by applications.
+1. Be easy to use in applications written in Java, Scala, and, via libhdfs, C/C++
+1. Have foundational interfaces and classes in the `hadoop-common` JAR.
+
+## Core Model
+
+Any class *may* implement `IOStatisticsSource` in order to
+provide statistics.
+
+Wrapper I/O Classes such as `FSDataInputStream` anc `FSDataOutputStream` *should*
+implement the interface and forward it to the wrapped class, if they also
+implement it -and return `null` if they do not.
+
+`IOStatisticsSource` implementations `getIOStatistics()` return an
+instance of `IOStatistics` enumerating the statistics of that specific
+instance.
+
+The `IOStatistics` Interface exports five kinds of statistic:
+
+
+| Category | Type | Description |
+|------|------|-------------|
+| `counter`        | `long`          | a counter which may increase in value; SHOULD BE >= 0 |
+| `gauge`          | `long`          | an arbitrary value which can down as well as up; SHOULD BE >= 0|
+| `minimum`        | `long`          | an minimum value; MAY BE negative |
+| `maximum`        | `long`          | a maximum value;  MAY BE negative |
+| `meanStatistic` | `MeanStatistic` | an arithmetic mean and sample size; mean MAY BE negative|
+
+Four are simple `long` values, with the variations how they are likely to
+change and how they are aggregated.
+
+
+#### Aggregation of Statistic Values
+
+For the different statistic category, the result of `aggregate(x, y)` is
+
+| Category         | Aggregation |
+|------------------|-------------|
+| `counter`        | `min(0, x) + min(0, y)`     |

Review comment:
       yeah, but if something negative has got in I don't want it tainting the results




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org