You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/03/10 15:14:04 UTC

[jira] [Commented] (HADOOP-13453) S3Guard: Instrument new functionality with Hadoop metrics.

    [ https://issues.apache.org/jira/browse/HADOOP-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905231#comment-15905231 ] 

Steve Loughran commented on HADOOP-13453:
-----------------------------------------

I'm afraid HADOOP-13914 has just broken the patch, which means, sadly, you get to do the merge. Let's get this in *before* anything else traumatic comes in, so other patches get to suffer next time.

I like what you've done measuring latency as well as counts. I think we could actually do this more broadly. I think the timing counting should be in a finally() clause though, so timings for failures get included too. (side issue: count success and failures separately? with different timings?)

I would like to think about how we could avoiding having to pass the instrumentation around all the time. Ideally, we could just pass it in as a constructor to the metadata store. Alternatively, that store could collect metrics and we could wire it up, but I don't see an easy way to do that in Hadoop metrics (compared to Coda Hale's). The easiest would be just to pass in the S3AInstrumentation (or an inner class) down, but currently the metastore interface is not specific to S3A only.

If we add an interface for metadata store instrumentation, then S3AInstrumentation can implement it in an inner class, and S3AFS can pass it down during initialization. Th's would let the metastore do all it wants, with well defined strings, of course.

What do people think?


> S3Guard: Instrument new functionality with Hadoop metrics.
> ----------------------------------------------------------
>
>                 Key: HADOOP-13453
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13453
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Ai Deng
>         Attachments: HADOOP-13453-HADOOP-13345-001.patch, HADOOP-13453-HADOOP-13345-002.patch
>
>
> Provide Hadoop metrics showing operational details of the S3Guard implementation.
> The metrics will be implemented in this ticket:
> ● S3GuardRechecksNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent
> in rechecks attempting to achieve consistency. Repeated for multiple percentile values
> of N.  This metric is an indicator of the additional latency cost of running S3A with
> S3Guard.
> ● S3GuardRechecksNumOps (MutableQuantiles) ­​ Number of times a consistency
> recheck was required while attempting to achieve consistency.
> ● S3GuardStoreNthPercentileLatency (MutableQuantiles) ­​ Percentile time spent in
> operations against the consistent store, including both write operations during file system
> mutations and read operations during file system consistency checks. Repeated for
> multiple percentile values of N. This metric is an indicator of latency to the consistent
> store implementation.
> ● S3GuardConsistencyStoreNumOps (MutableQuantiles) ­​ Number of operations
> against the consistent store, including both write operations during file system mutations
> and read operations during file system consistency checks.
> ● S3GuardConsistencyStoreFailures (MutableCounterLong) ­​ Number of failures
> during operations against the consistent store implementation.
> ● S3GuardConsistencyStoreTimeouts (MutableCounterLong) ­​ Number of timeouts
> during operations against the consistent store implementation.
> ● S3GuardInconsistencies (MutableCounterLong) ­ C​ ount of times S3Guard failed to
> achieve consistency, even after exhausting all rechecks. A high count may indicate
> unexpected out­of­band modification of the S3 bucket contents, such as by an external
> tool that does not make corresponding updates to the consistent store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org