You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2016/05/18 16:38:13 UTC

[jira] [Updated] (HADOOP-13171) Add StorageStatistics to S3A; instrument some more operations

     [ https://issues.apache.org/jira/browse/HADOOP-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-13171:
------------------------------------
    Attachment: HADOOP-13171-branch-2-001.patch

Patch 001.

This patch is designed to work on branch2+ the HADOOP-13130 error handling; the diff includes all so that it actually applies. The real diff is smaller.

# Adds a new {{S3AStorageStatistics}} class; collects storage statistics.
# Adds {{org.apache.hadoop.fs.s3a.Statistic}}: an enum of statistics supported in both {{S3AInstrumentation}} and {{S3AStorageStatistics}}.
# Isolates the calls in {{S3AFilesystem}} to {{getObjectMetadata(}} and {{listObjects()}} into their own methods, which increment the statistics and instrumentation counters.
# added some tests which compare the metrics before/after some of the filesystem operations, making assertions about their values {{TestS3AFileStatusCost}}, and operations on directory trees {{TestS3ADirectoryPerformance}}

I'd initially tried to bond the metrics and the statistics but backed off; it was complex the StorageStatistics lifecycle is not per instance. Instead they are just incremented in the same place.

There's no attempt made for any performance optimisation of {{getFileStatus()}} or directory listing operations. I'd tried with reordering operations inside {{getFileStatus()}} to make one fewer call when listing a directory with children (i.e. giving that situation priority), but it didn't work, and it wasn't clear that the optimisation would have delivered much instead.

What this patch then is the StorageStatistics and some tests. 

The StorageStatistics can be picked up by any code which handles the new stats. 

The tests will catch out if there's ever a regression in getFileStatus(). If there are more checks taking place. And they will act as a starting point for anyone looking at tuning the directory/directory tree listing operations. 

Test run against AWS Ireland: in progress

> Add StorageStatistics to S3A; instrument some more operations
> -------------------------------------------------------------
>
>                 Key: HADOOP-13171
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13171
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13171-branch-2-001.patch
>
>
> Add {{StorageStatistics}} support to S3A, collecting the same metrics as the instrumentation, but sharing across all instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org