You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Mehakmeet Singh (Jira)" <ji...@apache.org> on 2021/03/01 14:26:00 UTC

[jira] [Commented] (HADOOP-17553) FileSystem.close() to optionally log IOStats; save to local dir

    [ https://issues.apache.org/jira/browse/HADOOP-17553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292910#comment-17292910 ] 

Mehakmeet Singh commented on HADOOP-17553:
------------------------------------------

So, few doubts I had:
 * Creating JSON is also optional, right?
 * We are saving a JSON at every .close() and only when debug is on?
 * So, basically, we would have a map of <FileSystem, IOStats> in the Filesystem.java, right? So, that we could relate the IOStats for the filesystem in use in FileSystem.java?
 * We would have .json for every principal and for every job we run? Would that be a lot in terms of space?
 *  
{quote}extend IOStatisticsSnapshot with list of <string, string> options for use in annotating saved logs (hostname, principal, jobID, ...). Don’t know how to merge these.
{quote}Didn’t understand this bit. Do you mean, having some particular attribute as a key to IOStatsSnapshot instance to check up stats regarding that attribute?
 * Rajesh's point on how/who we will clear these files as well?

> FileSystem.close() to optionally log IOStats; save to local dir
> ---------------------------------------------------------------
>
>                 Key: HADOOP-17553
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17553
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.3.1
>            Reporter: Steve Loughran
>            Assignee: Mehakmeet Singh
>            Priority: Major
>
> We could save the IOStats to a local temp dir as JSON (the snapshot is designed to be serializable, even has a test), with a unique name (iostats-stevel-s3a-bucket1-timestamp-random#.json ... etc). 
> We can collect these (Rajesh can, anyway), and then
> * look for load on a specific bucket
> * look what happened at a specific time
> The best bit: the IOStatisticsSnapshot aggregates counters, min/max/mean, so you could merge iostats-*-s3a-bucket1-*.json to get the IOStats of all principals working with a given bucket
> This will be local, so low cost, low cost enough we could turn it on in production. All that's needed is collection of the stats from the local hosts (or they write to a shared mounted volume)
> We will need some "hadoop iostats merge" command to take multiple files and merge them all together; print to screen or save to a new file. Straightforward as all the load and merge code is present.
> Needs
> * logging in FS.close
> * new iostats CLI + docs, tests
> * extend IOStatisticsSnapshot with list of <string, string> options for use in annotating saved logs (hostname, principal, jobID, ...). Don't know how to merge these.
> If we are going to add a new context map to the IOStatisticsSnapshot then we MUST update it before 3.3.1 ships so as to avoid breaking the serialization format on the next release, especially the java one. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org