You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/03/29 16:14:00 UTC

[jira] [Created] (HADOOP-18190) s3a prefetching streams to collect iostats on prefetching operations

Steve Loughran created HADOOP-18190:
---------------------------------------

             Summary: s3a prefetching streams to collect iostats on prefetching operations
                 Key: HADOOP-18190
                 URL: https://issues.apache.org/jira/browse/HADOOP-18190
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.4.0
            Reporter: Steve Loughran




There is a lot more happening in reads, so lot of more to collect and publish in IO stats for us to view in a summary at the end of processes as well as get from the stream while it is active

Some useful ones would seem to be

counters
* is in memory. using 0 or 1 here lets aggregation reports count total #of memory cached files.
* prefetching operations executed
* errors during prefetching


gauges
* number of blocks in cache
* total size of blocks
* active prefetches
+ active memory used

duration tracking count/min/max/ave

* time to fetch a block 
* time queued before the actual fetch begins
* time a reader is blocked waiting for a block fetch to complete


and some info on cache use itself

* number of blocks discarded unread
* number of prefetched blocks later used
* number of backward seeks to a prefetched block
* number of forward seeks to a prefetched block

the key ones I care about are 
# memory consumption
# can we determine if cache is working (reads with cache hit) and when it is not (misses, wasted prefetches)
# time blocked on executors

The stats need to be accessible on a stream even when closed, and aggregated into the FS. once we get per-thread stats contexts we can publish there too and collect in worker threads for reporting in task commits





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org