You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/07/22 11:16:00 UTC

[jira] [Commented] (MAPREDUCE-7398) Improve TestDFSIO to support different filesystem

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569954#comment-17569954 ] 

Steve Loughran commented on MAPREDUCE-7398:
-------------------------------------------

moved to the right project; reassigned.

now, because i want to target hadoop 3.3.5, i have an extra feature i want here for maximum benchmark utility;j collection of IOStatistics from the run
once HADOOP-17461 is in (soon!) s3a and very soon abfs will collect thread level iostats. this benchmark should be able to also report back in the reports the serialized IOStatisticsSnapshot, which would then be accumulated and reported the same way.

# before each map operation: reset the IOStatisticsContext
# at the end of it, snapshot to IOStatisticsSnapshot
# convert to json, then to byte array and finally base 64 for embedding in the .text file on a single line (yes, this awful, but the mappers/reducers all send text records)
# aggregate by coverting back tp snapshot, aggregating with the others, reserializing
# collect in main process
# and report with the help of IOStatisticsLogging.ioStatisticsToPrettyString

Am i adding a lot of work? yes. but i know you oracle engineers want to add io stats collection and reporting, if you haven't already,  as well as the hadoop abfs and s3a connectorsl, google GCS has done it in in their code...

> Improve TestDFSIO to support different filesystem
> -------------------------------------------------
>
>                 Key: MAPREDUCE-7398
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7398
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: benchmarks
>    Affects Versions: 3.3.3
>            Reporter: Vijayakumar Govindasamy
>            Assignee: Vijayakumar Govindasamy
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestDFSIO is the tool used for performance benchmarking on Distributed File System. Recently there is a support added for different file system in HDFS. Ex: s3:// (from Amazon), oci:// (from oracle).
>  
> Request is to improve the TestDFSIO code to support newly added filesystem so that it can be used for benchmarking IOPS and Throughput of those filesystems. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org