You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/04/13 12:04:40 UTC

[GitHub] [hadoop-ozone] captainzmc opened a new pull request #716: HDDS-3155. Improved ozone client flush implementation to make it faster.

captainzmc opened a new pull request #716: HDDS-3155. Improved ozone client flush implementation to make it faster.
URL: https://github.com/apache/hadoop-ozone/pull/716

## What changes were proposed in this pull request?

When we run MR Job (with 1000 maps) based on OzoneFileSystem. After the map and reduce has finished 100%, the appmaster pauses More than 40 minutes .
`20/03/05 14:43:33 INFO mapreduce.Job: map 100% reduce 100% `
`20/03/05 15:29:52 INFO mapreduce.Job: Job job_1583385253878_0002 completed successfully`
It turns out that the appmaster writes all the task events to the log one by one, calling flush once for each one. This operation is very time consuming in ozone.

HDFS currently has two flush ports, flush () and hflush ().
flush() : flush the data from client buffer to the client package (dfs.write.packet.size default 64k). If the package is not full, it will not be sent to the datanode.
hflush(): each invocation sends the data in the buffer to the datanode.

Now, ozone's flush is more similar to HDFS's hflush. This PR adds an implementation of flush similar to HDFS‘s flush. Using ozone.client.stream.buffer.flush.delay to control whether to enable(not enabled by default). If we enabled it, when we call the flush() method, we will determine whether the data in the current buffer is greater than ozone.client.stream.buffer.size. If greater than, we will send it to the datanode. Otherwise, we will not send it.

The flush performance has been significantly improved through testing. The job is no longer blocked, It will take 1 second to exit after MR finished.
`20/03/25 11:04:04 INFO mapreduce.Job: map 100% reduce 100%`
`20/03/25 11:04:05 INFO mapreduce.Job: Job job_1585104739905_0002 completed successfully`

## What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-3155

## How was this patch tested?

Run yarn on the ozone, perform the testdfsio job below, start a thousand maps. And see the exit time after map and reduce 100%.
`hadoop jar /path/of/hadoop-mapreduce-client-jobclient-2.8.5-tests.jar TestDFSIO -write -nrFiles 1000 -fileSize 1KB -resFile /tmp/dfsio-write.out`

Add the following configuration in ozone-site.xml and repeat the above command to see the execution.
`<property>`
` <name>ozone.client.stream.buffer.flush.delay</name>`
` <value>true</value>`
`</property>`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org