You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/07/22 21:31:15 UTC

[jira] Updated: (HADOOP-5318) Poor IO Performance due to AtomicLong operations

     [ https://issues.apache.org/jira/browse/HADOOP-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-5318:
--------------------------------

    Attachment: TestWriteConcurrency.java
                Rplot001.png

Attaching a benchmark and the resulting box plots. This benchmark writes 1M pairs of <ByteWritable, ByteWritable> into a SequenceFile with varying number of threads. The three series are:

green: Before HADOOP-6148's improvement of CRC32
red: With HADOOP-6148's CRC committed (as it is in trunk)
blue: With HADOOP-6148's CRC plus hadoop-5318.txt from this JIRA

The results clearly show that the CRC implementation made the majority of the difference. Adding this patch seems to result in a very slight improvement. Here's the R script I used to generate the plot:

{code}
d.before.crc32 <- read.table(file="results_before_crc32.tsv", header=F)
d.before <- read.table(file="results_before.tsv", header=F)
d.after <- read.table(file="results_after.tsv", header=F)

lims <- c(1000000, 6400000)
log <- "" /* change to "y" to get log scale */

boxplot(V2 ~ V1, d.before.crc32, col="green", ylim=lims, at=(1:8)-0.4, log=log, boxwex=0.2)
boxplot(V2 ~ V1, d.before, col="red", ylim=lims, at=(1:8)-0.2,boxwex=0.2, log=log, add=T)
boxplot(V2 ~ V1, d.after, col="blue", ylim=lims, at=1:8+0.2,boxwex=0.2, log=log,  add=T)
legend(legend=c("before CRC32", "crc, no 5318", "5318+crc"), fill=c("green", "red", "blue"), x="topright",col="black")
{code}

Ran the benchmarks on:

Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

Machine is a Nehalem box, dual quad core with hyperthreading (/proc/cpuinfo shows 16 CPUs: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz)

> Poor IO Performance due to AtomicLong operations
> ------------------------------------------------
>
>                 Key: HADOOP-5318
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5318
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>         Environment: 2x quad core xeon linux 64 bit
>            Reporter: Ben Maurer
>            Assignee: Todd Lipcon
>         Attachments: buf.patch, buffer-output.patch, hadoop-5318.txt, hadoop-5318.txt, Rplot001.png, TestWriteConcurrency.java, TestWriteConcurrency.java, TestWriteConcurrency.java
>
>
> The AtomicLong operations in counting file system statistics can cause high levels of contention with multiple threads. This test demonstrates having multiple threads writing to different sequence files:
> {code:java}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.io.ByteWritable;
> import org.apache.hadoop.io.SequenceFile;
> import org.apache.hadoop.io.SequenceFile.Writer;
> import org.apache.hadoop.io.SequenceFile.CompressionType;
> public class Test {
> 	public static void main(String[] args) throws IOException {
> 		final Configuration c = new Configuration();
> 		final FileSystem fs = FileSystem.get(c);
> 		
> 		final int NUM = 1000*1000;
> 		for (int i = 0; i < Integer.valueOf(args[0]); i ++) {
> 			final int ii = i;
> 			new Thread(new Runnable() {
> 				@Override
> 				public void run() {
> 					
> 					try {
> 						Writer f = SequenceFile.createWriter(fs, c, new Path("/test/" + ii ), ByteWritable.class, ByteWritable.class, CompressionType.NONE);
> 						ByteWritable v = new ByteWritable();
> 						
> 						long time = System.currentTimeMillis();
> 						for (int i = 0; i < NUM; i ++)
> 							f.append(v, v);
> 						f.close();
> 						long end = System.currentTimeMillis();
> 						
> 						System.out.printf("%d opartions in %d msec. %f/second\n", NUM, end - time, (float)(1000 * NUM)/(end - time));
> 						
> 					} catch (Exception e) {
> 						// TODO Auto-generated catch block
> 						e.printStackTrace();
> 					}
> 					
> 				}
> 			}).start();
> 		}
> 	}
> }
> {code}
> The results of this benchmark are
> {code}
> ==== 1 threads ====
> 1000000 opartions in 1431 msec. 698812.000000/second
> ==== 2 threads ====
> 1000000 opartions in 3001 msec. 333222.250000/second
> 1000000 opartions in 2985 msec. 335008.375000/second
> ==== 3 threads ====
> 1000000 opartions in 4923 msec. 203128.171875/second
> 1000000 opartions in 4924 msec. 203086.921875/second
> 1000000 opartions in 4981 msec. 200762.906250/second
> ==== 4 threads ====
> 1000000 opartions in 6716 msec. 148898.156250/second
> 1000000 opartions in 7048 msec. 141884.218750/second
> 1000000 opartions in 7342 msec. 136202.671875/second
> 1000000 opartions in 7344 msec. 136165.578125/second
> ==== 5 threads ====
> 1000000 opartions in 10366 msec. 96469.226563/second
> 1000000 opartions in 11085 msec. 90212.000000/second
> 1000000 opartions in 11121 msec. 89919.968750/second
> 1000000 opartions in 11464 msec. 87229.585938/second
> 1000000 opartions in 11538 msec. 86670.132813/second
> ==== 6 threads ====
> 1000000 opartions in 16513 msec. 60558.347656/second
> 1000000 opartions in 17704 msec. 56484.410156/second
> 1000000 opartions in 18219 msec. 54887.753906/second
> 1000000 opartions in 18550 msec. 53908.355469/second
> 1000000 opartions in 18605 msec. 53748.992188/second
> 1000000 opartions in 18663 msec. 53581.953125/second
> ==== 7 threads ====
> 1000000 opartions in 22207 msec. 45030.847656/second
> 1000000 opartions in 23275 msec. 42964.554688/second
> 1000000 opartions in 23484 msec. 42582.183594/second
> 1000000 opartions in 24378 msec. 41020.593750/second
> 1000000 opartions in 24425 msec. 40941.656250/second
> 1000000 opartions in 24533 msec. 40761.421875/second
> 1000000 opartions in 24645 msec. 40576.183594/second
> ==== 8 threads ====
> 1000000 opartions in 26375 msec. 37914.691406/second
> 1000000 opartions in 26420 msec. 37850.113281/second
> 1000000 opartions in 26532 msec. 37690.335938/second
> 1000000 opartions in 26670 msec. 37495.312500/second
> 1000000 opartions in 29772 msec. 33588.605469/second
> 1000000 opartions in 29859 msec. 33490.738281/second
> 1000000 opartions in 30098 msec. 33224.800781/second
> 1000000 opartions in 30082 msec. 33242.468750/second
> {code}
> However, if I comment out the file system statistics increments, the benchmark improves to:
> {code}
> ==== 1 threads ====
> 1000000 opartions in 1194 msec. 837520.937500/second
> ==== 2 threads ====
> 1000000 opartions in 1433 msec. 697836.687500/second
> 1000000 opartions in 1433 msec. 697836.687500/second
> ==== 3 threads ====
> 1000000 opartions in 1643 msec. 608642.750000/second
> 1000000 opartions in 1643 msec. 608642.750000/second
> 1000000 opartions in 1639 msec. 610128.125000/second
> ==== 4 threads ====
> 1000000 opartions in 1886 msec. 530222.687500/second
> 1000000 opartions in 1886 msec. 530222.687500/second
> 1000000 opartions in 1886 msec. 530222.687500/second
> 1000000 opartions in 1899 msec. 526592.937500/second
> ==== 5 threads ====
> 1000000 opartions in 2065 msec. 484261.500000/second
> 1000000 opartions in 2066 msec. 484027.093750/second
> 1000000 opartions in 2067 msec. 483792.937500/second
> 1000000 opartions in 2066 msec. 484027.093750/second
> 1000000 opartions in 2066 msec. 484027.093750/second
> ==== 6 threads ====
> 1000000 opartions in 2151 msec. 464900.031250/second
> 1000000 opartions in 2111 msec. 473709.156250/second
> 1000000 opartions in 2153 msec. 464468.187500/second
> 1000000 opartions in 2114 msec. 473036.906250/second
> 1000000 opartions in 2113 msec. 473260.781250/second
> 1000000 opartions in 2112 msec. 473484.843750/second
> ==== 7 threads ====
> 1000000 opartions in 2368 msec. 422297.312500/second
> 1000000 opartions in 2334 msec. 428449.000000/second
> 1000000 opartions in 2332 msec. 428816.468750/second
> 1000000 opartions in 2330 msec. 429184.562500/second
> 1000000 opartions in 2332 msec. 428816.468750/second
> 1000000 opartions in 2375 msec. 421052.625000/second
> 1000000 opartions in 2394 msec. 417710.937500/second
> ==== 8 threads ====
> 1000000 opartions in 2517 msec. 397298.375000/second
> 1000000 opartions in 2538 msec. 394011.031250/second
> 1000000 opartions in 2538 msec. 394011.031250/second
> 1000000 opartions in 2538 msec. 394011.031250/second
> 1000000 opartions in 2539 msec. 393855.843750/second
> 1000000 opartions in 2614 msec. 382555.468750/second
> 1000000 opartions in 2666 msec. 375093.781250/second
> 1000000 opartions in 2701 msec. 370233.250000/second
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.