You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by revans2 <gi...@git.apache.org> on 2015/10/10 22:55:07 UTC

[GitHub] storm pull request: Disruptor batching v2

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/765#issuecomment-147124664
  
    I have some new test results.  I did a comparison of several different branches.  I looked at this branch, the upgraded-disruptor branch #750, STORM-855 #694, and apache-master 0.11.0-SNAPSHOT (04cf3f6162ce6fdd1ec13b758222d889dafd5749).  I had to make a few modifications to get my test to work.  I applied the following patch https://gist.github.com/revans2/84301ef0fde0dc4fbe44 to each of the branches.  For STORM-855 I had to modify the test a bit so it would optionally do batching.  In that case batching was enabled on all streams and all spouts and bolts.
    
    I then ran the test at various throughputs 100, 200, 400, 800, 1600, 3200, 6400, 10000, 12800, 25600. and possibly a few others when looking for it to hit the maximum throughput, and different batch sizes.
    
    Each test ran for 5 mins.  Here is the results of that test, excluding the tests where the worker could not keep up with the rate.
    
    | 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | avg service latency ms | std-dev ns |
    |---|---|---|---|---|---|---|
    | 2,613,247 | 4,673,535 | 100 | STORM-855-0 | 2,006,347.25 | 1.26 | 2,675,778.36 |
    | 2,617,343 | 4,423,679 | 200 | STORM-855-0 | 1,991,238.45 | 1.29 | 2,024,687.45 |
    | 2,623,487 | 5,619,711 | 400 | STORM-855-0 | 1,999,926.81 | 1.24 | 1,778,335.92 |
    | 2,627,583 | 4,603,903 | 1600 | STORM-855-0 | 1,971,888.24 | 1.30 | 893,085.40 |
    | 2,635,775 | 8,560,639 | 800 | STORM-855-0 | 2,010,286.65 | 1.35 | 2,134,795.12 |
    | 2,654,207 | 302,252,031 | 3200 | STORM-855-0 | 2,942,360.75 | 2.13 | 16,676,136.60 |
    | 2,684,927 | 124,190,719 | 3200 | batch-v2-1 | 2,154,234.45 | 1.41 | 6,219,057.66 |
    | 2,701,311 | 349,700,095 | 5000 | batch-v2-1 | 2,921,661.67 | 1.78 | 18,274,805.30 |
    | 2,715,647 | 7,356,415 | 100 | storm-base-1 | 2,092,991.53 | 1.30 | 2,447,956.21 |
    | 2,723,839 | 4,587,519 | 400 | storm-base-1 | 2,082,835.21 | 1.31 | 1,978,424.49 |
    | 2,723,839 | 6,049,791 | 100 | dist-upgraade-1 | 2,091,407.68 | 1.31 | 2,222,977.89 |
    | 2,725,887 | 10,403,839 | 1600 | batch-v2-1 | 2,010,694.30 | 1.27 | 2,095,223.90 |
    | 2,725,887 | 4,607,999 | 200 | storm-base-1 | 2,074,784.50 | 1.30 | 1,951,564.93 |
    | 2,727,935 | 4,513,791 | 200 | dist-upgraade-1 | 2,082,025.31 | 1.33 | 2,057,591.08 |
    | 2,729,983 | 4,182,015 | 400 | dist-upgraade-1 | 2,056,282.29 | 1.43 | 862,428.67 |
    | 2,732,031 | 4,632,575 | 800 | storm-base-1 | 2,092,514.39 | 1.27 | 2,231,550.66 |
    | 2,734,079 | 4,472,831 | 800 | dist-upgraade-1 | 2,095,994.08 | 1.28 | 1,870,953.62 |
    | 2,740,223 | 4,192,255 | 200 | batch-v2-1 | 2,011,025.19 | 1.21 | 911,556.19 |
    | 2,742,271 | 4,726,783 | 1600 | storm-base-1 | 2,089,581.40 | 1.35 | 2,410,668.79 |
    | 2,748,415 | 4,444,159 | 400 | batch-v2-1 | 2,055,600.78 | 1.34 | 1,729,257.92 |
    | 2,748,415 | 4,575,231 | 100 | batch-v2-1 | 2,035,920.21 | 1.31 | 1,213,874.52 |
    | 2,754,559 | 16,875,519 | 1600 | dist-upgraade-1 | 2,098,441.13 | 1.35 | 2,279,870.41 |
    | 2,754,559 | 3,969,023 | 800 | batch-v2-1 | 2,026,222.88 | 1.29 | 767,491.71 |
    | 2,793,471 | 53,477,375 | 3200 | storm-base-1 | 2,147,360.05 | 1.42 | 3,668,366.37 |
    | 2,801,663 | 147,062,783 | 3200 | dist-upgraade-1 | 2,358,863.31 | 1.59 | 7,574,577.81 |
    | 13,344,767 | 180,879,359 | 6400 | batch-v2-100 | 11,319,553.69 | 10.62 | 7,777,381.54 |
    | 13,369,343 | 15,122,431 | 3200 | batch-v2-100 | 10,699,832.23 | 10.02 | 1,623,949.38 |
    | 13,418,495 | 15,392,767 | 800 | batch-v2-100 | 10,589,813.17 | 9.86 | 2,439,134.80 |
    | 13,426,687 | 14,680,063 | 400 | batch-v2-100 | 10,738,973.68 | 10.03 | 2,298,229.99 |
    | 13,484,031 | 14,368,767 | 200 | batch-v2-100 | 10,941,653.28 | 10.20 | 2,471,899.43 |
    | 13,508,607 | 14,262,271 | 100 | batch-v2-100 | 11,099,257.68 | 10.35 | 1,658,054.66 |
    | 13,524,991 | 14,376,959 | 1600 | batch-v2-100 | 10,723,471.83 | 10.00 | 1,477,621.07 |
    | 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 | 78,326,501.83 |
    | 710,934,527 | 827,326,463 | 4000 | STORM-855-100 | 351,305,653.90 | 339.28 | 141,283,307.30 |
    | 783,286,271 | 1,268,776,959 | 5000 | STORM-855-100 | 332,417,358.65 | 312.07 | 139,760,316.82 |
    | 888,668,159 | 1,022,361,599 | 3200 | STORM-855-100 | 445,646,342.60 | 431.55 | 179,065,279.65 |
    | 940,048,383 | 1,363,148,799 | 6400 | storm-base-1 | 20,225,300.17 | 17.17 | 134,848,974.52 |
    | 1,043,333,119 | 1,409,286,143 | 10000 | batch-v2-1 | 22,750,840.18 | 6.13 | 146,235,076.73 |
    | 1,209,008,127 | 1,786,773,503 | 6400 | dist-upgraade-1 | 28,588,397.01 | 24.70 | 181,801,409.69 |
    | 1,747,976,191 | 1,946,157,055 | 1600 | STORM-855-100 | 738,741,774.85 | 734.75 | 374,194,675.56 |
    | 2,642,411,519 | 3,124,756,479 | 20000 | batch-v2-100 | 133,706,248.88 | 51.67 | 497,027,226.45 |
    | 3,374,317,567 | 3,892,314,111 | 10000 | dist-upgraade-1 | 141,866,760.39 | 69.39 | 589,014,777.73 |
    | 3,447,717,887 | 3,869,245,439 | 10000 | storm-base-1 | 139,149,514.03 | 56.45 | 609,509,456.98 |
    | 3,456,106,495 | 3,953,131,519 | 22000 | batch-v2-100 | 274,785,584.11 | 93.37 | 743,434,065.83 |
    | 3,512,729,599 | 3,898,605,567 | 800 | STORM-855-100 | 1,354,193,514.47 | 1,361.58 | 779,667,263.64 |
    | 3,963,617,279 | 4,416,602,111 | 5500 | STORM-855-100 | 450,364,286.22 | 415.96 | 575,017,536.40 |
    | 4,185,915,391 | 5,347,737,599 | 4500 | STORM-855-0 | 366,268,233.66 | 259.94 | 995,928,429.75 |
    | 4,919,918,591 | 5,582,618,623 | 6000 | STORM-855-100 | 534,520,242.96 | 497.47 | 758,754,139.61 |
    | 4,919,918,591 | 5,582,618,623 | 6000 | STORM-855-100 | 534,520,242.96 | 497.47 | 758,754,139.61 |
    | 7,071,596,543 | 7,843,348,479 | 400 | STORM-855-100 | 2,652,137,010.52 | 2,630.51 | 1,589,666,333.78 |
    | 14,159,970,303 | 15,653,142,527 | 200 | STORM-855-100 | 5,202,877,719.25 | 5,206.33 | 3,199,275,795.66 |
    | 27,648,851,967 | 31,205,621,759 | 100 | STORM-855-100 | 10,201,124,134.76 | 10,169.37 | 6,289,786,882.10 |
    
    
    I then filtered the list to show the maximum throughput for a given latency (several different ones)
    
    99th percentile:
    
    | 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | avg service latency ms | std-dev ns |
    |---|---|---|---|---|---|---|
    | 2,613,247 | 4,673,535 | 100 | STORM-855-0 | 2,006,347.25 | 1.26 | 2,675,778.36 |
    | 2,617,343 | 4,423,679 | 200 | STORM-855-0 | 1,991,238.45 | 1.29 | 2,024,687.45 |
    | 2,623,487 | 5,619,711 | 400 | STORM-855-0 | 1,999,926.81 | 1.24 | 1,778,335.92 |
    | 2,627,583 | 4,603,903 | 1600 | STORM-855-0 | 1,971,888.24 | 1.30 | 893,085.40 |
    | 2,654,207 | 302,252,031 | 3200 | STORM-855-0 | 2,942,360.75 | 2.13 | 16,676,136.60 |
    | 2,701,311 | 349,700,095 | 5000 | batch-v2-1 | 2,921,661.67 | 1.78 | 18,274,805.30 |
    | 13,344,767 | 180,879,359 | 6400 | batch-v2-100 | 11,319,553.69 | 10.62 | 7,777,381.54 |
    | 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 | 78,326,501.83 |
    | 2,642,411,519 | 3,124,756,479 | 20000 | batch-v2-100 | 133,706,248.88 | 51.67 | 497,027,226.45 |
    | 3,456,106,495 | 3,953,131,519 | 22000 | batch-v2-100 | 274,785,584.11 | 93.37 | 743,434,065.83 |
    
    99.9th percentile:
    
    | 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | avg service latency ms | std-dev ns |
    |---|---|---|---|---|---|---|
    | 2,754,559 | 3,969,023 | 800 | batch-v2-1 | 2,026,222.88 | 1.29 | 767,491.71 |
    | 2,627,583 | 4,603,903 | 1600 | STORM-855-0 | 1,971,888.24 | 1.30 | 893,085.40 |
    | 13,369,343 | 15,122,431 | 3200 | batch-v2-100 | 10,699,832.23 | 10.02 | 1,623,949.38 |
    | 13,344,767 | 180,879,359 | 6400 | batch-v2-100 | 11,319,553.69 | 10.62 | 7,777,381.54 |
    | 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 | 78,326,501.83 |
    | 2,642,411,519 | 3,124,756,479 | 20000 | batch-v2-100 | 133,706,248.88 | 51.67 | 497,027,226.45 |
    | 3,456,106,495 | 3,953,131,519 | 22000 | batch-v2-100 | 274,785,584.11 | 93.37 | 743,434,065.83 |
    
    mean latency:
    
    | 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | avg service latency ms | std-dev ns |
    |---|---|---|---|---|---|---|
    | 2,627,583 | 4,603,903 | 1600 | STORM-855-0 | 1,971,888.24 | 1.30 | 893,085.40 |
    | 2,793,471 | 53,477,375 | 3200 | storm-base-1 | 2,147,360.05 | 1.42 | 3,668,366.37 |
    | 2,701,311 | 349,700,095 | 5000 | batch-v2-1 | 2,921,661.67 | 1.78 | 18,274,805.30 |
    | 13,344,767 | 180,879,359 | 6400 | batch-v2-100 | 11,319,553.69 | 10.62 | 7,777,381.54 |
    | 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 | 78,326,501.83 |
    | 2,642,411,519 | 3,124,756,479 | 20000 | batch-v2-100 | 133,706,248.88 | 51.67 | 497,027,226.45 |
    | 3,456,106,495 | 3,953,131,519 | 22000 | batch-v2-100 | 274,785,584.11 | 93.37 | 743,434,065.83 |
    
    service latency ms (storm's complete latency):
    
    | 99%-ile ns | 99.9%-ils ns | throughput | branch-batch | mean latency ns | avg service latency ms | std-dev ns |
    |---|---|---|---|---|---|---|
    | 2,740,223 | 4,192,255 | 200 | batch-v2-1 | 2,011,025.19 | 1.21 | 911,556.19 |
    | 2,623,487 | 5,619,711 | 400 | STORM-855-0 | 1,999,926.81 | 1.24 | 1,778,335.92 |
    | 2,725,887 | 10,403,839 | 1600 | batch-v2-1 | 2,010,694.30 | 1.27 | 2,095,223.90 |
    | 2,684,927 | 124,190,719 | 3200 | batch-v2-1 | 2,154,234.45 | 1.41 | 6,219,057.66 |
    | 2,701,311 | 349,700,095 | 5000 | batch-v2-1 | 2,921,661.67 | 1.78 | 18,274,805.30 |
    | 1,043,333,119 | 1,409,286,143 | 10000 | batch-v2-1 | 22,750,840.18 | 6.13 | 146,235,076.73 |
    | 346,554,367 | 977,272,831 | 12800 | batch-v2-100 | 18,596,303.93 | 15.59 | 78,326,501.83 |
    | 2,642,411,519 | 3,124,756,479 | 20000 | batch-v2-100 | 133,706,248.88 | 51.67 | 497,027,226.45 |
    | 3,456,106,495 | 3,953,131,519 | 22000 | batch-v2-100 | 274,785,584.11 | 93.37 | 743,434,065.83 |
    
    I also looked at about the maximum throughput each branch could handle.
    
    | branch-batch | throughput | mean latency | 99%-lie latency |
    |---|---|---|---|
    | STORM-855-0 | 4,500 | 366,268,233.66 | 4,185,915,391 |
    | STORM-855-100 | 5,500 | 450,364,286.22 | 3,963,617,279 |
    | storm-base-1 | 10,000 | 139,149,514.03 | 3,447,717,887 |
    | dist-upgrade-1 | 10,000 | 141,866,760.39 | 3,374,317,567 |
    | batch-v2-1 | 10,000 | 22,750,840.18 | 1,043,333,119 |
    | batch-v2-100 | 22,000 | 274,785,584.11 | 3,456,106,495 |
    
    I really would like some feedback here, because these numbers seem to contradict STORM-855 using my original speed of light test.  I don't really like that test, even though I wrote it, because the throughput is limited only by storm, so with acking disabled it is measuring what the latency is when we hit the wall, and cannot provide any more throughput.  No one should run in production that way. When acking is enabled and we are using max-spout pending for flow control the throughput is directly related to the end to end latency.  This too shouldn't be the common case in production because it means we cannot keep up with the incoming rate and are falling behind.
    
    This seems to indicate that the only time STORM-855 makes since is when looking at the 99%-ile latency at a very low throughput, and then it only seems to save 1/20th of a ms advantage over the others.  In other cases it looks like the throughput per host it can support is about 1/2 of that without the change.  This branch however has a weakness on the low end when batching is enabled it is about 12 ms slower, but on the high end it can handle more then 2x the throughput with little change to the latency.  If that 12 ms is important I think we can mitigate it by allowing the batch size to self-adjust on a per queue bases.
    
    I really would like others to look at my numbers and my test to see if there are issues with it that I am missing, because like I said it seems to contradict the numbers from STORM-855.  The only thing I can think of is that the messaging layer is the bottleneck in the speed of light test, which is what it was intended to stress test, and STORM-855 is giving a significant batching advantage there.  If that is the case then we should look at what STORM-855 is doing around that to try and combine it with the batching we are doing here.
    
    @ptgoetz @d2r @rfarivar @mjsax @kishorvpatil @knusbaum please let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---