You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Donal Evans <do...@vmware.com> on 2021/03/12 19:19:17 UTC

Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

After some investigation, it appears that a degradation has been introduced that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. Efforts are underway to narrow down the cause to a single commit, but the degradation was definitely introduced in 1.14, so I believe this should be considered a 1.14 release blocker: https://issues.apache.org/jira/browse/GEODE-8950

Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

Posted by Owen Nichols <on...@vmware.com>.
Thanks for spotting this and looking into it, let's keep it on the blocker list and get a better understanding.

On 3/12/21, 11:21 AM, "Nabarun Nag" <nn...@vmware.com> wrote:

    +1
    ________________________________
    From: Donal Evans <do...@vmware.com>
    Sent: Friday, March 12, 2021 11:19 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

    After some investigation, it appears that a degradation has been introduced that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. Efforts are underway to narrow down the cause to a single commit, but the degradation was definitely introduced in 1.14, so I believe this should be considered a 1.14 release blocker: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&amp;data=04%7C01%7Conichols%40vmware.com%7Cc564f9c25ae3426d970e08d8e58c05f1%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511736847524655%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Bw8wvXC8WPsKTTBE4iOBW5yhXhZ%2FjfMUSeX0%2BUPrDtA%3D&amp;reserved=0


Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

Posted by Nabarun Nag <nn...@vmware.com>.
+1
________________________________
From: Donal Evans <do...@vmware.com>
Sent: Friday, March 12, 2021 11:19 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

After some investigation, it appears that a degradation has been introduced that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. Efforts are underway to narrow down the cause to a single commit, but the degradation was definitely introduced in 1.14, so I believe this should be considered a 1.14 release blocker: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&amp;data=04%7C01%7Cnnag%40vmware.com%7C725b916c911e47e0561408d8e58bc355%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511735732610115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=zWmQr%2FUvmR9bnE%2BeOyL0sHTz03qYW3Hd3aydKabTbyA%3D&amp;reserved=0

Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

Posted by Donal Evans <do...@vmware.com>.
That 4.6% degradation is within our thresholds so it is possible this came in well before the first time it was detected.
After doing some bisecting, I've found that Geode SHA 986733ec (committed on Nov 20th 2020) shows an average degradation of ~1% compared to the 1.13 baseline, whereas e26d7595 (committed on Dec 3rd) shows an average degradation of ~5%, so there's definitely been a performance degradation introduced somewhere between those two commits. The fact that these numbers are coming from an average of 10 runs is relevant too, since part of the reason we have a threshold is because we know that there is some noise associated with the test. An individual degradation of 5% is nothing to worry about, but a consistent average degradation of the same amount is much more serious. I'm continuing work to bisect to an individual SHA and hope to get it pinned down eventually (I currently have a range of about 20 that it could be), but it's slow going as I have to run the benchmark multiple times.
________________________________
From: Jacob Barrett <ja...@vmware.com>
Sent: Friday, March 12, 2021 1:55 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

That 4.6% degradation is within our thresholds so it is possible this came in well before the first time it was detected.

Some context on the benchmark for those unfamiliar: P2pPartitionedPutLongBenchmark performs a fire hose of puts with Long key and Long value directly from each of the two servers. This results in a lot of little p2p messages being exchanged between the servers. Presumably 50% of the puts result in forward to the primary bucket plus the replication message for every put. This test can be susceptible to the smallest alteration in garbage production/collection, hot methods, locks, etc.

This test is a very unlikely scenario in production. I am not sure that constitutes a blocking condition but is troubling. I will give a neutral vote on making it a blocker.

> On Mar 12, 2021, at 11:19 AM, Donal Evans <do...@vmware.com> wrote:
>
> After some investigation, it appears that a degradation has been introduced that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. Efforts are underway to narrow down the cause to a single commit, but the degradation was definitely introduced in 1.14, so I believe this should be considered a 1.14 release blocker: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGEODE-8950&amp;data=04%7C01%7Cdoevans%40vmware.com%7C0f5c6ee03ce8422a03d108d8e5a18490%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C637511829192332012%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hEKlAAtNvguEdp1Ey6vVxzMV%2FuR76jIvcqwt2CAa6JU%3D&amp;reserved=0


Re: Proposal: Add GEODE-8950 (performance degradation in P2pPartitionedPutLongBenchmark) to 1.14 blockers list

Posted by Jacob Barrett <ja...@vmware.com>.
That 4.6% degradation is within our thresholds so it is possible this came in well before the first time it was detected.

Some context on the benchmark for those unfamiliar: P2pPartitionedPutLongBenchmark performs a fire hose of puts with Long key and Long value directly from each of the two servers. This results in a lot of little p2p messages being exchanged between the servers. Presumably 50% of the puts result in forward to the primary bucket plus the replication message for every put. This test can be susceptible to the smallest alteration in garbage production/collection, hot methods, locks, etc. 

This test is a very unlikely scenario in production. I am not sure that constitutes a blocking condition but is troubling. I will give a neutral vote on making it a blocker.

> On Mar 12, 2021, at 11:19 AM, Donal Evans <do...@vmware.com> wrote:
> 
> After some investigation, it appears that a degradation has been introduced that causes the P2pPartitionedPutLongBenchmark to fail at an increased rate. Efforts are underway to narrow down the cause to a single commit, but the degradation was definitely introduced in 1.14, so I believe this should be considered a 1.14 release blocker: https://issues.apache.org/jira/browse/GEODE-8950