You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Hale Bales (Jira)" <ji...@apache.org> on 2021/02/27 00:18:00 UTC
[jira] [Comment Edited] (GEODE-8950) Benchmark failure -
P2pPartitionedPutLongBenchmark
[ https://issues.apache.org/jira/browse/GEODE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291981#comment-17291981 ]
Hale Bales edited comment on GEODE-8950 at 2/27/21, 12:17 AM:
--------------------------------------------------------------
* the first known CI failure was on 02/04/2021
* we do not have CI history before 02/01/2021
* these failures are occuring both in CI and when run using the scripts
* the test that is failing was added in November of 2020
* running develop against 1.13.0 does not produce consistent benchmark results
* running with a baseline of 1.13.1 does not improve the failure rate
* running 1.13.0 against itself does not produce consistently passing results
* running develop against itself does not produce consistently passing results
* there have been no changes to benchmarks this year (as of feb 26, 2021)
* there do not appear to be any suspect changes to geode core made this year
** Jake Barrett, Donal Evans, and I have all looked at the commits
** no commits are in the right area of the code
** I have tested all code changes that even had the slightest chance of changing the performance in P2pPartitionedPutLongBenchmark
** the changes to dependencies do not seem to have changed the performance
* profiling the test for the following did not produce any useful information:
** cpu usage
** allocations
** locks
* looking at the gfs logs showed that (on a failing run):
** develop did fewer puts than 1.13.0
** develop had less cpu activity
** develop received fewer bytes
** these results are expected for a run where develop had lower throughput than 1.13.0
* this benchmark has a very small payload size
** in the past the performance team saw a high degree of sensitivity in tests with small payloads
conclusions:
* these failures do not appear to be caused by any code change
* these failures do not appear to be caused by any benchmarking change
* these failures do not appear to be caused by any dependency change
* the instability when running the same version/commit against itself points to the issue being the overhead for each operation for such a small payload
* there is no data to support that this failure is occuring more often than previously
proposed next stepts:
* keep running this test and keep track of the failure rate
* if the failure rate increases, investigate the peer-to-peer code
* if the failure rate stays the same, comment out the test
* long term, invest time in a significant refactor of the peer-to-peer code
was (Author: balesh2):
- the first known CI failure was on 02/04/2021
- we do not have CI history before 02/01/2021
- these failures are occuring both in CI and when run using the scripts
- the test that is failing was added in November of 2020
- running develop against 1.13.0 does not produce consistent benchmark results
- running with a baseline of 1.13.1 does not improve the failure rate
- running 1.13.0 against itself does not produce consistently passing results
- running develop against itself does not produce consistently passing results
- there have been no changes to benchmarks this year (as of feb 26, 2021)
- there do not appear to be any suspect changes to geode core made this year
- Jake Barrett, Donal Evans, and I have all looked at the commits
- no commits are in the right area of the code
- I have tested all code changes that even had the slightest chance of
changing the performance in P2pPartitionedPutLongBenchmark
- the changes to dependencies do not seem to have changed the performance
- profiling the test for the following did not produce any useful information:
- cpu usage
- allocations
- locks
- looking at the gfs logs showed that (on a failing run):
- develop did fewer puts than 1.13.0
- develop had less cpu activity
- develop received fewer bytes
- these results are expected for a run where develop had lower throughput than
1.13.0
- this benchmark has a very small payload size
- in the past the performance team saw a high degree of sensitivity in tests
with small payloads
conclusions:
- these failures do not appear to be caused by any code change
- these failures do not appear to be caused by any benchmarking change
- these failures do not appear to be caused by any dependency change
- the instability when running the same version/commit against itself points to
the issue being the overhead for each operation for such a small payload
- there is no data to support that this failure is occuring more often than
previously
proposed next stepts:
- keep running this test and keep track of the failure rate
- if the failure rate increases, investigate the peer-to-peer code
- if the failure rate stays the same, comment out the test
- long term, invest time in a significant refactor of the peer-to-peer code
> Benchmark failure - P2pPartitionedPutLongBenchmark
> --------------------------------------------------
>
> Key: GEODE-8950
> URL: https://issues.apache.org/jira/browse/GEODE-8950
> Project: Geode
> Issue Type: Bug
> Components: benchmarks
> Affects Versions: 1.15.0
> Reporter: Donal Evans
> Assignee: Hale Bales
> Priority: Major
>
> Multiple benchmark failures due to P2pPartitionedPutLongBenchmark have been seen recently.
> This run saw 3 out of the 5 repeats fail due to flagged degradations in P2pPartitionedPutLongBenchmark: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/16#L601ed52d:5552]
> This run saw 1 out of the 5 repeats fail due to flagged degradations in P2pPartitionedPutLongBenchmark: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/20]
> This run saw 4 out of the 5 repeats fail due to flagged degradations in P2pPartitionedPutLongBenchmark: [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/Benchmark_base/builds/27]
> In all the above benchmarks, the other failed runs were due to EOFExceptions rather than flagged degradations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)