You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Bill Burcham (Jira)" <ji...@apache.org> on 2020/06/25 18:06:00 UTC

[jira] [Updated] (GEODE-8299) when an SNI proxy drops connections Geode benchmarks fail

     [ https://issues.apache.org/jira/browse/GEODE-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Burcham updated GEODE-8299:
--------------------------------
    Description: 
In developing the SNI topology support for the benchmarks in the {{geode-benchmarks}} repo, we noticed that when we configured the SNI proxy (HAproxy) with timeouts (e.g. 1 second or 10 seconds), the benchmark code would crash.

To work around the issue, for purposes of benchmarking, we turned off those timeouts. By leaving connection and request and response timeouts unspecified, HAproxy essentially makes the timeouts infinite—connections are never timed out.

While we haven't dug deep enough into the issue to know for sure if it is a Geode product issue or an HAproxy one, or some problem in the benchmark logic, it seemed prudent to open a ticket to ensure this issue is not swept under the rug.

[~jbarrett] suspects that there might be a problem with the Geode client connection pool logic—that perhaps when the proxy closes a connection, the corresponding connection in the pool is not being removed correctly, or it's not being removed quickly enough. But this is only a suspicion. It needs more investigation.

h2. Reproduction

To see the problem, clone https://github.com/apache/geode-benchmarks and edit {{StartSniProxy.generateHaProxyConfig()}} by un-commenting these lines:

{code}
            // + " timeout client 100s\n"
            // + " timeout connect 100s\n"
            // + " timeout server 100s\n"
{code}

And setting them each to {{1s}} (one second, not 100 seconds.)

Be sure to un-comment the corresponding lines in the unit test {{StartSniProxyTest.generateConfigTest()}} so the test passes.

Now run the scripts (see the _SNI_ section of the {{README.md}} in that repo):

{code}
./launch_cluster -t anytagname 5
./run_tests.sh -t anytagname -- -PwithWarmup=10 -PwithDuration=30 -PwithSniProxy '--tests=PartitionedGetBenchmark'
{code}

That'll run the "get" benchmark on a partitioned region. You can run other (all) the tests later after the bug is fixed.

But you should see the test crash with that as-is. If it doesn't crash you can lengthen the {{withDuration}} to say 60 seconds or more.

  was:
In developing the SNI topology support for the benchmarks in the {{geode-benchmarks}} repo, we noticed that when we configured the SNI proxy (HAproxy) with timeouts (e.g. 1 second or 10 seconds), the benchmark code would crash.

To work around the issue, for purposes of benchmarking, we turned off those timeouts. By leaving connection and request and response timeouts unspecified, HAproxy essentially makes the timeouts infinite—connections are never timed out.

While we haven't dug deep enough into the issue to know for sure if it is a Geode product issue or an HAproxy one, or some problem in the benchmark logic, it seemed prudent to open a ticket to ensure this issue is not swept under the rug.

[~jbarrett] suspects that there might be a problem with the Geode client connection pool logic—that perhaps when the proxy closes a connection, the corresponding connection in the pool is not being removed correctly, or it's not being removed quickly enough. But this is only a suspicion. It needs more investigation.


> when an SNI proxy drops connections Geode benchmarks fail
> ---------------------------------------------------------
>
>                 Key: GEODE-8299
>                 URL: https://issues.apache.org/jira/browse/GEODE-8299
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>            Reporter: Bill Burcham
>            Priority: Major
>
> In developing the SNI topology support for the benchmarks in the {{geode-benchmarks}} repo, we noticed that when we configured the SNI proxy (HAproxy) with timeouts (e.g. 1 second or 10 seconds), the benchmark code would crash.
> To work around the issue, for purposes of benchmarking, we turned off those timeouts. By leaving connection and request and response timeouts unspecified, HAproxy essentially makes the timeouts infinite—connections are never timed out.
> While we haven't dug deep enough into the issue to know for sure if it is a Geode product issue or an HAproxy one, or some problem in the benchmark logic, it seemed prudent to open a ticket to ensure this issue is not swept under the rug.
> [~jbarrett] suspects that there might be a problem with the Geode client connection pool logic—that perhaps when the proxy closes a connection, the corresponding connection in the pool is not being removed correctly, or it's not being removed quickly enough. But this is only a suspicion. It needs more investigation.
> h2. Reproduction
> To see the problem, clone https://github.com/apache/geode-benchmarks and edit {{StartSniProxy.generateHaProxyConfig()}} by un-commenting these lines:
> {code}
>             // + " timeout client 100s\n"
>             // + " timeout connect 100s\n"
>             // + " timeout server 100s\n"
> {code}
> And setting them each to {{1s}} (one second, not 100 seconds.)
> Be sure to un-comment the corresponding lines in the unit test {{StartSniProxyTest.generateConfigTest()}} so the test passes.
> Now run the scripts (see the _SNI_ section of the {{README.md}} in that repo):
> {code}
> ./launch_cluster -t anytagname 5
> ./run_tests.sh -t anytagname -- -PwithWarmup=10 -PwithDuration=30 -PwithSniProxy '--tests=PartitionedGetBenchmark'
> {code}
> That'll run the "get" benchmark on a partitioned region. You can run other (all) the tests later after the bug is fixed.
> But you should see the test crash with that as-is. If it doesn't crash you can lengthen the {{withDuration}} to say 60 seconds or more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)