You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Donal Evans <do...@vmware.com> on 2020/11/19 01:04:38 UTC

[PROPOSAL] Change the default value of conserve-sockets to false

Hi Geode dev,

First, from the docs[1], a brief explanation of the purpose of the conserve-sockets property:

"The conserve-sockets setting indicates whether application threads share sockets with other threads or use their own sockets for member communication. This setting has no effect on communication between a server and its clients, but it does control the server’s communication with its peers or a gateway sender’s communication with a gateway receiver."

The current default value for the conserve-sockets property is true, which at first glance makes sense, since in an ideal world, existing sockets could be shared between threads and there would be no need to create and destroy new sockets for each process, which can be somewhat resource-intensive. However, in practice, there are several known issues with using the default setting of true. From the docs[1]:

"For distributed regions, the put operation, and destroy and invalidate for regions and entries, can all be optimized with conserve-sockets set to false. For partitioned regions, setting conserve-sockets to false can improve general throughput.
Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks."

and[2]:

"WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment."

Given that it is generally accepted as best practice to set conserve-sockets to false for almost all use cases of Geode beyond the most simple, it would make sense to also change the default value to false, to prevent people having to encounter a problem, search for the solution, then change the setting to what is almost always the "correct" value.

I have done some experimenting to see what it would take to make this proposal a reality, and the changes required are very minimal, with only two existing DUnit tests that need to be modified to explicitly set the value of conserve-sockets that were previously relying on the default being true.

Any feedback on this proposal would be very welcome, and if the response is positive, I can create a PR with the changes as soon as a decision is reached.

Thanks,
Donal

[1] https://geode.apache.org/docs/guide/113/managing/monitor_tune/performance_controls_controlling_socket_use.html
[2] https://geode.apache.org/docs/guide/113/managing/monitor_tune/sockets_and_gateways.html

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Owen Nichols <on...@vmware.com>.
I’m not familiar with the inner workings, but your writeup is excellent and makes a compelling case.

It sounds like you are saying that the original motivation for conserve-sockets=true was to improve performance, but in fact it makes performance worse.  Do you have some numbers to quantify what magnitude of a difference this can make?

It also looks bad that we actually document that Geode may deadlock if you use the default.

To Udo’s concern about changing existing behavior, it sounds like you are saying there is no possible downside to conserve-sockets=false, but has that question been explored exhaustively?  Is there ever any circumstance where we would still recommend a user to explicitly set conserve-sockets=true, once false is the default?

From: Donal Evans <do...@vmware.com>
Date: Wednesday, November 18, 2020 at 5:04 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: [PROPOSAL] Change the default value of conserve-sockets to false
Hi Geode dev,

First, from the docs[1], a brief explanation of the purpose of the conserve-sockets property:

"The conserve-sockets setting indicates whether application threads share sockets with other threads or use their own sockets for member communication. This setting has no effect on communication between a server and its clients, but it does control the server’s communication with its peers or a gateway sender’s communication with a gateway receiver."

The current default value for the conserve-sockets property is true, which at first glance makes sense, since in an ideal world, existing sockets could be shared between threads and there would be no need to create and destroy new sockets for each process, which can be somewhat resource-intensive. However, in practice, there are several known issues with using the default setting of true. From the docs[1]:

"For distributed regions, the put operation, and destroy and invalidate for regions and entries, can all be optimized with conserve-sockets set to false. For partitioned regions, setting conserve-sockets to false can improve general throughput.
Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks."

and[2]:

"WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment."

Given that it is generally accepted as best practice to set conserve-sockets to false for almost all use cases of Geode beyond the most simple, it would make sense to also change the default value to false, to prevent people having to encounter a problem, search for the solution, then change the setting to what is almost always the "correct" value.

I have done some experimenting to see what it would take to make this proposal a reality, and the changes required are very minimal, with only two existing DUnit tests that need to be modified to explicitly set the value of conserve-sockets that were previously relying on the default being true.

Any feedback on this proposal would be very welcome, and if the response is positive, I can create a PR with the changes as soon as a decision is reached.

Thanks,
Donal

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Conichols%40vmware.com%7C73edfbbf4fe8435de38d08d88c271e40%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413446927237420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qGAh1BfdFovLpF11C0A1jtHUwRPpQDGVO48bksfzhOw%3D&amp;reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Conichols%40vmware.com%7C73edfbbf4fe8435de38d08d88c271e40%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413446927237420%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kRoPhR391aHl1iJLAyrs7JhzyGqG2fJ4dTFLxdb4Zb0%3D&amp;reserved=0

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.
OK, I double checked, my memory is wrong. It was true as early as 6.0.

From: Xiaojian Zhou <zh...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:29 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby <bo...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Czhouxh%40vmware.com%7C2fe6953277fd426e9a6208d8971a0eba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637425485491975789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VY8hZKgPI4XrxEn98%2FWyWadyLF5bb4776zp%2FgpdyYCY%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.
+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby <bo...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Czhouxh%40vmware.com%7Cf9e665f1e1b74e27032f08d897181214%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637425476924328962%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7NrhsuVNLFtl3keKZQrAeq9ZDs79rUB2G6Y6DpkqpYE%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anilkumar Gingade <ag...@vmware.com>.
Barry, Thanks for the detailed response. Very helpful.

On 12/3/20, 4:22 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

    Anil, you wrote:

    - We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements.

    It would be cool to eventually remove this property altogether and auto-configure it. Besides the things you mention, another thing that would need to be considered is features being used. For example, wan requires conserve-sockets=false. This discussion maybe should be moved to a different thread so we don't distract from this one.

    You also asked:

    - Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes?

    Since the server doing the op has conserve-sockets=false, and an unshared p2p message reader is used on the remote member that means a dedicated (thread-owned) connection is used.

    ConnectionTable.get decides that. Here is a stack for creating a thread-owned sender:

    java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1333)
    at org.apache.geode.internal.tcp.Connection.<init>(Connection.java:1224)
    at org.apache.geode.internal.tcp.Connection.createSender(Connection.java:1025)
    at org.apache.geode.internal.tcp.ConnectionTable.getThreadOwnedConnection(ConnectionTable.java:474)
    at org.apache.geode.internal.tcp.ConnectionTable.get(ConnectionTable.java:577)
    at org.apache.geode.internal.tcp.TCPConduit.getConnection(TCPConduit.java:800)
    at org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:452)
    at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:268)
    at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
    at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)

    Here are the same use cases with additional logging containing thread-owned Connection creation or shared Connection usage:

    Case 1:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 60539 Thread 3: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039519049; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server2:54360)<v71>:41002]

    2. The ServerConnection thread in server1 creates the thread-owned Connection:

    ServerConnection on port 60539 Thread 3: Connection.<init> sender=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519050
    ServerConnection on port 60539 Thread 3: ConnectionTable.get using threadOwnedConnection=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519051

    3. A P2P Listener Thread in server2 creates the receiver Connection:

    P2P Listener Thread /192.168.1.8:45823: Connection.<init> receiver=null(uid=0); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

    4. The unshared P2P message reader in server2 reads the handshake from server1's Connection:

    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: Connection.readHandshakeForReceiver receiver=192.168.1.8(server1:54333)<v70>:41001(uid=10); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

    5. The unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039519051; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039519052; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

    Case 2:

    The server (server2) that processes the put operation from the client is primary and has conserve-sockets=true.
    The server (server1) that handles the UpdateWithContextMessage has conserve-sockets=false.

    1. A ServerConnection thread in server2 sends the UpdateWithContextMessage:

    ServerConnection on port 60463 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137587; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server1:53948)<v67>:41001]

    2. The ServerConnection thread in server2 uses the shared Connection to server1:

    ServerConnection on port 60463 Thread 1: ConnectionTable.get using sharedConnection=192.168.1.8(server1:53948)<v67>:41001(uid=3); socket=Socket[addr=/192.168.1.8,port=56562,localport=60458]; time=1607039137587

    3. The shared P2P message reader in server1 handles the UpdateWithContextMessage and sends the ReplyMessage using the shared Connection to server2 even though conserve-sockets=false:

    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137588; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137588; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server2:53949)<v67>:41002]
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: ConnectionTable.get using sharedConnection=192.168.1.8(server2:53949)<v67>:41002(uid=2); socket=Socket[addr=192.168.1.8/192.168.1.8,port=46868,localport=60454]; time=1607039137588
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

    4. The shared P2P message reader in server2 handles the ReplyMessage:

    P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]
    P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]



    ________________________________
    From: Bruce Schuchardt <br...@vmware.com>
    Sent: Thursday, December 3, 2020 8:18 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

    +1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


    On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

        I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
        We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

        Sorry for diverting from the actual email thread subject.

        Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

         -Anil.

        On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

            I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

            One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

            Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

            In each case, the multi-threaded client did:

            - puts
            - gets
            - destroys
            - function updates
            - oql queries

            One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

            Here is some logging from a DistributionMessageObserver that shows that behavior.

            Case 1:

            The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
            The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

            1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

            ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

            2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

            Case 2:

            The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
            The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

            1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

            ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

            2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

            3. The shared P2P message reader in server1 handles the ReplyMessage:

            P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
            P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


            ________________________________
            From: Anthony Baker <ba...@vmware.com>
            Sent: Monday, November 23, 2020 2:16 PM
            To: dev@geode.apache.org <de...@geode.apache.org>
            Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

            Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

            I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cagingade%40vmware.com%7C942a8c06f4c6493e37e708d897eabe10%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426381756103833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gv10IwWsdxCiSQTX23wt1q0eQqeyRuR4wttHtCErewA%3D&amp;reserved=0

            …but I’ve only seen that used to make function threads/sockets follow the correct setting.

            Anthony


            On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

            @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..





Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Barrett Oglesby <bo...@vmware.com>.
Anil, you wrote:

- We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements.

It would be cool to eventually remove this property altogether and auto-configure it. Besides the things you mention, another thing that would need to be considered is features being used. For example, wan requires conserve-sockets=false. This discussion maybe should be moved to a different thread so we don't distract from this one.

You also asked:

- Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes?

Since the server doing the op has conserve-sockets=false, and an unshared p2p message reader is used on the remote member that means a dedicated (thread-owned) connection is used.

ConnectionTable.get decides that. Here is a stack for creating a thread-owned sender:

java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at org.apache.geode.internal.tcp.Connection.<init>(Connection.java:1224)
at org.apache.geode.internal.tcp.Connection.createSender(Connection.java:1025)
at org.apache.geode.internal.tcp.ConnectionTable.getThreadOwnedConnection(ConnectionTable.java:474)
at org.apache.geode.internal.tcp.ConnectionTable.get(ConnectionTable.java:577)
at org.apache.geode.internal.tcp.TCPConduit.getConnection(TCPConduit.java:800)
at org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:452)
at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:268)
at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)

Here are the same use cases with additional logging containing thread-owned Connection creation or shared Connection usage:

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60539 Thread 3: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039519049; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server2:54360)<v71>:41002]

2. The ServerConnection thread in server1 creates the thread-owned Connection:

ServerConnection on port 60539 Thread 3: Connection.<init> sender=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519050
ServerConnection on port 60539 Thread 3: ConnectionTable.get using threadOwnedConnection=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519051

3. A P2P Listener Thread in server2 creates the receiver Connection:

P2P Listener Thread /192.168.1.8:45823: Connection.<init> receiver=null(uid=0); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

4. The unshared P2P message reader in server2 reads the handshake from server1's Connection:

P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: Connection.readHandshakeForReceiver receiver=192.168.1.8(server1:54333)<v70>:41001(uid=10); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

5. The unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039519051; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039519052; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

Case 2:

The server (server2) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server1) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server2 sends the UpdateWithContextMessage:

ServerConnection on port 60463 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137587; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server1:53948)<v67>:41001]

2. The ServerConnection thread in server2 uses the shared Connection to server1:

ServerConnection on port 60463 Thread 1: ConnectionTable.get using sharedConnection=192.168.1.8(server1:53948)<v67>:41001(uid=3); socket=Socket[addr=/192.168.1.8,port=56562,localport=60458]; time=1607039137587

3. The shared P2P message reader in server1 handles the UpdateWithContextMessage and sends the ReplyMessage using the shared Connection to server2 even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137588; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137588; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server2:53949)<v67>:41002]
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: ConnectionTable.get using sharedConnection=192.168.1.8(server2:53949)<v67>:41002(uid=2); socket=Socket[addr=192.168.1.8/192.168.1.8,port=46868,localport=60454]; time=1607039137588
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

4. The shared P2P message reader in server2 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]



________________________________
From: Bruce Schuchardt <br...@vmware.com>
Sent: Thursday, December 3, 2020 8:18 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

+1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

    I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
    We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

    Sorry for diverting from the actual email thread subject.

    Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

     -Anil.

    On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

        I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

        One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

        Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

        In each case, the multi-threaded client did:

        - puts
        - gets
        - destroys
        - function updates
        - oql queries

        One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

        Here is some logging from a DistributionMessageObserver that shows that behavior.

        Case 1:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

        2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        Case 2:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

        2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        3. The shared P2P message reader in server1 handles the ReplyMessage:

        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


        ________________________________
        From: Anthony Baker <ba...@vmware.com>
        Sent: Monday, November 23, 2020 2:16 PM
        To: dev@geode.apache.org <de...@geode.apache.org>
        Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

        Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

        I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cboglesby%40vmware.com%7C1567b8cda1224c13b94108d897a71378%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426091139731058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BsIg1C31jcF3lxusMMmPiL%2BNV6c%2FsDdUaFN%2F2eTfy7Q%3D&amp;reserved=0

        …but I’ve only seen that used to make function threads/sockets follow the correct setting.

        Anthony


        On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

        @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..




Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Bruce Schuchardt <br...@vmware.com>.
+1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

    I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
    We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

    Sorry for diverting from the actual email thread subject.

    Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

     -Anil.

    On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

        I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

        One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

        Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

        In each case, the multi-threaded client did:

        - puts
        - gets
        - destroys
        - function updates
        - oql queries

        One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

        Here is some logging from a DistributionMessageObserver that shows that behavior.

        Case 1:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

        2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        Case 2:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

        2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        3. The shared P2P message reader in server1 handles the ReplyMessage:

        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


        ________________________________
        From: Anthony Baker <ba...@vmware.com>
        Sent: Monday, November 23, 2020 2:16 PM
        To: dev@geode.apache.org <de...@geode.apache.org>
        Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

        Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

        I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cbruces%40vmware.com%7C179cc117e59f4b8e261808d8979bdb82%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426042962073778%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uIIOR5cUPiYH27dyQ%2BpIXUZd6mGmEAJIbfek%2FNAQMx8%3D&amp;reserved=0

        …but I’ve only seen that used to make function threads/sockets follow the correct setting.

        Anthony


        On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

        @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..




Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anilkumar Gingade <ag...@vmware.com>.
I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

Sorry for diverting from the actual email thread subject.

Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

 -Anil.

On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

    I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

    One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

    Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

    In each case, the multi-threaded client did:

    - puts
    - gets
    - destroys
    - function updates
    - oql queries

    One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

    Here is some logging from a DistributionMessageObserver that shows that behavior.

    Case 1:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

    2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

    Case 2:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

    2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

    3. The shared P2P message reader in server1 handles the ReplyMessage:

    P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
    P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


    ________________________________
    From: Anthony Baker <ba...@vmware.com>
    Sent: Monday, November 23, 2020 2:16 PM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

    Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

    I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cagingade%40vmware.com%7C41e10de941644d1d2a5808d8971810ec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637425476906848869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=8SUxbCl9zm0nGSM1x%2F4ME7PaaQ1yHnr8hl1deIJo7aE%3D&amp;reserved=0

    …but I’ve only seen that used to make function threads/sockets follow the correct setting.

    Anthony


    On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

    @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..



Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Barrett Oglesby <bo...@vmware.com>.
I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cboglesby%40vmware.com%7Cfba12bb4b08d4e3f322b08d88ffd7bf3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417666152686146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aL2MwOcdSjStgGsRIT4yENLP%2FQ41qsWpPi8xvh8RMkI%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anthony Baker <ba...@vmware.com>.
Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://geode.apache.org/docs/guide/12/managing/monitor_tune/performance_controls_controlling_socket_use.html

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Udo Kohlmeyer <ud...@vmware.com>.
@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..


From: Xiaojian Zhou <zh...@vmware.com>
Date: Saturday, November 21, 2020 at 5:34 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
1) Conserve-socket will only impact p2p connection. If set to false, that mean the p2p connections between 2 servers can be created on request, as many as needed.
2) currently the default setting is true (I don't remember when did we change it from false to true)
3) For rollingUpgrade, unfortunately, if server1 is set to true and server2 is set to false, our server start-up will not check the mismatch automatically so far. We have to add some coding to prevent a server with different setting to join. And I don't know in current mixed setting environment, what will be the behavior. It could be an interesting dunit test scenario.

Regards
Gester

On 11/20/20, 8:53 AM, "Anthony Baker" <ba...@vmware.com> wrote:

    Question:  how would this work with a rolling upgrade?  If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster.

    Of course the user could explicitly set this property as you point out.


    Anthony


    > On Nov 20, 2020, at 8:49 AM, Donal Evans <do...@vmware.com> wrote:
    >
    > While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.
    >
    > Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Cudo%40vmware.com%7Cc251ecbd89d747e8cdc908d88d82dfec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637414940533642125%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z8qXkcopG2gjXtHFcEZayMZu1zETB%2Bb23WU1JHcyMRs%3D&amp;reserved=0>
    >
    > ________________________________
    > From: Anthony Baker <ba...@vmware.com>
    > Sent: Thursday, November 19, 2020 5:57:33 PM
    > To: dev@geode.apache.org <de...@geode.apache.org>
    > Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
    >
    > I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.
    >
    > Anthony
    >
    >> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
    >>
    >> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
    >> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
    >>
    >> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
    >>
    >> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
    >>
    >> -Dan
    >>
    >


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.
1) Conserve-socket will only impact p2p connection. If set to false, that mean the p2p connections between 2 servers can be created on request, as many as needed. 
2) currently the default setting is true (I don't remember when did we change it from false to true)
3) For rollingUpgrade, unfortunately, if server1 is set to true and server2 is set to false, our server start-up will not check the mismatch automatically so far. We have to add some coding to prevent a server with different setting to join. And I don't know in current mixed setting environment, what will be the behavior. It could be an interesting dunit test scenario. 

Regards
Gester

On 11/20/20, 8:53 AM, "Anthony Baker" <ba...@vmware.com> wrote:

    Question:  how would this work with a rolling upgrade?  If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster.

    Of course the user could explicitly set this property as you point out.


    Anthony


    > On Nov 20, 2020, at 8:49 AM, Donal Evans <do...@vmware.com> wrote:
    > 
    > While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.
    > 
    > Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Czhouxh%40vmware.com%7Cc05dc303ad1342d8d06c08d88d74d3d6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637414880207409532%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kjRZ8dHZRi82x0tVsITBOca1T%2FIYPllZMLfUOTUxED4%3D&amp;reserved=0>
    > 
    > ________________________________
    > From: Anthony Baker <ba...@vmware.com>
    > Sent: Thursday, November 19, 2020 5:57:33 PM
    > To: dev@geode.apache.org <de...@geode.apache.org>
    > Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
    > 
    > I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.
    > 
    > Anthony
    > 
    >> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
    >> 
    >> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
    >> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
    >> 
    >> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
    >> 
    >> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
    >> 
    >> -Dan
    >> 
    > 



Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.
Passed dunit tests is not enough. It might only mean we don't have enough test coverage. 

We need to inspect the code to see what will be the behavior when 2 servers configured different conserve-sockets.

On 11/20/20, 3:30 PM, "Donal Evans" <do...@vmware.com> wrote:

    Regarding behaviour during RollingUpgrade; I created a draft PR with this change to test the feasibility and see what problems, if any, would be caused by tests assuming the default setting to be true. After fixing two DUnit tests that were not explicitly setting the value of conserve-sockets to true, no test failures were observed. I also ran a large suite of proprietary tests that include rolling upgrade and observed no problems there. This doesn't mean that there would definitely be no problems caused by this change, but I can at least say that none of the testing we currently have showed any problems.
    ________________________________
    From: Anthony Baker <ba...@vmware.com>
    Sent: Friday, November 20, 2020 8:52 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

    Question:  how would this work with a rolling upgrade?  If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster.

    Of course the user could explicitly set this property as you point out.


    Anthony


    > On Nov 20, 2020, at 8:49 AM, Donal Evans <do...@vmware.com> wrote:
    >
    > While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.
    >
    > Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Czhouxh%40vmware.com%7C652bd1a75ea14bea138808d88dac418d%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637415118265371166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9o1R6y1dMiKzF%2BH0TZsrbNdz1E2UMRXOOg%2F%2FBm8JpZQ%3D&amp;reserved=0>
    >
    > ________________________________
    > From: Anthony Baker <ba...@vmware.com>
    > Sent: Thursday, November 19, 2020 5:57:33 PM
    > To: dev@geode.apache.org <de...@geode.apache.org>
    > Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
    >
    > I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.
    >
    > Anthony
    >
    >> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
    >>
    >> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
    >> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
    >>
    >> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
    >>
    >> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
    >>
    >> -Dan
    >>
    >



Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Donal Evans <do...@vmware.com>.
Regarding behaviour during RollingUpgrade; I created a draft PR with this change to test the feasibility and see what problems, if any, would be caused by tests assuming the default setting to be true. After fixing two DUnit tests that were not explicitly setting the value of conserve-sockets to true, no test failures were observed. I also ran a large suite of proprietary tests that include rolling upgrade and observed no problems there. This doesn't mean that there would definitely be no problems caused by this change, but I can at least say that none of the testing we currently have showed any problems.
________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Friday, November 20, 2020 8:52 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Question:  how would this work with a rolling upgrade?  If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster.

Of course the user could explicitly set this property as you point out.


Anthony


> On Nov 20, 2020, at 8:49 AM, Donal Evans <do...@vmware.com> wrote:
>
> While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.
>
> Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Cdoevans%40vmware.com%7C5707a5b4d38c451cf58908d88d74ce81%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637414880110757318%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oKeeQ5Z6DPJAsQRUoZalfyahVRrmVq47CjYsFCKMios%3D&amp;reserved=0>
>
> ________________________________
> From: Anthony Baker <ba...@vmware.com>
> Sent: Thursday, November 19, 2020 5:57:33 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
>
> I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.
>
> Anthony
>
>> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
>>
>> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
>> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
>>
>> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
>>
>> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
>>
>> -Dan
>>
>


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anthony Baker <ba...@vmware.com>.
Question:  how would this work with a rolling upgrade?  If the user did not set this property and we changed the default I believe that we would prevent the upgraded member from rejoining the cluster.

Of course the user could explicitly set this property as you point out.


Anthony


> On Nov 20, 2020, at 8:49 AM, Donal Evans <do...@vmware.com> wrote:
> 
> While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.
> 
> Get Outlook for Android<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fghei36&amp;data=04%7C01%7Cbakera%40vmware.com%7C0f9b5f57066447a0141c08d88d744a67%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637414877890840409%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=V65jEFWBflK8CWzFgxuFwQBD%2BV2BDlOlPa%2FtLR2N3eY%3D&amp;reserved=0>
> 
> ________________________________
> From: Anthony Baker <ba...@vmware.com>
> Sent: Thursday, November 19, 2020 5:57:33 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
> 
> I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.
> 
> Anthony
> 
>> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
>> 
>> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
>> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
>> 
>> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
>> 
>> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
>> 
>> -Dan
>> 
> 


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Donal Evans <do...@vmware.com>.
While I agree that the potential impact of having the setting changed out from a user may be high, the cost of addressing that change is very small. All users have to do is explicitly set the conserve-sockets value to true if they were previously using the default and they will be back to where they started with no change in behaviour or resource requirements. This could be as simple as adding a single line to a properties file, which seems like a pretty small inconvenience.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Thursday, November 19, 2020 5:57:33 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.

Anthony

> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
>
> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
>
> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
>
> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
>
> -Dan
>


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anthony Baker <ba...@vmware.com>.
I think there are many good reasons to flip the default value for this property. I do question whether requiring a user to allocate new hardware to support the changed resource requirements is appropriate for a minor version bump. In most cases I think that would come as an unwelcome surprise during the upgrade.

Anthony

> On Nov 19, 2020, at 10:42 AM, Dan Smith <da...@vmware.com> wrote:
> 
> Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
> I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.
> 
> If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.
> 
> With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.
> 
> -Dan
> 


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Donal Evans <do...@vmware.com>.
Thank you to everyone who participated in this discussion.

After a quick tally of responses, it looks like there are 6 people in favour of the change happening right now and 3 who are in favour but would prefer it to wait for a major version change. While this isn't a strong consensus, I do think that it demonstrates a solid majority in favour of the proposal as part of a minor version, so I have opened a pull request[1] with the necessary changes and invite everyone to provide feedback on it, particularly with regards to updating/expanding documentation around this change.

[1] https://github.com/apache/geode/pull/5832
________________________________
From: Dan Smith <da...@vmware.com>
Sent: Wednesday, December 9, 2020 11:22 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

I will go ahead and withdraw my objection to this change. Based on some side conversations, at least at VMWare it sounds like we don't have customers that are not setting this flag. So the scenario I'm worried about where a customer upgrades their production cluster and has it crash due to this change seems less likely. I do agree false is a better default.

I would also be fine waiting until 2.0 to make this change.

-Dan


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by John Blum <jb...@vmware.com>.
I agree with Dan, here, along with the consensus that false is the better default (in most cases).

So, I simply want to re-iterate the importance of "documentation" in whatever direction we decide.  There are, without a doubt, both pros and cons to each configuration arrangement (true of false) where the conserve-sockets setting is concerned.  Let's just make sure our old and new users are aware, too.

-j
________________________________
From: Dan Smith <da...@vmware.com>
Sent: Wednesday, December 9, 2020 11:22 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

I will go ahead and withdraw my objection to this change. Based on some side conversations, at least at VMWare it sounds like we don't have customers that are not setting this flag. So the scenario I'm worried about where a customer upgrades their production cluster and has it crash due to this change seems less likely. I do agree false is a better default.

I would also be fine waiting until 2.0 to make this change.

-Dan


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Dan Smith <da...@vmware.com>.
I will go ahead and withdraw my objection to this change. Based on some side conversations, at least at VMWare it sounds like we don't have customers that are not setting this flag. So the scenario I'm worried about where a customer upgrades their production cluster and has it crash due to this change seems less likely. I do agree false is a better default.

I would also be fine waiting until 2.0 to make this change.

-Dan


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Dan Smith <da...@vmware.com>.
Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.
I agree with John. Either value of conserve-sockets can crash or hang your system depending on your use case.

If this was just a matter of slowing down or speeding up performance, I think we could change it. But users that are impacted won't just see their system slow down. It will crash or hang. Potentially only with production sized workloads.

With conserve-sockets=false every thread on the server creates its own sockets to other servers. With N servers that's N sockets per thread. With our default of a max of 800 threads for client connections and a 20 server cluster you are looking at a worst case of 800 * 20 = 16K sending sockets per server, with another 16K receiving sockets and 16K receiving threads. That's before considering function execution threads, WAN receivers, and various other executors we have on the server. Users with too many threads will hit their file descriptor or thread limits. Or they will run out of memory for thread stacks, socket buffers, etc.

-Dan


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Donal Evans <do...@vmware.com>.
Just to clarify a comment from Owen, conserve-sockets=true does show improved performance over conserve-sockets=false in certain specific scenarios, but this isn't a hard and fast rule that applies everywhere, even excluding the cases where using conserve-sockets=true can lead to distributed deadlocks. As an example, with the new default setting of false, the UpgradeTest job in the CI pipeline takes about 10% longer, indicating that, at least for the scenarios being tested there, there is some performance impact. That being said, all of the geode-benchmarks tests explicitly set conserve-sockets=false, so from the point of view of what we're actually testing in terms of performance, no impact will be seen due to this change.
________________________________
From: Jacob Barrett <ja...@vmware.com>
Sent: Thursday, November 19, 2020 8:02 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

I would argue that is doesn’t change the outward behavior of the product. It does change the internal workings of the product. It does improve the performance and reliability. As long as changes to the internals don’t have effect on the outward facing behavior of the product I see no problem making these internal changes. Yes this may lead to more resource utilization but so can and have so other changes along the way between majors. I would also expect in the scenarios where this would make the most impact on resources they are already configured to use this feature.

+1 make the change.


> On Nov 19, 2020, at 1:16 AM, Ju@N <ju...@gmail.com> wrote:
>
> I'm all in for changing the default to *false* but, unfortunately and for
> all the reasons already stated in the thread, I'm hesitant to include this
> change as part of a minor release.
> Best regards.
>
> On Thu, 19 Nov 2020 at 02:48, John Blum <jb...@vmware.com> wrote:
>
>> The downside of conserve-sockets = false is that you are (essentially)
>> back to a Thread|Socket / Request model (though Geode limits this system
>> resource consumption to a degree by the use of Thread Pools in p2p
>> distribution layer) and thus, you can run out of file descriptors (per
>> newly opened Socket) pretty quickly if you are not careful.
>>
>> conserve-sockets set to true limits the use of finite system resources why
>> risking deadlocks (i.e. A -> B -> C -> A), which is also contingent on ACKS
>> (and the infamous ReplyProcessor21; at least at 1 time, not sure if it is
>> still in play, but probably!).
>>
>> conserve-sockets set to false uses more system resources but avoids
>> deadlocks.
>>
>> If this change is made, I'd minimally make sure to document that users
>> should adjust their (soft & hard) ulimits accordingly, based on use cases,
>> load, etc.
>>
>> Personally, this has caused enough grief in the past (both ways,
>> actually!) that I 'd say this is a major version change.
>>
>> -j
>>
>>
>> ________________________________
>> From: Nabarun Nag <nn...@vmware.com>
>> Sent: Wednesday, November 18, 2020 6:09 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
>> false
>>
>> +1
>>
>>  *   As nearly all of the production use-cases need "conserve-sockets" to
>> be set to false, I think we can aim for changing the default value to false
>> for 1.14.0 release.
>>  *   We can highlight this change in the release notes and emails.
>>
>> Regards,
>> Nabarun
>>
>> ________________________________
>> From: Udo Kohlmeyer <ud...@vmware.com>
>> Sent: Wednesday, November 18, 2020 6:00 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
>> false
>>
>> Hi there Donal,
>>
>> Thank you for raising this. It is not an uncommon request to change the
>> default value of this field.
>>
>> This has been discussed many times in the past. I would LOVE to approve
>> this change, but this would mean that users that don’t set this property
>> might suddenly have this property changed. We are not sure that this would
>> mean for these users. BUT
>>
>> That said, there have been very little (to none) complaints about the
>> product stability when `conserve-sockets=false` are set.
>>
>> +1 - if we are allowed to make this change outside of a major version
>> change.
>>
>> --Udo
>>
>> From: Donal Evans <do...@vmware.com>
>> Date: Thursday, November 19, 2020 at 12:04 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: [PROPOSAL] Change the default value of conserve-sockets to false
>> Hi Geode dev,
>>
>> First, from the docs[1], a brief explanation of the purpose of the
>> conserve-sockets property:
>>
>> "The conserve-sockets setting indicates whether application threads share
>> sockets with other threads or use their own sockets for member
>> communication. This setting has no effect on communication between a server
>> and its clients, but it does control the server’s communication with its
>> peers or a gateway sender’s communication with a gateway receiver."
>>
>> The current default value for the conserve-sockets property is true, which
>> at first glance makes sense, since in an ideal world, existing sockets
>> could be shared between threads and there would be no need to create and
>> destroy new sockets for each process, which can be somewhat
>> resource-intensive. However, in practice, there are several known issues
>> with using the default setting of true. From the docs[1]:
>>
>> "For distributed regions, the put operation, and destroy and invalidate
>> for regions and entries, can all be optimized with conserve-sockets set to
>> false. For partitioned regions, setting conserve-sockets to false can
>> improve general throughput.
>> Note: When you have transactions operating on EMPTY, NORMAL or PARTITION
>> regions, make sure that conserve-sockets is set to false to avoid
>> distributed deadlocks."
>>
>> and[2]:
>>
>> "WAN deployments increase the messaging demands on a Geode system. To
>> avoid hangs related to WAN messaging, always set `conserve-sockets=false`
>> for Geode members that participate in a WAN deployment."
>>
>> Given that it is generally accepted as best practice to set
>> conserve-sockets to false for almost all use cases of Geode beyond the most
>> simple, it would make sense to also change the default value to false, to
>> prevent people having to encounter a problem, search for the solution, then
>> change the setting to what is almost always the "correct" value.
>>
>> I have done some experimenting to see what it would take to make this
>> proposal a reality, and the changes required are very minimal, with only
>> two existing DUnit tests that need to be modified to explicitly set the
>> value of conserve-sockets that were previously relying on the default being
>> true.
>>
>> Any feedback on this proposal would be very welcome, and if the response
>> is positive, I can create a PR with the changes as soon as a decision is
>> reached.
>>
>> Thanks,
>> Donal
>>
>> [1]
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cdoevans%40vmware.com%7C6986324c9c2c4051425808d88ca4872a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413985573024235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3099YogxVmgVipro6vrU9P4de501hbrGPT9dGns47VA%3D&amp;reserved=0
>> [2]
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cdoevans%40vmware.com%7C6986324c9c2c4051425808d88ca4872a%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413985573034231%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=v6FuDfn3RWXCRlJDrqGMdBz18j32GjdIpbW41SDg6BI%3D&amp;reserved=0
>>
>
>
> --
> Ju@N


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Jacob Barrett <ja...@vmware.com>.
I would argue that is doesn’t change the outward behavior of the product. It does change the internal workings of the product. It does improve the performance and reliability. As long as changes to the internals don’t have effect on the outward facing behavior of the product I see no problem making these internal changes. Yes this may lead to more resource utilization but so can and have so other changes along the way between majors. I would also expect in the scenarios where this would make the most impact on resources they are already configured to use this feature.

+1 make the change.


> On Nov 19, 2020, at 1:16 AM, Ju@N <ju...@gmail.com> wrote:
> 
> I'm all in for changing the default to *false* but, unfortunately and for
> all the reasons already stated in the thread, I'm hesitant to include this
> change as part of a minor release.
> Best regards.
> 
> On Thu, 19 Nov 2020 at 02:48, John Blum <jb...@vmware.com> wrote:
> 
>> The downside of conserve-sockets = false is that you are (essentially)
>> back to a Thread|Socket / Request model (though Geode limits this system
>> resource consumption to a degree by the use of Thread Pools in p2p
>> distribution layer) and thus, you can run out of file descriptors (per
>> newly opened Socket) pretty quickly if you are not careful.
>> 
>> conserve-sockets set to true limits the use of finite system resources why
>> risking deadlocks (i.e. A -> B -> C -> A), which is also contingent on ACKS
>> (and the infamous ReplyProcessor21; at least at 1 time, not sure if it is
>> still in play, but probably!).
>> 
>> conserve-sockets set to false uses more system resources but avoids
>> deadlocks.
>> 
>> If this change is made, I'd minimally make sure to document that users
>> should adjust their (soft & hard) ulimits accordingly, based on use cases,
>> load, etc.
>> 
>> Personally, this has caused enough grief in the past (both ways,
>> actually!) that I 'd say this is a major version change.
>> 
>> -j
>> 
>> 
>> ________________________________
>> From: Nabarun Nag <nn...@vmware.com>
>> Sent: Wednesday, November 18, 2020 6:09 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
>> false
>> 
>> +1
>> 
>>  *   As nearly all of the production use-cases need "conserve-sockets" to
>> be set to false, I think we can aim for changing the default value to false
>> for 1.14.0 release.
>>  *   We can highlight this change in the release notes and emails.
>> 
>> Regards,
>> Nabarun
>> 
>> ________________________________
>> From: Udo Kohlmeyer <ud...@vmware.com>
>> Sent: Wednesday, November 18, 2020 6:00 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
>> false
>> 
>> Hi there Donal,
>> 
>> Thank you for raising this. It is not an uncommon request to change the
>> default value of this field.
>> 
>> This has been discussed many times in the past. I would LOVE to approve
>> this change, but this would mean that users that don’t set this property
>> might suddenly have this property changed. We are not sure that this would
>> mean for these users. BUT
>> 
>> That said, there have been very little (to none) complaints about the
>> product stability when `conserve-sockets=false` are set.
>> 
>> +1 - if we are allowed to make this change outside of a major version
>> change.
>> 
>> --Udo
>> 
>> From: Donal Evans <do...@vmware.com>
>> Date: Thursday, November 19, 2020 at 12:04 PM
>> To: dev@geode.apache.org <de...@geode.apache.org>
>> Subject: [PROPOSAL] Change the default value of conserve-sockets to false
>> Hi Geode dev,
>> 
>> First, from the docs[1], a brief explanation of the purpose of the
>> conserve-sockets property:
>> 
>> "The conserve-sockets setting indicates whether application threads share
>> sockets with other threads or use their own sockets for member
>> communication. This setting has no effect on communication between a server
>> and its clients, but it does control the server’s communication with its
>> peers or a gateway sender’s communication with a gateway receiver."
>> 
>> The current default value for the conserve-sockets property is true, which
>> at first glance makes sense, since in an ideal world, existing sockets
>> could be shared between threads and there would be no need to create and
>> destroy new sockets for each process, which can be somewhat
>> resource-intensive. However, in practice, there are several known issues
>> with using the default setting of true. From the docs[1]:
>> 
>> "For distributed regions, the put operation, and destroy and invalidate
>> for regions and entries, can all be optimized with conserve-sockets set to
>> false. For partitioned regions, setting conserve-sockets to false can
>> improve general throughput.
>> Note: When you have transactions operating on EMPTY, NORMAL or PARTITION
>> regions, make sure that conserve-sockets is set to false to avoid
>> distributed deadlocks."
>> 
>> and[2]:
>> 
>> "WAN deployments increase the messaging demands on a Geode system. To
>> avoid hangs related to WAN messaging, always set `conserve-sockets=false`
>> for Geode members that participate in a WAN deployment."
>> 
>> Given that it is generally accepted as best practice to set
>> conserve-sockets to false for almost all use cases of Geode beyond the most
>> simple, it would make sense to also change the default value to false, to
>> prevent people having to encounter a problem, search for the solution, then
>> change the setting to what is almost always the "correct" value.
>> 
>> I have done some experimenting to see what it would take to make this
>> proposal a reality, and the changes required are very minimal, with only
>> two existing DUnit tests that need to be modified to explicitly set the
>> value of conserve-sockets that were previously relying on the default being
>> true.
>> 
>> Any feedback on this proposal would be very welcome, and if the response
>> is positive, I can create a PR with the changes as soon as a decision is
>> reached.
>> 
>> Thanks,
>> Donal
>> 
>> [1]
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cjabarrett%40vmware.com%7Ccb6bab6c002646cca10a08d88c6be7a6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413742363212110%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ITAwCxh0hM6dQ8LvnwoDGWQNpY9AuV44lliPnjzE3zY%3D&amp;reserved=0
>> [2]
>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cjabarrett%40vmware.com%7Ccb6bab6c002646cca10a08d88c6be7a6%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413742363222106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kw%2BOgBu4DXtsnxYPLia4IfRxZG3MoGIJ%2FEdabPyOX9I%3D&amp;reserved=0
>> 
> 
> 
> -- 
> Ju@N


Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by "Ju@N" <ju...@gmail.com>.
I'm all in for changing the default to *false* but, unfortunately and for
all the reasons already stated in the thread, I'm hesitant to include this
change as part of a minor release.
Best regards.

On Thu, 19 Nov 2020 at 02:48, John Blum <jb...@vmware.com> wrote:

> The downside of conserve-sockets = false is that you are (essentially)
> back to a Thread|Socket / Request model (though Geode limits this system
> resource consumption to a degree by the use of Thread Pools in p2p
> distribution layer) and thus, you can run out of file descriptors (per
> newly opened Socket) pretty quickly if you are not careful.
>
> conserve-sockets set to true limits the use of finite system resources why
> risking deadlocks (i.e. A -> B -> C -> A), which is also contingent on ACKS
> (and the infamous ReplyProcessor21; at least at 1 time, not sure if it is
> still in play, but probably!).
>
> conserve-sockets set to false uses more system resources but avoids
> deadlocks.
>
> If this change is made, I'd minimally make sure to document that users
> should adjust their (soft & hard) ulimits accordingly, based on use cases,
> load, etc.
>
> Personally, this has caused enough grief in the past (both ways,
> actually!) that I 'd say this is a major version change.
>
> -j
>
>
> ________________________________
> From: Nabarun Nag <nn...@vmware.com>
> Sent: Wednesday, November 18, 2020 6:09 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
> false
>
> +1
>
>   *   As nearly all of the production use-cases need "conserve-sockets" to
> be set to false, I think we can aim for changing the default value to false
> for 1.14.0 release.
>   *   We can highlight this change in the release notes and emails.
>
> Regards,
> Nabarun
>
> ________________________________
> From: Udo Kohlmeyer <ud...@vmware.com>
> Sent: Wednesday, November 18, 2020 6:00 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to
> false
>
> Hi there Donal,
>
> Thank you for raising this. It is not an uncommon request to change the
> default value of this field.
>
> This has been discussed many times in the past. I would LOVE to approve
> this change, but this would mean that users that don’t set this property
> might suddenly have this property changed. We are not sure that this would
> mean for these users. BUT
>
> That said, there have been very little (to none) complaints about the
> product stability when `conserve-sockets=false` are set.
>
> +1 - if we are allowed to make this change outside of a major version
> change.
>
> --Udo
>
> From: Donal Evans <do...@vmware.com>
> Date: Thursday, November 19, 2020 at 12:04 PM
> To: dev@geode.apache.org <de...@geode.apache.org>
> Subject: [PROPOSAL] Change the default value of conserve-sockets to false
> Hi Geode dev,
>
> First, from the docs[1], a brief explanation of the purpose of the
> conserve-sockets property:
>
> "The conserve-sockets setting indicates whether application threads share
> sockets with other threads or use their own sockets for member
> communication. This setting has no effect on communication between a server
> and its clients, but it does control the server’s communication with its
> peers or a gateway sender’s communication with a gateway receiver."
>
> The current default value for the conserve-sockets property is true, which
> at first glance makes sense, since in an ideal world, existing sockets
> could be shared between threads and there would be no need to create and
> destroy new sockets for each process, which can be somewhat
> resource-intensive. However, in practice, there are several known issues
> with using the default setting of true. From the docs[1]:
>
> "For distributed regions, the put operation, and destroy and invalidate
> for regions and entries, can all be optimized with conserve-sockets set to
> false. For partitioned regions, setting conserve-sockets to false can
> improve general throughput.
> Note: When you have transactions operating on EMPTY, NORMAL or PARTITION
> regions, make sure that conserve-sockets is set to false to avoid
> distributed deadlocks."
>
> and[2]:
>
> "WAN deployments increase the messaging demands on a Geode system. To
> avoid hangs related to WAN messaging, always set `conserve-sockets=false`
> for Geode members that participate in a WAN deployment."
>
> Given that it is generally accepted as best practice to set
> conserve-sockets to false for almost all use cases of Geode beyond the most
> simple, it would make sense to also change the default value to false, to
> prevent people having to encounter a problem, search for the solution, then
> change the setting to what is almost always the "correct" value.
>
> I have done some experimenting to see what it would take to make this
> proposal a reality, and the changes required are very minimal, with only
> two existing DUnit tests that need to be modified to explicitly set the
> value of conserve-sockets that were previously relying on the default being
> true.
>
> Any feedback on this proposal would be very welcome, and if the response
> is positive, I can create a PR with the changes as soon as a decision is
> reached.
>
> Thanks,
> Donal
>
> [1]
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cjblum%40vmware.com%7C87fe2b03a77045e6335408d88c3037b3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413486010510705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=dSxkBw04EOxr8vw1HP7u0oKMf%2FmNED6lVmaoV6FezGY%3D&amp;reserved=0
> [2]
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cjblum%40vmware.com%7C87fe2b03a77045e6335408d88c3037b3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413486010510705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3tOT2lJPdd%2B1xBf4km5iQR6pVYIZVwcA6MelXyTbQ%2Bc%3D&amp;reserved=0
>


-- 
Ju@N

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by John Blum <jb...@vmware.com>.
The downside of conserve-sockets = false is that you are (essentially) back to a Thread|Socket / Request model (though Geode limits this system resource consumption to a degree by the use of Thread Pools in p2p distribution layer) and thus, you can run out of file descriptors (per newly opened Socket) pretty quickly if you are not careful.

conserve-sockets set to true limits the use of finite system resources why risking deadlocks (i.e. A -> B -> C -> A), which is also contingent on ACKS (and the infamous ReplyProcessor21; at least at 1 time, not sure if it is still in play, but probably!).

conserve-sockets set to false uses more system resources but avoids deadlocks.

If this change is made, I'd minimally make sure to document that users should adjust their (soft & hard) ulimits accordingly, based on use cases, load, etc.

Personally, this has caused enough grief in the past (both ways, actually!) that I 'd say this is a major version change.

-j


________________________________
From: Nabarun Nag <nn...@vmware.com>
Sent: Wednesday, November 18, 2020 6:09 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

+1

  *   As nearly all of the production use-cases need "conserve-sockets" to be set to false, I think we can aim for changing the default value to false for 1.14.0 release.
  *   We can highlight this change in the release notes and emails.

Regards,
Nabarun

________________________________
From: Udo Kohlmeyer <ud...@vmware.com>
Sent: Wednesday, November 18, 2020 6:00 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Hi there Donal,

Thank you for raising this. It is not an uncommon request to change the default value of this field.

This has been discussed many times in the past. I would LOVE to approve this change, but this would mean that users that don’t set this property might suddenly have this property changed. We are not sure that this would mean for these users. BUT

That said, there have been very little (to none) complaints about the product stability when `conserve-sockets=false` are set.

+1 - if we are allowed to make this change outside of a major version change.

--Udo

From: Donal Evans <do...@vmware.com>
Date: Thursday, November 19, 2020 at 12:04 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: [PROPOSAL] Change the default value of conserve-sockets to false
Hi Geode dev,

First, from the docs[1], a brief explanation of the purpose of the conserve-sockets property:

"The conserve-sockets setting indicates whether application threads share sockets with other threads or use their own sockets for member communication. This setting has no effect on communication between a server and its clients, but it does control the server’s communication with its peers or a gateway sender’s communication with a gateway receiver."

The current default value for the conserve-sockets property is true, which at first glance makes sense, since in an ideal world, existing sockets could be shared between threads and there would be no need to create and destroy new sockets for each process, which can be somewhat resource-intensive. However, in practice, there are several known issues with using the default setting of true. From the docs[1]:

"For distributed regions, the put operation, and destroy and invalidate for regions and entries, can all be optimized with conserve-sockets set to false. For partitioned regions, setting conserve-sockets to false can improve general throughput.
Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks."

and[2]:

"WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment."

Given that it is generally accepted as best practice to set conserve-sockets to false for almost all use cases of Geode beyond the most simple, it would make sense to also change the default value to false, to prevent people having to encounter a problem, search for the solution, then change the setting to what is almost always the "correct" value.

I have done some experimenting to see what it would take to make this proposal a reality, and the changes required are very minimal, with only two existing DUnit tests that need to be modified to explicitly set the value of conserve-sockets that were previously relying on the default being true.

Any feedback on this proposal would be very welcome, and if the response is positive, I can create a PR with the changes as soon as a decision is reached.

Thanks,
Donal

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cjblum%40vmware.com%7C87fe2b03a77045e6335408d88c3037b3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413486010510705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=dSxkBw04EOxr8vw1HP7u0oKMf%2FmNED6lVmaoV6FezGY%3D&amp;reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cjblum%40vmware.com%7C87fe2b03a77045e6335408d88c3037b3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413486010510705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=3tOT2lJPdd%2B1xBf4km5iQR6pVYIZVwcA6MelXyTbQ%2Bc%3D&amp;reserved=0

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Nabarun Nag <nn...@vmware.com>.
+1

  *   As nearly all of the production use-cases need "conserve-sockets" to be set to false, I think we can aim for changing the default value to false for 1.14.0 release.
  *   We can highlight this change in the release notes and emails.

Regards,
Nabarun

________________________________
From: Udo Kohlmeyer <ud...@vmware.com>
Sent: Wednesday, November 18, 2020 6:00 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Hi there Donal,

Thank you for raising this. It is not an uncommon request to change the default value of this field.

This has been discussed many times in the past. I would LOVE to approve this change, but this would mean that users that don’t set this property might suddenly have this property changed. We are not sure that this would mean for these users. BUT

That said, there have been very little (to none) complaints about the product stability when `conserve-sockets=false` are set.

+1 - if we are allowed to make this change outside of a major version change.

--Udo

From: Donal Evans <do...@vmware.com>
Date: Thursday, November 19, 2020 at 12:04 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: [PROPOSAL] Change the default value of conserve-sockets to false
Hi Geode dev,

First, from the docs[1], a brief explanation of the purpose of the conserve-sockets property:

"The conserve-sockets setting indicates whether application threads share sockets with other threads or use their own sockets for member communication. This setting has no effect on communication between a server and its clients, but it does control the server’s communication with its peers or a gateway sender’s communication with a gateway receiver."

The current default value for the conserve-sockets property is true, which at first glance makes sense, since in an ideal world, existing sockets could be shared between threads and there would be no need to create and destroy new sockets for each process, which can be somewhat resource-intensive. However, in practice, there are several known issues with using the default setting of true. From the docs[1]:

"For distributed regions, the put operation, and destroy and invalidate for regions and entries, can all be optimized with conserve-sockets set to false. For partitioned regions, setting conserve-sockets to false can improve general throughput.
Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks."

and[2]:

"WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment."

Given that it is generally accepted as best practice to set conserve-sockets to false for almost all use cases of Geode beyond the most simple, it would make sense to also change the default value to false, to prevent people having to encounter a problem, search for the solution, then change the setting to what is almost always the "correct" value.

I have done some experimenting to see what it would take to make this proposal a reality, and the changes required are very minimal, with only two existing DUnit tests that need to be modified to explicitly set the value of conserve-sockets that were previously relying on the default being true.

Any feedback on this proposal would be very welcome, and if the response is positive, I can create a PR with the changes as soon as a decision is reached.

Thanks,
Donal

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cnnag%40vmware.com%7Cb3a63185f8f948ce5e1708d88c2ee809%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413480399258567%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hwVkxp99IM%2FeRWDt39P81GO%2Bq7LD7maQewAKLYqQlj0%3D&amp;reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cnnag%40vmware.com%7Cb3a63185f8f948ce5e1708d88c2ee809%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413480399268565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=FOBKgXh1tOj3WDsLZRbBXoufPeaUu4YTKkSDKD81P9A%3D&amp;reserved=0

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Udo Kohlmeyer <ud...@vmware.com>.
Hi there Donal,

Thank you for raising this. It is not an uncommon request to change the default value of this field.

This has been discussed many times in the past. I would LOVE to approve this change, but this would mean that users that don’t set this property might suddenly have this property changed. We are not sure that this would mean for these users. BUT

That said, there have been very little (to none) complaints about the product stability when `conserve-sockets=false` are set.

+1 - if we are allowed to make this change outside of a major version change.

--Udo

From: Donal Evans <do...@vmware.com>
Date: Thursday, November 19, 2020 at 12:04 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: [PROPOSAL] Change the default value of conserve-sockets to false
Hi Geode dev,

First, from the docs[1], a brief explanation of the purpose of the conserve-sockets property:

"The conserve-sockets setting indicates whether application threads share sockets with other threads or use their own sockets for member communication. This setting has no effect on communication between a server and its clients, but it does control the server’s communication with its peers or a gateway sender’s communication with a gateway receiver."

The current default value for the conserve-sockets property is true, which at first glance makes sense, since in an ideal world, existing sockets could be shared between threads and there would be no need to create and destroy new sockets for each process, which can be somewhat resource-intensive. However, in practice, there are several known issues with using the default setting of true. From the docs[1]:

"For distributed regions, the put operation, and destroy and invalidate for regions and entries, can all be optimized with conserve-sockets set to false. For partitioned regions, setting conserve-sockets to false can improve general throughput.
Note: When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks."

and[2]:

"WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment."

Given that it is generally accepted as best practice to set conserve-sockets to false for almost all use cases of Geode beyond the most simple, it would make sense to also change the default value to false, to prevent people having to encounter a problem, search for the solution, then change the setting to what is almost always the "correct" value.

I have done some experimenting to see what it would take to make this proposal a reality, and the changes required are very minimal, with only two existing DUnit tests that need to be modified to explicitly set the value of conserve-sockets that were previously relying on the default being true.

Any feedback on this proposal would be very welcome, and if the response is positive, I can create a PR with the changes as soon as a decision is reached.

Thanks,
Donal

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cudo%40vmware.com%7Ca64b45dfe47446cdb7b608d88c271f1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413446938137469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yFfjxy0smP0PBWdHp9ui6XwrOGZPIfLQF5L0gkdT9kE%3D&amp;reserved=0
[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F113%2Fmanaging%2Fmonitor_tune%2Fsockets_and_gateways.html&amp;data=04%7C01%7Cudo%40vmware.com%7Ca64b45dfe47446cdb7b608d88c271f1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637413446938137469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tNb0tpFLZU%2BGEIShpFpVXy4vPUv1cCyVgCF%2B1WmMjzk%3D&amp;reserved=0