You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Barrett Oglesby <bo...@vmware.com> on 2020/12/02 23:14:36 UTC

Re: [PROPOSAL] Change the default value of conserve-sockets to false

I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cboglesby%40vmware.com%7Cfba12bb4b08d4e3f322b08d88ffd7bf3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417666152686146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aL2MwOcdSjStgGsRIT4yENLP%2FQ41qsWpPi8xvh8RMkI%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony


On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.

OK, I double checked, my memory is wrong. It was true as early as 6.0.

From: Xiaojian Zhou <zh...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:29 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby <bo...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]

________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Czhouxh%40vmware.com%7C2fe6953277fd426e9a6208d8971a0eba%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637425485491975789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VY8hZKgPI4XrxEn98%2FWyWadyLF5bb4776zp%2FgpdyYCY%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony

On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Xiaojian Zhou <zh...@vmware.com>.

+1
I think it’s good to change back the default to be false. It was false before.

From: Barrett Oglesby <bo...@vmware.com>
Date: Wednesday, December 2, 2020 at 3:14 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false
I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

In each case, the multi-threaded client did:

- puts
- gets
- destroys
- function updates
- oql queries

One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

Here is some logging from a DistributionMessageObserver that shows that behavior.

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

Case 2:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

3. The shared P2P message reader in server1 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]

________________________________
From: Anthony Baker <ba...@vmware.com>
Sent: Monday, November 23, 2020 2:16 PM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Czhouxh%40vmware.com%7Cf9e665f1e1b74e27032f08d897181214%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637425476924328962%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7NrhsuVNLFtl3keKZQrAeq9ZDs79rUB2G6Y6DpkqpYE%3D&amp;reserved=0

…but I’ve only seen that used to make function threads/sockets follow the correct setting.

Anthony

On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

@Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anilkumar Gingade <ag...@vmware.com>.

Barry, Thanks for the detailed response. Very helpful.

On 12/3/20, 4:22 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

    Anil, you wrote:

    - We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements.

    It would be cool to eventually remove this property altogether and auto-configure it. Besides the things you mention, another thing that would need to be considered is features being used. For example, wan requires conserve-sockets=false. This discussion maybe should be moved to a different thread so we don't distract from this one.

    You also asked:

    - Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes?

    Since the server doing the op has conserve-sockets=false, and an unshared p2p message reader is used on the remote member that means a dedicated (thread-owned) connection is used.

    ConnectionTable.get decides that. Here is a stack for creating a thread-owned sender:

    java.lang.Exception: Stack trace
    at java.lang.Thread.dumpStack(Thread.java:1333)
    at org.apache.geode.internal.tcp.Connection.<init>(Connection.java:1224)
    at org.apache.geode.internal.tcp.Connection.createSender(Connection.java:1025)
    at org.apache.geode.internal.tcp.ConnectionTable.getThreadOwnedConnection(ConnectionTable.java:474)
    at org.apache.geode.internal.tcp.ConnectionTable.get(ConnectionTable.java:577)
    at org.apache.geode.internal.tcp.TCPConduit.getConnection(TCPConduit.java:800)
    at org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:452)
    at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:268)
    at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
    at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)

    Here are the same use cases with additional logging containing thread-owned Connection creation or shared Connection usage:

    Case 1:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 60539 Thread 3: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039519049; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server2:54360)<v71>:41002]

    2. The ServerConnection thread in server1 creates the thread-owned Connection:

    ServerConnection on port 60539 Thread 3: Connection.<init> sender=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519050
    ServerConnection on port 60539 Thread 3: ConnectionTable.get using threadOwnedConnection=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519051

    3. A P2P Listener Thread in server2 creates the receiver Connection:

    P2P Listener Thread /192.168.1.8:45823: Connection.<init> receiver=null(uid=0); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

    4. The unshared P2P message reader in server2 reads the handshake from server1's Connection:

    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: Connection.readHandshakeForReceiver receiver=192.168.1.8(server1:54333)<v70>:41001(uid=10); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

    5. The unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039519051; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
    P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039519052; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

    Case 2:

    The server (server2) that processes the put operation from the client is primary and has conserve-sockets=true.
    The server (server1) that handles the UpdateWithContextMessage has conserve-sockets=false.

    1. A ServerConnection thread in server2 sends the UpdateWithContextMessage:

    ServerConnection on port 60463 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137587; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server1:53948)<v67>:41001]

    2. The ServerConnection thread in server2 uses the shared Connection to server1:

    ServerConnection on port 60463 Thread 1: ConnectionTable.get using sharedConnection=192.168.1.8(server1:53948)<v67>:41001(uid=3); socket=Socket[addr=/192.168.1.8,port=56562,localport=60458]; time=1607039137587

    3. The shared P2P message reader in server1 handles the UpdateWithContextMessage and sends the ReplyMessage using the shared Connection to server2 even though conserve-sockets=false:

    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137588; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137588; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server2:53949)<v67>:41002]
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: ConnectionTable.get using sharedConnection=192.168.1.8(server2:53949)<v67>:41002(uid=2); socket=Socket[addr=192.168.1.8/192.168.1.8,port=46868,localport=60454]; time=1607039137588
    P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

    4. The shared P2P message reader in server2 handles the ReplyMessage:

    P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]
    P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]



    ________________________________
    From: Bruce Schuchardt <br...@vmware.com>
    Sent: Thursday, December 3, 2020 8:18 AM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

    +1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


    On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

        I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
        We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

        Sorry for diverting from the actual email thread subject.

        Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

         -Anil.

        On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

            I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

            One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

            Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

            In each case, the multi-threaded client did:

            - puts
            - gets
            - destroys
            - function updates
            - oql queries

            One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

            Here is some logging from a DistributionMessageObserver that shows that behavior.

            Case 1:

            The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
            The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

            1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

            ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

            2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
            P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

            Case 2:

            The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
            The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

            1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

            ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

            2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
            P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

            3. The shared P2P message reader in server1 handles the ReplyMessage:

            P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
            P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


            ________________________________
            From: Anthony Baker <ba...@vmware.com>
            Sent: Monday, November 23, 2020 2:16 PM
            To: dev@geode.apache.org <de...@geode.apache.org>
            Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

            Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

            I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cagingade%40vmware.com%7C942a8c06f4c6493e37e708d897eabe10%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426381756103833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=gv10IwWsdxCiSQTX23wt1q0eQqeyRuR4wttHtCErewA%3D&amp;reserved=0

            …but I’ve only seen that used to make function threads/sockets follow the correct setting.

            Anthony


            On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

            @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Barrett Oglesby <bo...@vmware.com>.

Anil, you wrote:

- We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements.

It would be cool to eventually remove this property altogether and auto-configure it. Besides the things you mention, another thing that would need to be considered is features being used. For example, wan requires conserve-sockets=false. This discussion maybe should be moved to a different thread so we don't distract from this one.

You also asked:

- Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes?

Since the server doing the op has conserve-sockets=false, and an unshared p2p message reader is used on the remote member that means a dedicated (thread-owned) connection is used.

ConnectionTable.get decides that. Here is a stack for creating a thread-owned sender:

java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1333)
at org.apache.geode.internal.tcp.Connection.<init>(Connection.java:1224)
at org.apache.geode.internal.tcp.Connection.createSender(Connection.java:1025)
at org.apache.geode.internal.tcp.ConnectionTable.getThreadOwnedConnection(ConnectionTable.java:474)
at org.apache.geode.internal.tcp.ConnectionTable.get(ConnectionTable.java:577)
at org.apache.geode.internal.tcp.TCPConduit.getConnection(TCPConduit.java:800)
at org.apache.geode.distributed.internal.direct.DirectChannel.getConnections(DirectChannel.java:452)
at org.apache.geode.distributed.internal.direct.DirectChannel.sendToMany(DirectChannel.java:268)
at org.apache.geode.distributed.internal.direct.DirectChannel.sendToOne(DirectChannel.java:182)
at org.apache.geode.distributed.internal.direct.DirectChannel.send(DirectChannel.java:511)

Here are the same use cases with additional logging containing thread-owned Connection creation or shared Connection usage:

Case 1:

The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

ServerConnection on port 60539 Thread 3: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039519049; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server2:54360)<v71>:41002]

2. The ServerConnection thread in server1 creates the thread-owned Connection:

ServerConnection on port 60539 Thread 3: Connection.<init> sender=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519050
ServerConnection on port 60539 Thread 3: ConnectionTable.get using threadOwnedConnection=192.168.1.8(server2:54360)<v71>:41002(uid=10); socket=Socket[addr=/192.168.1.8,port=45823,localport=60595]; time=1607039519051

3. A P2P Listener Thread in server2 creates the receiver Connection:

P2P Listener Thread /192.168.1.8:45823: Connection.<init> receiver=null(uid=0); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

4. The unshared P2P message reader in server2 reads the handshake from server1's Connection:

P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: Connection.readHandshakeForReceiver receiver=192.168.1.8(server1:54333)<v70>:41001(uid=10); socket=Socket[addr=/192.168.1.8,port=60595,localport=45823]; time=1607039519050

5. The unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039519051; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
P2P message reader for 192.168.1.8(server1:54333)<v70>:41001 unshared ordered uid=10 dom #1 local port=45823 remote port=60595: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039519052; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

Case 2:

The server (server2) that processes the put operation from the client is primary and has conserve-sockets=true.
The server (server1) that handles the UpdateWithContextMessage has conserve-sockets=false.

1. A ServerConnection thread in server2 sends the UpdateWithContextMessage:

ServerConnection on port 60463 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137587; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[192.168.1.8(server1:53948)<v67>:41001]

2. The ServerConnection thread in server2 uses the shared Connection to server1:

ServerConnection on port 60463 Thread 1: ConnectionTable.get using sharedConnection=192.168.1.8(server1:53948)<v67>:41001(uid=3); socket=Socket[addr=/192.168.1.8,port=56562,localport=60458]; time=1607039137587

3. The shared P2P message reader in server1 handles the UpdateWithContextMessage and sends the ReplyMessage using the shared Connection to server2 even though conserve-sockets=false:

P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137588; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=beforeSendMessage; time=1607039137588; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server2:53949)<v67>:41002]
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: ConnectionTable.get using sharedConnection=192.168.1.8(server2:53949)<v67>:41002(uid=2); socket=Socket[addr=192.168.1.8/192.168.1.8,port=46868,localport=60454]; time=1607039137588
P2P message reader for 192.168.1.8(server2:53949)<v67>:41002 shared ordered uid=3 local port=56562 remote port=60458: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=UpdateOperation$UpdateWithContextMessage(...); recipients=[null]

4. The shared P2P message reader in server2 handles the ReplyMessage:

P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=beforeProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]
P2P message reader for 192.168.1.8(server1:53948)<v67>:41001 shared unordered uid=2 local port=46868 remote port=60454: TestDistributionMessageObserver operation=afterProcessMessage; time=1607039137589; message=ReplyMessage processorId=42 from 192.168.1.8(server1:53948)<v67>:41001; recipients=[null]



________________________________
From: Bruce Schuchardt <br...@vmware.com>
Sent: Thursday, December 3, 2020 8:18 AM
To: dev@geode.apache.org <de...@geode.apache.org>
Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

+1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

    I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
    We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

    Sorry for diverting from the actual email thread subject.

    Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

     -Anil.

    On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

        I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

        One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

        Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

        In each case, the multi-threaded client did:

        - puts
        - gets
        - destroys
        - function updates
        - oql queries

        One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

        Here is some logging from a DistributionMessageObserver that shows that behavior.

        Case 1:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

        2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        Case 2:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

        2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        3. The shared P2P message reader in server1 handles the ReplyMessage:

        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


        ________________________________
        From: Anthony Baker <ba...@vmware.com>
        Sent: Monday, November 23, 2020 2:16 PM
        To: dev@geode.apache.org <de...@geode.apache.org>
        Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

        Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

        I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cboglesby%40vmware.com%7C1567b8cda1224c13b94108d897a71378%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426091139731058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BsIg1C31jcF3lxusMMmPiL%2BNV6c%2FsDdUaFN%2F2eTfy7Q%3D&amp;reserved=0

        …but I’ve only seen that used to make function threads/sockets follow the correct setting.

        Anthony


        On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

        @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Bruce Schuchardt <br...@vmware.com>.

+1 for having the default be conserve-sockets=false.   Any time there has been trouble and conserve-sockets=true is involved we always suggest changing it to false.


On 12/3/20, 6:58 AM, "Anilkumar Gingade" <ag...@vmware.com> wrote:

    I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
    We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

    Sorry for diverting from the actual email thread subject.

    Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

     -Anil.

    On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

        I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

        One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

        Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

        In each case, the multi-threaded client did:

        - puts
        - gets
        - destroys
        - function updates
        - oql queries

        One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

        Here is some logging from a DistributionMessageObserver that shows that behavior.

        Case 1:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

        2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        Case 2:

        The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
        The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

        1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

        ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

        2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
        P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

        3. The shared P2P message reader in server1 handles the ReplyMessage:

        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
        P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


        ________________________________
        From: Anthony Baker <ba...@vmware.com>
        Sent: Monday, November 23, 2020 2:16 PM
        To: dev@geode.apache.org <de...@geode.apache.org>
        Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

        Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

        I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cbruces%40vmware.com%7C179cc117e59f4b8e261808d8979bdb82%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637426042962073778%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uIIOR5cUPiYH27dyQ%2BpIXUZd6mGmEAJIbfek%2FNAQMx8%3D&amp;reserved=0

        …but I’ve only seen that used to make function threads/sockets follow the correct setting.

        Anthony


        On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

        @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..

Re: [PROPOSAL] Change the default value of conserve-sockets to false

Posted by Anilkumar Gingade <ag...@vmware.com>.

I was conversing with few of the dev's about requirement of different settings/configuration for set of nodes in the cluster depending on the business/application needs; for example set of nodes serving different kind of application requirement (data store) than other nodes in the cluster (computation heavy). I am calling this as heterogeneous cluster configuration (mostly in large cluster) compared to homogeneous cluster (same config across all the nodes). We need to be thinking both kind of deployment as the business models are moving towards cloud based services more and more for the entire org.
We need to be thinking about auto setting of configuration values (dynamic) based on the load, resource availability and service agreements. We should plan taking few of these settings and build a logic where these can be automatically adjusted.

Sorry for diverting from the actual email thread subject.

Barry, it’s a great find. Will there be dedicated channel for communication from the node where conserve-socket is set to false to the remote nodes.

 -Anil.

On 12/2/20, 3:14 PM, "Barrett Oglesby" <bo...@vmware.com> wrote:

    I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.

    One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true.

    Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true.

    In each case, the multi-threaded client did:

    - puts
    - gets
    - destroys
    - function updates
    - oql queries

    One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was.

    Here is some logging from a DistributionMessageObserver that shows that behavior.

    Case 1:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002]

    2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true:

    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes))
    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
    P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

    Case 2:

    The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true.
    The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false.

    1. A ServerConnection thread in server1 sends the UpdateWithContextMessage:

    ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001]

    2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false:

    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]
    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002]
    P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null]

    3. The shared P2P message reader in server1 handles the ReplyMessage:

    P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]
    P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null]


    ________________________________
    From: Anthony Baker <ba...@vmware.com>
    Sent: Monday, November 23, 2020 2:16 PM
    To: dev@geode.apache.org <de...@geode.apache.org>
    Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false

    Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement.

    I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically:  https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&amp;data=04%7C01%7Cagingade%40vmware.com%7C41e10de941644d1d2a5808d8971810ec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637425476906848869%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=8SUxbCl9zm0nGSM1x%2F4ME7PaaQ1yHnr8hl1deIJo7aE%3D&amp;reserved=0

    …but I’ve only seen that used to make function threads/sockets follow the correct setting.

    Anthony


    On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <ud...@vmware.com>> wrote:

    @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..