You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Bruce Schuchardt <bs...@pivotal.io> on 2016/06/13 22:50:33 UTC

Review Request 48665: GEODE-1542 shared/unordered tcp/ip connection times out, initiating suspicion

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48665/
-----------------------------------------------------------

Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.


Bugs: GEODE-1542
    https://issues.apache.org/jira/browse/GEODE-1542


Repository: geode


Description
-------

This disables timing out of shared/unordered TcpConduit connections.  We don't want them to time out because we are using them to initiate suspect processing on other members.

The ticket also pointed out a problem with the "final check" mechanism in the health monitor.  I tracked that down to improper use of SocketCreator to create the server-socket in GMSHealthMonitor.  It was creating sn SSL socket if SSL is enabled but the client-side of the check uses non-SSL sockets.  I changed the server to use non-SSL sockets as well since no useful information is sent over the final-check TCP/IP connections & they need to be lightweight and fast.

While looking at logs I also found that the heartbeat request sent at the beginning of a final-check had a request-ID even though it's not waiting for a response.  That causes processing of the response to do more work than necessary so I changed it to remove the request-ID from the message.


Diffs
-----

  geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java f27e0b8d238fd4cda3a81a5d1edf199ebeb1c3c7 
  geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 8dce1a5eb46b26a7b9ecc3e5b538d98e7e9f720e 
  geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/HeartbeatRequestMessage.java 3c08e3383c2bf8f53cca291825ddb45ef69d0184 
  geode-core/src/main/java/com/gemstone/gemfire/internal/SocketCreator.java 367d4a7bbe1334ad393da337030a1d02089cfcaf 
  geode-core/src/main/java/com/gemstone/gemfire/internal/tcp/Connection.java 85e351106285db571ecc3f388bbb11979562c3ed 

Diff: https://reviews.apache.org/r/48665/diff/


Testing
-------

precheckin, SSL integration testing.


Thanks,

Bruce Schuchardt


Re: Review Request 48665: GEODE-1542 shared/unordered tcp/ip connection times out, initiating suspicion

Posted by Bruce Schuchardt <bs...@pivotal.io>.

> On June 14, 2016, 4:47 p.m., Jianxia Chen wrote:
> > geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java, line 566
> > <https://reviews.apache.org/r/48665/diff/1/?file=1417316#file1417316line566>
> >
> >     Do you want to keep the suspect member id in the debug message to help identify who the suspect member is? This would be useful when there are more than one suspect members.

The member's ID and failure detection port have already been logged by the same thread.  There's no reason to keep logging the same information over and over again.


> On June 14, 2016, 4:47 p.m., Jianxia Chen wrote:
> > geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java, line 569
> > <https://reviews.apache.org/r/48665/diff/1/?file=1417316#file1417316line569>
> >
> >     Same as above.

Same comment.  No reason to bloat the log files with redundant information.


- Bruce


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48665/#review137535
-----------------------------------------------------------


On June 13, 2016, 10:50 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48665/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 10:50 p.m.)
> 
> 
> Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-1542
>     https://issues.apache.org/jira/browse/GEODE-1542
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This disables timing out of shared/unordered TcpConduit connections.  We don't want them to time out because we are using them to initiate suspect processing on other members.
> 
> The ticket also pointed out a problem with the "final check" mechanism in the health monitor.  I tracked that down to improper use of SocketCreator to create the server-socket in GMSHealthMonitor.  It was creating sn SSL socket if SSL is enabled but the client-side of the check uses non-SSL sockets.  I changed the server to use non-SSL sockets as well since no useful information is sent over the final-check TCP/IP connections & they need to be lightweight and fast.
> 
> While looking at logs I also found that the heartbeat request sent at the beginning of a final-check had a request-ID even though it's not waiting for a response.  That causes processing of the response to do more work than necessary so I changed it to remove the request-ID from the message.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java f27e0b8d238fd4cda3a81a5d1edf199ebeb1c3c7 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 8dce1a5eb46b26a7b9ecc3e5b538d98e7e9f720e 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/HeartbeatRequestMessage.java 3c08e3383c2bf8f53cca291825ddb45ef69d0184 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/SocketCreator.java 367d4a7bbe1334ad393da337030a1d02089cfcaf 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/tcp/Connection.java 85e351106285db571ecc3f388bbb11979562c3ed 
> 
> Diff: https://reviews.apache.org/r/48665/diff/
> 
> 
> Testing
> -------
> 
> precheckin, SSL integration testing.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Re: Review Request 48665: GEODE-1542 shared/unordered tcp/ip connection times out, initiating suspicion

Posted by Jianxia Chen <jc...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48665/#review137535
-----------------------------------------------------------


Fix it, then Ship it!





geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java (line 547)
<https://reviews.apache.org/r/48665/#comment202682>

    Do you want to keep the suspect member id in the debug message to help identify who the suspect member is? This would be useful when there are more than one suspect members.



geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java (line 550)
<https://reviews.apache.org/r/48665/#comment202683>

    Same as above.


- Jianxia Chen


On June 13, 2016, 10:50 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48665/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 10:50 p.m.)
> 
> 
> Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-1542
>     https://issues.apache.org/jira/browse/GEODE-1542
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This disables timing out of shared/unordered TcpConduit connections.  We don't want them to time out because we are using them to initiate suspect processing on other members.
> 
> The ticket also pointed out a problem with the "final check" mechanism in the health monitor.  I tracked that down to improper use of SocketCreator to create the server-socket in GMSHealthMonitor.  It was creating sn SSL socket if SSL is enabled but the client-side of the check uses non-SSL sockets.  I changed the server to use non-SSL sockets as well since no useful information is sent over the final-check TCP/IP connections & they need to be lightweight and fast.
> 
> While looking at logs I also found that the heartbeat request sent at the beginning of a final-check had a request-ID even though it's not waiting for a response.  That causes processing of the response to do more work than necessary so I changed it to remove the request-ID from the message.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java f27e0b8d238fd4cda3a81a5d1edf199ebeb1c3c7 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 8dce1a5eb46b26a7b9ecc3e5b538d98e7e9f720e 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/HeartbeatRequestMessage.java 3c08e3383c2bf8f53cca291825ddb45ef69d0184 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/SocketCreator.java 367d4a7bbe1334ad393da337030a1d02089cfcaf 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/tcp/Connection.java 85e351106285db571ecc3f388bbb11979562c3ed 
> 
> Diff: https://reviews.apache.org/r/48665/diff/
> 
> 
> Testing
> -------
> 
> precheckin, SSL integration testing.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Re: Review Request 48665: GEODE-1542 shared/unordered tcp/ip connection times out, initiating suspicion

Posted by Udo Kohlmeyer <uk...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48665/#review137552
-----------------------------------------------------------


Ship it!




Ship It!

- Udo Kohlmeyer


On June 13, 2016, 10:50 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48665/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 10:50 p.m.)
> 
> 
> Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-1542
>     https://issues.apache.org/jira/browse/GEODE-1542
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This disables timing out of shared/unordered TcpConduit connections.  We don't want them to time out because we are using them to initiate suspect processing on other members.
> 
> The ticket also pointed out a problem with the "final check" mechanism in the health monitor.  I tracked that down to improper use of SocketCreator to create the server-socket in GMSHealthMonitor.  It was creating sn SSL socket if SSL is enabled but the client-side of the check uses non-SSL sockets.  I changed the server to use non-SSL sockets as well since no useful information is sent over the final-check TCP/IP connections & they need to be lightweight and fast.
> 
> While looking at logs I also found that the heartbeat request sent at the beginning of a final-check had a request-ID even though it's not waiting for a response.  That causes processing of the response to do more work than necessary so I changed it to remove the request-ID from the message.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java f27e0b8d238fd4cda3a81a5d1edf199ebeb1c3c7 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 8dce1a5eb46b26a7b9ecc3e5b538d98e7e9f720e 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/HeartbeatRequestMessage.java 3c08e3383c2bf8f53cca291825ddb45ef69d0184 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/SocketCreator.java 367d4a7bbe1334ad393da337030a1d02089cfcaf 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/tcp/Connection.java 85e351106285db571ecc3f388bbb11979562c3ed 
> 
> Diff: https://reviews.apache.org/r/48665/diff/
> 
> 
> Testing
> -------
> 
> precheckin, SSL integration testing.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>


Re: Review Request 48665: GEODE-1542 shared/unordered tcp/ip connection times out, initiating suspicion

Posted by Hitesh Khamesra <hk...@pivotal.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48665/#review137534
-----------------------------------------------------------


Ship it!




Ship It!

- Hitesh Khamesra


On June 13, 2016, 10:50 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48665/
> -----------------------------------------------------------
> 
> (Updated June 13, 2016, 10:50 p.m.)
> 
> 
> Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.
> 
> 
> Bugs: GEODE-1542
>     https://issues.apache.org/jira/browse/GEODE-1542
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> This disables timing out of shared/unordered TcpConduit connections.  We don't want them to time out because we are using them to initiate suspect processing on other members.
> 
> The ticket also pointed out a problem with the "final check" mechanism in the health monitor.  I tracked that down to improper use of SocketCreator to create the server-socket in GMSHealthMonitor.  It was creating sn SSL socket if SSL is enabled but the client-side of the check uses non-SSL sockets.  I changed the server to use non-SSL sockets as well since no useful information is sent over the final-check TCP/IP connections & they need to be lightweight and fast.
> 
> While looking at logs I also found that the heartbeat request sent at the beginning of a final-check had a request-ID even though it's not waiting for a response.  That causes processing of the response to do more work than necessary so I changed it to remove the request-ID from the message.
> 
> 
> Diffs
> -----
> 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/fd/GMSHealthMonitor.java f27e0b8d238fd4cda3a81a5d1edf199ebeb1c3c7 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/membership/GMSJoinLeave.java 8dce1a5eb46b26a7b9ecc3e5b538d98e7e9f720e 
>   geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/membership/gms/messages/HeartbeatRequestMessage.java 3c08e3383c2bf8f53cca291825ddb45ef69d0184 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/SocketCreator.java 367d4a7bbe1334ad393da337030a1d02089cfcaf 
>   geode-core/src/main/java/com/gemstone/gemfire/internal/tcp/Connection.java 85e351106285db571ecc3f388bbb11979562c3ed 
> 
> Diff: https://reviews.apache.org/r/48665/diff/
> 
> 
> Testing
> -------
> 
> precheckin, SSL integration testing.
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>