You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by markap14 <gi...@git.apache.org> on 2016/07/22 13:21:27 UTC
[GitHub] nifi pull request #705: NIFI-2360: Leave ZooKeeper running when a node is di...
GitHub user markap14 opened a pull request:
https://github.com/apache/nifi/pull/705
NIFI-2360: Leave ZooKeeper running when a node is disconnected. Do no\u2026
\u2026t allow the last node in the cluster to be disconnected. Change ClusterProtocoLHeartbeater to use RetryNTime retry strategy instead of RetryForever because web requests could block on this
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/markap14/nifi NIFI-2360
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nifi/pull/705.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #705
----
commit 8ad138d02201d2b593ebb579b51b0c853829b568
Author: Mark Payne <ma...@hotmail.com>
Date: 2016-07-22T13:21:12Z
NIFI-2360: Leave ZooKeeper running when a node is disconnected. Do not allow the last node in the cluster to be disconnected. Change ClusterProtocoLHeartbeater to use RetryNTime retry strategy instead of RetryForever because web requests could block on this
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...
Posted by JPercivall <gi...@git.apache.org>.
Github user JPercivall commented on the issue:
https://github.com/apache/nifi/pull/705
+1
Visually verified code, and did a contrib check. Ran a 3 node secure cluster and tested various combinations of disconnecting and reconnecting nodes and restarting in order to find corner cases and performs as expected. Thanks @markap14, I will merge it in.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...
Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:
https://github.com/apache/nifi/pull/705
@JPercivall - interestingly enough i didn't see this in any of my testing but when i switched back to the branch i saw it immediately when i started up the cluster. A restart fixed it, but then after restarting the cluster another 10 times or so, i was able to replicate the issue again. So it seems a bit sporadic. I was able to address the issue, though. I have updated the PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] nifi pull request #705: NIFI-2360: Leave ZooKeeper running when a node is di...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/nifi/pull/705
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] nifi issue #705: NIFI-2360: Leave ZooKeeper running when a node is disconnec...
Posted by JPercivall <gi...@git.apache.org>.
Github user JPercivall commented on the issue:
https://github.com/apache/nifi/pull/705
Reproducible error:
3 node cluster
stop 2 of the nodes
restart all three nodes cluster
None of them can start up, hit "Unexpected Error".
In the nifi.app logs see a lot of:
2016-07-22 18:13:37,643 WARN [Replicate Request Thread-8] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.Client.handle(Client.java:652) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) ~[jersey-client-1.19.jar:1.19]
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:493) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:687) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_74]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_74]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[na:1.8.0_74]
at java.net.SocketInputStream.read(SocketInputStream.java:170) ~[na:1.8.0_74]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_74]
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[na:1.8.0_74]
at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[na:1.8.0_74]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[na:1.8.0_74]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[na:1.8.0_74]
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[na:1.8.0_74]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_74]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_74]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_74]
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) ~[na:1.8.0_74]
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) ~[na:1.8.0_74]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) ~[na:1.8.0_74]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) ~[na:1.8.0_74]
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[na:1.8.0_74]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) ~[na:1.8.0_74]
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ~[jersey-client-1.19.jar:1.19]
... 12 common frames omitted
2016-07-22 18:13:37,652 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/controller/bulletins to localhost:8481 due to {}
com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.Client.handle(Client.java:652) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.filter.GZIPContentEncodingFilter.handle(GZIPContentEncodingFilter.java:123) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:509) ~[jersey-client-1.19.jar:1.19]
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator.replicateRequest(ThreadPoolRequestReplicator.java:493) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator$NodeHttpRequest.run(ThreadPoolRequestReplicator.java:687) ~[nifi-framework-cluster-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_74]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.8.0_74]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[na:1.8.0_74]
at java.net.SocketInputStream.read(SocketInputStream.java:170) ~[na:1.8.0_74]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_74]
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[na:1.8.0_74]
at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[na:1.8.0_74]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[na:1.8.0_74]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[na:1.8.0_74]
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[na:1.8.0_74]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[na:1.8.0_74]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[na:1.8.0_74]
at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[na:1.8.0_74]
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) ~[na:1.8.0_74]
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) ~[na:1.8.0_74]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) ~[na:1.8.0_74]
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) ~[na:1.8.0_74]
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[na:1.8.0_74]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338) ~[na:1.8.0_74]
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253) ~[jersey-client-1.19.jar:1.19]
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ~[jersey-client-1.19.jar:1.19]
... 12 common frames omitted
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---