You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Rajesh Radhakrishnan <Ra...@phe.gov.uk> on 2016/10/24 11:48:05 UTC

Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

Hi,

I have 3 nodes Cassandra cluster.
Cassandra version : dsc-cassandra-2.1.5
Python Cassandra Driver : 2.5.1

Running the nodes in Red Hat virtual machines.

Node ip info:
Node 1: IP_ADDRESS219
Node 2: IP_ADDRESS229
Node 3: IP_ADDRESS230

(IP_ADDRESS219 is masked for this email which represents something similar 123.321.123.219)

Cassandra.yaml configuration details of node1:

listen_address: IP_ADDRESS219
broadcast_address: commented
rpc_address: IP_ADDRESS219
broadcast_rpc_address : commented

The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.

While the cluster is up and running , when I put Node 3 (IP_ADDRESS230) down, I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229.

But while I was running a Python script which just reads data from Cassandra using the cassandra-python-driver, and when the Node 3 stops(while the script is still running I intentionally stops node3).

Then the script comes to a halt with OperationTimedOut: errors={}, last_host= IP_ADDRESS219.

However if I run the script when node3 is already down, it runs and reads data.

So during the reading operation if any of the nodes in the cluster goes down its affects the client operation??? Does anyone has similar situation.

Here we are trying to establish or prove Cassandra's always on (NO single point of failure). Do you know why this is happening Thank you .

Kind regards,
Rajesh R

**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

Posted by Edward Capriolo <ed...@gmail.com>.

I would suggest you look some existing work
http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html
and attempt to re-create those scenarios and methodologies for failing
nodes and seeing the performance impact.

This would yield faster and more easily verifiable results than starting
from scratch.

On Wed, Oct 26, 2016 at 6:41 AM, Rajesh Radhakrishnan <
Rajesh.Radhakrishnan@phe.gov.uk> wrote:

> Hi Vladimir,
>
> Thank you for the response.
>
> Yes I added all the three node IPs while connecting to the cluster via
> driver.
>
> Its not failed operation. while the script is running and it takes some
> time to read millions of data and during this time , I intentionally put
> one node down to see how the script react.
>
> But entire script stops with timeout error. Why?
>
>
> Kind regards,
> Rajesh R
>
> ------------------------------
> *From:* Vladimir Yudovin [vladyu@winguzone.com]
> *Sent:* 24 October 2016 17:04
> *To:* user
> *Subject:* Re: Error creating pool to /IP_ADDRESS33:9042 (Proving
> Cassandra's NO SINGLE point of failure)
>
> Probably used Python driver can't restart failed operation with connection
> to other node. Do you provide all three IPs to Python driver for connecting?
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone
> <http://redir.aspx?REF=q9_Pf7rrHEjMR8EIfdHVgiWZ_iV6rbXrIm4Qe63q_x23umczjP3TCAFodHRwczovL3dpbmd1em9uZS5jb20_ZnJvbT1saXN0>
> - Hosted Cloud Cassandra Launch your cluster in minutes.*
>
>
> ---- On Mon, 24 Oct 2016 07:48:05 -0400*Rajesh Radhakrishnan
> <Rajesh.Radhakrishnan@phe.gov.uk <Ra...@phe.gov.uk>>*
> wrote ----
>
> Hi,
>
> I have  3 nodes Cassandra cluster.
> Cassandra version : dsc-cassandra-2.1.5
> Python Cassandra Driver : 2.5.1
>
> Running the nodes in Red Hat virtual machines.
>
> Node ip info:
> Node 1: IP_ADDRESS219
> Node 2: IP_ADDRESS229
> Node 3: IP_ADDRESS230
>
>
> (IP_ADDRESS219 is masked for this email which represents something similar
> 123.321.123.219)
>
>
> Cassandra.yaml configuration details of node1:
>
> listen_address: IP_ADDRESS219
> broadcast_address: commented
> rpc_address: IP_ADDRESS219
> broadcast_rpc_address : commented
>
> The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.
>
> While the cluster is up and running , when I put Node 3 (IP_ADDRESS230)
> down, I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229.
>
> But while I was running a Python script which just reads data from
> Cassandra using the cassandra-python-driver, and when the Node 3
> stops(while the script is still running I intentionally stops  node3).
>
> Then the script comes to a halt with OperationTimedOut: errors={},
> last_host= IP_ADDRESS219.
>
> However if I run the script when node3 is already down, it runs and reads
> data.
>
> So during the reading operation  if any of the nodes in the cluster goes
> down its affects the client operation??? Does anyone has similar situation.
>
> Here we are trying to establish or prove Cassandra's always on (NO single
> point of failure).  Do you know why this is happening Thank you .
>
>
>
> Kind regards,
> Rajesh R
>
>
> **************************************************************************
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> <http://redir.aspx?REF=4nIHHjL9OE9Z51xlYElILy2tM4RDy1QBOdkn_dHtpO63umczjP3TCAFodHRwOi8vd3d3Lmdvdi51ay9QSEU.>
> **************************************************************************
>
>
>
> **************************************************************************
> The information contained in the EMail and any attachments is confidential
> and intended solely and for the attention and use of the named
> addressee(s). It may not be disclosed to any other person without the
> express authority of Public Health England, or the intended recipient, or
> both. If you are not the intended recipient, you must not disclose, copy,
> distribute or retain this message or any part of it. This footnote also
> confirms that this EMail has been swept for computer viruses by
> Symantec.Cloud, but please re-sweep any attachments before opening or
> saving. http://www.gov.uk/PHE
> **************************************************************************
>

RE: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

Posted by Rajesh Radhakrishnan <Ra...@phe.gov.uk>.

Hi Vladimir,

Thank you for the response.

Yes I added all the three node IPs while connecting to the cluster via driver.

Its not failed operation. while the script is running and it takes some time to read millions of data and during this time , I intentionally put one node down to see how the script react.

But entire script stops with timeout error. Why?


Kind regards,
Rajesh R

________________________________
From: Vladimir Yudovin [vladyu@winguzone.com]
Sent: 24 October 2016 17:04
To: user
Subject: Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

Probably used Python driver can't restart failed operation with connection to other node. Do you provide all three IPs to Python driver for connecting?

Best regards, Vladimir Yudovin,
Winguzone<redir.aspx?REF=q9_Pf7rrHEjMR8EIfdHVgiWZ_iV6rbXrIm4Qe63q_x23umczjP3TCAFodHRwczovL3dpbmd1em9uZS5jb20_ZnJvbT1saXN0> - Hosted Cloud Cassandra
Launch your cluster in minutes.


---- On Mon, 24 Oct 2016 07:48:05 -0400Rajesh Radhakrishnan <Ra...@phe.gov.uk> wrote ----

Hi,

I have  3 nodes Cassandra cluster.
Cassandra version : dsc-cassandra-2.1.5
Python Cassandra Driver : 2.5.1

Running the nodes in Red Hat virtual machines.

Node ip info:
Node 1: IP_ADDRESS219
Node 2: IP_ADDRESS229
Node 3: IP_ADDRESS230


(IP_ADDRESS219 is masked for this email which represents something similar 123.321.123.219)


Cassandra.yaml configuration details of node1:

listen_address: IP_ADDRESS219
broadcast_address: commented
rpc_address: IP_ADDRESS219
broadcast_rpc_address : commented

The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.

While the cluster is up and running , when I put Node 3 (IP_ADDRESS230) down, I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229.

But while I was running a Python script which just reads data from Cassandra using the cassandra-python-driver, and when the Node 3 stops(while the script is still running I intentionally stops  node3).

Then the script comes to a halt with OperationTimedOut: errors={}, last_host= IP_ADDRESS219.

However if I run the script when node3 is already down, it runs and reads data.

So during the reading operation  if any of the nodes in the cluster goes down its affects the client operation??? Does anyone has similar situation.

Here we are trying to establish or prove Cassandra's always on (NO single point of failure).  Do you know why this is happening Thank you .



Kind regards,
Rajesh R


**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE<redir.aspx?REF=4nIHHjL9OE9Z51xlYElILy2tM4RDy1QBOdkn_dHtpO63umczjP3TCAFodHRwOi8vd3d3Lmdvdi51ay9QSEU.>
**************************************************************************


**************************************************************************
The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
**************************************************************************

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

Posted by Vladimir Yudovin <vl...@winguzone.com>.

Probably used Python driver can't restart failed operation with connection to other node. Do you provide all three IPs to Python driver for connecting?

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

---- On Mon, 24 Oct 2016 07:48:05 -0400Rajesh Radhakrishnan &lt;Rajesh.Radhakrishnan@phe.gov.uk&gt; wrote ----

Hi,

I have 3 nodes Cassandra cluster.

Cassandra version : dsc-cassandra-2.1.5

Python Cassandra Driver : 2.5.1

Running the nodes in Red Hat virtual machines.

Node ip info:

Node 1: IP_ADDRESS219

Node 2: IP_ADDRESS229

Node 3: IP_ADDRESS230

(IP_ADDRESS219 is masked for this email which represents something similar 123.321.123.219)

Cassandra.yaml configuration details of node1:

listen_address: IP_ADDRESS219

broadcast_address: commented

rpc_address: IP_ADDRESS219

broadcast_rpc_address : commented

The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.

While the cluster is up and running , when I put Node 3 (IP_ADDRESS230) down, I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229.

Then the script comes to a halt with OperationTimedOut: errors={}, last_host= IP_ADDRESS219.

However if I run the script when node3 is already down, it runs and reads data.

So during the reading operation if any of the nodes in the cluster goes down its affects the client operation??? Does anyone has similar situation.

Here we are trying to establish or prove Cassandra's always on (NO single point of failure). Do you know why this is happening Thank you .

Kind regards,
Rajesh R

**************************************************************************

The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE

**************************************************************************