You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ajay Garg <aj...@gmail.com> on 2015/10/07 17:56:02 UTC

Is replication possible with already existing data?

Hi All.

We have a scenario, where till now we had been using a plain, simple
single node, with the keyspace created using ::

CREATE KEYSPACE our_db WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'}  AND durable_writes = true;


We now plan to introduce replication (in the true sense) in our scheme
of things, but cannot afford to lose any data.
We, however can take a bit of downtime, and do any data-migration if
required (we have already done data-migration once in the past, when
we moved our plain, simple single node from one physical machine to
another).


So,

a)
Is it possible at all to introduce replication in our scenario?
If yes, what needs to be done to NOT LOSE our current existing data?

b)
Also, will "NetworkTopologyStrategy" work in our scenario (since
NetworkTopologyStrategy seems to be more robust)?


Brief pointers to above will give huge confidence-boosts in our endeavours.


Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Bingo !!!

Using "LoadBalancingPolicy" did the trick.
Exactly what was needed !!!


Thanks and Regards,
Ajay

On Sun, Oct 25, 2015 at 5:52 PM, Ryan Svihla <rs...@foundev.pro> wrote:

> Ajay,
>
> So It's the default driver behavior to pin requests to the first data
> center it connects to (DCAwareRoundRobin strategy). but let me explain why
> this is.
>
> I think you're thinking about data centers in Cassandra as a unit of
> failure, and while you can have say a rack fail, as you scale up and use
> rack awareness, it's rare you lose a whole "data center" in the sense
> you're thinking about, so lets reset a bit:
>
>    1. If I'm designing a multidc architecture, usually the nature of
>    latency I will not want my app servers connecting _across_ data centers.
>    2. So since the common desire is not to magically have very high
>    latency requests  bleed out to remote data centers, the default behavior of
>    the driver is to pin to the first data center it connects too, you can
>    change this with a different Load Balancing Policy (
>    http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/LoadBalancingPolicy.html
>    )
>    3. However, I generally do NOT advise users connecting to an app
>    server from another data center, since Cassandra is a masterless
>    architecture you typically have issues that affect nodes, and not an entire
>    data center and if they affect an entire data center (say the intra DC link
>    is down) then it's going to affect your app server as well!
>
> So for new users, I typically just recommend pinning an app server to a DC
> and do your data center level switching further up. You can get more
> advanced and handle bleed out later, but you have to think of latencies.
>
> Final point, rely on repairs for your data consistency, hints are great
> and all but repair is how you make sure you're in sync.
>
> On Sun, Oct 25, 2015 at 3:10 AM, Ajay Garg <aj...@gmail.com> wrote:
>
>> Some more observations ::
>>
>> a)
>> CAS11 and CAS12 are down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, even then the exception occurs.
>>
>> b)
>> CAS11 down, CAS12 up, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>> c)
>> CAS11 up, CAS12 down, CAS21 and CAS22 up.
>> If I connect via the driver to the cluster using only CAS21 and CAS22 as
>> contact-points, then connection goes fine.
>>
>>
>> Seems the java-driver is kinda always requiring either one of CAS11 or
>> CAS12 to be up (although the expectation is that the driver must work fine
>> if ANY of the 4 nodes is up).
>>
>>
>> Thoughts, experts !? :)
>>
>>
>>
>> On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg <aj...@gmail.com>
>> wrote:
>>
>>> Ideas please, on what I may be doing wrong?
>>>
>>> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <aj...@gmail.com>
>>> wrote:
>>>
>>>> Hi All.
>>>>
>>>> I have been doing extensive testing, and replication works fine, even
>>>> if any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought
>>>> up. Syncing always takes place (obviously, as long as
>>>> continuous-downtime-value does not exceed *max_hint_window_in_ms*).
>>>>
>>>>
>>>> However, things behave weird when I try connecting via DataStax
>>>> Java-Driver.
>>>> I always add the nodes to the cluster in the order ::
>>>>
>>>>                          CAS11, CAS12, CAS21, CAS22
>>>>
>>>> during "cluster.connect" method.
>>>>
>>>>
>>>> Now, following happens ::
>>>>
>>>> a)
>>>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>>>> and later replicated to CAS21 and CAS22).
>>>>
>>>> b)
>>>> If CAS11 and CAS12 go down, data is NOT persisted.
>>>> Instead the following exceptions are observed in the Java-Driver ::
>>>>
>>>>
>>>> ##################################################################################
>>>> Exception in thread "main"
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (no host was tried)
>>>>     at
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>>>     at
>>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>>>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>>>>     at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>>>>     at
>>>> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>>>>     at
>>>> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
>>>> Caused by:
>>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>>> tried for query failed (no host was tried)
>>>>     at
>>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>>>     at
>>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>>>>     at
>>>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>>>>     at
>>>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>>>>     at
>>>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>>>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>>>>     ... 3 more
>>>>
>>>> ###################################################################################
>>>>
>>>>
>>>> I have already tried ::
>>>>
>>>> 1)
>>>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>>>
>>>> 2)
>>>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>>>
>>>> 3)
>>>> I have also confirmed that each of the 4 nodes are telnet-able over
>>>> ports 9042 and 9160 each.
>>>>
>>>>
>>>> Definitely seems to be some driver-issue, since
>>>> data-persistence/replication works perfect (with any permutation) if
>>>> data-persistence is done via "cqlsh".
>>>>
>>>>
>>>> Kindly provide some pointers.
>>>> Ultimately, it is the Java-driver that will be used in production, so
>>>> it is imperative that data-persistence/replication happens for any downing
>>>> of any permutation of node(s).
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Ajay
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
>
> Thanks,
> Ryan Svihla
>
>


-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ryan Svihla <rs...@foundev.pro>.

Ajay,

So It's the default driver behavior to pin requests to the first data
center it connects to (DCAwareRoundRobin strategy). but let me explain why
this is.

I think you're thinking about data centers in Cassandra as a unit of
failure, and while you can have say a rack fail, as you scale up and use
rack awareness, it's rare you lose a whole "data center" in the sense
you're thinking about, so lets reset a bit:

   1. If I'm designing a multidc architecture, usually the nature of
   latency I will not want my app servers connecting _across_ data centers.
   2. So since the common desire is not to magically have very high latency
   requests  bleed out to remote data centers, the default behavior of the
   driver is to pin to the first data center it connects too, you can change
   this with a different Load Balancing Policy (
   http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/LoadBalancingPolicy.html
   )
   3. However, I generally do NOT advise users connecting to an app server
   from another data center, since Cassandra is a masterless architecture you
   typically have issues that affect nodes, and not an entire data center and
   if they affect an entire data center (say the intra DC link is down) then
   it's going to affect your app server as well!

So for new users, I typically just recommend pinning an app server to a DC
and do your data center level switching further up. You can get more
advanced and handle bleed out later, but you have to think of latencies.

Final point, rely on repairs for your data consistency, hints are great and
all but repair is how you make sure you're in sync.

On Sun, Oct 25, 2015 at 3:10 AM, Ajay Garg <aj...@gmail.com> wrote:

> Some more observations ::
>
> a)
> CAS11 and CAS12 are down, CAS21 and CAS22 up.
> If I connect via the driver to the cluster using only CAS21 and CAS22 as
> contact-points, even then the exception occurs.
>
> b)
> CAS11 down, CAS12 up, CAS21 and CAS22 up.
> If I connect via the driver to the cluster using only CAS21 and CAS22 as
> contact-points, then connection goes fine.
>
> c)
> CAS11 up, CAS12 down, CAS21 and CAS22 up.
> If I connect via the driver to the cluster using only CAS21 and CAS22 as
> contact-points, then connection goes fine.
>
>
> Seems the java-driver is kinda always requiring either one of CAS11 or
> CAS12 to be up (although the expectation is that the driver must work fine
> if ANY of the 4 nodes is up).
>
>
> Thoughts, experts !? :)
>
>
>
> On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg <aj...@gmail.com> wrote:
>
>> Ideas please, on what I may be doing wrong?
>>
>> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <aj...@gmail.com>
>> wrote:
>>
>>> Hi All.
>>>
>>> I have been doing extensive testing, and replication works fine, even if
>>> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
>>> Syncing always takes place (obviously, as long as continuous-downtime-value
>>> does not exceed *max_hint_window_in_ms*).
>>>
>>>
>>> However, things behave weird when I try connecting via DataStax
>>> Java-Driver.
>>> I always add the nodes to the cluster in the order ::
>>>
>>>                          CAS11, CAS12, CAS21, CAS22
>>>
>>> during "cluster.connect" method.
>>>
>>>
>>> Now, following happens ::
>>>
>>> a)
>>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>>> and later replicated to CAS21 and CAS22).
>>>
>>> b)
>>> If CAS11 and CAS12 go down, data is NOT persisted.
>>> Instead the following exceptions are observed in the Java-Driver ::
>>>
>>>
>>> ##################################################################################
>>> Exception in thread "main"
>>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>>> tried for query failed (no host was tried)
>>>     at
>>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>>     at
>>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>>>     at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>>>     at
>>> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>>>     at
>>> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
>>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>>> All host(s) tried for query failed (no host was tried)
>>>     at
>>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>>     at
>>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>>>     at
>>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>>>     at
>>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>>>     at
>>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>>>     ... 3 more
>>>
>>> ###################################################################################
>>>
>>>
>>> I have already tried ::
>>>
>>> 1)
>>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>>
>>> 2)
>>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>>
>>> 3)
>>> I have also confirmed that each of the 4 nodes are telnet-able over
>>> ports 9042 and 9160 each.
>>>
>>>
>>> Definitely seems to be some driver-issue, since
>>> data-persistence/replication works perfect (with any permutation) if
>>> data-persistence is done via "cqlsh".
>>>
>>>
>>> Kindly provide some pointers.
>>> Ultimately, it is the Java-driver that will be used in production, so it
>>> is imperative that data-persistence/replication happens for any downing of
>>> any permutation of node(s).
>>>
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 

Thanks,
Ryan Svihla

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Some more observations ::

a)
CAS11 and CAS12 are down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, even then the exception occurs.

b)
CAS11 down, CAS12 up, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.

c)
CAS11 up, CAS12 down, CAS21 and CAS22 up.
If I connect via the driver to the cluster using only CAS21 and CAS22 as
contact-points, then connection goes fine.


Seems the java-driver is kinda always requiring either one of CAS11 or
CAS12 to be up (although the expectation is that the driver must work fine
if ANY of the 4 nodes is up).


Thoughts, experts !? :)



On Sat, Oct 24, 2015 at 9:40 PM, Ajay Garg <aj...@gmail.com> wrote:

> Ideas please, on what I may be doing wrong?
>
> On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <aj...@gmail.com> wrote:
>
>> Hi All.
>>
>> I have been doing extensive testing, and replication works fine, even if
>> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
>> Syncing always takes place (obviously, as long as continuous-downtime-value
>> does not exceed *max_hint_window_in_ms*).
>>
>>
>> However, things behave weird when I try connecting via DataStax
>> Java-Driver.
>> I always add the nodes to the cluster in the order ::
>>
>>                          CAS11, CAS12, CAS21, CAS22
>>
>> during "cluster.connect" method.
>>
>>
>> Now, following happens ::
>>
>> a)
>> If CAS11 goes down, data is persisted fine (presumably first in CAS12,
>> and later replicated to CAS21 and CAS22).
>>
>> b)
>> If CAS11 and CAS12 go down, data is NOT persisted.
>> Instead the following exceptions are observed in the Java-Driver ::
>>
>>
>> ##################################################################################
>> Exception in thread "main"
>> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
>> tried for query failed (no host was tried)
>>     at
>> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>>     at
>> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>>     at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>>     at
>> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>>     at
>> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
>> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
>> All host(s) tried for query failed (no host was tried)
>>     at
>> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>>     at
>> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>>     at
>> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>>     at
>> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>>     at
>> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>>     at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>>     ... 3 more
>>
>> ###################################################################################
>>
>>
>> I have already tried ::
>>
>> 1)
>> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>>
>> 2)
>> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>>
>> 3)
>> I have also confirmed that each of the 4 nodes are telnet-able over ports
>> 9042 and 9160 each.
>>
>>
>> Definitely seems to be some driver-issue, since
>> data-persistence/replication works perfect (with any permutation) if
>> data-persistence is done via "cqlsh".
>>
>>
>> Kindly provide some pointers.
>> Ultimately, it is the Java-driver that will be used in production, so it
>> is imperative that data-persistence/replication happens for any downing of
>> any permutation of node(s).
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Ideas please, on what I may be doing wrong?

On Sat, Oct 24, 2015 at 5:48 PM, Ajay Garg <aj...@gmail.com> wrote:

> Hi All.
>
> I have been doing extensive testing, and replication works fine, even if
> any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
> Syncing always takes place (obviously, as long as continuous-downtime-value
> does not exceed *max_hint_window_in_ms*).
>
>
> However, things behave weird when I try connecting via DataStax
> Java-Driver.
> I always add the nodes to the cluster in the order ::
>
>                          CAS11, CAS12, CAS21, CAS22
>
> during "cluster.connect" method.
>
>
> Now, following happens ::
>
> a)
> If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
> later replicated to CAS21 and CAS22).
>
> b)
> If CAS11 and CAS12 go down, data is NOT persisted.
> Instead the following exceptions are observed in the Java-Driver ::
>
>
> ##################################################################################
> Exception in thread "main"
> com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
> tried for query failed (no host was tried)
>     at
> com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
>     at
> com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
>     at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
>     at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
>     at
> com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
>     at
> com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
> Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
> All host(s) tried for query failed (no host was tried)
>     at
> com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
>     at
> com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
>     at
> com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
>     at
> com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
>     at
> com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
>     at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
>     ... 3 more
>
> ###################################################################################
>
>
> I have already tried ::
>
> 1)
> Increasing driver-read-timeout from 12 seconds to 30 seconds.
>
> 2)
> Increasing driver-connect-timeout from 5 seconds to 30 seconds.
>
> 3)
> I have also confirmed that each of the 4 nodes are telnet-able over ports
> 9042 and 9160 each.
>
>
> Definitely seems to be some driver-issue, since
> data-persistence/replication works perfect (with any permutation) if
> data-persistence is done via "cqlsh".
>
>
> Kindly provide some pointers.
> Ultimately, it is the Java-driver that will be used in production, so it
> is imperative that data-persistence/replication happens for any downing of
> any permutation of node(s).
>
>
> Thanks and Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Hi All.

I have been doing extensive testing, and replication works fine, even if
any permuatation of CAS11, CAS12, CAS21, CAS22 are downed and brought up.
Syncing always takes place (obviously, as long as continuous-downtime-value
does not exceed *max_hint_window_in_ms*).


However, things behave weird when I try connecting via DataStax Java-Driver.
I always add the nodes to the cluster in the order ::

                         CAS11, CAS12, CAS21, CAS22

during "cluster.connect" method.


Now, following happens ::

a)
If CAS11 goes down, data is persisted fine (presumably first in CAS12, and
later replicated to CAS21 and CAS22).

b)
If CAS11 and CAS12 go down, data is NOT persisted.
Instead the following exceptions are observed in the Java-Driver ::

##################################################################################
Exception in thread "main"
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (no host was tried)
    at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
    at
com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:258)
    at com.datastax.driver.core.Cluster.connect(Cluster.java:267)
    at com.example.cassandra.SimpleClient.connect(SimpleClient.java:43)
    at
com.example.cassandra.SimpleClientTest.setUp(SimpleClientTest.java:50)
    at com.example.cassandra.SimpleClientTest.main(SimpleClientTest.java:86)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (no host was tried)
    at
com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
    at
com.datastax.driver.core.SessionManager.execute(SessionManager.java:446)
    at
com.datastax.driver.core.SessionManager.executeQuery(SessionManager.java:482)
    at
com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:88)
    at
com.datastax.driver.core.AbstractSession.executeAsync(AbstractSession.java:60)
    at com.datastax.driver.core.Cluster.connect(Cluster.java:260)
    ... 3 more
###################################################################################


I have already tried ::

1)
Increasing driver-read-timeout from 12 seconds to 30 seconds.

2)
Increasing driver-connect-timeout from 5 seconds to 30 seconds.

3)
I have also confirmed that each of the 4 nodes are telnet-able over ports
9042 and 9160 each.


Definitely seems to be some driver-issue, since
data-persistence/replication works perfect (with any permutation) if
data-persistence is done via "cqlsh".


Kindly provide some pointers.
Ultimately, it is the Java-driver that will be used in production, so it is
imperative that data-persistence/replication happens for any downing of any
permutation of node(s).


Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Thanks Steve and Michael.

Simply uncommenting "initial_token" did the trick !!!

Right now, I was evaluating replication, for the case when everything is a
clean install.
Will now try my hands on integrating/starting replication, with
pre-existing data.


Once again, thanks a ton for all the help guys !!!


Thanks and Regards,
Ajay

On Sat, Oct 24, 2015 at 2:06 AM, Steve Robenalt <sr...@highwire.org>
wrote:

> Hi Ajay,
>
> Please take a look at the cassandra.yaml configuration reference regarding
> intial_token and num_tokens:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token
>
> This is basically what Michael was referring to in his earlier message.
> Setting an initial token overrode your num_tokens setting on initial
> startup, but after initial startup, the initial token setting is ignored,
> so num_tokens comes into play, attempting to start up with 256 vnodes.
> That's where your error comes from.
>
> It's likely that all of your nodes started up like this since you have the
> same config on all of them (hopefully, you at least changed initial_token
> for each node).
>
> After reviewing the doc on the two sections above, you'll need to decide
> which path to take to recover. You can likely bring the downed node up by
> setting num_tokens to 1 (which you'd need to do on all nodes), in which
> case you're not really running vnodes. Alternately, you can migrate the
> cluster to vnodes:
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html
>
> BTW, I recommend carefully reviewing the cassandra.yaml configuration
> reference for ANY change you make from the default. As you've experienced
> here, not all settings are intended to work together.
>
> HTH,
> Steve
>
>
>
> On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg <aj...@gmail.com>
> wrote:
>
>> Any ideas, please?
>> To repeat, we are using the exact same cassandra-version on all 4 nodes
>> (2.1.10).
>>
>> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg <aj...@gmail.com>
>> wrote:
>>
>>> Hi Michael.
>>>
>>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>>> the rest of the three nodes are also exactly the same, except the
>>> "initial_token" and "listen_address" fields) ::
>>>
>>> CAS11 ::
>>>
>>>
>>>
>>> What changes need to be made, so that whenever a downed server comes
>>> back up, the missing data comes back over to it?
>>>
>>> Thanks and Regards,
>>> Ajay
>>>
>>>
>>>
>>> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mi...@pbandjelly.org>
>>> wrote:
>>>
>>>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>>>
>>>>> However, CAS11 refuses to come up now.
>>>>> Following is the error in /var/log/cassandra/system.log ::
>>>>>
>>>>>
>>>>> ################################################################
>>>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>>>> configuration error
>>>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>>>> the number of tokens from 1 to 256
>>>>>
>>>>
>>>> Check your cassandra.yaml - this node has vnodes enabled in the
>>>> configuration when it did not, previously. Check all nodes. Something
>>>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ajay
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Steve Robenalt
> Software Architect
> srobenalt@highwire.org <bz...@highwire.org>
> (office/cell): 916-505-1785
>
> HighWire Press, Inc.
> 425 Broadway St, Redwood City, CA 94063
> www.highwire.org
>
> Technology for Scholarly Communication
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Steve Robenalt <sr...@highwire.org>.

Hi Ajay,

Please take a look at the cassandra.yaml configuration reference regarding
intial_token and num_tokens:

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__initial_token

This is basically what Michael was referring to in his earlier message.
Setting an initial token overrode your num_tokens setting on initial
startup, but after initial startup, the initial token setting is ignored,
so num_tokens comes into play, attempting to start up with 256 vnodes.
That's where your error comes from.

It's likely that all of your nodes started up like this since you have the
same config on all of them (hopefully, you at least changed initial_token
for each node).

After reviewing the doc on the two sections above, you'll need to decide
which path to take to recover. You can likely bring the downed node up by
setting num_tokens to 1 (which you'd need to do on all nodes), in which
case you're not really running vnodes. Alternately, you can migrate the
cluster to vnodes:

http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html

BTW, I recommend carefully reviewing the cassandra.yaml configuration
reference for ANY change you make from the default. As you've experienced
here, not all settings are intended to work together.

HTH,
Steve



On Fri, Oct 23, 2015 at 12:07 PM, Ajay Garg <aj...@gmail.com> wrote:

> Any ideas, please?
> To repeat, we are using the exact same cassandra-version on all 4 nodes
> (2.1.10).
>
> On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg <aj...@gmail.com> wrote:
>
>> Hi Michael.
>>
>> Please find below the contents of cassandra.yaml for CAS11 (the files on
>> the rest of the three nodes are also exactly the same, except the
>> "initial_token" and "listen_address" fields) ::
>>
>> CAS11 ::
>>
>> ####################################
>> cluster_name: 'InstaMsg Cluster'
>> num_tokens: 256
>> initial_token: -9223372036854775808
>> hinted_handoff_enabled: true
>> max_hint_window_in_ms: 10800000 # 3 hours
>> hinted_handoff_throttle_in_kb: 1024
>> max_hints_delivery_threads: 2
>> batchlog_replay_throttle_in_kb: 1024
>> authenticator: AllowAllAuthenticator
>> authorizer: AllowAllAuthorizer
>> permissions_validity_in_ms: 2000
>> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> data_file_directories:
>>     - /var/lib/cassandra/data
>>
>> commitlog_directory: /var/lib/cassandra/commitlog
>>
>> disk_failure_policy: stop
>> commit_failure_policy: stop
>> key_cache_size_in_mb:
>> key_cache_save_period: 14400
>> row_cache_size_in_mb: 0
>> row_cache_save_period: 0
>> counter_cache_size_in_mb:
>> counter_cache_save_period: 7200
>> saved_caches_directory: /var/lib/cassandra/saved_caches
>> commitlog_sync: periodic
>> commitlog_sync_period_in_ms: 10000
>> commitlog_segment_size_in_mb: 32
>> seed_provider:
>>     - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>       parameters:
>>           - seeds: "104.239.200.33,119.9.92.77"
>>
>> concurrent_reads: 32
>> concurrent_writes: 32
>> concurrent_counter_writes: 32
>>
>> memtable_allocation_type: heap_buffers
>>
>> index_summary_capacity_in_mb:
>> index_summary_resize_interval_in_minutes: 60
>> trickle_fsync: false
>> trickle_fsync_interval_in_kb: 10240
>> storage_port: 7000
>> ssl_storage_port: 7001
>> listen_address: 104.239.200.33
>> start_native_transport: true
>> native_transport_port: 9042
>> start_rpc: true
>> rpc_address: localhost
>> rpc_port: 9160
>> rpc_keepalive: true
>>
>> rpc_server_type: sync
>> thrift_framed_transport_size_in_mb: 15
>> incremental_backups: false
>> snapshot_before_compaction: false
>> auto_snapshot: true
>>
>> tombstone_warn_threshold: 1000
>> tombstone_failure_threshold: 100000
>>
>> column_index_size_in_kb: 64
>> batch_size_warn_threshold_in_kb: 5
>>
>> compaction_throughput_mb_per_sec: 16
>> compaction_large_partition_warning_threshold_mb: 100
>>
>> sstable_preemptive_open_interval_in_mb: 50
>>
>> read_request_timeout_in_ms: 5000
>> range_request_timeout_in_ms: 10000
>>
>> write_request_timeout_in_ms: 2000
>> counter_write_request_timeout_in_ms: 5000
>> cas_contention_timeout_in_ms: 1000
>> truncate_request_timeout_in_ms: 60000
>> request_timeout_in_ms: 10000
>> cross_node_timeout: false
>> endpoint_snitch: PropertyFileSnitch
>>
>> dynamic_snitch_update_interval_in_ms: 100
>> dynamic_snitch_reset_interval_in_ms: 600000
>> dynamic_snitch_badness_threshold: 0.1
>>
>> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>>
>> server_encryption_options:
>>     internode_encryption: none
>>     keystore: conf/.keystore
>>     keystore_password: cassandra
>>     truststore: conf/.truststore
>>     truststore_password: cassandra
>>
>> client_encryption_options:
>>     enabled: false
>>     keystore: conf/.keystore
>>     keystore_password: cassandra
>>
>> internode_compression: all
>> inter_dc_tcp_nodelay: false
>> ####################################
>>
>>
>> What changes need to be made, so that whenever a downed server comes back
>> up, the missing data comes back over to it?
>>
>> Thanks and Regards,
>> Ajay
>>
>>
>>
>> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mi...@pbandjelly.org>
>> wrote:
>>
>>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>>
>>>> However, CAS11 refuses to come up now.
>>>> Following is the error in /var/log/cassandra/system.log ::
>>>>
>>>>
>>>> ################################################################
>>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>>> configuration error
>>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>>> the number of tokens from 1 to 256
>>>>
>>>
>>> Check your cassandra.yaml - this node has vnodes enabled in the
>>> configuration when it did not, previously. Check all nodes. Something
>>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>>
>>> --
>>> Kind regards,
>>> Michael
>>>
>>
>>
>>
>> --
>> Regards,
>> Ajay
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Steve Robenalt
Software Architect
srobenalt@highwire.org <bz...@highwire.org>
(office/cell): 916-505-1785

HighWire Press, Inc.
425 Broadway St, Redwood City, CA 94063
www.highwire.org

Technology for Scholarly Communication

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Any ideas, please?
To repeat, we are using the exact same cassandra-version on all 4 nodes
(2.1.10).

On Fri, Oct 23, 2015 at 9:43 AM, Ajay Garg <aj...@gmail.com> wrote:

> Hi Michael.
>
> Please find below the contents of cassandra.yaml for CAS11 (the files on
> the rest of the three nodes are also exactly the same, except the
> "initial_token" and "listen_address" fields) ::
>
> CAS11 ::
>
> ####################################
> cluster_name: 'InstaMsg Cluster'
> num_tokens: 256
> initial_token: -9223372036854775808
> hinted_handoff_enabled: true
> max_hint_window_in_ms: 10800000 # 3 hours
> hinted_handoff_throttle_in_kb: 1024
> max_hints_delivery_threads: 2
> batchlog_replay_throttle_in_kb: 1024
> authenticator: AllowAllAuthenticator
> authorizer: AllowAllAuthorizer
> permissions_validity_in_ms: 2000
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> data_file_directories:
>     - /var/lib/cassandra/data
>
> commitlog_directory: /var/lib/cassandra/commitlog
>
> disk_failure_policy: stop
> commit_failure_policy: stop
> key_cache_size_in_mb:
> key_cache_save_period: 14400
> row_cache_size_in_mb: 0
> row_cache_save_period: 0
> counter_cache_size_in_mb:
> counter_cache_save_period: 7200
> saved_caches_directory: /var/lib/cassandra/saved_caches
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 10000
> commitlog_segment_size_in_mb: 32
> seed_provider:
>     - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>       parameters:
>           - seeds: "104.239.200.33,119.9.92.77"
>
> concurrent_reads: 32
> concurrent_writes: 32
> concurrent_counter_writes: 32
>
> memtable_allocation_type: heap_buffers
>
> index_summary_capacity_in_mb:
> index_summary_resize_interval_in_minutes: 60
> trickle_fsync: false
> trickle_fsync_interval_in_kb: 10240
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: 104.239.200.33
> start_native_transport: true
> native_transport_port: 9042
> start_rpc: true
> rpc_address: localhost
> rpc_port: 9160
> rpc_keepalive: true
>
> rpc_server_type: sync
> thrift_framed_transport_size_in_mb: 15
> incremental_backups: false
> snapshot_before_compaction: false
> auto_snapshot: true
>
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 100000
>
> column_index_size_in_kb: 64
> batch_size_warn_threshold_in_kb: 5
>
> compaction_throughput_mb_per_sec: 16
> compaction_large_partition_warning_threshold_mb: 100
>
> sstable_preemptive_open_interval_in_mb: 50
>
> read_request_timeout_in_ms: 5000
> range_request_timeout_in_ms: 10000
>
> write_request_timeout_in_ms: 2000
> counter_write_request_timeout_in_ms: 5000
> cas_contention_timeout_in_ms: 1000
> truncate_request_timeout_in_ms: 60000
> request_timeout_in_ms: 10000
> cross_node_timeout: false
> endpoint_snitch: PropertyFileSnitch
>
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 600000
> dynamic_snitch_badness_threshold: 0.1
>
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>
> server_encryption_options:
>     internode_encryption: none
>     keystore: conf/.keystore
>     keystore_password: cassandra
>     truststore: conf/.truststore
>     truststore_password: cassandra
>
> client_encryption_options:
>     enabled: false
>     keystore: conf/.keystore
>     keystore_password: cassandra
>
> internode_compression: all
> inter_dc_tcp_nodelay: false
> ####################################
>
>
> What changes need to be made, so that whenever a downed server comes back
> up, the missing data comes back over to it?
>
> Thanks and Regards,
> Ajay
>
>
>
> On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mi...@pbandjelly.org>
> wrote:
>
>> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>>
>>> However, CAS11 refuses to come up now.
>>> Following is the error in /var/log/cassandra/system.log ::
>>>
>>>
>>> ################################################################
>>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>>> the number of tokens from 1 to 256
>>>
>>
>> Check your cassandra.yaml - this node has vnodes enabled in the
>> configuration when it did not, previously. Check all nodes. Something
>> changed. Mixed vnode/non-vnode clusters is bad juju.
>>
>> --
>> Kind regards,
>> Michael
>>
>
>
>
> --
> Regards,
> Ajay
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Hi Michael.

Please find below the contents of cassandra.yaml for CAS11 (the files on
the rest of the three nodes are also exactly the same, except the
"initial_token" and "listen_address" fields) ::

CAS11 ::

####################################
cluster_name: 'InstaMsg Cluster'
num_tokens: 256
initial_token: -9223372036854775808
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
    - /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "104.239.200.33,119.9.92.77"

concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32

memtable_allocation_type: heap_buffers

index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: 104.239.200.33
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true

rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000

column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5

compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

sstable_preemptive_open_interval_in_mb: 50

read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000

write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: PropertyFileSnitch

dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1

request_scheduler: org.apache.cassandra.scheduler.NoScheduler

server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra

client_encryption_options:
    enabled: false
    keystore: conf/.keystore
    keystore_password: cassandra

internode_compression: all
inter_dc_tcp_nodelay: false
####################################


What changes need to be made, so that whenever a downed server comes back
up, the missing data comes back over to it?

Thanks and Regards,
Ajay



On Fri, Oct 23, 2015 at 9:05 AM, Michael Shuler <mi...@pbandjelly.org>
wrote:

> On 10/22/2015 10:14 PM, Ajay Garg wrote:
>
>> However, CAS11 refuses to come up now.
>> Following is the error in /var/log/cassandra/system.log ::
>>
>>
>> ################################################################
>> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
>> the number of tokens from 1 to 256
>>
>
> Check your cassandra.yaml - this node has vnodes enabled in the
> configuration when it did not, previously. Check all nodes. Something
> changed. Mixed vnode/non-vnode clusters is bad juju.
>
> --
> Kind regards,
> Michael
>



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Michael Shuler <mi...@pbandjelly.org>.

On 10/22/2015 10:14 PM, Ajay Garg wrote:
> However, CAS11 refuses to come up now.
> Following is the error in /var/log/cassandra/system.log ::
>
>
> ################################################################
> ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Cannot change
> the number of tokens from 1 to 256

Check your cassandra.yaml - this node has vnodes enabled in the 
configuration when it did not, previously. Check all nodes. Something 
changed. Mixed vnode/non-vnode clusters is bad juju.

-- 
Kind regards,
Michael

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Hi Carlos.


I setup a following setup ::

CAS11 and CAS12 in DC1
CAS21 and CAS22 in DC2

a)
Brought all the 4 up, replication worked perfect !!!

b)
Thereafter, downed CAS11 via "sudo service cassandra stop".
Replication continued to work fine on CAS12, CAS21 and CAS22.

c)
Thereafter, upped CAS11 via "sudo service cassandra start".


However, CAS11 refuses to come up now.
Following is the error in /var/log/cassandra/system.log ::


################################################################
ERROR [main] 2015-10-23 03:07:34,242 CassandraDaemon.java:391 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Cannot change the
number of tokens from 1 to 256
        at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:966)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:734)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387)
[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562)
[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651)
[apache-cassandra-2.1.10.jar:2.1.10]
INFO  [StorageServiceShutdownHook] 2015-10-23 03:07:34,271
Gossiper.java:1442 - Announcing shutdown
INFO  [GossipStage:1] 2015-10-23 03:07:34,282 OutboundTcpConnection.java:97
- OutboundTcpConnection using coalescing strategy DISABLED
ERROR [StorageServiceShutdownHook] 2015-10-23 03:07:34,305
CassandraDaemon.java:227 - Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
        at
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1624)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1632)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1686)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1510)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:1412)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.gms.Gossiper.addLocalApplicationStates(Gossiper.java:1427)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1417)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1443)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:678)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.1.10.jar:2.1.10]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
################################################################


Ideas?


Thanks and Regards,
Ajay



On Mon, Oct 12, 2015 at 3:46 PM, Carlos Alonso <in...@mrcalonso.com> wrote:

> Yes Ajay, in your particular scenario, after all hints are delivered, both
> CAS11 and CAS12 will have the exact same data.
>
> Cheers!
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 11 October 2015 at 05:21, Ajay Garg <aj...@gmail.com> wrote:
>
>> Thanks a ton Anuja for the help !!!
>>
>> On Fri, Oct 9, 2015 at 12:38 PM, anuja jain <an...@gmail.com> wrote:
>> > Hi Ajay,
>> >
>> >
>> > On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <aj...@gmail.com>
>> wrote:
>> >>
>> > In this case, it will be the responsibility of APP1 to start connection
>> to
>> > CAS12. On the other hand if your APP1 is connecting to cassandra using
>> Java
>> > driver, you can add multiple contact points(CAS11 and CAS12 here) so
>> that if
>> > CAS11 is down it will directly connect to CAS12.
>>
>> Great .. Java-driver it will be :)
>>
>>
>>
>>
>> >>
>> > In such a case, CAS12 will store hints for the data to be stored on
>> CAS11
>> > (the tokens of which lies within the range of tokens CAS11 holds)  and
>> > whenever CAS11 is up again, the hints will be transferred to it and the
>> data
>> > will be distributed evenly.
>> >
>>
>> Evenly?
>>
>> Should not the data be """EXACTLY""" equal after CAS11 comes back up
>> and the sync/transfer/whatever happens?
>> After all, before CAS11 went down, CAS11 and CAS12 were replicating all
>> data.
>>
>>
>> Once again, thanks for your help.
>> I will be even more grateful if you would help me clear the lingering
>> doubt to second point.
>>
>>
>> Thanks and Regards,
>> Ajay
>>
>
>


-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Carlos Alonso <in...@mrcalonso.com>.

Yes Ajay, in your particular scenario, after all hints are delivered, both
CAS11 and CAS12 will have the exact same data.

Cheers!

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 11 October 2015 at 05:21, Ajay Garg <aj...@gmail.com> wrote:

> Thanks a ton Anuja for the help !!!
>
> On Fri, Oct 9, 2015 at 12:38 PM, anuja jain <an...@gmail.com> wrote:
> > Hi Ajay,
> >
> >
> > On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <aj...@gmail.com>
> wrote:
> >>
> > In this case, it will be the responsibility of APP1 to start connection
> to
> > CAS12. On the other hand if your APP1 is connecting to cassandra using
> Java
> > driver, you can add multiple contact points(CAS11 and CAS12 here) so
> that if
> > CAS11 is down it will directly connect to CAS12.
>
> Great .. Java-driver it will be :)
>
>
>
>
> >>
> > In such a case, CAS12 will store hints for the data to be stored on CAS11
> > (the tokens of which lies within the range of tokens CAS11 holds)  and
> > whenever CAS11 is up again, the hints will be transferred to it and the
> data
> > will be distributed evenly.
> >
>
> Evenly?
>
> Should not the data be """EXACTLY""" equal after CAS11 comes back up
> and the sync/transfer/whatever happens?
> After all, before CAS11 went down, CAS11 and CAS12 were replicating all
> data.
>
>
> Once again, thanks for your help.
> I will be even more grateful if you would help me clear the lingering
> doubt to second point.
>
>
> Thanks and Regards,
> Ajay
>

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Thanks a ton Anuja for the help !!!

On Fri, Oct 9, 2015 at 12:38 PM, anuja jain <an...@gmail.com> wrote:
> Hi Ajay,
>
>
> On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <aj...@gmail.com> wrote:
>>
> In this case, it will be the responsibility of APP1 to start connection to
> CAS12. On the other hand if your APP1 is connecting to cassandra using Java
> driver, you can add multiple contact points(CAS11 and CAS12 here) so that if
> CAS11 is down it will directly connect to CAS12.

Great .. Java-driver it will be :)

>>
> In such a case, CAS12 will store hints for the data to be stored on CAS11
> (the tokens of which lies within the range of tokens CAS11 holds)  and
> whenever CAS11 is up again, the hints will be transferred to it and the data
> will be distributed evenly.
>

Evenly?

Should not the data be """EXACTLY""" equal after CAS11 comes back up
and the sync/transfer/whatever happens?
After all, before CAS11 went down, CAS11 and CAS12 were replicating all data.

Once again, thanks for your help.
I will be even more grateful if you would help me clear the lingering
doubt to second point.

Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by anuja jain <an...@gmail.com>.

Hi Ajay,


On Fri, Oct 9, 2015 at 9:00 AM, Ajay Garg <aj...@gmail.com> wrote:

> On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg <aj...@gmail.com> wrote:
> > Thanks Eric for the reply.
> >
> >
> > On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens <mi...@gmail.com> wrote:
> >> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
> >> ought to be able to increase RF to 3 before bootstrapping your new
> nodes,
> >> with no downtime and no loss of data (even temporary).  Effective RF is
> >> min-bounded by N, so temporarily having RF > N ought to behave as RF =
> N.
> >>
> >> If you're starting at N > RF and you want to increase RF, things get
> >> harrier
> >> if you can't afford temporary consistency issues.
> >>
> >
> > We are ok with temporary consistency issues.
> >
> > Also, I was going through the following articles
> >
> https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/
> >
> > and following doubts came up in my mind ::
> >
> >
> > a)
> > Let's say at site-1, Application-Server (APP1) uses the two
> > Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for
> all
> > its needs (of course, whatever happens on CAS11, the same is replicated
> to
> > CAS12 at Cassandra-level).
> >
> > Now, if CAS11 goes down, will it be the responsibility of APP1 to
> "detect"
> > this and pick up CAS12 for its needs?
> > Or some automatic Cassandra-magic will happen?
> >
>
In this case, it will be the responsibility of APP1 to start connection to
 CAS12. On the other hand if your APP1 is connecting to cassandra using
Java driver, you can add multiple contact points(CAS11 and CAS12 here) so
that if CAS11 is down it will directly connect to CAS12.

> b)
> > In the same above scenario, let's say before CAS11 goes down, the amount
> of
> > data in both CAS11 and CAS12 was "x".
> >
> > After CAS11 goes down, the data is being put in CAS12 only.
> > After some time, CAS11 comes back up.
> >
> > Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously,
> "y"
> >> "x").
> >
> > Now, will the additional ("y" - "x") data be automatically
> > put/replicated/whatever back in CAS11 through Cassandra?
> > Or it has to be done manually?
> >
>
> In such a case, CAS12 will store hints for the data to be stored on CAS11
(the tokens of which lies within the range of tokens CAS11 holds)  and
whenever CAS11 is up again, the hints will be transferred to it and the
data will be distributed evenly.


> >
> > If there are easy recommended solutions to above, I am beginning to think
> > that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
> > (allowing failures of entire site, or a few nodes on the same site).
> >
> > I am sorry for asking such newbie questions, and I will be grateful if
> these
> > silly questions could be answered by the experts :)
> >
> >
> > Thanks and Regards,
> > Ajay
>
>
>
> --
> Regards,
> Ajay
>

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

On Thu, Oct 8, 2015 at 9:47 AM, Ajay Garg <aj...@gmail.com> wrote:
> Thanks Eric for the reply.
>
>
> On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens <mi...@gmail.com> wrote:
>> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
>> ought to be able to increase RF to 3 before bootstrapping your new nodes,
>> with no downtime and no loss of data (even temporary).  Effective RF is
>> min-bounded by N, so temporarily having RF > N ought to behave as RF = N.
>>
>> If you're starting at N > RF and you want to increase RF, things get
>> harrier
>> if you can't afford temporary consistency issues.
>>
>
> We are ok with temporary consistency issues.
>
> Also, I was going through the following articles
> https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/
>
> and following doubts came up in my mind ::
>
>
> a)
> Let's say at site-1, Application-Server (APP1) uses the two
> Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for all
> its needs (of course, whatever happens on CAS11, the same is replicated to
> CAS12 at Cassandra-level).
>
> Now, if CAS11 goes down, will it be the responsibility of APP1 to "detect"
> this and pick up CAS12 for its needs?
> Or some automatic Cassandra-magic will happen?
>
>
> b)
> In the same above scenario, let's say before CAS11 goes down, the amount of
> data in both CAS11 and CAS12 was "x".
>
> After CAS11 goes down, the data is being put in CAS12 only.
> After some time, CAS11 comes back up.
>
> Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously, "y"
>> "x").
>
> Now, will the additional ("y" - "x") data be automatically
> put/replicated/whatever back in CAS11 through Cassandra?
> Or it has to be done manually?
>

Any pointers, please.... ???

>
> If there are easy recommended solutions to above, I am beginning to think
> that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
> (allowing failures of entire site, or a few nodes on the same site).
>
> I am sorry for asking such newbie questions, and I will be grateful if these
> silly questions could be answered by the experts :)
>
>
> Thanks and Regards,
> Ajay



-- 
Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Thanks Eric for the reply.

On Thu, Oct 8, 2015 at 1:44 AM, Eric Stevens <mi...@gmail.com> wrote:
> If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
> ought to be able to increase RF to 3 before bootstrapping your new nodes,
> with no downtime and no loss of data (even temporary).  Effective RF is
> min-bounded by N, so temporarily having RF > N ought to behave as RF = N.
>
> If you're starting at N > RF and you want to increase RF, things get
harrier
> if you can't afford temporary consistency issues.
>

We are ok with temporary consistency issues.

Also, I was going through the following articles
https://10kloc.wordpress.com/2012/12/27/cassandra-chapter-5-data-replication-strategies/

and following doubts came up in my mind ::

a)
Let's say at site-1, Application-Server (APP1) uses the two
Cassandra-instances (CAS11 and CAS12), and APP1 generally uses CAS11 for
all its needs (of course, whatever happens on CAS11, the same is replicated
to CAS12 at Cassandra-level).

Now, if CAS11 goes down, will it be the responsibility of APP1 to "detect"
this and pick up CAS12 for its needs?
Or some automatic Cassandra-magic will happen?

b)
In the same above scenario, let's say before CAS11 goes down, the amount of
data in both CAS11 and CAS12 was "x".

After CAS11 goes down, the data is being put in CAS12 only.
After some time, CAS11 comes back up.

Now, data in CAS11 is still "x", while data in CAS12 is "y" (obviously, "y"
> "x").

Now, will the additional ("y" - "x") data be automatically
put/replicated/whatever back in CAS11 through Cassandra?
Or it has to be done manually?

If there are easy recommended solutions to above, I am beginning to think
that a 2*2 (2 nodes each at 2 data-centres) will be the ideal setup
(allowing failures of entire site, or a few nodes on the same site).

I am sorry for asking such newbie questions, and I will be grateful if
these silly questions could be answered by the experts :)

Thanks and Regards,
Ajay

Re: Is replication possible with already existing data?

Posted by Eric Stevens <mi...@gmail.com>.

If you're at 1 node (N=1) and RF=1 now, and you want to go N=3 RF=3, you
ought to be able to increase RF to 3 before bootstrapping your new nodes,
with no downtime and no loss of data (even temporary).  Effective RF is
min-bounded by N, so temporarily having RF > N ought to behave as RF = N.

If you're starting at N > RF and you want to increase RF, things get
harrier if you can't afford temporary consistency issues.

On Wed, Oct 7, 2015 at 10:58 AM Ajay Garg <aj...@gmail.com> wrote:

> Hi Sean.
>
> Thanks for the reply.
>
> On Wed, Oct 7, 2015 at 10:13 PM,  <SE...@homedepot.com> wrote:
> > How many nodes are you planning to add?
>
> I guess 2 more.
>
> > How many replicas do you want?
>
> 1 (original) + 2 (replicas).
> That makes it a total of 3 copies of every row of data.
>
>
>
> > In general, there shouldn't be a problem adding nodes and then altering
> the keyspace to change replication.
>
> Great !!
> I guess
> http://docs.datastax.com/en/cql/3.0/cql/cql_reference/alter_keyspace_r.html
> will do the trick for changing schema-replication-details !!
>
>
> > You will want to run repairs to stream the data to the new replicas.
>
> Hmm.. we'll be really grateful if you could point us to a suitable
> link for the above step.
> If there is a nice-utility, we would be perfectly set up to start our
> fun-exercise, consisting of following steps ::
>
> a)
> (As advised by you) Changing the schema, to allow a replication_factor of
> 3.
>
> b)
> (As advised by you) Duplicating the already-existing-data on the other 2
> nodes.
>
> c)
> Thereafter, let Cassandra create a total of 3 copies for every row of
> new-incoming-data.
>
>
> Once again, thanks a ton for the help !!
>
>
> Thanks and Regards,
> Ajay
>
>
> > You shouldn't need downtime or data migration -- this is the beauty of
> > Cassandra.
>
>
>
>
> >
> >
> > Sean Durity – Lead Cassandra Admin
> >
>
> > ________________________________
> >
> > The information in this Internet Email is confidential and may be
> legally privileged. It is intended solely for the addressee. Access to this
> Email by anyone else is unauthorized. If you are not the intended
> recipient, any disclosure, copying, distribution or any action taken or
> omitted to be taken in reliance on it, is prohibited and may be unlawful.
> When addressed to our clients any opinions or advice contained in this
> Email are subject to the terms and conditions expressed in any applicable
> governing The Home Depot terms of business or client engagement letter. The
> Home Depot disclaims all responsibility and liability for the accuracy and
> content of this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
>
> --
> Regards,
> Ajay
>

Re: Is replication possible with already existing data?

Posted by Ajay Garg <aj...@gmail.com>.

Hi Sean.

Thanks for the reply.

On Wed, Oct 7, 2015 at 10:13 PM,  <SE...@homedepot.com> wrote:
> How many nodes are you planning to add?

I guess 2 more.

> How many replicas do you want?

1 (original) + 2 (replicas).
That makes it a total of 3 copies of every row of data.

> In general, there shouldn't be a problem adding nodes and then altering the keyspace to change replication.

Great !!
I guess http://docs.datastax.com/en/cql/3.0/cql/cql_reference/alter_keyspace_r.html
will do the trick for changing schema-replication-details !!

> You will want to run repairs to stream the data to the new replicas.

Hmm.. we'll be really grateful if you could point us to a suitable
link for the above step.
If there is a nice-utility, we would be perfectly set up to start our
fun-exercise, consisting of following steps ::

a)
(As advised by you) Changing the schema, to allow a replication_factor of 3.

b)
(As advised by you) Duplicating the already-existing-data on the other 2 nodes.

c)
Thereafter, let Cassandra create a total of 3 copies for every row of
new-incoming-data.

Once again, thanks a ton for the help !!

Thanks and Regards,
Ajay

> You shouldn't need downtime or data migration -- this is the beauty of
> Cassandra.

>
>
> Sean Durity – Lead Cassandra Admin
>

> ________________________________
>
> The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

-- 
Regards,
Ajay

RE: Is replication possible with already existing data?

Posted by SE...@homedepot.com.

How many nodes are you planning to add? How many replicas do you want? In general, there shouldn't be a problem adding nodes and then altering the keyspace to change replication. You will want to run repairs to stream the data to the new replicas. You shouldn't need downtime or data migration -- this is the beauty of Cassandra.

Sean Durity – Lead Cassandra Admin

-----Original Message-----
From: Ajay Garg [mailto:ajaygargnsit@gmail.com]
Sent: Wednesday, October 07, 2015 11:56 AM
To: user@cassandra.apache.org
Subject: Is replication possible with already existing data?

Hi All.

We have a scenario, where till now we had been using a plain, simple single node, with the keyspace created using ::

CREATE KEYSPACE our_db WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'}  AND durable_writes = true;

We now plan to introduce replication (in the true sense) in our scheme of things, but cannot afford to lose any data.
We, however can take a bit of downtime, and do any data-migration if required (we have already done data-migration once in the past, when we moved our plain, simple single node from one physical machine to another).

So,

a)
Is it possible at all to introduce replication in our scenario?
If yes, what needs to be done to NOT LOSE our current existing data?

b)
Also, will "NetworkTopologyStrategy" work in our scenario (since NetworkTopologyStrategy seems to be more robust)?

Brief pointers to above will give huge confidence-boosts in our endeavours.

Thanks and Regards,
Ajay

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.