You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Luke Jolly <lu...@getadmiral.com> on 2016/05/23 19:31:18 UTC

Increasing replication factor and repair doesn't seem to work

I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns" for
the node switched to 100% as it should but the Load showed that it didn't
actually sync the data.  I then ran a full 'nodetool repair' and it didn't
fix it still.  This scares me as I thought 'nodetool repair' was a way to
assure consistency and that all the nodes were synced but it doesn't seem
to be.  Outside of that command, I have no idea how I would assure all the
data was synced or how to get the data correctly synced without
decommissioning the node and re-adding it.

Re: Increasing replication factor and repair doesn't seem to work

Posted by Luke Jolly <lu...@getadmiral.com>.

After thinking about it more, I have no idea how that worked at all.  I
must have not cleared out the working directory or something....
Regardless, I did something weird  with my initial joining of the cluster
and then wasn't using repair -full.  Thank y'all very much for the info.

On Wed, May 25, 2016 at 3:11 PM Luke Jolly <lu...@getadmiral.com> wrote:

> So I figured out the main cause of the problem.  The seed node was
> itself.  That's what got it in a weird state.  The second part was that I
> didn't know the default repair is incremental as I was accidently looking
> at the wrong version documentation.  After running a repair -full, the 3
> other nodes are synced correctly it seems as they have identical loads.
> Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others
> have 6 GB).  Since I now know I started it off in a very weird state, I'm
> going to just decommission it and add it back in from scratch.  When I
> added it, all working folders were cleared.
>
> I feel Cassandra should through an error if the seed node is set to itself
> and fail to bootstrap / join?
>
>
> On Wed, May 25, 2016 at 2:37 AM Mike Yeap <wk...@gmail.com> wrote:
>
>> Hi Luke, I've encountered similar problem before, could you please advise
>> on following?
>>
>> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml?
>>
>> 2) when you add 10.128.0.20, were the data and cache directories in
>> 10.128.0.20 empty?
>>
>>    - /var/lib/cassandra/data
>>    - /var/lib/cassandra/saved_caches
>>
>> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load"
>> column in "nodetool status <keyspace_name>"?
>>
>> 4) when you do the full repair, did you use "nodetool repair" or
>> "nodetool repair -full"? I'm asking this because Incremental Repair is the
>> default for Cassandra 2.2 and later.
>>
>>
>> Regards,
>> Mike Yeap
>>
>> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Hi Luke,
>>>
>>> I've never found nodetool status' load to be useful beyond a general
>>> indicator.
>>>
>>> You should expect some small skew, as this will depend on your current
>>> compaction status, tombstones, etc. IIRC repair will not provide
>>> consistency of intermediate states nor will it remove tombstones, it only
>>> guarantees consistency in the final state. This means, in the case of
>>> dropped hints or mutations, you will see differences in intermediate
>>> states, and therefore storage footrpint, even in fully repaired nodes. This
>>> includes intermediate UPDATE operations as well.
>>>
>>> Your one node with sub 1GB sticks out like a sore thumb, though. Where
>>> did you originate the nodetool repair from? Remember that repair will only
>>> ensure consistency for ranges held by the node you're running it on. While
>>> I am not sure if missing ranges are included in this, if you ran nodetool
>>> repair only on a machine with partial ownership, you will need to complete
>>> repairs across the ring before data will return to full consistency.
>>>
>>> I would query some older data using consistency = ONE on the affected
>>> machine to determine if you are actually missing data.  There are a few
>>> outstanding bugs in the 2.1.x  and older release families that may result
>>> in tombstone creation even without deletes, for example CASSANDRA-10547,
>>> which impacts updates on collections in pre-2.1.13 Cassandra.
>>>
>>> You can also try examining the output of nodetool ring, which will give
>>> you a breakdown of tokens and their associations within your cluster.
>>>
>>> --Bryan
>>>
>>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <ku...@instaclustr.com>
>>> wrote:
>>>
>>>> Not necessarily considering RF is 2 so both nodes should have all
>>>> partitions. Luke, are you sure the repair is succeeding? You don't have
>>>> other keyspaces/duplicate data/extra data in your cassandra data directory?
>>>> Also, you could try querying on the node with less data to confirm if
>>>> it has the same dataset.
>>>>
>>>> On 24 May 2016 at 22:03, Bhuvan Rawal <bh...@gmail.com> wrote:
>>>>
>>>>> For the other DC, it can be acceptable because partition reside on one
>>>>> node, so say  if you have a large partition, it may skew things a bit.
>>>>> On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:
>>>>>
>>>>>> So I guess the problem may have been with the initial addition of the
>>>>>> 10.128.0.20 node because when I added it in it never synced data I
>>>>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>>>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>>>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>>>>> that has been written since it came up.  We never delete data ever so we
>>>>>> should have zero tombstones.
>>>>>>
>>>>>> If I am not mistaken, only two of my nodes actually have all the
>>>>>> data, 10.128.0.3 and 10.142.0.14 since they agree on the data amount.
>>>>>> 10.142.0.13 is almost a GB lower and then of course 10.128.0.20
>>>>>> which is missing over 5 GB of data.  I tried running nodetool -local on
>>>>>> both DCs and it didn't fix either one.
>>>>>>
>>>>>> Am I running into a bug of some kind?
>>>>>>
>>>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Luke,
>>>>>>>
>>>>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>>>>> earlier?
>>>>>>>
>>>>>>> You can run nodetool repair with option -local to initiate repair
>>>>>>> local datacenter for gce-us-central1.
>>>>>>>
>>>>>>> Also you may suspect that if a lot of data was deleted while the
>>>>>>> node was down it may be having a lot of tombstones which is not needed to
>>>>>>> be replicated to the other node. In order to verify the same, you can issue
>>>>>>> a select count(*) query on column families (With the amount of data you
>>>>>>> have it should not be an issue) with tracing on and with consistency
>>>>>>> local_all by connecting to either 10.128.0.3  or 10.128.0.20 and
>>>>>>> store it in a file. It will give you a fair amount of idea about how many
>>>>>>> deleted cells the nodes have. I tried searching for reference if tombstones
>>>>>>> are moved around during repair, but I didnt find evidence of it. However I
>>>>>>> see no reason to because if the node didnt have data then streaming
>>>>>>> tombstones does not make a lot of sense.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Bhuvan
>>>>>>>
>>>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here's my setup:
>>>>>>>>
>>>>>>>> Datacenter: gce-us-central1
>>>>>>>> ===========================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>>                               Rack
>>>>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>>>>> Datacenter: gce-us-east1
>>>>>>>> ========================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>>                               Rack
>>>>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>>>>
>>>>>>>> And my replication settings are:
>>>>>>>>
>>>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>>>>
>>>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a
>>>>>>>> load of 943 MB even though it's supposed to own 100% and should have 6.4
>>>>>>>> GB.  Also 10.142.0.13 seems also not to have everything as it only
>>>>>>>> has a load of 5.55 GB.
>>>>>>>>
>>>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <kurt@instaclustr.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1
>>>>>>>>> node in each DC then a RF of 2 doesn't make sense. Can you clarify on what
>>>>>>>>> your set up is?
>>>>>>>>>
>>>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>>>>>>
>>>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed that
>>>>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool repair' and
>>>>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool repair' was
>>>>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I would
>>>>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Kurt Greaves
>>>>>>>>> kurt@instaclustr.com
>>>>>>>>> www.instaclustr.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>> Kurt Greaves
>>>> kurt@instaclustr.com
>>>> www.instaclustr.com
>>>>
>>>
>>>
>>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Luke Jolly <lu...@getadmiral.com>.

So I figured out the main cause of the problem.  The seed node was itself.
That's what got it in a weird state.  The second part was that I didn't
know the default repair is incremental as I was accidently looking at the
wrong version documentation.  After running a repair -full, the 3 other
nodes are synced correctly it seems as they have identical loads.
Strangely, now the problem 10.128.0.20 node has 10 GB of load (the others
have 6 GB).  Since I now know I started it off in a very weird state, I'm
going to just decommission it and add it back in from scratch.  When I
added it, all working folders were cleared.

I feel Cassandra should through an error if the seed node is set to itself
and fail to bootstrap / join?

On Wed, May 25, 2016 at 2:37 AM Mike Yeap <wk...@gmail.com> wrote:

> Hi Luke, I've encountered similar problem before, could you please advise
> on following?
>
> 1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml?
>
> 2) when you add 10.128.0.20, were the data and cache directories in
> 10.128.0.20 empty?
>
>    - /var/lib/cassandra/data
>    - /var/lib/cassandra/saved_caches
>
> 3) if you do a compact in 10.128.0.3, what is the size shown in "Load"
> column in "nodetool status <keyspace_name>"?
>
> 4) when you do the full repair, did you use "nodetool repair" or "nodetool
> repair -full"? I'm asking this because Incremental Repair is the default
> for Cassandra 2.2 and later.
>
>
> Regards,
> Mike Yeap
>
> On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com>
> wrote:
>
>> Hi Luke,
>>
>> I've never found nodetool status' load to be useful beyond a general
>> indicator.
>>
>> You should expect some small skew, as this will depend on your current
>> compaction status, tombstones, etc. IIRC repair will not provide
>> consistency of intermediate states nor will it remove tombstones, it only
>> guarantees consistency in the final state. This means, in the case of
>> dropped hints or mutations, you will see differences in intermediate
>> states, and therefore storage footrpint, even in fully repaired nodes. This
>> includes intermediate UPDATE operations as well.
>>
>> Your one node with sub 1GB sticks out like a sore thumb, though. Where
>> did you originate the nodetool repair from? Remember that repair will only
>> ensure consistency for ranges held by the node you're running it on. While
>> I am not sure if missing ranges are included in this, if you ran nodetool
>> repair only on a machine with partial ownership, you will need to complete
>> repairs across the ring before data will return to full consistency.
>>
>> I would query some older data using consistency = ONE on the affected
>> machine to determine if you are actually missing data.  There are a few
>> outstanding bugs in the 2.1.x  and older release families that may result
>> in tombstone creation even without deletes, for example CASSANDRA-10547,
>> which impacts updates on collections in pre-2.1.13 Cassandra.
>>
>> You can also try examining the output of nodetool ring, which will give
>> you a breakdown of tokens and their associations within your cluster.
>>
>> --Bryan
>>
>> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> Not necessarily considering RF is 2 so both nodes should have all
>>> partitions. Luke, are you sure the repair is succeeding? You don't have
>>> other keyspaces/duplicate data/extra data in your cassandra data directory?
>>> Also, you could try querying on the node with less data to confirm if it
>>> has the same dataset.
>>>
>>> On 24 May 2016 at 22:03, Bhuvan Rawal <bh...@gmail.com> wrote:
>>>
>>>> For the other DC, it can be acceptable because partition reside on one
>>>> node, so say  if you have a large partition, it may skew things a bit.
>>>> On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:
>>>>
>>>>> So I guess the problem may have been with the initial addition of the
>>>>> 10.128.0.20 node because when I added it in it never synced data I
>>>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>>>> that has been written since it came up.  We never delete data ever so we
>>>>> should have zero tombstones.
>>>>>
>>>>> If I am not mistaken, only two of my nodes actually have all the data,
>>>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>>>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>>>> didn't fix either one.
>>>>>
>>>>> Am I running into a bug of some kind?
>>>>>
>>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Luke,
>>>>>>
>>>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>>>> earlier?
>>>>>>
>>>>>> You can run nodetool repair with option -local to initiate repair
>>>>>> local datacenter for gce-us-central1.
>>>>>>
>>>>>> Also you may suspect that if a lot of data was deleted while the node
>>>>>> was down it may be having a lot of tombstones which is not needed to be
>>>>>> replicated to the other node. In order to verify the same, you can issue a
>>>>>> select count(*) query on column families (With the amount of data you have
>>>>>> it should not be an issue) with tracing on and with consistency local_all
>>>>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>>>>> file. It will give you a fair amount of idea about how many deleted cells
>>>>>> the nodes have. I tried searching for reference if tombstones are moved
>>>>>> around during repair, but I didnt find evidence of it. However I see no
>>>>>> reason to because if the node didnt have data then streaming tombstones
>>>>>> does not make a lot of sense.
>>>>>>
>>>>>> Regards,
>>>>>> Bhuvan
>>>>>>
>>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here's my setup:
>>>>>>>
>>>>>>> Datacenter: gce-us-central1
>>>>>>> ===========================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>                             Rack
>>>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>>>> Datacenter: gce-us-east1
>>>>>>> ========================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>>                             Rack
>>>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>>>
>>>>>>> And my replication settings are:
>>>>>>>
>>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>>>
>>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a
>>>>>>> load of 943 MB even though it's supposed to own 100% and should have 6.4
>>>>>>> GB.  Also 10.142.0.13 seems also not to have everything as it only
>>>>>>> has a load of 5.55 GB.
>>>>>>>
>>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1
>>>>>>>> node in each DC then a RF of 2 doesn't make sense. Can you clarify on what
>>>>>>>> your set up is?
>>>>>>>>
>>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>>>>>
>>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed that
>>>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool repair' and
>>>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool repair' was
>>>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I would
>>>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Kurt Greaves
>>>>>>>> kurt@instaclustr.com
>>>>>>>> www.instaclustr.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> kurt@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>>
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Mike Yeap <wk...@gmail.com>.

Hi Luke, I've encountered similar problem before, could you please advise
on following?

1) when you add 10.128.0.20, what are the seeds defined in cassandra.yaml?

2) when you add 10.128.0.20, were the data and cache directories in
10.128.0.20 empty?

   - /var/lib/cassandra/data
   - /var/lib/cassandra/saved_caches

3) if you do a compact in 10.128.0.3, what is the size shown in "Load"
column in "nodetool status <keyspace_name>"?

4) when you do the full repair, did you use "nodetool repair" or "nodetool
repair -full"? I'm asking this because Incremental Repair is the default
for Cassandra 2.2 and later.


Regards,
Mike Yeap

On Wed, May 25, 2016 at 8:01 AM, Bryan Cheng <br...@blockcypher.com> wrote:

> Hi Luke,
>
> I've never found nodetool status' load to be useful beyond a general
> indicator.
>
> You should expect some small skew, as this will depend on your current
> compaction status, tombstones, etc. IIRC repair will not provide
> consistency of intermediate states nor will it remove tombstones, it only
> guarantees consistency in the final state. This means, in the case of
> dropped hints or mutations, you will see differences in intermediate
> states, and therefore storage footrpint, even in fully repaired nodes. This
> includes intermediate UPDATE operations as well.
>
> Your one node with sub 1GB sticks out like a sore thumb, though. Where did
> you originate the nodetool repair from? Remember that repair will only
> ensure consistency for ranges held by the node you're running it on. While
> I am not sure if missing ranges are included in this, if you ran nodetool
> repair only on a machine with partial ownership, you will need to complete
> repairs across the ring before data will return to full consistency.
>
> I would query some older data using consistency = ONE on the affected
> machine to determine if you are actually missing data.  There are a few
> outstanding bugs in the 2.1.x  and older release families that may result
> in tombstone creation even without deletes, for example CASSANDRA-10547,
> which impacts updates on collections in pre-2.1.13 Cassandra.
>
> You can also try examining the output of nodetool ring, which will give
> you a breakdown of tokens and their associations within your cluster.
>
> --Bryan
>
> On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <ku...@instaclustr.com>
> wrote:
>
>> Not necessarily considering RF is 2 so both nodes should have all
>> partitions. Luke, are you sure the repair is succeeding? You don't have
>> other keyspaces/duplicate data/extra data in your cassandra data directory?
>> Also, you could try querying on the node with less data to confirm if it
>> has the same dataset.
>>
>> On 24 May 2016 at 22:03, Bhuvan Rawal <bh...@gmail.com> wrote:
>>
>>> For the other DC, it can be acceptable because partition reside on one
>>> node, so say  if you have a large partition, it may skew things a bit.
>>> On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:
>>>
>>>> So I guess the problem may have been with the initial addition of the
>>>> 10.128.0.20 node because when I added it in it never synced data I
>>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>>> that has been written since it came up.  We never delete data ever so we
>>>> should have zero tombstones.
>>>>
>>>> If I am not mistaken, only two of my nodes actually have all the data,
>>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>>> didn't fix either one.
>>>>
>>>> Am I running into a bug of some kind?
>>>>
>>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Luke,
>>>>>
>>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>>> earlier?
>>>>>
>>>>> You can run nodetool repair with option -local to initiate repair
>>>>> local datacenter for gce-us-central1.
>>>>>
>>>>> Also you may suspect that if a lot of data was deleted while the node
>>>>> was down it may be having a lot of tombstones which is not needed to be
>>>>> replicated to the other node. In order to verify the same, you can issue a
>>>>> select count(*) query on column families (With the amount of data you have
>>>>> it should not be an issue) with tracing on and with consistency local_all
>>>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>>>> file. It will give you a fair amount of idea about how many deleted cells
>>>>> the nodes have. I tried searching for reference if tombstones are moved
>>>>> around during repair, but I didnt find evidence of it. However I see no
>>>>> reason to because if the node didnt have data then streaming tombstones
>>>>> does not make a lot of sense.
>>>>>
>>>>> Regards,
>>>>> Bhuvan
>>>>>
>>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com>
>>>>> wrote:
>>>>>
>>>>>> Here's my setup:
>>>>>>
>>>>>> Datacenter: gce-us-central1
>>>>>> ===========================
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>                             Rack
>>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>>> Datacenter: gce-us-east1
>>>>>> ========================
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>>                             Rack
>>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>>
>>>>>> And my replication settings are:
>>>>>>
>>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>>
>>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
>>>>>> of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
>>>>>> Also 10.142.0.13 seems also not to have everything as it only has a
>>>>>> load of 5.55 GB.
>>>>>>
>>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node
>>>>>>> in each DC then a RF of 2 doesn't make sense. Can you clarify on what your
>>>>>>> set up is?
>>>>>>>
>>>>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>>>>
>>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed that
>>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool repair' and
>>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool repair' was
>>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I would
>>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Kurt Greaves
>>>>>>> kurt@instaclustr.com
>>>>>>> www.instaclustr.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>
>>
>> --
>> Kurt Greaves
>> kurt@instaclustr.com
>> www.instaclustr.com
>>
>
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Bryan Cheng <br...@blockcypher.com>.

Hi Luke,

I've never found nodetool status' load to be useful beyond a general
indicator.

You should expect some small skew, as this will depend on your current
compaction status, tombstones, etc. IIRC repair will not provide
consistency of intermediate states nor will it remove tombstones, it only
guarantees consistency in the final state. This means, in the case of
dropped hints or mutations, you will see differences in intermediate
states, and therefore storage footrpint, even in fully repaired nodes. This
includes intermediate UPDATE operations as well.

Your one node with sub 1GB sticks out like a sore thumb, though. Where did
you originate the nodetool repair from? Remember that repair will only
ensure consistency for ranges held by the node you're running it on. While
I am not sure if missing ranges are included in this, if you ran nodetool
repair only on a machine with partial ownership, you will need to complete
repairs across the ring before data will return to full consistency.

I would query some older data using consistency = ONE on the affected
machine to determine if you are actually missing data.  There are a few
outstanding bugs in the 2.1.x  and older release families that may result
in tombstone creation even without deletes, for example CASSANDRA-10547,
which impacts updates on collections in pre-2.1.13 Cassandra.

You can also try examining the output of nodetool ring, which will give you
a breakdown of tokens and their associations within your cluster.

--Bryan

On Tue, May 24, 2016 at 3:49 PM, kurt Greaves <ku...@instaclustr.com> wrote:

> Not necessarily considering RF is 2 so both nodes should have all
> partitions. Luke, are you sure the repair is succeeding? You don't have
> other keyspaces/duplicate data/extra data in your cassandra data directory?
> Also, you could try querying on the node with less data to confirm if it
> has the same dataset.
>
> On 24 May 2016 at 22:03, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> For the other DC, it can be acceptable because partition reside on one
>> node, so say  if you have a large partition, it may skew things a bit.
>> On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:
>>
>>> So I guess the problem may have been with the initial addition of the
>>> 10.128.0.20 node because when I added it in it never synced data I
>>> guess?  It was at around 50 MB when it first came up and transitioned to
>>> "UN". After it was in I did the 1->2 replication change and tried repair
>>> but it didn't fix it.  From what I can tell all the data on it is stuff
>>> that has been written since it came up.  We never delete data ever so we
>>> should have zero tombstones.
>>>
>>> If I am not mistaken, only two of my nodes actually have all the data,
>>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>>> is almost a GB lower and then of course 10.128.0.20 which is missing
>>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>>> didn't fix either one.
>>>
>>> Am I running into a bug of some kind?
>>>
>>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Luke,
>>>>
>>>> You mentioned that replication factor was increased from 1 to 2. In
>>>> that case was the node bearing ip 10.128.0.20 carried around 3GB data
>>>> earlier?
>>>>
>>>> You can run nodetool repair with option -local to initiate repair local
>>>> datacenter for gce-us-central1.
>>>>
>>>> Also you may suspect that if a lot of data was deleted while the node
>>>> was down it may be having a lot of tombstones which is not needed to be
>>>> replicated to the other node. In order to verify the same, you can issue a
>>>> select count(*) query on column families (With the amount of data you have
>>>> it should not be an issue) with tracing on and with consistency local_all
>>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>>> file. It will give you a fair amount of idea about how many deleted cells
>>>> the nodes have. I tried searching for reference if tombstones are moved
>>>> around during repair, but I didnt find evidence of it. However I see no
>>>> reason to because if the node didnt have data then streaming tombstones
>>>> does not make a lot of sense.
>>>>
>>>> Regards,
>>>> Bhuvan
>>>>
>>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com>
>>>> wrote:
>>>>
>>>>> Here's my setup:
>>>>>
>>>>> Datacenter: gce-us-central1
>>>>> ===========================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>                           Rack
>>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>>> Datacenter: gce-us-east1
>>>>> ========================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>>                           Rack
>>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>>
>>>>> And my replication settings are:
>>>>>
>>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>>
>>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
>>>>> of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
>>>>> Also 10.142.0.13 seems also not to have everything as it only has a
>>>>> load of 5.55 GB.
>>>>>
>>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node
>>>>>> in each DC then a RF of 2 doesn't make sense. Can you clarify on what your
>>>>>> set up is?
>>>>>>
>>>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>>>
>>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The
>>>>>>> "Owns" for the node switched to 100% as it should but the Load showed that
>>>>>>> it didn't actually sync the data.  I then ran a full 'nodetool repair' and
>>>>>>> it didn't fix it still.  This scares me as I thought 'nodetool repair' was
>>>>>>> a way to assure consistency and that all the nodes were synced but it
>>>>>>> doesn't seem to be.  Outside of that command, I have no idea how I would
>>>>>>> assure all the data was synced or how to get the data correctly synced
>>>>>>> without decommissioning the node and re-adding it.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Kurt Greaves
>>>>>> kurt@instaclustr.com
>>>>>> www.instaclustr.com
>>>>>>
>>>>>
>>>>>
>>>>
>
>
> --
> Kurt Greaves
> kurt@instaclustr.com
> www.instaclustr.com
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by kurt Greaves <ku...@instaclustr.com>.

Not necessarily considering RF is 2 so both nodes should have all
partitions. Luke, are you sure the repair is succeeding? You don't have
other keyspaces/duplicate data/extra data in your cassandra data directory?
Also, you could try querying on the node with less data to confirm if it
has the same dataset.

On 24 May 2016 at 22:03, Bhuvan Rawal <bh...@gmail.com> wrote:

> For the other DC, it can be acceptable because partition reside on one
> node, so say  if you have a large partition, it may skew things a bit.
> On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:
>
>> So I guess the problem may have been with the initial addition of the
>> 10.128.0.20 node because when I added it in it never synced data I
>> guess?  It was at around 50 MB when it first came up and transitioned to
>> "UN". After it was in I did the 1->2 replication change and tried repair
>> but it didn't fix it.  From what I can tell all the data on it is stuff
>> that has been written since it came up.  We never delete data ever so we
>> should have zero tombstones.
>>
>> If I am not mistaken, only two of my nodes actually have all the data,
>> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
>> is almost a GB lower and then of course 10.128.0.20 which is missing
>> over 5 GB of data.  I tried running nodetool -local on both DCs and it
>> didn't fix either one.
>>
>> Am I running into a bug of some kind?
>>
>> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com> wrote:
>>
>>> Hi Luke,
>>>
>>> You mentioned that replication factor was increased from 1 to 2. In that
>>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>>
>>> You can run nodetool repair with option -local to initiate repair local
>>> datacenter for gce-us-central1.
>>>
>>> Also you may suspect that if a lot of data was deleted while the node
>>> was down it may be having a lot of tombstones which is not needed to be
>>> replicated to the other node. In order to verify the same, you can issue a
>>> select count(*) query on column families (With the amount of data you have
>>> it should not be an issue) with tracing on and with consistency local_all
>>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>>> file. It will give you a fair amount of idea about how many deleted cells
>>> the nodes have. I tried searching for reference if tombstones are moved
>>> around during repair, but I didnt find evidence of it. However I see no
>>> reason to because if the node didnt have data then streaming tombstones
>>> does not make a lot of sense.
>>>
>>> Regards,
>>> Bhuvan
>>>
>>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com>
>>> wrote:
>>>
>>>> Here's my setup:
>>>>
>>>> Datacenter: gce-us-central1
>>>> ===========================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>                           Rack
>>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>>> Datacenter: gce-us-east1
>>>> ========================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>>                           Rack
>>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>>
>>>> And my replication settings are:
>>>>
>>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>>
>>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load
>>>> of 943 MB even though it's supposed to own 100% and should have 6.4 GB.
>>>> Also 10.142.0.13 seems also not to have everything as it only has a
>>>> load of 5.55 GB.
>>>>
>>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>>>> wrote:
>>>>
>>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node
>>>>> in each DC then a RF of 2 doesn't make sense. Can you clarify on what your
>>>>> set up is?
>>>>>
>>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>>
>>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>>>>> for the node switched to 100% as it should but the Load showed that it
>>>>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>>>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>>>>> way to assure consistency and that all the nodes were synced but it doesn't
>>>>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>>>>> the data was synced or how to get the data correctly synced without
>>>>>> decommissioning the node and re-adding it.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Kurt Greaves
>>>>> kurt@instaclustr.com
>>>>> www.instaclustr.com
>>>>>
>>>>
>>>>
>>>


-- 
Kurt Greaves
kurt@instaclustr.com
www.instaclustr.com

Re: Increasing replication factor and repair doesn't seem to work

Posted by Bhuvan Rawal <bh...@gmail.com>.

For the other DC, it can be acceptable because partition reside on one
node, so say  if you have a large partition, it may skew things a bit.
On May 25, 2016 2:41 AM, "Luke Jolly" <lu...@getadmiral.com> wrote:

> So I guess the problem may have been with the initial addition of the
> 10.128.0.20 node because when I added it in it never synced data I
> guess?  It was at around 50 MB when it first came up and transitioned to
> "UN". After it was in I did the 1->2 replication change and tried repair
> but it didn't fix it.  From what I can tell all the data on it is stuff
> that has been written since it came up.  We never delete data ever so we
> should have zero tombstones.
>
> If I am not mistaken, only two of my nodes actually have all the data,
> 10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
> is almost a GB lower and then of course 10.128.0.20 which is missing over
> 5 GB of data.  I tried running nodetool -local on both DCs and it didn't
> fix either one.
>
> Am I running into a bug of some kind?
>
> On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Hi Luke,
>>
>> You mentioned that replication factor was increased from 1 to 2. In that
>> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>>
>> You can run nodetool repair with option -local to initiate repair local
>> datacenter for gce-us-central1.
>>
>> Also you may suspect that if a lot of data was deleted while the node was
>> down it may be having a lot of tombstones which is not needed to be
>> replicated to the other node. In order to verify the same, you can issue a
>> select count(*) query on column families (With the amount of data you have
>> it should not be an issue) with tracing on and with consistency local_all
>> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
>> file. It will give you a fair amount of idea about how many deleted cells
>> the nodes have. I tried searching for reference if tombstones are moved
>> around during repair, but I didnt find evidence of it. However I see no
>> reason to because if the node didnt have data then streaming tombstones
>> does not make a lot of sense.
>>
>> Regards,
>> Bhuvan
>>
>> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com> wrote:
>>
>>> Here's my setup:
>>>
>>> Datacenter: gce-us-central1
>>> ===========================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>                         Rack
>>> UN  10.128.0.3   6.4 GB     256          100.0%
>>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>>> UN  10.128.0.20  943.08 MB  256          100.0%
>>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>>> Datacenter: gce-us-east1
>>> ========================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>>                         Rack
>>> UN  10.142.0.14  6.4 GB     256          100.0%
>>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>>> UN  10.142.0.13  5.55 GB    256          100.0%
>>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>>
>>> And my replication settings are:
>>>
>>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>>
>>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
>>> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 10.142.0.13
>>> seems also not to have everything as it only has a load of 5.55 GB.
>>>
>>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>>> wrote:
>>>
>>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>>>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>>>> up is?
>>>>
>>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>>
>>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>>>> for the node switched to 100% as it should but the Load showed that it
>>>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>>>> way to assure consistency and that all the nodes were synced but it doesn't
>>>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>>>> the data was synced or how to get the data correctly synced without
>>>>> decommissioning the node and re-adding it.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Kurt Greaves
>>>> kurt@instaclustr.com
>>>> www.instaclustr.com
>>>>
>>>
>>>
>>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Luke Jolly <lu...@getadmiral.com>.

So I guess the problem may have been with the initial addition of the
10.128.0.20 node because when I added it in it never synced data I guess?
It was at around 50 MB when it first came up and transitioned to "UN".
After it was in I did the 1->2 replication change and tried repair but it
didn't fix it.  From what I can tell all the data on it is stuff that has
been written since it came up.  We never delete data ever so we should have
zero tombstones.

If I am not mistaken, only two of my nodes actually have all the data,
10.128.0.3 and 10.142.0.14 since they agree on the data amount. 10.142.0.13
is almost a GB lower and then of course 10.128.0.20 which is missing over 5
GB of data.  I tried running nodetool -local on both DCs and it didn't fix
either one.

Am I running into a bug of some kind?

On Tue, May 24, 2016 at 4:06 PM Bhuvan Rawal <bh...@gmail.com> wrote:

> Hi Luke,
>
> You mentioned that replication factor was increased from 1 to 2. In that
> case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?
>
> You can run nodetool repair with option -local to initiate repair local
> datacenter for gce-us-central1.
>
> Also you may suspect that if a lot of data was deleted while the node was
> down it may be having a lot of tombstones which is not needed to be
> replicated to the other node. In order to verify the same, you can issue a
> select count(*) query on column families (With the amount of data you have
> it should not be an issue) with tracing on and with consistency local_all
> by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a
> file. It will give you a fair amount of idea about how many deleted cells
> the nodes have. I tried searching for reference if tombstones are moved
> around during repair, but I didnt find evidence of it. However I see no
> reason to because if the node didnt have data then streaming tombstones
> does not make a lot of sense.
>
> Regards,
> Bhuvan
>
> On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com> wrote:
>
>> Here's my setup:
>>
>> Datacenter: gce-us-central1
>> ===========================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>                         Rack
>> UN  10.128.0.3   6.4 GB     256          100.0%
>>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
>> UN  10.128.0.20  943.08 MB  256          100.0%
>>  958348cb-8205-4630-8b96-0951bf33f3d3  default
>> Datacenter: gce-us-east1
>> ========================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens       Owns (effective)  Host ID
>>                         Rack
>> UN  10.142.0.14  6.4 GB     256          100.0%
>>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
>> UN  10.142.0.13  5.55 GB    256          100.0%
>>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>>
>> And my replication settings are:
>>
>> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
>> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>>
>> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
>> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 10.142.0.13
>> seems also not to have everything as it only has a load of 5.55 GB.
>>
>> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>>> up is?
>>>
>>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>>
>>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>>> for the node switched to 100% as it should but the Load showed that it
>>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>>> way to assure consistency and that all the nodes were synced but it doesn't
>>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>>> the data was synced or how to get the data correctly synced without
>>>> decommissioning the node and re-adding it.
>>>>
>>>
>>>
>>>
>>> --
>>> Kurt Greaves
>>> kurt@instaclustr.com
>>> www.instaclustr.com
>>>
>>
>>
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Bhuvan Rawal <bh...@gmail.com>.

Hi Luke,

You mentioned that replication factor was increased from 1 to 2. In that
case was the node bearing ip 10.128.0.20 carried around 3GB data earlier?

You can run nodetool repair with option -local to initiate repair local
datacenter for gce-us-central1.

Also you may suspect that if a lot of data was deleted while the node was
down it may be having a lot of tombstones which is not needed to be
replicated to the other node. In order to verify the same, you can issue a
select count(*) query on column families (With the amount of data you have
it should not be an issue) with tracing on and with consistency local_all
by connecting to either 10.128.0.3  or 10.128.0.20 and store it in a file.
It will give you a fair amount of idea about how many deleted cells the
nodes have. I tried searching for reference if tombstones are moved around
during repair, but I didnt find evidence of it. However I see no reason to
because if the node didnt have data then streaming tombstones does not make
a lot of sense.

Regards,
Bhuvan

On Tue, May 24, 2016 at 11:06 PM, Luke Jolly <lu...@getadmiral.com> wrote:

> Here's my setup:
>
> Datacenter: gce-us-central1
> ===========================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns (effective)  Host ID
>                       Rack
> UN  10.128.0.3   6.4 GB     256          100.0%
>  3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
> UN  10.128.0.20  943.08 MB  256          100.0%
>  958348cb-8205-4630-8b96-0951bf33f3d3  default
> Datacenter: gce-us-east1
> ========================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns (effective)  Host ID
>                       Rack
> UN  10.142.0.14  6.4 GB     256          100.0%
>  c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
> UN  10.142.0.13  5.55 GB    256          100.0%
>  d0d9c30e-1506-4b95-be64-3dd4d78f0583  default
>
> And my replication settings are:
>
> {'class': 'NetworkTopologyStrategy', 'aws-us-west': '2',
> 'gce-us-central1': '2', 'gce-us-east1': '2'}
>
> As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of
> 943 MB even though it's supposed to own 100% and should have 6.4 GB.  Also 10.142.0.13
> seems also not to have everything as it only has a load of 5.55 GB.
>
> On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com>
> wrote:
>
>> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
>> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
>> up is?
>>
>> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>>
>>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns"
>>> for the node switched to 100% as it should but the Load showed that it
>>> didn't actually sync the data.  I then ran a full 'nodetool repair' and it
>>> didn't fix it still.  This scares me as I thought 'nodetool repair' was a
>>> way to assure consistency and that all the nodes were synced but it doesn't
>>> seem to be.  Outside of that command, I have no idea how I would assure all
>>> the data was synced or how to get the data correctly synced without
>>> decommissioning the node and re-adding it.
>>>
>>
>>
>>
>> --
>> Kurt Greaves
>> kurt@instaclustr.com
>> www.instaclustr.com
>>
>
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by Luke Jolly <lu...@getadmiral.com>.

Here's my setup:

Datacenter: gce-us-central1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID
                      Rack
UN  10.128.0.3   6.4 GB     256          100.0%
 3317a3de-9113-48e2-9a85-bbf756d7a4a6  default
UN  10.128.0.20  943.08 MB  256          100.0%
 958348cb-8205-4630-8b96-0951bf33f3d3  default
Datacenter: gce-us-east1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID
                      Rack
UN  10.142.0.14  6.4 GB     256          100.0%
 c3a5c39d-e1c9-4116-903d-b6d1b23fb652  default
UN  10.142.0.13  5.55 GB    256          100.0%
 d0d9c30e-1506-4b95-be64-3dd4d78f0583  default

And my replication settings are:

{'class': 'NetworkTopologyStrategy', 'aws-us-west': '2', 'gce-us-central1':
'2', 'gce-us-east1': '2'}

As you can see 10.128.0.20 in the gce-us-central1 DC only has a load of 943
MB even though it's supposed to own 100% and should have 6.4 GB.  Also
10.142.0.13
seems also not to have everything as it only has a load of 5.55 GB.

On Mon, May 23, 2016 at 7:28 PM, kurt Greaves <ku...@instaclustr.com> wrote:

> Do you have 1 node in each DC or 2? If you're saying you have 1 node in
> each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
> up is?
>
> On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:
>
>> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
>> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
>> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns" for
>> the node switched to 100% as it should but the Load showed that it didn't
>> actually sync the data.  I then ran a full 'nodetool repair' and it didn't
>> fix it still.  This scares me as I thought 'nodetool repair' was a way to
>> assure consistency and that all the nodes were synced but it doesn't seem
>> to be.  Outside of that command, I have no idea how I would assure all the
>> data was synced or how to get the data correctly synced without
>> decommissioning the node and re-adding it.
>>
>
>
>
> --
> Kurt Greaves
> kurt@instaclustr.com
> www.instaclustr.com
>

Re: Increasing replication factor and repair doesn't seem to work

Posted by kurt Greaves <ku...@instaclustr.com>.

Do you have 1 node in each DC or 2? If you're saying you have 1 node in
each DC then a RF of 2 doesn't make sense. Can you clarify on what your set
up is?

On 23 May 2016 at 19:31, Luke Jolly <lu...@getadmiral.com> wrote:

> I am running 3.0.5 with 2 nodes in two DCs, gce-us-central1 and
> gce-us-east1.  I increased the replication factor of gce-us-central1 from 1
> to 2.  Then I ran 'nodetool repair -dc gce-us-central1'.  The "Owns" for
> the node switched to 100% as it should but the Load showed that it didn't
> actually sync the data.  I then ran a full 'nodetool repair' and it didn't
> fix it still.  This scares me as I thought 'nodetool repair' was a way to
> assure consistency and that all the nodes were synced but it doesn't seem
> to be.  Outside of that command, I have no idea how I would assure all the
> data was synced or how to get the data correctly synced without
> decommissioning the node and re-adding it.
>



-- 
Kurt Greaves
kurt@instaclustr.com
www.instaclustr.com