You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Muralikrishna Gutha <mu...@gmail.com> on 2019/10/24 15:26:39 UTC

Duplicates columns which are backed by LIST collection types

Hello Guys,

We started noticing strange behavior after we migrated one keyspace from
existing cluster to new cluster.

We expanded our source cluster from 18 node to 36 nodes and Didn't run
"nodetool cleanup".
We took sstable backups on source cluster and restored which has duplicate
data and restored (sstableloader) it on to new cluster. Apparently
applications started seeing duplicate data mostly on list backed columns.
Below is sstable2json output for one of the list backed columns.

Clustering Column1:Clustering Column2:mods (List collection type
ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],

Below is the select statement i would expect Cassandra to return data with
latest timestamp rather it returns duplicate values.

select mods from keyspace.table where partition_key ='1117302' and
type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;

[image: image.png]

Any help or guidance is greatly appreciated.

-- 
Thanks & Regards
  Murali K Gutha

Re: Duplicates columns which are backed by LIST collection types

Posted by Muralikrishna Gutha <mu...@gmail.com>.
If data was TTL'ed & tombstoned, shouldn't select be returning just 3
elements from the list column ?

Thanks,
Murali

On Thu, Oct 24, 2019 at 12:03 PM ZAIDI, ASAD <az...@att.com> wrote:

> Guess TTL’ed data is lurking around?  If so , you can try get rid of
> tombstones (by reducing gc_grace_seconds to zero? ) and let compaction take
> care of tombstone before sstable  migration. Do keep an eye on hinted
> handoffs  because of  zero’ed gc_grace_second property.
>
>
>
>
>
> *From:* Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
> *Sent:* Thursday, October 24, 2019 10:39 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Duplicates columns which are backed by LIST collection
> types
>
>
>
> Thanks, List datatype has been in-use for this table almost over a few
> years now and never had issues. We ran into this issue recently when we did
> the keyspace migration.
>
>
>
> Thanks,
>
> Murali
>
>
>
> On Thu, Oct 24, 2019 at 11:36 AM ZAIDI, ASAD <az...@att.com> wrote:
>
> Have you chosen correct datatype to begin with, if you don’t want
> duplicates?
>
>
>
> Generally speaking:
>
>
>
> A set and a list both represent multiple values but do so differently.
>
> A set doesn’t save ordering and values are sorted in ascending order. No
> duplicates are allowed.
>
>
>
> A list saves ordering where you append or prepend the value into the list.
> A list allows duplicates.
>
>
>
>
>
>
>
> *From:* Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
> *Sent:* Thursday, October 24, 2019 10:27 AM
> *To:* user@cassandra.apache.org
> *Cc:* Muralikrishna Gutha <mu...@gmail.com>
> *Subject:* Duplicates columns which are backed by LIST collection types
>
>
>
> Hello Guys,
>
>
>
> We started noticing strange behavior after we migrated one keyspace from
> existing cluster to new cluster.
>
>
>
> We expanded our source cluster from 18 node to 36 nodes and Didn't run
> "nodetool cleanup".
>
> We took sstable backups on source cluster and restored which has duplicate
> data and restored (sstableloader) it on to new cluster. Apparently
> applications started seeing duplicate data mostly on list backed columns.
> Below is sstable2json output for one of the list backed columns.
>
>
>
> Clustering Column1:Clustering Column2:mods (List collection type
>
> ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233
>
>
>
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],
>
>
>
> Below is the select statement i would expect Cassandra to return data with
> latest timestamp rather it returns duplicate values.
>
>
>
> select mods from keyspace.table where partition_key ='1117302' and
> type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;
>
>
>
> [image: image.png]
>
>
>
> Any help or guidance is greatly appreciated.
>
>
>
> --
>
> Thanks & Regards
>   Murali K Gutha
>
>
>
>
> --
>
> Thanks & Regards
>   Murali K Gutha
>


-- 
Thanks & Regards
  Murali K Gutha

RE: Duplicates columns which are backed by LIST collection types

Posted by "ZAIDI, ASAD" <az...@att.com>.
Guess TTL’ed data is lurking around?  If so , you can try get rid of tombstones (by reducing gc_grace_seconds to zero? ) and let compaction take care of tombstone before sstable  migration. Do keep an eye on hinted handoffs  because of  zero’ed gc_grace_second property.


From: Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
Sent: Thursday, October 24, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Duplicates columns which are backed by LIST collection types

Thanks, List datatype has been in-use for this table almost over a few years now and never had issues. We ran into this issue recently when we did the keyspace migration.

Thanks,
Murali

On Thu, Oct 24, 2019 at 11:36 AM ZAIDI, ASAD <az...@att.com>> wrote:
Have you chosen correct datatype to begin with, if you don’t want duplicates?

Generally speaking:

A set and a list both represent multiple values but do so differently.
A set doesn’t save ordering and values are sorted in ascending order. No duplicates are allowed.

A list saves ordering where you append or prepend the value into the list. A list allows duplicates.



From: Muralikrishna Gutha [mailto:muralikgutha@gmail.com<ma...@gmail.com>]
Sent: Thursday, October 24, 2019 10:27 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Muralikrishna Gutha <mu...@gmail.com>>
Subject: Duplicates columns which are backed by LIST collection types

Hello Guys,

We started noticing strange behavior after we migrated one keyspace from existing cluster to new cluster.

We expanded our source cluster from 18 node to 36 nodes and Didn't run "nodetool cleanup".
We took sstable backups on source cluster and restored which has duplicate data and restored (sstableloader) it on to new cluster. Apparently applications started seeing duplicate data mostly on list backed columns. Below is sstable2json output for one of the list backed columns.

Clustering Column1:Clustering Column2:mods (List collection type
ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],

Below is the select statement i would expect Cassandra to return data with latest timestamp rather it returns duplicate values.

select mods from keyspace.table where partition_key ='1117302' and type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;

[image.png]

Any help or guidance is greatly appreciated.

--
Thanks & Regards
  Murali K Gutha


--
Thanks & Regards
  Murali K Gutha

Re: Duplicates columns which are backed by LIST collection types

Posted by Muralikrishna Gutha <mu...@gmail.com>.
Thanks, List datatype has been in-use for this table almost over a few
years now and never had issues. We ran into this issue recently when we did
the keyspace migration.

Thanks,
Murali

On Thu, Oct 24, 2019 at 11:36 AM ZAIDI, ASAD <az...@att.com> wrote:

> Have you chosen correct datatype to begin with, if you don’t want
> duplicates?
>
>
>
> Generally speaking:
>
>
>
> A set and a list both represent multiple values but do so differently.
>
> A set doesn’t save ordering and values are sorted in ascending order. No
> duplicates are allowed.
>
>
>
> A list saves ordering where you append or prepend the value into the list.
> A list allows duplicates.
>
>
>
>
>
>
>
> *From:* Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
> *Sent:* Thursday, October 24, 2019 10:27 AM
> *To:* user@cassandra.apache.org
> *Cc:* Muralikrishna Gutha <mu...@gmail.com>
> *Subject:* Duplicates columns which are backed by LIST collection types
>
>
>
> Hello Guys,
>
>
>
> We started noticing strange behavior after we migrated one keyspace from
> existing cluster to new cluster.
>
>
>
> We expanded our source cluster from 18 node to 36 nodes and Didn't run
> "nodetool cleanup".
>
> We took sstable backups on source cluster and restored which has duplicate
> data and restored (sstableloader) it on to new cluster. Apparently
> applications started seeing duplicate data mostly on list backed columns.
> Below is sstable2json output for one of the list backed columns.
>
>
>
> Clustering Column1:Clustering Column2:mods (List collection type
>
> ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233
>
>
>
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
>
>  ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],
>
>
>
> Below is the select statement i would expect Cassandra to return data with
> latest timestamp rather it returns duplicate values.
>
>
>
> select mods from keyspace.table where partition_key ='1117302' and
> type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;
>
>
>
> [image: image.png]
>
>
>
> Any help or guidance is greatly appreciated.
>
>
>
> --
>
> Thanks & Regards
>   Murali K Gutha
>


-- 
Thanks & Regards
  Murali K Gutha

RE: Duplicates columns which are backed by LIST collection types

Posted by "ZAIDI, ASAD" <az...@att.com>.
Have you chosen correct datatype to begin with, if you don’t want duplicates?

Generally speaking:

A set and a list both represent multiple values but do so differently.
A set doesn’t save ordering and values are sorted in ascending order. No duplicates are allowed.

A list saves ordering where you append or prepend the value into the list. A list allows duplicates.



From: Muralikrishna Gutha [mailto:muralikgutha@gmail.com]
Sent: Thursday, October 24, 2019 10:27 AM
To: user@cassandra.apache.org
Cc: Muralikrishna Gutha <mu...@gmail.com>
Subject: Duplicates columns which are backed by LIST collection types

Hello Guys,

We started noticing strange behavior after we migrated one keyspace from existing cluster to new cluster.

We expanded our source cluster from 18 node to 36 nodes and Didn't run "nodetool cleanup".
We took sstable backups on source cluster and restored which has duplicate data and restored (sstableloader) it on to new cluster. Apparently applications started seeing duplicate data mostly on list backed columns. Below is sstable2json output for one of the list backed columns.

Clustering Column1:Clustering Column2:mods (List collection type
ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233

 ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b050eac811e9ab2729ea208ce219","eb25d0b13a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b051eac811e9ab2729ea208ce219","eb26bb113a6611e980b22102e728a233",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:d120b052eac811e9ab2729ea208ce219","a4fcf1f1eac811e99664732b9302ab46",1570648383445000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973560ead811e98bf68711844fec13","eb25d0b13a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973561ead811e98bf68711844fec13","eb26bb113a6611e980b22102e728a233",1570654999478000],
           ["ModifierList:eb26e221-3a66-11e9-80b2-2102e728a233:mods:38973562ead811e98bf68711844fec13","a4fcf1f1eac811e99664732b9302ab46",1570654999478000],

Below is the select statement i would expect Cassandra to return data with latest timestamp rather it returns duplicate values.

select mods from keyspace.table where partition_key ='1117302' and type='ModifierList' and id=eb26e221-3a66-11e9-80b2-2102e728a233;

[image.png]

Any help or guidance is greatly appreciated.

--
Thanks & Regards
  Murali K Gutha