You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "KALYAN CHAKRAVARTHY KANCHARLA (JIRA)" <ji...@apache.org> on 2018/12/11 19:31:00 UTC

[jira] [Issue Comment Deleted] (CASSANDRA-14927) During data migration from 7 node to 21 node cluster using sstableloader, new data is being populated on the new tables & data is being duplicated on user type tables

     [ https://issues.apache.org/jira/browse/CASSANDRA-14927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

KALYAN CHAKRAVARTHY KANCHARLA updated CASSANDRA-14927:
------------------------------------------------------
    Comment: was deleted

(was: I have migrated the data from 7 nodes to 21 nodes. 

Along with the old data I see there is some new data being populated on the 21 nodes and also duplicate data on ids.sso_saml tables.)

> During data migration from 7 node to 21 node cluster using sstableloader, new data is being populated on the new tables & data is being duplicated on user type tables 
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14927
>             Project: Cassandra
>          Issue Type: Test
>            Reporter: KALYAN CHAKRAVARTHY KANCHARLA
>            Priority: Blocker
>              Labels: test
>             Fix For: 2.1.13
>
>
> I'm trying to migrate data from 7 node (single DC) cluster to a 21 node (3 DC) cluster using sstableloader.
> We have same versions on both old and new clusters.
> *cqlsh 5.0.1* 
>  *Cassandra 2.1.13* 
>  *CQL spec 3.2.1* 
> Old and New clusters are in different networks. So we opened the following ports between them.
> 7000- storage port
> 7001- ssl storage port
> 7199- JMX port
> 9042- client port
> 9160- Thrift client port
> We use vnodes in the clusters.
> We made sure cassandra.yaml file on the new cluster is set correct by changing following options,
>  
> {{cluster_name: 'MyCassandraCluster' }}
> {{num_tokens: 256 }}
> {{seed_provider: - }}
> {{class_name: org.apache.cassandra.locator.SimpleSeedProvider }}
> {{parameters: - }}
> {{seeds: "10.168.66.41,10.176.170.59" }}
> {{listen_address: localhost}}
> {{endpoint_snitch: GossipingPropertyFileSnitch}}
> And also changes in cassaandra-rackdc-properties for each DC by specifying respective DC and rack.
> while creating keyspaces, changed Replication to NetworkTopologyStratagy.
>  
> cluster looks healthy, all the node are UP and NORMAL. 
>  
> {color:#FF0000}*I was able to get the data from old cluster to new cluster. But, along with the data from old cluster, I see some new rows being populated in the tables on new cluster and data is being duplicated in the tables with user type*. {color}
> {color:#333333}We have used the following steps to migrate data:{color}
>  # Took snapshorts for all the keyspaces that we want to migrate. (9 keyspaces). Used the _nodetool snapshot_ command on source nodes to take snapshot of required keyspace/table by specifying _hostname, jmx port_ and _keyspace_
>  __ 
> _/a/cassandra/bin/nodetool -u $(sudo su - company -c "cat /a/cassandra/jmxremote.password" | awk '\{print $1}') -pw $(sudo su - company -c "cat /a/cassandra/jmxremote.password" | awk '\{print $2}')_  *_-h localhost -p 7199 snapshot keyspace_name_*
>  # After taking snapshots, move these snapshot directory from source nodes to target node.
>        
> → Create a tar file on source node for the snapshot directory that we want to move on to target node.
>      tar -cvf file.tar snapshot_name
> → Move this file.tar from source node to local machine.
>      scp -S gwsh root@192.168.64.99:/a/cassandra/data/file.tar .
> → Now move this file.tar from local machine to a new directory(example: test) in the target node.
>     scp -S gwsh file.tar root@192.168.58.41:/a/cassandra/data/test/.
>  # Now untar this file.tar in test directory in target node.
>  # The path of the sstables must be same in both source and target.
>  # To bulk load these files using _sstableloader,run sstableloader on source node, indicate one or more nodes in the destination Cluster with -d flag, which can accept comma-separated list of IP addresses or hostnames, and specify the path to  sstables in the source node._ __ 
> _/a/Cassandra/bin/_ *_./sstableloader -d host_IP path_to_sstables_*
>           *_Example:_*
> [/a/cassandra/bin#|mailto:root@sqa-cassandra03.sqaextranet:/a/cassandra/bin] sstableloader -d 192.168.58.41 -u popps -pw ******* -tf org.apache.cassandra.thrift.SSLTransportFactory -ts /a/cassandra/ssl/truststore.jks -tspw test123 -ks /a/cassandra/ssl/keystore.jks -kspw test123 -f /a/cassandra/conf/cassandra.yaml /a/cassandra/data/app_properties/_admins-58524140431511e8bbb6357f562e11ca_/ 
> Summary statistics:
>  Connections per host: : 1
>  Total files transferred: : 9
>  Total bytes transferred: : 1787893
>  Total duration (ms): : 2936
>  Average transfer rate (MB/s): : 0
>  Peak transfer rate (MB/s): : 0
>  
> Performed these steps on all the tables. And checked the row count in old and new tables using CQLSH
> cqlsh> SELECT count(*) FROM keyspace.table;
> example for a single table:
> count on new table: 341
> count on old table: 303
>  
> And we are also able to identify the difference in tables by using 'sdiff' command. Followed the following steps:
>  * created .txt/.csv files for tables in old and new clusters.
>  * compared them using sdiff command   
>  
> *So I request someone can help me to know the cause behind the population of new data in the new tables.*  
> Please let me know if you need more info.
> PS: After migrating the data for the first time and saw these issues, we have TRUNCATED all the tables and DROPPED tables with user 'type' and recreated  the dropped tables. And did the same procedure for migrating data again. Still we see the same issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org