You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Faraz Mateen <fm...@an10.io> on 2018/04/05 14:04:45 UTC

Shifting data to DCOS

Hi all,

I have been spending the last few days trying to move my C* cluster on
Gcloud (3 nodes, 700GB) into a DC/OS deployment. This, as you people might
know, was not trivial.

I have finally found a way to do this migration in a time-efficient way (We
evaluated bulkloading and sstableloader, but these would take much too
long, especially if we want to repeat this process between different
deployments).

I would really appreciate if you can review my approach below and comment
on where I can do something better (or automate it using existing tools
that I might not have stumbled across).

All the data from my previous setup is on persistent disks. I created
copies of those persistent disks and attached them to DC/OS agents. When
deploying the service on DC/OS, I specified disk type as MOUNT and provided
the same cluster name as my previous setup.

After the service was successfully deployed, I logged into cqlsh. I was
able to see all the keyspaces but all the column families were missing.
When I rechecked my data directory on the persistent disk, I was able to
see all my data in different directories. Each directory has a hash
attached to its name.

For example,  if the table is *data_main_bim_dn_10*, its data directory is
named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created a new
table with the same name through cqlsh. This resulted in creation of
another directory with a different hash i.e.
data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
from the former to the latter.

Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I was
able to access all data contents through cqlsh.

Now, the problem is, I have around 500 tables and the method I mentioned
above is quite cumbersome. Bulkloading through sstableloader or remote
seeding are also a couple of options but they will take a lot of time. Does
anyone know an easier way to shift all my data to new setup on DC/OS?

-- 
Faraz Mateen

Re: Shifting data to DCOS

Posted by Patrick Bannister <pt...@gmail.com>.

*nodetool ring* will give you the tokens for each node on the ring. Each
node has the token range between the previous node's token and its token -
so the token range for each node is the interval (previous_token,
this_token]. The first node in the ring has the range between the last
node's token and its token (the "wrapping range").

Patrick Bannister


On Sun, Apr 15, 2018 at 11:23 AM, Faraz Mateen <fm...@an10.io> wrote:

> *UPDATE* - I created schema for all the tables in one of the keypsaces,
> copied data to new directories and ran nodetool refresh. However, a lot of
> data seems to be missing.
>
> I ran nodetool repair on all three nodes one by one. First two nodes took
> around 20 minutes (each) to complete. Third node took a lot of time to
> repair and did not complete even in 14 hours. Eventually I had to stop it
> manually.
>
> *nodetool compactionstats *give me the "pending tasks by table name"
> traceback which can be viewed here:
> https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3
>
> *nodetool netstats* shows a lot of dropped gossip messages on all the
> nodes. Here is the output from one of the nodes:
>
> Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted: 0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name                    Active   Pending      Completed   DroppedLarge messages                  n/a         0             92         1Small messages                  n/a         0         355491         0Gossip messages                 n/a         5        3726945    286613
>
> Is the problem related to token ranges? How can I find out token range for
> each node?
> What can I do to further debug and root cause this?
>
> On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateen <fm...@an10.io> wrote:
>
>> Sorry for the late reply. I was trying to figure out some other approach
>> to it.
>>
>> @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am
>> not exactly sure how I would handle the tokens. Can you explain that a bit?
>>
>> @Michael - Actually, my DC/OS cluster has an older version than my
>> previous cluster. However both of them have hash with their data
>> directories. Previous cluster is on version 3.9 while new DC/OS cluster is
>> on 3.0.16.
>>
>>
>> On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> Without looking at the code I'd say maybe the keyspaces are displayed
>>> purely because the directories exist (but it seems unlikely). The process
>>> you should follow instead is to exclude the system keyspaces for each node
>>> and manually apply your schema, then upload your CFs into the correct
>>> directory. Note this only works when RF=#nodes, if you have more nodes you
>>> need to take tokens into account when restoring.
>>>
>>>
>>> On Fri., 6 Apr. 2018, 17:16 Affan Syed, <as...@an10.io> wrote:
>>>
>>>> Michael,
>>>>
>>>> both of the folders are with hash, so I dont think that would be an
>>>> issue.
>>>>
>>>> What is strange is why the tables dont show up if the keyspaces are
>>>> visible. Shouldnt that be a meta data that can be edited once and then be
>>>> visible?
>>>>
>>>> Affan
>>>>
>>>> - Affan
>>>>
>>>> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler <mi...@pbandjelly.org>
>>>> wrote:
>>>>
>>>>> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
>>>>> >
>>>>> > For example,  if the table is *data_main_bim_dn_10*, its data
>>>>> directory
>>>>> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I
>>>>> created
>>>>> > a new table with the same name through cqlsh. This resulted in
>>>>> creation
>>>>> > of another directory with a different hash i.e.
>>>>> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all
>>>>> data
>>>>> > from the former to the latter.
>>>>> >
>>>>> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that
>>>>> I
>>>>> > was able to access all data contents through cqlsh.
>>>>> >
>>>>> > Now, the problem is, I have around 500 tables and the method I
>>>>> mentioned
>>>>> > above is quite cumbersome. Bulkloading through sstableloader or
>>>>> remote
>>>>> > seeding are also a couple of options but they will take a lot of
>>>>> time.
>>>>> > Does anyone know an easier way to shift all my data to new setup on
>>>>> DC/OS?
>>>>>
>>>>> For upgrade support from older versions of C* that did not have the
>>>>> hash
>>>>> on the data directory, the table data dir can be just
>>>>> `data_main_bim_dn_10` without the appended hash, as in your example.
>>>>>
>>>>> Give that a quick test to see if that simplifies things for you.
>>>>>
>>>>> --
>>>>> Kind regards,
>>>>> Michael
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>>
>>>>>
>>>>
>>
>>
>> --
>> Faraz Mateen
>>
>
>
>
> --
> Faraz Mateen
>

Re: Shifting data to DCOS

Posted by Faraz Mateen <fm...@an10.io>.

*UPDATE* - I created schema for all the tables in one of the keypsaces,
copied data to new directories and ran nodetool refresh. However, a lot of
data seems to be missing.

I ran nodetool repair on all three nodes one by one. First two nodes took
around 20 minutes (each) to complete. Third node took a lot of time to
repair and did not complete even in 14 hours. Eventually I had to stop it
manually.

*nodetool compactionstats *give me the "pending tasks by table name"
traceback which can be viewed here:
https://gist.github.com/farazmateen/10adce4b2477457f0e20fc95176f66a3

*nodetool netstats* shows a lot of dropped gossip messages on all the
nodes. Here is the output from one of the nodes:

Mode: NORMALNot sending any streams.Read Repair Statistics:Attempted:
0Mismatch (Blocking): 1Mismatch (Background): 2Pool Name
     Active   Pending      Completed   DroppedLarge messages
       n/a         0             92         1Small messages
      n/a         0         355491         0Gossip messages
     n/a         5        3726945    286613

Is the problem related to token ranges? How can I find out token range for
each node?
What can I do to further debug and root cause this?

On Tue, Apr 10, 2018 at 4:28 PM, Faraz Mateen <fm...@an10.io> wrote:

> Sorry for the late reply. I was trying to figure out some other approach
> to it.
>
> @Kurt - My previous cluster has 3 nodes but replication factor is 2. I am
> not exactly sure how I would handle the tokens. Can you explain that a bit?
>
> @Michael - Actually, my DC/OS cluster has an older version than my
> previous cluster. However both of them have hash with their data
> directories. Previous cluster is on version 3.9 while new DC/OS cluster is
> on 3.0.16.
>
>
> On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves <ku...@instaclustr.com> wrote:
>
>> Without looking at the code I'd say maybe the keyspaces are displayed
>> purely because the directories exist (but it seems unlikely). The process
>> you should follow instead is to exclude the system keyspaces for each node
>> and manually apply your schema, then upload your CFs into the correct
>> directory. Note this only works when RF=#nodes, if you have more nodes you
>> need to take tokens into account when restoring.
>>
>>
>> On Fri., 6 Apr. 2018, 17:16 Affan Syed, <as...@an10.io> wrote:
>>
>>> Michael,
>>>
>>> both of the folders are with hash, so I dont think that would be an
>>> issue.
>>>
>>> What is strange is why the tables dont show up if the keyspaces are
>>> visible. Shouldnt that be a meta data that can be edited once and then be
>>> visible?
>>>
>>> Affan
>>>
>>> - Affan
>>>
>>> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler <mi...@pbandjelly.org>
>>> wrote:
>>>
>>>> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
>>>> >
>>>> > For example,  if the table is *data_main_bim_dn_10*, its data
>>>> directory
>>>> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I
>>>> created
>>>> > a new table with the same name through cqlsh. This resulted in
>>>> creation
>>>> > of another directory with a different hash i.e.
>>>> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all
>>>> data
>>>> > from the former to the latter.
>>>> >
>>>> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
>>>> > was able to access all data contents through cqlsh.
>>>> >
>>>> > Now, the problem is, I have around 500 tables and the method I
>>>> mentioned
>>>> > above is quite cumbersome. Bulkloading through sstableloader or remote
>>>> > seeding are also a couple of options but they will take a lot of time.
>>>> > Does anyone know an easier way to shift all my data to new setup on
>>>> DC/OS?
>>>>
>>>> For upgrade support from older versions of C* that did not have the hash
>>>> on the data directory, the table data dir can be just
>>>> `data_main_bim_dn_10` without the appended hash, as in your example.
>>>>
>>>> Give that a quick test to see if that simplifies things for you.
>>>>
>>>> --
>>>> Kind regards,
>>>> Michael
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>
>>>>
>>>
>
>
> --
> Faraz Mateen
>



-- 
Faraz Mateen

Re: Shifting data to DCOS

Posted by kurt greaves <ku...@instaclustr.com>.

Something is not right if it thinks the rf is different. Do you have the
command you ran for repair and the error?

If you are willing to do the operation again I'd be interested to see if
nodetool cleanup causes any data to be removed (you should snapshot the
disks before running this as it will remove data if tokens were incorrect).

On Wed., 2 May 2018, 21:48 Faraz Mateen, <fm...@an10.io> wrote:

> Hi all,
>
> Sorry I couldn't update earlier as I got caught up in some other stuff.
>
> Anyway, my previous 3 node cluster was on version 3.9.  I created a new
> cluster of cassandra 3.11.2 with same number of nodes on GCE VMs instead of
> DC/OS. My existing cluster has cassandra data on persistent disks. I made
> copies of those disks and attached them to new cluster.
>
> I was using the following link to move data to the new cluster:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSnapshotRestoreNewCluster.html
>
> As mentioned in the link, I manually assigned token ranges to each node
> according to their corresponding node in the previous cluster. When I
> restarted cassandra process on the VMs, I noticed that it had automatically
> picked up all my keyspaces and column families. I did not recreate schema
> or copy data manually or run sstablesloader. I am not sure if this should
> have happened.
>
> Anyway, the data in both clusters is still not in sync. I ran a simple
> count query on a table both clusters and got different results:
>
> Old cluster: 217699
> New Cluster: 138770
>
> On the new cluster, when I run nodetool repair for my keyspace, it runs
> fine on one node, but on other two nodes it says that keyspace replication
> factor is 1 so repair is not needed. Cqlsh also shows that the replication
> factor is 2.
>
> Nodetool status on new and old cluster shows different outputs for each
> cluster as well.
>
> *Cluster1:*
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns    Host ID
>             Rack
> UN  10.128.1.1  228.14 GiB  256          ?
> 63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
> UN  10.128.1.2  231.25 GiB  256          ?
> 702e8a31-6441-4444-b569-d2d137d54a5d  rack1
> UN  10.128.1.3  199.91 GiB  256          ?
> b5b22a90-f037-433a-8ad9-f370b26cca26  rack1
>
> *Cluster2:*
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address     Load       Tokens       Owns    Host ID
>             Rack
> UJ  10.142.0.4  211.27 GiB  256          ?
> c55fef77-9c78-449c-b0d9-64e755caee7d  rack1
> UN  10.142.0.2  228.14 GiB  256          ?
> 0065c8e1-47be-4cf8-a3fe-3f4d20ff1b47  rack1
> UJ  10.142.0.3  241.77 GiB  256          ?
> f3b3f409-d108-4751-93ba-682692e46318  rack1
>
> This is weird because both the clusters have essentially same disks
> attached to them.
>  Only one node (10.142.0.2) in cluster2 has the same load as its
> counterpart in the cluster1 (10.128.1.1).
> This is also the node where nodetool repair seems to be running fine and
> it is also acting as the seed node in second cluster.
>
> I am confused that what might be causing this inconsistency in load and
> replication factor?  Has anyone ever seen different replication factor for
> same keyspace on different nodes? Is there a problem in my workflow?
> Can anyone please suggest the best way to move data from one cluster to
> another?
>
> Any help will be greatly appreciated.
>
> On Tue, Apr 17, 2018 at 6:52 AM, Faraz Mateen <fm...@an10.io> wrote:
>
>> Thanks for the response guys.
>>
>> Let me try setting token ranges manually and move the data again to
>> correct nodes. Will update with the outcome soon.
>>
>>
>> On Tue, Apr 17, 2018 at 5:42 AM, kurt greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> Sorry for the delay.
>>>
>>>> Is the problem related to token ranges? How can I find out token range
>>>> for each node?
>>>> What can I do to further debug and root cause this?
>>>
>>> Very likely. See below.
>>>
>>> My previous cluster has 3 nodes but replication factor is 2. I am not
>>>> exactly sure how I would handle the tokens. Can you explain that a bit?
>>>
>>> The new cluster will have to have the same token ring as the old if you
>>> are copying from node to node. Basically you should get the set of tokens
>>> for each node (from nodetool ring) and when you spin up your 3 new nodes,
>>> set initial_tokens in the yaml to be the comma-separated list of tokens for *exactly
>>> one* node from the previous cluster. When restoring the SSTables you
>>> need to make sure you take the SSTables from the original node and place it
>>> on the new node that has the *same* list of tokens. If you don't do
>>> this it won't be a replica for all the data in those SSTables and
>>> consequently you'll lose data (or it simply won't be available).
>>> 
>>>
>>
>>
>>
>> --
>> Faraz Mateen
>>
>
>
>
> --
> Faraz Mateen
>

Re: Shifting data to DCOS

Posted by Faraz Mateen <fm...@an10.io>.

Hi all,

Sorry I couldn't update earlier as I got caught up in some other stuff.

Anyway, my previous 3 node cluster was on version 3.9.  I created a new
cluster of cassandra 3.11.2 with same number of nodes on GCE VMs instead of
DC/OS. My existing cluster has cassandra data on persistent disks. I made
copies of those disks and attached them to new cluster.

I was using the following link to move data to the new cluster:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsSnapshotRestoreNewCluster.html

As mentioned in the link, I manually assigned token ranges to each node
according to their corresponding node in the previous cluster. When I
restarted cassandra process on the VMs, I noticed that it had automatically
picked up all my keyspaces and column families. I did not recreate schema
or copy data manually or run sstablesloader. I am not sure if this should
have happened.

Anyway, the data in both clusters is still not in sync. I ran a simple
count query on a table both clusters and got different results:

Old cluster: 217699
New Cluster: 138770

On the new cluster, when I run nodetool repair for my keyspace, it runs
fine on one node, but on other two nodes it says that keyspace replication
factor is 1 so repair is not needed. Cqlsh also shows that the replication
factor is 2.

Nodetool status on new and old cluster shows different outputs for each
cluster as well.

*Cluster1:*
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID
          Rack
UN  10.128.1.1  228.14 GiB  256          ?
63ff8054-934a-4a7a-a33f-405e064bc8e8  rack1
UN  10.128.1.2  231.25 GiB  256          ?
702e8a31-6441-4444-b569-d2d137d54a5d  rack1
UN  10.128.1.3  199.91 GiB  256          ?
b5b22a90-f037-433a-8ad9-f370b26cca26  rack1

*Cluster2:*
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID
          Rack
UJ  10.142.0.4  211.27 GiB  256          ?
c55fef77-9c78-449c-b0d9-64e755caee7d  rack1
UN  10.142.0.2  228.14 GiB  256          ?
0065c8e1-47be-4cf8-a3fe-3f4d20ff1b47  rack1
UJ  10.142.0.3  241.77 GiB  256          ?
f3b3f409-d108-4751-93ba-682692e46318  rack1

This is weird because both the clusters have essentially same disks
attached to them.
 Only one node (10.142.0.2) in cluster2 has the same load as its
counterpart in the cluster1 (10.128.1.1).
This is also the node where nodetool repair seems to be running fine and it
is also acting as the seed node in second cluster.

I am confused that what might be causing this inconsistency in load and
replication factor?  Has anyone ever seen different replication factor for
same keyspace on different nodes? Is there a problem in my workflow?
Can anyone please suggest the best way to move data from one cluster to
another?

Any help will be greatly appreciated.

On Tue, Apr 17, 2018 at 6:52 AM, Faraz Mateen <fm...@an10.io> wrote:

> Thanks for the response guys.
>
> Let me try setting token ranges manually and move the data again to
> correct nodes. Will update with the outcome soon.
>
>
> On Tue, Apr 17, 2018 at 5:42 AM, kurt greaves <ku...@instaclustr.com>
> wrote:
>
>> Sorry for the delay.
>>
>>> Is the problem related to token ranges? How can I find out token range
>>> for each node?
>>> What can I do to further debug and root cause this?
>>
>> Very likely. See below.
>>
>> My previous cluster has 3 nodes but replication factor is 2. I am not
>>> exactly sure how I would handle the tokens. Can you explain that a bit?
>>
>> The new cluster will have to have the same token ring as the old if you
>> are copying from node to node. Basically you should get the set of tokens
>> for each node (from nodetool ring) and when you spin up your 3 new nodes,
>> set initial_tokens in the yaml to be the comma-separated list of tokens for *exactly
>> one* node from the previous cluster. When restoring the SSTables you
>> need to make sure you take the SSTables from the original node and place it
>> on the new node that has the *same* list of tokens. If you don't do this
>> it won't be a replica for all the data in those SSTables and consequently
>> you'll lose data (or it simply won't be available).
>> 
>>
>
>
>
> --
> Faraz Mateen
>

-- 
Faraz Mateen

Re: Shifting data to DCOS

Posted by Faraz Mateen <fm...@an10.io>.

Thanks for the response guys.

Let me try setting token ranges manually and move the data again to correct
nodes. Will update with the outcome soon.


On Tue, Apr 17, 2018 at 5:42 AM, kurt greaves <ku...@instaclustr.com> wrote:

> Sorry for the delay.
>
>> Is the problem related to token ranges? How can I find out token range
>> for each node?
>> What can I do to further debug and root cause this?
>
> Very likely. See below.
>
> My previous cluster has 3 nodes but replication factor is 2. I am not
>> exactly sure how I would handle the tokens. Can you explain that a bit?
>
> The new cluster will have to have the same token ring as the old if you
> are copying from node to node. Basically you should get the set of tokens
> for each node (from nodetool ring) and when you spin up your 3 new nodes,
> set initial_tokens in the yaml to be the comma-separated list of tokens for *exactly
> one* node from the previous cluster. When restoring the SSTables you need
> to make sure you take the SSTables from the original node and place it on
> the new node that has the *same* list of tokens. If you don't do this it
> won't be a replica for all the data in those SSTables and consequently
> you'll lose data (or it simply won't be available).
> 
>



-- 
Faraz Mateen

Re: Shifting data to DCOS

Posted by kurt greaves <ku...@instaclustr.com>.

Sorry for the delay.

> Is the problem related to token ranges? How can I find out token range for
> each node?
> What can I do to further debug and root cause this?

Very likely. See below.

My previous cluster has 3 nodes but replication factor is 2. I am not
> exactly sure how I would handle the tokens. Can you explain that a bit?

The new cluster will have to have the same token ring as the old if you are
copying from node to node. Basically you should get the set of tokens for
each node (from nodetool ring) and when you spin up your 3 new nodes, set
initial_tokens in the yaml to be the comma-separated list of tokens
for *exactly
one* node from the previous cluster. When restoring the SSTables you need
to make sure you take the SSTables from the original node and place it on
the new node that has the *same* list of tokens. If you don't do this it
won't be a replica for all the data in those SSTables and consequently
you'll lose data (or it simply won't be available).

Re: Shifting data to DCOS

Posted by Faraz Mateen <fm...@an10.io>.

Sorry for the late reply. I was trying to figure out some other approach to
it.

@Kurt - My previous cluster has 3 nodes but replication factor is 2. I am
not exactly sure how I would handle the tokens. Can you explain that a bit?

@Michael - Actually, my DC/OS cluster has an older version than my previous
cluster. However both of them have hash with their data directories.
Previous cluster is on version 3.9 while new DC/OS cluster is on 3.0.16.


On Fri, Apr 6, 2018 at 2:35 PM, kurt greaves <ku...@instaclustr.com> wrote:

> Without looking at the code I'd say maybe the keyspaces are displayed
> purely because the directories exist (but it seems unlikely). The process
> you should follow instead is to exclude the system keyspaces for each node
> and manually apply your schema, then upload your CFs into the correct
> directory. Note this only works when RF=#nodes, if you have more nodes you
> need to take tokens into account when restoring.
>
>
> On Fri., 6 Apr. 2018, 17:16 Affan Syed, <as...@an10.io> wrote:
>
>> Michael,
>>
>> both of the folders are with hash, so I dont think that would be an
>> issue.
>>
>> What is strange is why the tables dont show up if the keyspaces are
>> visible. Shouldnt that be a meta data that can be edited once and then be
>> visible?
>>
>> Affan
>>
>> - Affan
>>
>> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler <mi...@pbandjelly.org>
>> wrote:
>>
>>> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
>>> >
>>> > For example,  if the table is *data_main_bim_dn_10*, its data directory
>>> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I
>>> created
>>> > a new table with the same name through cqlsh. This resulted in creation
>>> > of another directory with a different hash i.e.
>>> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all
>>> data
>>> > from the former to the latter.
>>> >
>>> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
>>> > was able to access all data contents through cqlsh.
>>> >
>>> > Now, the problem is, I have around 500 tables and the method I
>>> mentioned
>>> > above is quite cumbersome. Bulkloading through sstableloader or remote
>>> > seeding are also a couple of options but they will take a lot of time.
>>> > Does anyone know an easier way to shift all my data to new setup on
>>> DC/OS?
>>>
>>> For upgrade support from older versions of C* that did not have the hash
>>> on the data directory, the table data dir can be just
>>> `data_main_bim_dn_10` without the appended hash, as in your example.
>>>
>>> Give that a quick test to see if that simplifies things for you.
>>>
>>> --
>>> Kind regards,
>>> Michael
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>
>>>
>>


-- 
Faraz Mateen

Re: Shifting data to DCOS

Posted by kurt greaves <ku...@instaclustr.com>.

Without looking at the code I'd say maybe the keyspaces are displayed
purely because the directories exist (but it seems unlikely). The process
you should follow instead is to exclude the system keyspaces for each node
and manually apply your schema, then upload your CFs into the correct
directory. Note this only works when RF=#nodes, if you have more nodes you
need to take tokens into account when restoring.

On Fri., 6 Apr. 2018, 17:16 Affan Syed, <as...@an10.io> wrote:

> Michael,
>
> both of the folders are with hash, so I dont think that would be an issue.
>
> What is strange is why the tables dont show up if the keyspaces are
> visible. Shouldnt that be a meta data that can be edited once and then be
> visible?
>
> Affan
>
> - Affan
>
> On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler <mi...@pbandjelly.org>
> wrote:
>
>> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
>> >
>> > For example,  if the table is *data_main_bim_dn_10*, its data directory
>> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created
>> > a new table with the same name through cqlsh. This resulted in creation
>> > of another directory with a different hash i.e.
>> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
>> > from the former to the latter.
>> >
>> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
>> > was able to access all data contents through cqlsh.
>> >
>> > Now, the problem is, I have around 500 tables and the method I mentioned
>> > above is quite cumbersome. Bulkloading through sstableloader or remote
>> > seeding are also a couple of options but they will take a lot of time.
>> > Does anyone know an easier way to shift all my data to new setup on
>> DC/OS?
>>
>> For upgrade support from older versions of C* that did not have the hash
>> on the data directory, the table data dir can be just
>> `data_main_bim_dn_10` without the appended hash, as in your example.
>>
>> Give that a quick test to see if that simplifies things for you.
>>
>> --
>> Kind regards,
>> Michael
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>

Re: Shifting data to DCOS

Posted by Affan Syed <as...@an10.io>.

Michael,

both of the folders are with hash, so I dont think that would be an issue.

What is strange is why the tables dont show up if the keyspaces are
visible. Shouldnt that be a meta data that can be edited once and then be
visible?

Affan

- Affan

On Thu, Apr 5, 2018 at 7:55 PM, Michael Shuler <mi...@pbandjelly.org>
wrote:

> On 04/05/2018 09:04 AM, Faraz Mateen wrote:
> >
> > For example,  if the table is *data_main_bim_dn_10*, its data directory
> > is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created
> > a new table with the same name through cqlsh. This resulted in creation
> > of another directory with a different hash i.e.
> > data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
> > from the former to the latter.
> >
> > Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
> > was able to access all data contents through cqlsh.
> >
> > Now, the problem is, I have around 500 tables and the method I mentioned
> > above is quite cumbersome. Bulkloading through sstableloader or remote
> > seeding are also a couple of options but they will take a lot of time.
> > Does anyone know an easier way to shift all my data to new setup on
> DC/OS?
>
> For upgrade support from older versions of C* that did not have the hash
> on the data directory, the table data dir can be just
> `data_main_bim_dn_10` without the appended hash, as in your example.
>
> Give that a quick test to see if that simplifies things for you.
>
> --
> Kind regards,
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Shifting data to DCOS

Posted by Michael Shuler <mi...@pbandjelly.org>.

On 04/05/2018 09:04 AM, Faraz Mateen wrote:
> 
> For example,  if the table is *data_main_bim_dn_10*, its data directory
> is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created
> a new table with the same name through cqlsh. This resulted in creation
> of another directory with a different hash i.e.
> data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
> from the former to the latter. 
> 
> Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I
> was able to access all data contents through cqlsh.
> 
> Now, the problem is, I have around 500 tables and the method I mentioned
> above is quite cumbersome. Bulkloading through sstableloader or remote
> seeding are also a couple of options but they will take a lot of time.
> Does anyone know an easier way to shift all my data to new setup on DC/OS?

For upgrade support from older versions of C* that did not have the hash
on the data directory, the table data dir can be just
`data_main_bim_dn_10` without the appended hash, as in your example.

Give that a quick test to see if that simplifies things for you.

-- 
Kind regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org