You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Gerard Maas <ge...@gmail.com> on 2015/08/06 18:50:50 UTC

Duplicating a cluster with different # of disks

Hi,

I'm currently trying to duplicate a given keyspace on a new cluster to run
some analytics on it.

My source cluster has 3 disks and corresponding data directories (mnt,
mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt,
mnt2).

What should be the correct procedure to copy the sstable data  from source
to destination in this case?

-kr, Gerard.

Re: limit the size of data type LIST

Posted by Jeff Jirsa <Je...@crowdstrike.com>.

This is not currently possible, though it has been proposed in the past and may potentially be implemented in the future: https://issues.apache.org/jira/browse/CASSANDRA-9110

- Jeff

From:  yuankui
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, August 13, 2015 at 6:24 PM
To:  "user@cassandra.apache.org"
Subject:  Re: limit the size of data type LIST

Sorry for not making myself clear and thank you for your reply. 

--------------

I want to know if there is a way to automatically remove old items in the list in SERVER SIDE if the size() of the list reached a certain limit(say 1000).

client does not need to care about this, and just do insert and get. and he will get the latest 1000 messages of a user?

can I?

在 2015年8月14日，上午12:55，<SE...@homedepot.com> <SE...@homedepot.com> 写道：

This sounds like something you do on the client side BEFORE you insert. Or are you wanting to limit the size of the list coming out to the client?

Sean Durity

Lead Cassandra Admin, Big Data Team

From: yuankui [mailto:kui.yuan@fraudmetrix.cn] 
Sent: Thursday, August 13, 2015 9:06 AM
To: user@cassandra.apache.org
Subject: limit the size of data type LIST

hi, friends

I am design a message history table

CREATE TABLE message_history (

    user_name text PRIMARY KEY,

    time timestamp,

    message_details list<text>, 

);

so that I can query a user's message via primary key `user_name` at once.

but the item in `message_details` list may be very long so that I want to limit the list size of the message_details list.

is there a way to solve this?

like a redis operation `LTRIM` - http://redis.readthedocs.org/en/latest/list/ltrim.html

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: limit the size of data type LIST

Posted by yuankui <ku...@fraudmetrix.cn>.

Sorry for not making myself clear and thank you for your reply.

--------------

I want to know if there is a way to automatically remove old items in the list in SERVER SIDE if the size() of the list reached a certain limit(say 1000).

client does not need to care about this, and just do insert and get. and he will get the latest 1000 messages of a user?

can I?




在 2015年8月14日，上午12:55，<SE...@homedepot.com> <SE...@homedepot.com> 写道：

This sounds like something you do on the client side BEFORE you insert. Or are you wanting to limit the size of the list coming out to the client?
 
 
Sean Durity
Lead Cassandra Admin, Big Data Team
 
From: yuankui [mailto:kui.yuan@fraudmetrix.cn] 
Sent: Thursday, August 13, 2015 9:06 AM
To: user@cassandra.apache.org
Subject: limit the size of data type LIST
 
hi, friends
 
I am design a message history table
 
CREATE TABLE message_history (
    user_name text PRIMARY KEY,
    time timestamp,
    message_details list<text>, 
);
 
so that I can query a user's message via primary key `user_name` at once.
 
but the item in `message_details` list may be very long so that I want to limit the list size of the message_details list.
 
is there a way to solve this?
 
like a redis operation `LTRIM` - http://redis.readthedocs.org/en/latest/list/ltrim.html <http://redis.readthedocs.org/en/latest/list/ltrim.html>
 
 


The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

RE: limit the size of data type LIST

Posted by SE...@homedepot.com.

This sounds like something you do on the client side BEFORE you insert. Or are you wanting to limit the size of the list coming out to the client?

Sean Durity
Lead Cassandra Admin, Big Data Team

From: yuankui [mailto:kui.yuan@fraudmetrix.cn]
Sent: Thursday, August 13, 2015 9:06 AM
To: user@cassandra.apache.org
Subject: limit the size of data type LIST

hi, friends

I am design a message history table

CREATE TABLE message_history (
    user_name text PRIMARY KEY,
    time timestamp,
    message_details list<text>,
);

so that I can query a user's message via primary key `user_name` at once.

but the item in `message_details` list may be very long so that I want to limit the list size of the message_details list.

is there a way to solve this?

like a redis operation `LTRIM` - http://redis.readthedocs.org/en/latest/list/ltrim.html

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

limit the size of data type LIST

Posted by yuankui <ku...@fraudmetrix.cn>.

hi, friends

I am design a message history table

CREATE TABLE message_history (
    user_name text PRIMARY KEY,
    time timestamp,
    message_details list<text>, 
);

so that I can query a user's message via primary key `user_name` at once.

but the item in `message_details` list may be very long so that I want to limit the list size of the message_details list.

is there a way to solve this?

like a redis operation `LTRIM` - http://redis.readthedocs.org/en/latest/list/ltrim.html <http://redis.readthedocs.org/en/latest/list/ltrim.html>

Re: Duplicating a cluster with different # of disks

Posted by Gerard Maas <ge...@gmail.com>.

Many thanks for confirming the procedure. I was doing the copy from 3->2 as
explained before. My doubt came from  noticing that the total count
strongly differed from src to destination. 3M vs 150k.
But small test tables with few hundred records all went well.

Double checked the copy and the procedure was correct. It was a table we
had issues with in the past (few very loooong rows). Maybe related to that?

Kr, Gerard
On Aug 6, 2015 11:00 PM, "Alain RODRIGUEZ" <ar...@gmail.com> wrote:

> I agree with Jeff, those 2 solution should work well indeed to have
> distinct cluster (data will be fixed in time, not synchronised).
>
> It really depends on you but basically having hybride data storage
> structures is not an issue at all in both cases as it is something that you
> can set in the cassandra.yaml at the node level.
>
> C*heers,
>
> Alain
>
> 2015-08-06 22:41 GMT+02:00 Jeff Jirsa <Je...@crowdstrike.com>:
>
>> You can copy all of the sstables into any given data directory without
>> issue (keep them within the keyspace/table directories, but the
>> mnt/mnt2/mnt3 location is irrelevant).
>>
>> You can also stream them in via sstableloader if your ring topology has
>> changed (especially if tokens have moved)
>>
>>
>>
>> From: Gerard Maas
>> Reply-To: "user@cassandra.apache.org"
>> Date: Thursday, August 6, 2015 at 9:50 AM
>> To: "user@cassandra.apache.org"
>> Subject: Duplicating a cluster with different # of disks
>>
>> Hi,
>>
>> I'm currently trying to duplicate a given keyspace on a new cluster to
>> run some analytics on it.
>>
>> My source cluster has 3 disks and corresponding data directories (mnt,
>> mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt,
>> mnt2).
>>
>> What should be the correct procedure to copy the sstable data  from
>> source to destination in this case?
>>
>> -kr, Gerard.
>>
>
>

Re: Duplicating a cluster with different # of disks

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I agree with Jeff, those 2 solution should work well indeed to have
distinct cluster (data will be fixed in time, not synchronised).

It really depends on you but basically having hybride data storage
structures is not an issue at all in both cases as it is something that you
can set in the cassandra.yaml at the node level.

C*heers,

Alain

2015-08-06 22:41 GMT+02:00 Jeff Jirsa <Je...@crowdstrike.com>:

> You can copy all of the sstables into any given data directory without
> issue (keep them within the keyspace/table directories, but the
> mnt/mnt2/mnt3 location is irrelevant).
>
> You can also stream them in via sstableloader if your ring topology has
> changed (especially if tokens have moved)
>
>
>
> From: Gerard Maas
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, August 6, 2015 at 9:50 AM
> To: "user@cassandra.apache.org"
> Subject: Duplicating a cluster with different # of disks
>
> Hi,
>
> I'm currently trying to duplicate a given keyspace on a new cluster to run
> some analytics on it.
>
> My source cluster has 3 disks and corresponding data directories (mnt,
> mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt,
> mnt2).
>
> What should be the correct procedure to copy the sstable data  from source
> to destination in this case?
>
> -kr, Gerard.
>

Re: Duplicating a cluster with different # of disks

Posted by Jeff Jirsa <Je...@crowdstrike.com>.

You can copy all of the sstables into any given data directory without issue (keep them within the keyspace/table directories, but the mnt/mnt2/mnt3 location is irrelevant).

You can also stream them in via sstableloader if your ring topology has changed (especially if tokens have moved)

From:  Gerard Maas
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, August 6, 2015 at 9:50 AM
To:  "user@cassandra.apache.org"
Subject:  Duplicating a cluster with different # of disks

Hi, 

I'm currently trying to duplicate a given keyspace on a new cluster to run some analytics on it.

My source cluster has 3 disks and corresponding data directories (mnt, mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt, mnt2).

What should be the correct procedure to copy the sstable data  from source to destination in this case?

-kr, Gerard.

Re: Duplicating a cluster with different # of disks

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I forget to specify that you will obtain 2 DC instead of 2 cluster, the
main differences are that DCs are connected through gossip and keep synced
(make sure your clients are sticked to your main DC). Depending on what you
want to achieve you might want 2 clusters or 2 DC.

2015-08-06 22:31 GMT+02:00 Alain RODRIGUEZ <ar...@gmail.com>:

> Hi Gerard,
>
> You should probably add a new datacenter following this procedure :
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> You will just have to make sure to configure all the nodes of the new
> nodes to use mnt + mnt2 instead of mnt + mnt2 + mnt3.
>
> Make sure to have enough space (disks / nodes) to handle your whole set of
> data in the keyspace you want to load and it should be ok.
>
> I think that this is the easiest approach, you don't need to care about
> SSTable at all, Cassandra does it for you.
>
> C*heers,
>
> Alain
>
> 2015-08-06 18:50 GMT+02:00 Gerard Maas <ge...@gmail.com>:
>
>> Hi,
>>
>> I'm currently trying to duplicate a given keyspace on a new cluster to
>> run some analytics on it.
>>
>> My source cluster has 3 disks and corresponding data directories (mnt,
>> mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt,
>> mnt2).
>>
>> What should be the correct procedure to copy the sstable data  from
>> source to destination in this case?
>>
>> -kr, Gerard.
>>
>
>

Re: Duplicating a cluster with different # of disks

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi Gerard,

You should probably add a new datacenter following this procedure :
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

You will just have to make sure to configure all the nodes of the new nodes
to use mnt + mnt2 instead of mnt + mnt2 + mnt3.

Make sure to have enough space (disks / nodes) to handle your whole set of
data in the keyspace you want to load and it should be ok.

I think that this is the easiest approach, you don't need to care about
SSTable at all, Cassandra does it for you.

C*heers,

Alain

2015-08-06 18:50 GMT+02:00 Gerard Maas <ge...@gmail.com>:

> Hi,
>
> I'm currently trying to duplicate a given keyspace on a new cluster to run
> some analytics on it.
>
> My source cluster has 3 disks and corresponding data directories (mnt,
> mnt2, mnt3) but the machines in my target cluster only have 2 disks (mnt,
> mnt2).
>
> What should be the correct procedure to copy the sstable data  from source
> to destination in this case?
>
> -kr, Gerard.
>