You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Terje Marthinussen <tm...@gmail.com> on 2011/02/24 03:39:02 UTC

Fill disks more than 50%

Hi,

Given that you have have always increasing key values (timestamps) and never
delete and hardly ever overwrite data.

If you want to minimize work on rebalancing and statically assign (new)
token ranges to new nodes as you add them so they always get the latest
data....
Lets say you add a new node each year to handle next years data.

In a scenario like this, could you with 0.7 be able to safely fill disks
significantly more than 50% and still manage things like repair/recovery of
faulty nodes?


Regards,
Terje

Re: Fill disks more than 50%

Posted by Terje Marthinussen <tm...@gmail.com>.

 I am suggesting that your probably want to rethink your scheme design

> since partitioning by year is going to be bad performance since the
> old servers are going to be nothing more then expensive tape drives.
>

You fail to see the obvious....

It is just the fact that most of the data is stale that makes the question
interesting in the first place, and I would obviously not have asked if
there would be an I/O throughput problem in doing this.

Now, when that is said, we tested a repair on a set of nodes that was 70-80%
full and no luck. Ran out of disk :(

Terje

Re: Fill disks more than 50%

Posted by Edward Capriolo <ed...@gmail.com>.

On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen
<tm...@gmail.com> wrote:
>>
>> @Thibaut Britz
>> Caveat:Using simple strategy.
>> This works because cassandra scans data at startup and then serves
>> what it finds. For a join for example you can rsync all the data from
>> the node below/to the right of where the new node is joining. Then
>> join without bootstrap then cleanup both nodes. (also you have to
>> shutdown the first node so you do not have a lost write scenario in
>> the time between rsync and new node startup)
>>
>
> rsync all data from node to left/right..
> Wouldn't that mean that you need 2x the data to recover...?
> Terje

Terje,

In your scenario where you are never updating running repair becomes
less important. I have an alternative for you. I have a program I call
the "RescueRanger" we use it to range-scan all our data, find old
entries and then delete them. However if we set that program to "read
only mode" and tell it to read at CL.ALL, It becomes a program that
read repairs data!

This is a tradeoff. Range scanning though all your data is not fast,
but it does not require the extra disk space. Kinda like merge sort vs
bubble sort.

Re: Fill disks more than 50%

Posted by Terje Marthinussen <tm...@gmail.com>.

>
>
> @Thibaut Britz
> Caveat:Using simple strategy.
> This works because cassandra scans data at startup and then serves
> what it finds. For a join for example you can rsync all the data from
> the node below/to the right of where the new node is joining. Then
> join without bootstrap then cleanup both nodes. (also you have to
> shutdown the first node so you do not have a lost write scenario in
> the time between rsync and new node startup)
>
>
rsync all data from node to left/right..
Wouldn't that mean that you need 2x the data to recover...?

Terje

Re: Fill disks more than 50%

Posted by Edward Capriolo <ed...@gmail.com>.

On Thu, Feb 24, 2011 at 4:08 AM, Thibaut Britz
<th...@trendiction.com> wrote:
> Hi,
>
> How would you use rsync instead of repair in case of a node failure?
>
> Rsync all files from the data directories from the adjacant nodes
> (which are part of the quorum group) and then run a compactation which
> will? remove all the unneeded keys?
>
> Thanks,
> Thibaut
>
>
> On Thu, Feb 24, 2011 at 4:22 AM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
>> <tm...@gmail.com> wrote:
>>> Hi,
>>> Given that you have have always increasing key values (timestamps) and never
>>> delete and hardly ever overwrite data.
>>> If you want to minimize work on rebalancing and statically assign (new)
>>> token ranges to new nodes as you add them so they always get the latest
>>> data....
>>> Lets say you add a new node each year to handle next years data.
>>> In a scenario like this, could you with 0.7 be able to safely fill disks
>>> significantly more than 50% and still manage things like repair/recovery of
>>> faulty nodes?
>>>
>>> Regards,
>>> Terje
>>
>> Since all your data for a day/month/year would sit on the same server.
>> Meaning all your servers with old data would be idle and your servers
>> with current data would be very busy. This is probably not a good way
>> to go.
>>
>> There is a ticket open for 0.8 for efficient node moves joins. It is
>> already a lot better in 0.7. Pretend you did not see this (you can
>> join nodes using rsync if you know some tricks) if you are really
>> afraid of joins, which you really should not be.
>>
>> As for the 50% statement. In a worse case scenario a major compaction
>> will require double the disk size of your column family. So if you
>> have more then 1 column family you do NOT need 50% overhead.
>>
>
@Thibaut Britz
Caveat:Using simple strategy.
This works because cassandra scans data at startup and then serves
what it finds. For a join for example you can rsync all the data from
the node below/to the right of where the new node is joining. Then
join without bootstrap then cleanup both nodes. (also you have to
shutdown the first node so you do not have a lost write scenario in
the time between rsync and new node startup)

It does not make as much sense for repair because the data on a node
will tripple, before you compact/cleanup it.

@Terje
I am suggesting that your probably want to rethink your scheme design
since partitioning by year is going to be bad performance since the
old servers are going to be nothing more then expensive tape drives.

Re: Fill disks more than 50%

Posted by Thibaut Britz <th...@trendiction.com>.

Hi,

How would you use rsync instead of repair in case of a node failure?

Rsync all files from the data directories from the adjacant nodes
(which are part of the quorum group) and then run a compactation which
will? remove all the unneeded keys?

Thanks,
Thibaut


On Thu, Feb 24, 2011 at 4:22 AM, Edward Capriolo <ed...@gmail.com> wrote:
> On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
> <tm...@gmail.com> wrote:
>> Hi,
>> Given that you have have always increasing key values (timestamps) and never
>> delete and hardly ever overwrite data.
>> If you want to minimize work on rebalancing and statically assign (new)
>> token ranges to new nodes as you add them so they always get the latest
>> data....
>> Lets say you add a new node each year to handle next years data.
>> In a scenario like this, could you with 0.7 be able to safely fill disks
>> significantly more than 50% and still manage things like repair/recovery of
>> faulty nodes?
>>
>> Regards,
>> Terje
>
> Since all your data for a day/month/year would sit on the same server.
> Meaning all your servers with old data would be idle and your servers
> with current data would be very busy. This is probably not a good way
> to go.
>
> There is a ticket open for 0.8 for efficient node moves joins. It is
> already a lot better in 0.7. Pretend you did not see this (you can
> join nodes using rsync if you know some tricks) if you are really
> afraid of joins, which you really should not be.
>
> As for the 50% statement. In a worse case scenario a major compaction
> will require double the disk size of your column family. So if you
> have more then 1 column family you do NOT need 50% overhead.
>

Re: Fill disks more than 50%

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen
<tm...@gmail.com> wrote:
> Hi,
> Given that you have have always increasing key values (timestamps) and never
> delete and hardly ever overwrite data.
> If you want to minimize work on rebalancing and statically assign (new)
> token ranges to new nodes as you add them so they always get the latest
> data....
> Lets say you add a new node each year to handle next years data.
> In a scenario like this, could you with 0.7 be able to safely fill disks
> significantly more than 50% and still manage things like repair/recovery of
> faulty nodes?
>
> Regards,
> Terje

Since all your data for a day/month/year would sit on the same server.
Meaning all your servers with old data would be idle and your servers
with current data would be very busy. This is probably not a good way
to go.

There is a ticket open for 0.8 for efficient node moves joins. It is
already a lot better in 0.7. Pretend you did not see this (you can
join nodes using rsync if you know some tricks) if you are really
afraid of joins, which you really should not be.

As for the 50% statement. In a worse case scenario a major compaction
will require double the disk size of your column family. So if you
have more then 1 column family you do NOT need 50% overhead.