You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Johan Elmerfjord <je...@adobe.com> on 2012/03/15 23:05:16 UTC

1.0.8 with Leveled compaction - Possible issues

Hi, I'm testing the community-version of Cassandra 1.0.8.
We are currently on 0.8.7 in our production-setup.

We have 3 Column Families that each takes between 20 and 35 GB on disk
per node. (8*2 nodes total)
We would like to change to Leveled Compaction - and even try compression
as well to reduce the space needed for compactions.
We are running on SSD-drives as latency is a key-issue.

As test I have imported one Column Family from 3 production-nodes to a 3
node test-cluster.
The data on the 3 nodes ranges from 19-33GB. (with at least one large
SSTable (Tiered size - recently compacted)).

After loading this data to the 3 test-nodes, and running scrub and
repair, I took a backup of the data so I have good test-set of data to
work on.
Then I changed changed to leveled compaction, using the cassandra-cli:

UPDATE COLUMN FAMILY TestCF1 WITH
compaction_strategy=LeveledCompactionStrategy;
I could see the change being written to the logfile on all nodes.

Then I don't know for for sure if I need to run anything else to make
the change happen - or if it's just to wait.
My test-cluster does not receive new data.
 
For this  KS & CF and on each of the nodes I have tried some or several
of: upgradesstable, scrub, compact, cleanup and repair - each task
taking between 40 minutes and 4 hours.
With the exception of compact that returns almost immediately with no
visible compactions made.

On some node I ended up with over 30000 files with the default 5MB size
for leveled compaction, on another node it didn't look like anything has
been done and I still have a 19GB SSTable.

I then made another change.
UPDATE COLUMN FAMILY TestCF1 WITH
compaction_strategy=LeveledCompactionStrategy AND
compaction_strategy_options=[{sstable_size_in_mb: 64}];
WARNING: [{}] strategy_options syntax is deprecated, please use {}
Which is probably wrong in the documentation - and should be:
UPDATE COLUMN FAMILY TestCF1 WITH
compaction_strategy=LeveledCompactionStrategy AND
compaction_strategy_options={sstable_size_in_mb: 64};

I think that we will be able to find the data in 3 searches with a 64MB
size - and still only use around 700MB while doing compactions - and
keep the number of files ~3000 per CF.

A few days later it looks like I still have a mix between original huge
SStables, 5MB once - and some nodes has 64MB files as well.
Do I need to do something special to clean this up?
I have tried another scrub /upgradesstables/clean - but nothing seems to
do any change to me.

Finally I have also tried to enable compression:
UPDATE COLUMN FAMILY TestCF1 WITH
compression_options=[{sstable_compression:SnappyCompressor,
chunk_length_kb:64}];
- which results in the same [{}] - warning.

As you can see below - this created CompressionInfo.db - files on some
nodes - but not on all.

Is there a way I can force Teired sstables to be converted into Leveled
once - and then to compression as well?
Why are the original file (Tiered Sized SSTables still present on
testnode1 - when is it supposed to delete them?

Can I change back and forth between compression (on/off - or chunksizes)
- and between Leveled vs Size Tiered compaction?
Is there a way to see if the node is done - or waiting for something?
When is it safe to apply another setting - does it have to complete one
reorg before moving on to the next?

Any input or own experiences are warmly welcome.

Best regards, Johan


Some lines of example directory-listings below.:

Some files for testnode 3. (looks like it's still have the original Size
Tiered files around, and a mixture of compressed 64MB files - and 5MB
files?

total 19G
drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots
-rw-r--r-- 1 cass cass 6.0G Mar 13 18:42 TestCF1-hc-6346-Index.db
-rw-r--r-- 1 cass cass 1.3M Mar 13 18:42 TestCF1-hc-6346-Filter.db
-rw-r--r-- 1 cass cass  13G Mar 13 18:42 TestCF1-hc-6346-Data.db
-rw-r--r-- 1 cass cass 2.4M Mar 13 18:42 TestCF1-hc-6346-CompressionInfo.db
-rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6346-Statistics.db
-rw-r--r-- 1 cass cass 195K Mar 13 18:42 TestCF1-hc-6347-Filter.db
-rw-r--r-- 1 cass cass 4.9M Mar 13 18:42 TestCF1-hc-6347-Index.db
-rw-r--r-- 1 cass cass 9.0M Mar 13 18:42 TestCF1-hc-6347-Data.db
-rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6347-Statistics.db
-rw-r--r-- 1 cass cass 2.0K Mar 13 18:42 TestCF1-hc-6347-CompressionInfo.db
-rw-r--r-- 1 cass cass  11K Mar 13 18:43 TestCF1-hc-6351-CompressionInfo.db
-rw-r--r-- 1 cass cass  52M Mar 13 18:43 TestCF1-hc-6351-Data.db
-rw-r--r-- 1 cass cass 1.1M Mar 13 18:43 TestCF1-hc-6351-Filter.db
-rw-r--r-- 1 cass cass  28M Mar 13 18:43 TestCF1-hc-6351-Index.db
-rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6351-Statistics.db
-rw-r--r-- 1 cass cass  401 Mar 13 18:43 TestCF1.json
-rw-r--r-- 1 cass cass  950 Mar 13 18:43 TestCF1-hc-6350-CompressionInfo.db
-rw-r--r-- 1 cass cass 4.3M Mar 13 18:43 TestCF1-hc-6350-Data.db
-rw-r--r-- 1 cass cass  93K Mar 13 18:43 TestCF1-hc-6350-Filter.db
-rw-r--r-- 1 cass cass 2.3M Mar 13 18:43 TestCF1-hc-6350-Index.db
-rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6350-Statistics.db
-rw-r--r-- 1 cass cass  400 Mar 13 18:43 TestCF1-old.json



Some DB-files ordered by size for testnode1:
No compressed files - but has the original Size-tiered files - as well
as 5MB Leveled compaction-files - but no 64 MB once.

total 83G
-rw-r--r-- 1 cass cass   33G Mar 13 07:14 TestCF1-hc-33504-Data.db
-rw-r--r-- 1 cass cass   11G Mar 13 07:14 TestCF1-hc-33504-Index.db
-rw-r--r-- 1 cass cass  407M Mar 13 07:14 TestCF1-hc-33504-Filter.db
-rw-r--r-- 1 cass cass  5.1M Mar 13 05:27 TestCF1-hc-33338-Data.db
-rw-r--r-- 1 cass cass  5.1M Mar 13 08:54 TestCF1-hc-38997-Data.db
-rw-r--r-- 1 cass cass  5.1M Mar 13 07:15 TestCF1-hc-33513-Data.db



-- 
  
Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe Systems,
Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46 735
101 444 | Jelmerfj@adobe.com 


Re: 1.0.8 with Leveled compaction - Possible issues

Posted by Watanabe Maki <wa...@gmail.com>.
The Cassandra team has been released new version every month last half year.
So I anticipate they will release 1.0.9 before April. Just my forecast:-)

maki


On 2012/03/16, at 22:41, Johan Elmerfjord <je...@adobe.com> wrote:

> Perfect.. this helped a lot - and I can confirm that I have run in to the same issue as described in:
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201203.mbox/%3CCALqbeQbQ=d-hORVhA-LHOo_a5j46fQrsZMm+OQgfkgR=4RRQJQ@mail.gmail.com%3E
> 
> Where it goes down when it tries to move up files to a higher level - that is out of bounds.
> 
> Nice that I could get a overview of the levels by looking in the .json-file as well.
> 
> Any timeframe on when we can expect 1.0.9 to be released?
> 
> /Johan
> 
> 
> -- 
>   
> Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe Systems, Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46 735 101 444 | Jelmerfj@adobe.com 
> 
> On Thu, 2012-03-15 at 17:00 -0700, Watanabe Maki wrote:
>> 
>> update column family with LCS option + upgradesstables should convert all of your sstables.
>> Set lig4j config:
>> org.apache.cassandra.db.compaction=DEBUG
>> in conf/log4j-server.properties and retry your procedure to find what is happen.
>> 
>> 
>> maki
>> 
>> 
>> 
>> On 2012/03/16, at 7:05, Johan Elmerfjord <je...@adobe.com> wrote:
>> 
>> 
>> 
>>> Hi, I'm testing the community-version of Cassandra 1.0.8.
>>> We are currently on 0.8.7 in our production-setup.
>>> 
>>> We have 3 Column Families that each takes between 20 and 35 GB on disk per node. (8*2 nodes total)
>>> We would like to change to Leveled Compaction - and even try compression as well to reduce the space needed for compactions.
>>> We are running on SSD-drives as latency is a key-issue.
>>> 
>>> As test I have imported one Column Family from 3 production-nodes to a 3 node test-cluster.
>>> The data on the 3 nodes ranges from 19-33GB. (with at least one large SSTable (Tiered size - recently compacted)).
>>> 
>>> After loading this data to the 3 test-nodes, and running scrub and repair, I took a backup of the data so I have good test-set of data to work on.
>>> Then I changed changed to leveled compaction, using the cassandra-cli:
>>> 
>>> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy;
>>> I could see the change being written to the logfile on all nodes.
>>> 
>>> Then I don't know for for sure if I need to run anything else to make the change happen - or if it's just to wait.
>>> My test-cluster does not receive new data.
>>> 
>>> For this  KS & CF and on each of the nodes I have tried some or several of: upgradesstable, scrub, compact, cleanup and repair - each task taking between 40 minutes and 4 hours.
>>> With the exception of compact that returns almost immediately with no visible compactions made.
>>> 
>>> On some node I ended up with over 30000 files with the default 5MB size for leveled compaction, on another node it didn't look like anything has been done and I still have a 19GB SSTable.
>>> 
>>> I then made another change.
>>> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options=[{sstable_size_in_mb: 64}];
>>> WARNING: [{}] strategy_options syntax is deprecated, please use {}
>>> Which is probably wrong in the documentation - and should be:
>>> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options={sstable_size_in_mb: 64};
>>> 
>>> I think that we will be able to find the data in 3 searches with a 64MB size - and still only use around 700MB while doing compactions - and keep the number of files ~3000 per CF.
>>> 
>>> A few days later it looks like I still have a mix between original huge SStables, 5MB once - and some nodes has 64MB files as well.
>>> Do I need to do something special to clean this up?
>>> I have tried another scrub /upgradesstables/clean - but nothing seems to do any change to me.
>>> 
>>> Finally I have also tried to enable compression:
>>> UPDATE COLUMN FAMILY TestCF1 WITH compression_options=[{sstable_compression:SnappyCompressor, chunk_length_kb:64}];
>>> - which results in the same [{}] - warning.
>>> 
>>> As you can see below - this created CompressionInfo.db - files on some nodes - but not on all.
>>> 
>>> Is there a way I can force Teired sstables to be converted into Leveled once - and then to compression as well?
>>> Why are the original file (Tiered Sized SSTables still present on testnode1 - when is it supposed to delete them?
>>> 
>>> Can I change back and forth between compression (on/off - or chunksizes) - and between Leveled vs Size Tiered compaction?
>>> Is there a way to see if the node is done - or waiting for something?
>>> When is it safe to apply another setting - does it have to complete one reorg before moving on to the next?
>>> 
>>> Any input or own experiences are warmly welcome.
>>> 
>>> Best regards, Johan
>>> 
>>> 
>>> Some lines of example directory-listings below.:
>>> 
>>> Some files for testnode 3. (looks like it's still have the original Size Tiered files around, and a mixture of compressed 64MB files - and 5MB files?
>>> total 19G
>>> drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots
>>> -rw-r--r-- 1 cass cass 6.0G Mar 13 18:42 TestCF1-hc-6346-Index.db
>>> -rw-r--r-- 1 cass cass 1.3M Mar 13 18:42 TestCF1-hc-6346-Filter.db
>>> -rw-r--r-- 1 cass cass  13G Mar 13 18:42 TestCF1-hc-6346-Data.db
>>> -rw-r--r-- 1 cass cass 2.4M Mar 13 18:42 TestCF1-hc-6346-CompressionInfo.db
>>> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6346-Statistics.db
>>> -rw-r--r-- 1 cass cass 195K Mar 13 18:42 TestCF1-hc-6347-Filter.db
>>> -rw-r--r-- 1 cass cass 4.9M Mar 13 18:42 TestCF1-hc-6347-Index.db
>>> -rw-r--r-- 1 cass cass 9.0M Mar 13 18:42 TestCF1-hc-6347-Data.db
>>> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6347-Statistics.db
>>> -rw-r--r-- 1 cass cass 2.0K Mar 13 18:42 TestCF1-hc-6347-CompressionInfo.db
>>> -rw-r--r-- 1 cass cass  11K Mar 13 18:43 TestCF1-hc-6351-CompressionInfo.db
>>> -rw-r--r-- 1 cass cass  52M Mar 13 18:43 TestCF1-hc-6351-Data.db
>>> -rw-r--r-- 1 cass cass 1.1M Mar 13 18:43 TestCF1-hc-6351-Filter.db
>>> -rw-r--r-- 1 cass cass  28M Mar 13 18:43 TestCF1-hc-6351-Index.db
>>> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6351-Statistics.db
>>> -rw-r--r-- 1 cass cass  401 Mar 13 18:43 TestCF1.json
>>> -rw-r--r-- 1 cass cass  950 Mar 13 18:43 TestCF1-hc-6350-CompressionInfo.db
>>> -rw-r--r-- 1 cass cass 4.3M Mar 13 18:43 TestCF1-hc-6350-Data.db
>>> -rw-r--r-- 1 cass cass  93K Mar 13 18:43 TestCF1-hc-6350-Filter.db
>>> -rw-r--r-- 1 cass cass 2.3M Mar 13 18:43 TestCF1-hc-6350-Index.db
>>> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6350-Statistics.db
>>> -rw-r--r-- 1 cass cass  400 Mar 13 18:43 TestCF1-old.json
>>> 
>>> 
>>> Some DB-files ordered by size for testnode1:
>>> No compressed files - but has the original Size-tiered files - as well as 5MB Leveled compaction-files - but no 64 MB once.
>>> total 83G
>>> -rw-r--r-- 1 cass cass   33G Mar 13 07:14 TestCF1-hc-33504-Data.db
>>> -rw-r--r-- 1 cass cass   11G Mar 13 07:14 TestCF1-hc-33504-Index.db
>>> -rw-r--r-- 1 cass cass  407M Mar 13 07:14 TestCF1-hc-33504-Filter.db
>>> -rw-r--r-- 1 cass cass  5.1M Mar 13 05:27 TestCF1-hc-33338-Data.db
>>> -rw-r--r-- 1 cass cass  5.1M Mar 13 08:54 TestCF1-hc-38997-Data.db
>>> -rw-r--r-- 1 cass cass  5.1M Mar 13 07:15 TestCF1-hc-33513-Data.db
>>> 
>>> 
>>> -- 
>>>   
>>> Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe Systems, Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46 735 101 444 | Jelmerfj@adobe.com 
>>> 
>>> 

Re: 1.0.8 with Leveled compaction - Possible issues

Posted by Johan Elmerfjord <je...@adobe.com>.
Perfect.. this helped a lot - and I can confirm that I have run in to
the same issue as described in:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201203.mbox/%
3CCALqbeQbQ=d-hORVhA-LHOo_a5j46fQrsZMm+OQgfkgR=4RRQJQ@mail.gmail.com%3E

Where it goes down when it tries to move up files to a higher level -
that is out of bounds.

Nice that I could get a overview of the levels by looking in
the .json-file as well.

Any timeframe on when we can expect 1.0.9 to be released?

/Johan


-- 
  
Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe Systems,
Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46 735
101 444 | Jelmerfj@adobe.com 

On Thu, 2012-03-15 at 17:00 -0700, Watanabe Maki wrote:
> update column family with LCS option + upgradesstables should convert
> all of your sstables.
> Set lig4j config:
> org.apache.cassandra.db.compaction=DEBUG
> in conf/log4j-server.properties and retry your procedure to find what
> is happen.
> 
> 
> maki
> 
> 
> 
> On 2012/03/16, at 7:05, Johan Elmerfjord <je...@adobe.com> wrote:
> 
> 
> 
> > Hi, I'm testing the community-version of Cassandra 1.0.8.
> > We are currently on 0.8.7 in our production-setup.
> > 
> > We have 3 Column Families that each takes between 20 and 35 GB on
> > disk per node. (8*2 nodes total)
> > We would like to change to Leveled Compaction - and even try
> > compression as well to reduce the space needed for compactions.
> > We are running on SSD-drives as latency is a key-issue.
> > 
> > As test I have imported one Column Family from 3 production-nodes to
> > a 3 node test-cluster.
> > The data on the 3 nodes ranges from 19-33GB. (with at least one
> > large SSTable (Tiered size - recently compacted)).
> > 
> > After loading this data to the 3 test-nodes, and running scrub and
> > repair, I took a backup of the data so I have good test-set of data
> > to work on.
> > Then I changed changed to leveled compaction, using the
> > cassandra-cli:
> > 
> > UPDATE COLUMN FAMILY TestCF1 WITH
> > compaction_strategy=LeveledCompactionStrategy;
> > I could see the change being written to the logfile on all nodes.
> > 
> > Then I don't know for for sure if I need to run anything else to
> > make the change happen - or if it's just to wait.
> > My test-cluster does not receive new data.
> > 
> > For this  KS & CF and on each of the nodes I have tried some or
> > several of: upgradesstable, scrub, compact, cleanup and repair -
> > each task taking between 40 minutes and 4 hours.
> > With the exception of compact that returns almost immediately with
> > no visible compactions made.
> > 
> > On some node I ended up with over 30000 files with the default 5MB
> > size for leveled compaction, on another node it didn't look like
> > anything has been done and I still have a 19GB SSTable.
> > 
> > I then made another change.
> > UPDATE COLUMN FAMILY TestCF1 WITH
> > compaction_strategy=LeveledCompactionStrategy AND
> > compaction_strategy_options=[{sstable_size_in_mb: 64}];
> > WARNING: [{}] strategy_options syntax is deprecated, please use {}
> > Which is probably wrong in the documentation - and should be:
> > UPDATE COLUMN FAMILY TestCF1 WITH
> > compaction_strategy=LeveledCompactionStrategy AND
> > compaction_strategy_options={sstable_size_in_mb: 64};
> > 
> > I think that we will be able to find the data in 3 searches with a
> > 64MB size - and still only use around 700MB while doing compactions
> > - and keep the number of files ~3000 per CF.
> > 
> > A few days later it looks like I still have a mix between original
> > huge SStables, 5MB once - and some nodes has 64MB files as well.
> > Do I need to do something special to clean this up?
> > I have tried another scrub /upgradesstables/clean - but nothing
> > seems to do any change to me.
> > 
> > Finally I have also tried to enable compression:
> > UPDATE COLUMN FAMILY TestCF1 WITH
> > compression_options=[{sstable_compression:SnappyCompressor,
> > chunk_length_kb:64}];
> > - which results in the same [{}] - warning.
> > 
> > As you can see below - this created CompressionInfo.db - files on
> > some nodes - but not on all.
> > 
> > Is there a way I can force Teired sstables to be converted into
> > Leveled once - and then to compression as well?
> > Why are the original file (Tiered Sized SSTables still present on
> > testnode1 - when is it supposed to delete them?
> > 
> > Can I change back and forth between compression (on/off - or
> > chunksizes) - and between Leveled vs Size Tiered compaction?
> > Is there a way to see if the node is done - or waiting for
> > something?
> > When is it safe to apply another setting - does it have to complete
> > one reorg before moving on to the next?
> > 
> > Any input or own experiences are warmly welcome.
> > 
> > Best regards, Johan
> > 
> > 
> > Some lines of example directory-listings below.:
> > 
> > Some files for testnode 3. (looks like it's still have the original
> > Size Tiered files around, and a mixture of compressed 64MB files -
> > and 5MB files? 
> > 
> > total 19G
> > drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots
> > -rw-r--r-- 1 cass cass 6.0G Mar 13 18:42 TestCF1-hc-6346-Index.db
> > -rw-r--r-- 1 cass cass 1.3M Mar 13 18:42 TestCF1-hc-6346-Filter.db
> > -rw-r--r-- 1 cass cass  13G Mar 13 18:42 TestCF1-hc-6346-Data.db
> > -rw-r--r-- 1 cass cass 2.4M Mar 13 18:42 TestCF1-hc-6346-CompressionInfo.db
> > -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6346-Statistics.db
> > -rw-r--r-- 1 cass cass 195K Mar 13 18:42 TestCF1-hc-6347-Filter.db
> > -rw-r--r-- 1 cass cass 4.9M Mar 13 18:42 TestCF1-hc-6347-Index.db
> > -rw-r--r-- 1 cass cass 9.0M Mar 13 18:42 TestCF1-hc-6347-Data.db
> > -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6347-Statistics.db
> > -rw-r--r-- 1 cass cass 2.0K Mar 13 18:42 TestCF1-hc-6347-CompressionInfo.db
> > -rw-r--r-- 1 cass cass  11K Mar 13 18:43 TestCF1-hc-6351-CompressionInfo.db
> > -rw-r--r-- 1 cass cass  52M Mar 13 18:43 TestCF1-hc-6351-Data.db
> > -rw-r--r-- 1 cass cass 1.1M Mar 13 18:43 TestCF1-hc-6351-Filter.db
> > -rw-r--r-- 1 cass cass  28M Mar 13 18:43 TestCF1-hc-6351-Index.db
> > -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6351-Statistics.db
> > -rw-r--r-- 1 cass cass  401 Mar 13 18:43 TestCF1.json
> > -rw-r--r-- 1 cass cass  950 Mar 13 18:43 TestCF1-hc-6350-CompressionInfo.db
> > -rw-r--r-- 1 cass cass 4.3M Mar 13 18:43 TestCF1-hc-6350-Data.db
> > -rw-r--r-- 1 cass cass  93K Mar 13 18:43 TestCF1-hc-6350-Filter.db
> > -rw-r--r-- 1 cass cass 2.3M Mar 13 18:43 TestCF1-hc-6350-Index.db
> > -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6350-Statistics.db
> > -rw-r--r-- 1 cass cass  400 Mar 13 18:43 TestCF1-old.json
> > 
> > 
> > 
> > Some DB-files ordered by size for testnode1:
> > No compressed files - but has the original Size-tiered files - as
> > well as 5MB Leveled compaction-files - but no 64 MB once. 
> > 
> > total 83G
> > -rw-r--r-- 1 cass cass   33G Mar 13 07:14 TestCF1-hc-33504-Data.db
> > -rw-r--r-- 1 cass cass   11G Mar 13 07:14 TestCF1-hc-33504-Index.db
> > -rw-r--r-- 1 cass cass  407M Mar 13 07:14 TestCF1-hc-33504-Filter.db
> > -rw-r--r-- 1 cass cass  5.1M Mar 13 05:27 TestCF1-hc-33338-Data.db
> > -rw-r--r-- 1 cass cass  5.1M Mar 13 08:54 TestCF1-hc-38997-Data.db
> > -rw-r--r-- 1 cass cass  5.1M Mar 13 07:15 TestCF1-hc-33513-Data.db
> > 
> > 
> > 
> > -- 
> >   
> > Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe
> > Systems, Product Technical Operations | p. +45 3231 6008 | x86008 |
> > cell. +46 735 101 444 | Jelmerfj@adobe.com 
> > 
> > 

Re: 1.0.8 with Leveled compaction - Possible issues

Posted by Watanabe Maki <wa...@gmail.com>.
update column family with LCS option + upgradesstables should convert all of your sstables.
Set lig4j config:
org.apache.cassandra.db.compaction=DEBUG
in conf/log4j-server.properties and retry your procedure to find what is happen.

maki


On 2012/03/16, at 7:05, Johan Elmerfjord <je...@adobe.com> wrote:

> Hi, I'm testing the community-version of Cassandra 1.0.8.
> We are currently on 0.8.7 in our production-setup.
> 
> We have 3 Column Families that each takes between 20 and 35 GB on disk per node. (8*2 nodes total)
> We would like to change to Leveled Compaction - and even try compression as well to reduce the space needed for compactions.
> We are running on SSD-drives as latency is a key-issue.
> 
> As test I have imported one Column Family from 3 production-nodes to a 3 node test-cluster.
> The data on the 3 nodes ranges from 19-33GB. (with at least one large SSTable (Tiered size - recently compacted)).
> 
> After loading this data to the 3 test-nodes, and running scrub and repair, I took a backup of the data so I have good test-set of data to work on.
> Then I changed changed to leveled compaction, using the cassandra-cli:
> 
> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy;
> I could see the change being written to the logfile on all nodes.
> 
> Then I don't know for for sure if I need to run anything else to make the change happen - or if it's just to wait.
> My test-cluster does not receive new data.
> 
> For this  KS & CF and on each of the nodes I have tried some or several of: upgradesstable, scrub, compact, cleanup and repair - each task taking between 40 minutes and 4 hours.
> With the exception of compact that returns almost immediately with no visible compactions made.
> 
> On some node I ended up with over 30000 files with the default 5MB size for leveled compaction, on another node it didn't look like anything has been done and I still have a 19GB SSTable.
> 
> I then made another change.
> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options=[{sstable_size_in_mb: 64}];
> WARNING: [{}] strategy_options syntax is deprecated, please use {}
> Which is probably wrong in the documentation - and should be:
> UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options={sstable_size_in_mb: 64};
> 
> I think that we will be able to find the data in 3 searches with a 64MB size - and still only use around 700MB while doing compactions - and keep the number of files ~3000 per CF.
> 
> A few days later it looks like I still have a mix between original huge SStables, 5MB once - and some nodes has 64MB files as well.
> Do I need to do something special to clean this up?
> I have tried another scrub /upgradesstables/clean - but nothing seems to do any change to me.
> 
> Finally I have also tried to enable compression:
> UPDATE COLUMN FAMILY TestCF1 WITH compression_options=[{sstable_compression:SnappyCompressor, chunk_length_kb:64}];
> - which results in the same [{}] - warning.
> 
> As you can see below - this created CompressionInfo.db - files on some nodes - but not on all.
> 
> Is there a way I can force Teired sstables to be converted into Leveled once - and then to compression as well?
> Why are the original file (Tiered Sized SSTables still present on testnode1 - when is it supposed to delete them?
> 
> Can I change back and forth between compression (on/off - or chunksizes) - and between Leveled vs Size Tiered compaction?
> Is there a way to see if the node is done - or waiting for something?
> When is it safe to apply another setting - does it have to complete one reorg before moving on to the next?
> 
> Any input or own experiences are warmly welcome.
> 
> Best regards, Johan
> 
> 
> Some lines of example directory-listings below.:
> 
> Some files for testnode 3. (looks like it's still have the original Size Tiered files around, and a mixture of compressed 64MB files - and 5MB files?
> total 19G
> drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots
> -rw-r--r-- 1 cass cass 6.0G Mar 13 18:42 TestCF1-hc-6346-Index.db
> -rw-r--r-- 1 cass cass 1.3M Mar 13 18:42 TestCF1-hc-6346-Filter.db
> -rw-r--r-- 1 cass cass  13G Mar 13 18:42 TestCF1-hc-6346-Data.db
> -rw-r--r-- 1 cass cass 2.4M Mar 13 18:42 TestCF1-hc-6346-CompressionInfo.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6346-Statistics.db
> -rw-r--r-- 1 cass cass 195K Mar 13 18:42 TestCF1-hc-6347-Filter.db
> -rw-r--r-- 1 cass cass 4.9M Mar 13 18:42 TestCF1-hc-6347-Index.db
> -rw-r--r-- 1 cass cass 9.0M Mar 13 18:42 TestCF1-hc-6347-Data.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6347-Statistics.db
> -rw-r--r-- 1 cass cass 2.0K Mar 13 18:42 TestCF1-hc-6347-CompressionInfo.db
> -rw-r--r-- 1 cass cass  11K Mar 13 18:43 TestCF1-hc-6351-CompressionInfo.db
> -rw-r--r-- 1 cass cass  52M Mar 13 18:43 TestCF1-hc-6351-Data.db
> -rw-r--r-- 1 cass cass 1.1M Mar 13 18:43 TestCF1-hc-6351-Filter.db
> -rw-r--r-- 1 cass cass  28M Mar 13 18:43 TestCF1-hc-6351-Index.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6351-Statistics.db
> -rw-r--r-- 1 cass cass  401 Mar 13 18:43 TestCF1.json
> -rw-r--r-- 1 cass cass  950 Mar 13 18:43 TestCF1-hc-6350-CompressionInfo.db
> -rw-r--r-- 1 cass cass 4.3M Mar 13 18:43 TestCF1-hc-6350-Data.db
> -rw-r--r-- 1 cass cass  93K Mar 13 18:43 TestCF1-hc-6350-Filter.db
> -rw-r--r-- 1 cass cass 2.3M Mar 13 18:43 TestCF1-hc-6350-Index.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6350-Statistics.db
> -rw-r--r-- 1 cass cass  400 Mar 13 18:43 TestCF1-old.json
> 
> 
> Some DB-files ordered by size for testnode1:
> No compressed files - but has the original Size-tiered files - as well as 5MB Leveled compaction-files - but no 64 MB once.
> total 83G
> -rw-r--r-- 1 cass cass   33G Mar 13 07:14 TestCF1-hc-33504-Data.db
> -rw-r--r-- 1 cass cass   11G Mar 13 07:14 TestCF1-hc-33504-Index.db
> -rw-r--r-- 1 cass cass  407M Mar 13 07:14 TestCF1-hc-33504-Filter.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 05:27 TestCF1-hc-33338-Data.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 08:54 TestCF1-hc-38997-Data.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 07:15 TestCF1-hc-33513-Data.db
> 
> 
> -- 
>   
> Johan Elmerfjord | Sr. Systems Administration/Mgr, EMEA | Adobe Systems, Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46 735 101 444 | Jelmerfj@adobe.com 
> 

Re: 1.0.8 with Leveled compaction - Possible issues

Posted by Thomas van Neerijnen <to...@bossastudios.com>.
Heya

I'd suggest staying away from Leveled Compaction until 1.0.9.
For the why see this great explanation I got from Maki Watanabe on the
list:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201203.mbox/%3CCALqbeQbQ=d-hORVhA-LHOo_a5j46fQrsZMm+OQgfkgR=4RRQJQ@mail.gmail.com%3E
Keep an eye on that one because I'm busy testing one of his suggestions,
I'll post back with the results soon.

My understanding is after a change in compaction or compression, until you
run an upgradesstables on all the nodes the current sstables will have the
old schema settings, only new ones get the new format. Obviously this
compounds the issue I mentioned above tho.
Be warned, an upgradesstables can take a long time so maybe keep an eye on
the number of files around vs over 5MB to get an idea of progress. Maybe
someone else knows a better way?

You can change back and forth between compression and compaction options
quite safely, but again you need an upgradesstables to remove it from
current sstables.

In my experience I've safely applied compression and leveled compaction to
the same CF at the same time without issue so I guess it's ok:)

On Thu, Mar 15, 2012 at 10:05 PM, Johan Elmerfjord <je...@adobe.com>wrote:

> **
> Hi, I'm testing the community-version of Cassandra 1.0.8.
> We are currently on 0.8.7 in our production-setup.
>
> We have 3 Column Families that each takes between 20 and 35 GB on disk per
> node. (8*2 nodes total)
> We would like to change to Leveled Compaction - and even try compression
> as well to reduce the space needed for compactions.
> We are running on SSD-drives as latency is a key-issue.
>
> As test I have imported one Column Family from 3 production-nodes to a 3
> node test-cluster.
> The data on the 3 nodes ranges from 19-33GB. (with at least one large
> SSTable (Tiered size - recently compacted)).
>
> After loading this data to the 3 test-nodes, and running scrub and repair,
> I took a backup of the data so I have good test-set of data to work on.
> Then I changed changed to leveled compaction, using the cassandra-cli:
>
> UPDATE COLUMN FAMILY TestCF1 WITH
> compaction_strategy=LeveledCompactionStrategy;
> I could see the change being written to the logfile on all nodes.
>
> Then I don't know for for sure if I need to run anything else to make the
> change happen - or if it's just to wait.
> My test-cluster does not receive new data.
>
> For this  KS & CF and on each of the nodes I have tried some or several
> of: upgradesstable, scrub, compact, cleanup and repair - each task taking
> between 40 minutes and 4 hours.
> With the exception of compact that returns almost immediately with no
> visible compactions made.
>
> On some node I ended up with over 30000 files with the default 5MB size
> for leveled compaction, on another node it didn't look like anything has
> been done and I still have a 19GB SSTable.
>
> I then made another change.
> UPDATE COLUMN FAMILY TestCF1 WITH
> compaction_strategy=LeveledCompactionStrategy AND
> compaction_strategy_options=[{sstable_size_in_mb: 64}];
> WARNING: [{}] strategy_options syntax is deprecated, please use {}
> Which is probably wrong in the documentation - and should be:
> UPDATE COLUMN FAMILY TestCF1 WITH
> compaction_strategy=LeveledCompactionStrategy AND
> compaction_strategy_options={sstable_size_in_mb: 64};
>
> I think that we will be able to find the data in 3 searches with a 64MB
> size - and still only use around 700MB while doing compactions - and keep
> the number of files ~3000 per CF.
>
> A few days later it looks like I still have a mix between original huge
> SStables, 5MB once - and some nodes has 64MB files as well.
> Do I need to do something special to clean this up?
> I have tried another scrub /upgradesstables/clean - but nothing seems to
> do any change to me.
>
> Finally I have also tried to enable compression:
> UPDATE COLUMN FAMILY TestCF1 WITH
> compression_options=[{sstable_compression:SnappyCompressor,
> chunk_length_kb:64}];
> - which results in the same [{}] - warning.
>
> As you can see below - this created CompressionInfo.db - files on some
> nodes - but not on all.
>
> *Is there a way I can force Teired sstables to be converted into Leveled
> once - and then to compression as well?*
> *Why are the original file (Tiered Sized SSTables still present on
> testnode1 - when is it supposed to delete them?*
>
> *Can I change back and forth between compression (on/off - or chunksizes)
> - and between Leveled vs Size Tiered compaction?*
> *Is there a way to see if the node is done - or waiting for something?*
> *When is it safe to apply another setting - does it have to complete one
> reorg before moving on to the next?*
>
> *Any input or own experiences are warmly welcome.*
>
> Best regards, Johan
>
>
> Some lines of example directory-listings below.:
>
> Some files for testnode 3. (looks like it's still have the original Size
> Tiered files around, and a mixture of compressed 64MB files - and 5MB
> files?
>
> total 19G
> drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots
> -rw-r--r-- 1 cass cass 6.0G Mar 13 18:42 TestCF1-hc-6346-Index.db
> -rw-r--r-- 1 cass cass 1.3M Mar 13 18:42 TestCF1-hc-6346-Filter.db
> -rw-r--r-- 1 cass cass  13G Mar 13 18:42 TestCF1-hc-6346-Data.db
> -rw-r--r-- 1 cass cass 2.4M Mar 13 18:42 TestCF1-hc-6346-CompressionInfo.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6346-Statistics.db
> -rw-r--r-- 1 cass cass 195K Mar 13 18:42 TestCF1-hc-6347-Filter.db
> -rw-r--r-- 1 cass cass 4.9M Mar 13 18:42 TestCF1-hc-6347-Index.db
> -rw-r--r-- 1 cass cass 9.0M Mar 13 18:42 TestCF1-hc-6347-Data.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:42 TestCF1-hc-6347-Statistics.db
> -rw-r--r-- 1 cass cass 2.0K Mar 13 18:42 TestCF1-hc-6347-CompressionInfo.db
> -rw-r--r-- 1 cass cass  11K Mar 13 18:43 TestCF1-hc-6351-CompressionInfo.db
> -rw-r--r-- 1 cass cass  52M Mar 13 18:43 TestCF1-hc-6351-Data.db
> -rw-r--r-- 1 cass cass 1.1M Mar 13 18:43 TestCF1-hc-6351-Filter.db
> -rw-r--r-- 1 cass cass  28M Mar 13 18:43 TestCF1-hc-6351-Index.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6351-Statistics.db
> -rw-r--r-- 1 cass cass  401 Mar 13 18:43 TestCF1.json
> -rw-r--r-- 1 cass cass  950 Mar 13 18:43 TestCF1-hc-6350-CompressionInfo.db
> -rw-r--r-- 1 cass cass 4.3M Mar 13 18:43 TestCF1-hc-6350-Data.db
> -rw-r--r-- 1 cass cass  93K Mar 13 18:43 TestCF1-hc-6350-Filter.db
> -rw-r--r-- 1 cass cass 2.3M Mar 13 18:43 TestCF1-hc-6350-Index.db
> -rw-r--r-- 1 cass cass 4.3K Mar 13 18:43 TestCF1-hc-6350-Statistics.db
> -rw-r--r-- 1 cass cass  400 Mar 13 18:43 TestCF1-old.json
>
>
>
> Some DB-files ordered by size for testnode1:
> No compressed files - but has the original Size-tiered files - as well as
> 5MB Leveled compaction-files - but no 64 MB once.
>
> total 83G
> -rw-r--r-- 1 cass cass   33G Mar 13 07:14 TestCF1-hc-33504-Data.db
> -rw-r--r-- 1 cass cass   11G Mar 13 07:14 TestCF1-hc-33504-Index.db
> -rw-r--r-- 1 cass cass  407M Mar 13 07:14 TestCF1-hc-33504-Filter.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 05:27 TestCF1-hc-33338-Data.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 08:54 TestCF1-hc-38997-Data.db
> -rw-r--r-- 1 cass cass  5.1M Mar 13 07:15 TestCF1-hc-33513-Data.db
>
>
>
>   --
>
> *Johan Elmerfjord* | Sr. Systems Administration/Mgr, EMEA | Adobe
> Systems, Product Technical Operations | p. +45 3231 6008 | x86008 | cell. +46
> 735 101 444 | Jelmerfj@adobe.com
>
>