You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vicent Llongo <vi...@gmail.com> on 2014/04/06 13:48:00 UTC

Timeseries with TTL

Hi there,

I have this table where I'm inserting timeseries values with a TTL of
86400*7 (1week):

CREATE TABLE metrics_5min (
  object_id varchar,
  metric varchar,
  ts timestamp,
  val double,
  PRIMARY KEY ((object_id, metric), ts)
)
WITH gc_grace_seconds = 86400
AND compaction = {'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb' : 100};


With this structure is obvious that after one week inserting data, from
that moment there's gonna be new expired columns every 5 minutes in that
table. Because of that I've noticed that this table is being compacted
every 5 minutes.

Is that a good pattern for Cassandra? Is there some compaction tunings I
should take into account?

Thanks!

Re: Timeseries with TTL

Posted by "Laing, Michael" <mi...@nytimes.com>.
Since you are using LeveledCompactionStrategy there is no major/minor
compaction - just compaction.

Leveled compaction does more work - your logs don't look unreasonable to me
- the real question is whether your nodes can keep up w the IO. SSDs work
best.

BTW if you never delete and only ttl your values at a constant value, you
can set gc=0 and forget about periodic repair of the table, saving some
space, IO, CPU, and an operational step.

If your nodes cannot keep up the IO, switch to SizeTieredCompaction and
monitor read response times. Or add SSDs.

In my experience, for smallish nodes running C* 2 without SSDs,
LeveledCompactionStrategy
can cause the disk cache to churn, reducing read performance substantially.
So watch out for that.

Good luck,

Michael


On Sun, Apr 6, 2014 at 10:25 AM, Vicent Llongo <vi...@gmail.com> wrote:

> Hi,
>
> Most of the queries to that table are just getting a range of values for a
> metric:
> SELECT val FROM metrics_5min WHERE uid = ? AND metric = ? AND ts >= ? AND
> ts <= ?
>
> I'm not sure from the logs what kind of compactions they are. This is what
> I see in system.log (grepping for that specific table):
>
> ...
> INFO [CompactionExecutor:742] 2014-04-06 13:30:11,223 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14991-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14990-Data.db')]
> INFO [CompactionExecutor:753] 2014-04-06 13:35:22,495 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14992-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14993-Data.db')]
> INFO [CompactionExecutor:770] 2014-04-06 13:41:09,146 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14995-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14994-Data.db')]
> INFO [CompactionExecutor:783] 2014-04-06 13:46:21,250 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14996-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14997-Data.db')]
> INFO [CompactionExecutor:798] 2014-04-06 13:51:28,369 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14998-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14999-Data.db')]
> INFO [CompactionExecutor:816] 2014-04-06 13:57:17,585 CompactionTask.java
> (line 105) Compacting
> [SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15000-Data.db'),
> SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15001-Data.db')]
> ...
>
> As you can see every ~5 minutes there's a compaction going on.
>
>
>
>
> On Sun, Apr 6, 2014 at 4:33 PM, Sergey Murylev <se...@gmail.com>wrote:
>
>>  Hi Vincent,
>>
>>
>> Is that a good pattern for Cassandra? Is there some compaction tunings I
>> should take into account?
>>
>> Actually it depends on how you use Cassandra :). If you use it as
>> key-value storage TTL works fine. But if you would use rather complex CQL
>> queries to this table I not sure that it would be good.
>>
>>
>> With this structure is obvious that after one week inserting data, from
>> that moment there's gonna be new expired columns every 5 minutes in that
>> table. Because of that I've noticed that this table is being compacted
>> every 5 minutes.
>>
>> Compaction doesn't triggered when some column expired. It triggered on
>> gc_grace_seconds timeout and according compaction strategy. You can see
>> more detailed description of LeveledCompactionStrategy in following
>> article: Leveled compaction in Cassandra<http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra>.
>>
>>
>> There are 2 types of compaction: minor and major, which kind of
>> compaction do you see and how come to conclusion that compaction triggered
>> every 5 minutes? If you see major compaction that situation is very bad
>> otherwise it is normal case.
>>
>> --
>> Thanks,
>> Sergey
>>
>>
>>
>> On 06/04/14 15:48, Vicent Llongo wrote:
>>
>>  Hi there,
>>
>>  I have this table where I'm inserting timeseries values with a TTL of
>> 86400*7 (1week):
>>
>> CREATE TABLE metrics_5min (
>>   object_id varchar,
>>   metric varchar,
>>   ts timestamp,
>>   val double,
>>   PRIMARY KEY ((object_id, metric), ts)
>> )
>> WITH gc_grace_seconds = 86400
>> AND compaction = {'class': 'LeveledCompactionStrategy',
>> 'sstable_size_in_mb' : 100};
>>
>>
>>  With this structure is obvious that after one week inserting data, from
>> that moment there's gonna be new expired columns every 5 minutes in that
>> table. Because of that I've noticed that this table is being compacted
>> every 5 minutes.
>>
>> Is that a good pattern for Cassandra? Is there some compaction tunings I
>> should take into account?
>>
>>  Thanks!
>>
>>
>>
>>
>

Re: Timeseries with TTL

Posted by Vicent Llongo <vi...@gmail.com>.
Hi,

Most of the queries to that table are just getting a range of values for a
metric:
SELECT val FROM metrics_5min WHERE uid = ? AND metric = ? AND ts >= ? AND
ts <= ?

I'm not sure from the logs what kind of compactions they are. This is what
I see in system.log (grepping for that specific table):

...
INFO [CompactionExecutor:742] 2014-04-06 13:30:11,223 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14991-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14990-Data.db')]
INFO [CompactionExecutor:753] 2014-04-06 13:35:22,495 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14992-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14993-Data.db')]
INFO [CompactionExecutor:770] 2014-04-06 13:41:09,146 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14995-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14994-Data.db')]
INFO [CompactionExecutor:783] 2014-04-06 13:46:21,250 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14996-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14997-Data.db')]
INFO [CompactionExecutor:798] 2014-04-06 13:51:28,369 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14998-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14999-Data.db')]
INFO [CompactionExecutor:816] 2014-04-06 13:57:17,585 CompactionTask.java
(line 105) Compacting
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15000-Data.db'),
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-15001-Data.db')]
...

As you can see every ~5 minutes there's a compaction going on.




On Sun, Apr 6, 2014 at 4:33 PM, Sergey Murylev <se...@gmail.com>wrote:

>  Hi Vincent,
>
>
> Is that a good pattern for Cassandra? Is there some compaction tunings I
> should take into account?
>
> Actually it depends on how you use Cassandra :). If you use it as
> key-value storage TTL works fine. But if you would use rather complex CQL
> queries to this table I not sure that it would be good.
>
>
> With this structure is obvious that after one week inserting data, from
> that moment there's gonna be new expired columns every 5 minutes in that
> table. Because of that I've noticed that this table is being compacted
> every 5 minutes.
>
> Compaction doesn't triggered when some column expired. It triggered on
> gc_grace_seconds timeout and according compaction strategy. You can see
> more detailed description of LeveledCompactionStrategy in following
> article: Leveled compaction in Cassandra<http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra>.
>
>
> There are 2 types of compaction: minor and major, which kind of compaction
> do you see and how come to conclusion that compaction triggered every 5
> minutes? If you see major compaction that situation is very bad otherwise
> it is normal case.
>
> --
> Thanks,
> Sergey
>
>
>
> On 06/04/14 15:48, Vicent Llongo wrote:
>
>  Hi there,
>
>  I have this table where I'm inserting timeseries values with a TTL of
> 86400*7 (1week):
>
> CREATE TABLE metrics_5min (
>   object_id varchar,
>   metric varchar,
>   ts timestamp,
>   val double,
>   PRIMARY KEY ((object_id, metric), ts)
> )
> WITH gc_grace_seconds = 86400
> AND compaction = {'class': 'LeveledCompactionStrategy',
> 'sstable_size_in_mb' : 100};
>
>
>  With this structure is obvious that after one week inserting data, from
> that moment there's gonna be new expired columns every 5 minutes in that
> table. Because of that I've noticed that this table is being compacted
> every 5 minutes.
>
> Is that a good pattern for Cassandra? Is there some compaction tunings I
> should take into account?
>
>  Thanks!
>
>
>
>

Re: Timeseries with TTL

Posted by Sergey Murylev <se...@gmail.com>.
Hi Vincent,

> Is that a good pattern for Cassandra? Is there some compaction tunings
> I should take into account?
Actually it depends on how you use Cassandra :). If you use it as
key-value storage TTL works fine. But if you would use rather complex
CQL queries to this table I not sure that it would be good.

> With this structure is obvious that after one week inserting data,
> from that moment there's gonna be new expired columns every 5 minutes
> in that table. Because of that I've noticed that this table is being
> compacted every 5 minutes.
Compaction doesn't triggered when some column expired. It triggered on
gc_grace_seconds timeout and according compaction strategy. You can see
more detailed description of LeveledCompactionStrategy in following
article: Leveled compaction in Cassandra
<http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra>.

There are 2 types of compaction: minor and major, which kind of
compaction do you see and how come to conclusion that compaction
triggered every 5 minutes? If you see major compaction that situation is
very bad otherwise it is normal case.

--
Thanks,
Sergey


On 06/04/14 15:48, Vicent Llongo wrote:
> Hi there,
>
> I have this table where I'm inserting timeseries values with a TTL of
> 86400*7 (1week):
>
> CREATE TABLE metrics_5min (
>   object_id varchar,
>   metric varchar,
>   ts timestamp,
>   val double,
>   PRIMARY KEY ((object_id, metric), ts)
> )
> WITH gc_grace_seconds = 86400
> AND compaction = {'class': 'LeveledCompactionStrategy',
> 'sstable_size_in_mb' : 100};
>
>
> With this structure is obvious that after one week inserting data,
> from that moment there's gonna be new expired columns every 5 minutes
> in that table. Because of that I've noticed that this table is being
> compacted every 5 minutes.
>
> Is that a good pattern for Cassandra? Is there some compaction tunings
> I should take into account?
>
> Thanks!
>
>