You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeff Jirsa (JIRA)" <ji...@apache.org> on 2018/02/14 08:25:00 UTC
[jira] [Commented] (CASSANDRA-14229) Separate data drive for Index.db files

    [ https://issues.apache.org/jira/browse/CASSANDRA-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363636#comment-16363636 ] 

Jeff Jirsa commented on CASSANDRA-14229:
----------------------------------------

Sorta dove-tails in with CASSANDRA-8460 - similar purpose, different selection criteria, but could potentially use some of the same concepts.


> Separate data drive for Index.db files
> --------------------------------------
>
>                 Key: CASSANDRA-14229
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14229
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Local Write-Read Paths
>            Reporter: Dan Kinder
>            Priority: Minor
>
> For datasets with an active set of keys that well exceeds ram, it would be quite useful to be able to put certain sstable files (e.g. *-Index.db) on a separate, faster drive(s) than the data. E.g. put the indexes on SSD and data on HDD. Particularly valuable when keys are much smaller than values. Also as ram continues to get more expensive, users that currently optimize by having large key caches may not need to buy as much of it.
> Our use case is a large dataset like this one. Storing all the data on SSD is cost-prohibitive, and the reads are extremely random (effectively every key is in the active set), so we don't have enough ram to cache it. (I did try using a massive key cache, 64GB, and was seeing strange behavior anyway... irqbalancer process pegged the cpu and the whole thing way underperformed. An investigation for another day.)
> At the moment our only resolution is to buy enough HDD to handle 2 seeks per read, 1 for the index and 1 for the data. But having indexes on SSD would speed this way up, and practically require us to purchase a small number of SSDs and about 1/2 the number of HDD.
> One user suggested lvmcache, which could work. I'd like to hear if this will really work optimally and if lvmcache will really keep the right blocks on the faster volume, and how reliable it is at the task.
> Note: asked about this on the mailing list and it was suggested I create a JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org