You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Justin Sanciangco <js...@blizzard.com> on 2018/01/06 01:41:21 UTC

NVMe SSD benchmarking with Cassandra

Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco

Re: NVMe SSD benchmarking with Cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.

Can you quantify very bad performance? 

-- 
Jeff Jirsa


> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com> wrote:
> 
> Hello,
>  
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.
>  
> Thank you,
> Justin Sanciangco

Re: NVMe SSD benchmarking with Cassandra

Posted by Matija Gobec <ma...@gmail.com>.

Justin,

NVMe drives have their own IO queueing mechanism and there is a huge
performance difference vs the linux queue.
Next to properly configured file system and scheduler try setting
"scsi_mod.use_blk_mq=1"
in grub cmdline.
If you are looking for a BFQ scheduler, its probably a module so you will
need to load it.

Best,
Matija

On Tue, Jan 9, 2018 at 1:17 AM, Nate McCall <na...@thelastpickle.com> wrote:

>
>>
>>
>> In regards to setting read ahead, how is this set for nvme drives? Also,
>> below is our compression settings for the table… It’s the same as our tests
>> that we are doing against SAS SSDs so I don’t think the compression
>> settings would be the issue…
>>
>>
>>
>
> Check blockdev --report between the old and the new servers to see if
> there is a difference. Are there other deltas in the disk layouts between
> the old and new servers (ie. LVM, mdadm, etc.)?
>
> You can control read ahead via 'blockdev --setra' or via poking the
> kernel: /sys/block/[YOUR DRIVE]/queue/read_ahead_kb
>
> In both cases, changes are instantaneous so you can do it on a canary and
> monitor for effect.
>
> Also, i'd be curious to know (since you have this benchmark setup) if you
> got the degradation you are currently seeing if you set concurrent_reads
> and concurrent_writes back to their defaults.
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: NVMe SSD benchmarking with Cassandra

Posted by Nate McCall <na...@thelastpickle.com>.

>
>
>
>
> In regards to setting read ahead, how is this set for nvme drives? Also,
> below is our compression settings for the table… It’s the same as our tests
> that we are doing against SAS SSDs so I don’t think the compression
> settings would be the issue…
>
>
>

Check blockdev --report between the old and the new servers to see if there
is a difference. Are there other deltas in the disk layouts between the old
and new servers (ie. LVM, mdadm, etc.)?

You can control read ahead via 'blockdev --setra' or via poking the kernel:
/sys/block/[YOUR DRIVE]/queue/read_ahead_kb

In both cases, changes are instantaneous so you can do it on a canary and
monitor for effect.

Also, i'd be curious to know (since you have this benchmark setup) if you
got the degradation you are currently seeing if you set concurrent_reads
and concurrent_writes back to their defaults.


-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

RE: NVMe SSD benchmarking with Cassandra

Posted by Justin Sanciangco <js...@blizzard.com>.

Hi Jeff,

In regards to setting read ahead, how is this set for nvme drives? Also, below is our compression settings for the table… It’s the same as our tests that we are doing against SAS SSDs so I don’t think the compression settings would be the issue…

CREATE KEYSPACE ycsb WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;

CREATE TABLE ycsb.usertable (
    y_id text PRIMARY KEY,
    field0 text,
    field1 text,
    field2 text,
    field3 text,
    field4 text,
    field5 text,
    field6 text,
    field7 text,
    field8 text,
    field9 text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Below are the metrics as far as TPS output from the YCSB benchmark...

DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
2018-01-08 21:50:49:100 10 sec: 1048634 operations; 104863.4 current ops/sec; est completion in 15 minutes [INSERT: Count=1048634, Max=291071, Min=194, Avg=417.22, 90=463, 99=947, 99.9=5531, 99.99=136831]
2018-01-08 21:50:59:100 20 sec: 2159133 operations; 111049.9 current ops/sec; est completion in 15 minutes [INSERT: Count=1110545, Max=409087, Min=194, Avg=434.17, 90=450, 99=612, 99.9=3409, 99.99=294911]
2018-01-08 21:51:09:101 30 sec: 3092963 operations; 93383 current ops/sec; est completion in 15 minutes [INSERT: Count=933938, Max=460287, Min=193, Avg=511.4, 90=470, 99=750, 99.9=6055, 99.99=429823]
2018-01-08 21:51:19:100 40 sec: 4153712 operations; 106074.9 current ops/sec; est completion in 15 minutes [INSERT: Count=1060595, Max=388095, Min=194, Avg=434.08, 90=457, 99=604, 99.9=3261, 99.99=335103]
2018-01-08 21:51:29:100 50 sec: 5165150 operations; 101143.8 current ops/sec; est completion in 15 minutes [INSERT: Count=1011537, Max=419839, Min=189, Avg=488.41, 90=462, 99=666, 99.9=4057, 99.99=397823]
2018-01-08 21:51:39:100 60 sec: 6151282 operations; 98613.2 current ops/sec; est completion in 15 minutes [INSERT: Count=986033, Max=408575, Min=196, Avg=474.68, 90=467, 99=671, 99.9=5463, 99.99=375807]
2018-01-08 21:51:49:100 70 sec: 7171184 operations; 101990.2 current ops/sec; est completion in 15 minutes [INSERT: Count=1019962, Max=406783, Min=189, Avg=477.11, 90=468, 99=725, 99.9=4855, 99.99=364031]
2018-01-08 21:51:59:100 80 sec: 8154478 operations; 98329.4 current ops/sec; est completion in 15 minutes [INSERT: Count=983234, Max=391423, Min=188, Avg=473.52, 90=465, 99=653, 99.9=4751, 99.99=346623]
2018-01-08 21:52:09:100 90 sec: 9204270 operations; 104979.2 current ops/sec; est completion in 14 minutes [INSERT: Count=1049855, Max=366335, Min=194, Avg=465.83, 90=466, 99=690, 99.9=4207, 99.99=347391]
2018-01-08 21:52:19:100 100 sec: 10191251 operations; 98698.1 current ops/sec; est completion in 14 minutes [INSERT: Count=986982, Max=337663, Min=191, Avg=483.67, 90=466, 99=707, 99.9=5495, 99.99=323583]
2018-01-08 21:52:29:100 110 sec: 11118897 operations; 92764.6 current ops/sec; est completion in 14 minutes [INSERT: Count=927649, Max=324607, Min=195, Avg=514.77, 90=490, 99=798, 99.9=7939, 99.99=314111]
2018-01-08 21:52:39:100 120 sec: 12106226 operations; 98732.9 current ops/sec; est completion in 14 minutes [INSERT: Count=987327, Max=327423, Min=191, Avg=483.53, 90=475, 99=749, 99.9=6303, 99.99=291583]
2018-01-08 21:52:49:100 130 sec: 12406781 operations; 30055.5 current ops/sec; est completion in 15 minutes [INSERT: Count=300545, Max=2267135, Min=195, Avg=1594.21, 90=551, 99=1412, 99.9=268031, 99.99=2059263]
2018-01-08 21:52:59:100 140 sec: 12455737 operations; 4895.6 current ops/sec; est completion in 16 minutes [INSERT: Count=48901, Max=2637823, Min=208, Avg=8719.47, 90=570, 99=1775, 99.9=2435071, 99.99=2637823]
2018-01-08 21:53:09:100 150 sec: 12545132 operations; 8939.5 current ops/sec; est completion in 17 minutes [INSERT: Count=89395, Max=2040831, Min=196, Avg=5236.29, 90=603, 99=3103, 99.9=1419263, 99.99=2039807]
2018-01-08 21:53:19:100 160 sec: 12713856 operations; 16872.4 current ops/sec; est completion in 18 minutes [INSERT: Count=168724, Max=3260415, Min=201, Avg=3212.66, 90=505, 99=825, 99.9=1442815, 99.99=3256319]
2018-01-08 21:53:29:100 170 sec: 13014136 operations; 30028 current ops/sec; est completion in 18 minutes [INSERT: Count=300280, Max=3291135, Min=195, Avg=1398.45, 90=486, 99=722, 99.9=200575, 99.99=2809855]
2018-01-08 21:53:39:100 180 sec: 13212312 operations; 19817.6 current ops/sec; est completion in 19 minutes [INSERT: Count=198176, Max=1838079, Min=196, Avg=2409.91, 90=524, 99=841, 99.9=612863, 99.99=1628159]
2018-01-08 21:53:49:100 190 sec: 13498836 operations; 28652.4 current ops/sec; est completion in 20 minutes [INSERT: Count=286524, Max=2865151, Min=195, Avg=1851.54, 90=513, 99=824, 99.9=402175, 99.99=2654207]
2018-01-08 21:53:59:100 200 sec: 13616476 operations; 11764 current ops/sec; est completion in 21 minutes [INSERT: Count=117640, Max=1835007, Min=198, Avg=4156.37, 90=555, 99=1461, 99.9=1234943, 99.99=1829887]
2018-01-08 21:54:09:100 210 sec: 13810240 operations; 19376.4 current ops/sec; est completion in 21 minutes [INSERT: Count=193764, Max=1638399, Min=196, Avg=2159.51, 90=528, 99=1352, 99.9=814591, 99.99=1637375]
2018-01-08 21:54:19:100 220 sec: 14052024 operations; 24178.4 current ops/sec; est completion in 22 minutes [INSERT: Count=241784, Max=3465215, Min=192, Avg=2111.23, 90=479, 99=847, 99.9=221183, 99.99=2643967]
2018-01-08 21:54:29:100 230 sec: 14349241 operations; 29721.7 current ops/sec; est completion in 22 minutes [INSERT: Count=297272, Max=1814527, Min=197, Avg=1722.5, 90=541, 99=1400, 99.9=418815, 99.99=1233919]
2018-01-08 21:54:39:100 240 sec: 14495872 operations; 14663.1 current ops/sec; est completion in 23 minutes [INSERT: Count=146576, Max=2435071, Min=195, Avg=2692.02, 90=509, 99=881, 99.9=1121279, 99.99=2032639]
2018-01-08 21:54:49:100 250 sec: 14581928 operations; 8605.6 current ops/sec; est completion in 24 minutes [INSERT: Count=86056, Max=3651583, Min=203, Avg=6109.73, 90=578, 99=979, 99.9=1354751, 99.99=3651583]
2018-01-08 21:54:59:100 260 sec: 14647360 operations; 6543.2 current ops/sec; est completion in 25 minutes [INSERT: Count=65432, Max=2424831, Min=204, Avg=6781.02, 90=572, 99=2071, 99.9=1839103, 99.99=2422783]
2018-01-08 21:55:09:100 270 sec: 14688500 operations; 4114 current ops/sec; est completion in 26 minutes [INSERT: Count=41140, Max=3887103, Min=217, Avg=12254.66, 90=599, 99=6423, 99.9=2451455, 99.99=3678207]
2018-01-08 21:55:19:100 280 sec: 15060816 operations; 37231.6 current ops/sec; est completion in 26 minutes [INSERT: Count=372316, Max=2234367, Min=190, Avg=1347.23, 90=493, 99=833, 99.9=410111, 99.99=1830911]
2018-01-08 21:55:29:100 290 sec: 15148256 operations; 8744 current ops/sec; est completion in 27 minutes [INSERT: Count=87440, Max=2453503, Min=203, Avg=4990.8, 90=532, 99=820, 99.9=1429503, 99.99=2453503]
2018-01-08 21:55:39:100 300 sec: 15452601 operations; 30434.5 current ops/sec; est completion in 27 minutes [INSERT: Count=304345, Max=2049023, Min=191, Avg=1774.37, 90=497, 99=762, 99.9=400383, 99.99=2013183]
2018-01-08 21:55:49:100 310 sec: 15522064 operations; 6946.3 current ops/sec; est completion in 28 minutes [INSERT: Count=69463, Max=2836479, Min=209, Avg=6808.93, 90=617, 99=187519, 99.9=1024511, 99.99=2433023]
2018-01-08 21:55:59:100 320 sec: 15589351 operations; 6728.7 current ops/sec; est completion in 28 minutes [INSERT: Count=67367, Max=2637823, Min=200, Avg=7380.59, 90=574, 99=1152, 99.9=2209791, 99.99=2637823]
2018-01-08 21:56:09:100 330 sec: 15691979 operations; 10262.8 current ops/sec; est completion in 29 minutes [INSERT: Count=102601, Max=3438591, Min=205, Avg=4675.01, 90=560, 99=1179, 99.9=995839, 99.99=3438591]
2018-01-08 21:56:19:100 340 sec: 15762632 operations; 7065.3 current ops/sec; est completion in 30 minutes [INSERT: Count=70669, Max=3080191, Min=205, Avg=6789.43, 90=570, 99=1582, 99.9=2453503, 99.99=3078143]
2018-01-08 21:56:29:100 350 sec: 15864184 operations; 10155.2 current ops/sec; est completion in 30 minutes [INSERT: Count=101483, Max=2232319, Min=195, Avg=3720.85, 90=550, 99=967, 99.9=1419263, 99.99=1840127]
2018-01-08 21:56:39:101 360 sec: 15884476 operations; 2029.2 current ops/sec; est completion in 31 minutes [INSERT: Count=20292, Max=3004415, Min=234, Avg=24097.18, 90=669, 99=1005055, 99.9=2998271, 99.99=3004415]
2018-01-08 21:56:49:100 370 sec: 15904064 operations; 1958.8 current ops/sec; est completion in 32 minutes [INSERT: Count=19588, Max=3647487, Min=250, Avg=28360.33, 90=676, 99=1220607, 99.9=3485695, 99.99=3643391]
2018-01-08 21:56:59:100 380 sec: 15922500 operations; 1843.6 current ops/sec; est completion in 33 minutes [INSERT: Count=18436, Max=3223551, Min=230, Avg=25792.42, 90=675, 99=814079, 99.9=3215359, 99.99=3223551]
2018-01-08 21:57:09:100 390 sec: 15958808 operations; 3630.8 current ops/sec; est completion in 34 minutes [INSERT: Count=36308, Max=3637247, Min=211, Avg=13116.93, 90=631, 99=163967, 99.9=3219455, 99.99=3635199]
2018-01-08 21:57:19:100 400 sec: 16037208 operations; 7840 current ops/sec; est completion in 34 minutes [INSERT: Count=78400, Max=3844095, Min=205, Avg=4716.95, 90=533, 99=875, 99.9=1020415, 99.99=3839999]
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)

Any insight would be very helpful.

Thank you,
Justin Sanciangco


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, January 5, 2018 5:50 PM
To: user@cassandra.apache.org
Subject: Re: NVMe SSD benchmarking with Cassandra

Second the note about compression chunk size in particular.
--
Jeff Jirsa


On Jan 5, 2018, at 5:48 PM, Jon Haddad <jo...@jonhaddad.com>> wrote:
Generally speaking, disable readahead.  After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model.  How are you measuring things?  Are you saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it ends up being a mix of incorrect compression settings (use 4K at most), some crazy readahead setting like 1MB, and terrible JVM settings that are the bulk of the problem.

Without knowing how you are testing things or *any* metrics whatsoever whether it be C* or OS it’s going to be hard to help you out.

Jon



On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com>> wrote:

Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco

Re: NVMe SSD benchmarking with Cassandra

Posted by Dikang Gu <di...@gmail.com>.

Do you have some detailed benchmark metrics? Like the QPS, Avg read/write
latency, P95/P99 read/write latency?

On Fri, Jan 5, 2018 at 5:57 PM, Justin Sanciangco <js...@blizzard.com>
wrote:

> I am benchmarking with the YCSB tool doing 1k writes.
>
>
>
> Below are my server specs
>
> 2 sockets
>
> 12 core hyperthreaded processor
>
> 64GB memory
>
>
>
> Cassandra settings
>
> 32GB heap
>
> Concurrent_reads: 128
>
> Concurrent_writes:256
>
>
>
> From what we are seeing it looks like the kernel writing to the disk
> causes degrading performance.
>
>
>
>
>
> Please let me know
>
>
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com]
> *Sent:* Friday, January 5, 2018 5:50 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: NVMe SSD benchmarking with Cassandra
>
>
>
> Second the note about compression chunk size in particular.
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Jan 5, 2018, at 5:48 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
>
> Generally speaking, disable readahead.  After that it's very likely the
> issue isn’t in the settings you’re using the disk settings, but is actually
> in your Cassandra config or the data model.  How are you measuring things?
> Are you saturating your disks?  What resource is your bottleneck?
>
>
>
> *Every* single time I’ve handled a question like this, without exception,
> it ends up being a mix of incorrect compression settings (use 4K at most),
> some crazy readahead setting like 1MB, and terrible JVM settings that are
> the bulk of the problem.
>
>
>
> Without knowing how you are testing things or *any* metrics whatsoever
> whether it be C* or OS it’s going to be hard to help you out.
>
>
>
> Jon
>
>
>
>
>
> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com>
> wrote:
>
>
>
> Hello,
>
>
>
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very
> bad performance when my workload exceeds the memory size. What mount
> settings for NVMe should be used? Right now the SSD is formatted as XFS
> using noop scheduler. Are there any additional mount options that should be
> used? Any specific kernel parameters that should set in order to make best
> use of the PCIe NVMe SSD? Your insight would be well appreciated.
>
>
>
> Thank you,
>
> Justin Sanciangco
>
>
>
>


-- 
Dikang

RE: NVMe SSD benchmarking with Cassandra

Posted by Justin Sanciangco <js...@blizzard.com>.

I am benchmarking with the YCSB tool doing 1k writes.

Below are my server specs
2 sockets
12 core hyperthreaded processor
64GB memory

Cassandra settings
32GB heap
Concurrent_reads: 128
Concurrent_writes:256

From what we are seeing it looks like the kernel writing to the disk causes degrading performance.

[cid:image001.png@01D3864E.B5034DA0]

Please let me know


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Friday, January 5, 2018 5:50 PM
To: user@cassandra.apache.org
Subject: Re: NVMe SSD benchmarking with Cassandra

Second the note about compression chunk size in particular.
--
Jeff Jirsa


On Jan 5, 2018, at 5:48 PM, Jon Haddad <jo...@jonhaddad.com>> wrote:
Generally speaking, disable readahead.  After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model.  How are you measuring things?  Are you saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it ends up being a mix of incorrect compression settings (use 4K at most), some crazy readahead setting like 1MB, and terrible JVM settings that are the bulk of the problem.

Without knowing how you are testing things or *any* metrics whatsoever whether it be C* or OS it’s going to be hard to help you out.

Jon



On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com>> wrote:

Hello,

I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.

Thank you,
Justin Sanciangco

Re: NVMe SSD benchmarking with Cassandra

Posted by Jeff Jirsa <jj...@gmail.com>.

Second the note about compression chunk size in particular. 

-- 
Jeff Jirsa


> On Jan 5, 2018, at 5:48 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> 
> Generally speaking, disable readahead.  After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model.  How are you measuring things?  Are you saturating your disks?  What resource is your bottleneck?
> 
> *Every* single time I’ve handled a question like this, without exception, it ends up being a mix of incorrect compression settings (use 4K at most), some crazy readahead setting like 1MB, and terrible JVM settings that are the bulk of the problem.  
> 
> Without knowing how you are testing things or *any* metrics whatsoever whether it be C* or OS it’s going to be hard to help you out.
> 
> Jon
> 
> 
>> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com> wrote:
>> 
>> Hello,
>>  
>> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.
>>  
>> Thank you,
>> Justin Sanciangco
>

Re: NVMe SSD benchmarking with Cassandra

Posted by Jon Haddad <jo...@jonhaddad.com>.

Oh, I should have added, my compression settings comment only applies to read heavy workloads, as reading 64KB off disk in order to return a handful of bytes is incredibly wasteful by orders of magnitude but doesn’t really cause any problems on write heavy workloads.

> On Jan 5, 2018, at 5:48 PM, Jon Haddad <jo...@jonhaddad.com> wrote:
> 
> Generally speaking, disable readahead.  After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model.  How are you measuring things?  Are you saturating your disks?  What resource is your bottleneck?
> 
> *Every* single time I’ve handled a question like this, without exception, it ends up being a mix of incorrect compression settings (use 4K at most), some crazy readahead setting like 1MB, and terrible JVM settings that are the bulk of the problem.  
> 
> Without knowing how you are testing things or *any* metrics whatsoever whether it be C* or OS it’s going to be hard to help you out.
> 
> Jon
> 
> 
>> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <jsanciangco@blizzard.com <ma...@blizzard.com>> wrote:
>> 
>> Hello,
>>  
>> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.
>>  
>> Thank you,
>> Justin Sanciangco
>

Re: NVMe SSD benchmarking with Cassandra

Posted by Jon Haddad <jo...@jonhaddad.com>.

Generally speaking, disable readahead.  After that it's very likely the issue isn’t in the settings you’re using the disk settings, but is actually in your Cassandra config or the data model.  How are you measuring things?  Are you saturating your disks?  What resource is your bottleneck?

*Every* single time I’ve handled a question like this, without exception, it ends up being a mix of incorrect compression settings (use 4K at most), some crazy readahead setting like 1MB, and terrible JVM settings that are the bulk of the problem.  

Without knowing how you are testing things or *any* metrics whatsoever whether it be C* or OS it’s going to be hard to help you out.

Jon

> On Jan 5, 2018, at 5:41 PM, Justin Sanciangco <js...@blizzard.com> wrote:
> 
> Hello,
>  
> I am currently benchmarking NVMe SSDs with Cassandra and am getting very bad performance when my workload exceeds the memory size. What mount settings for NVMe should be used? Right now the SSD is formatted as XFS using noop scheduler. Are there any additional mount options that should be used? Any specific kernel parameters that should set in order to make best use of the PCIe NVMe SSD? Your insight would be well appreciated.
>  
> Thank you,
> Justin Sanciangco