You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Venkata Harikrishna Nukala (Jira)" <ji...@apache.org> on 2021/03/06 11:33:00 UTC
[jira] [Commented] (CASSANDRA-14898) Key cache loading is very slow when there are many SSTables

    [ https://issues.apache.org/jira/browse/CASSANDRA-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296524#comment-17296524 ] 

Venkata Harikrishna Nukala commented on CASSANDRA-14898:
--------------------------------------------------------

[~jolynch] sorry, took long time to get back on this. I took your patch and tried it in my local env (with even more SStables: around 11k+) and it performed better than my patch. It loaded 400k+ keys with in a second where as my code changes took 8 to 9 seconds. It is worth maintaining state for this performance gain. I am fine with dropping my patch. Though I made a small change on top of your patch which is similar to my patch. Replaced find method with generic collect method so that copying SSTables into a new map can be avoided ([https://github.com/nvharikrishna/cassandra/commit/b26b5f17877b5d89698840e42a3c77a6629594f5]). Probably splitting the CacheSerializer is an alternative (which you mentioned earlier) if not this solution.

I have also tried to time bound the key and row cache loading by calling get method with specific time and cancelling it (may leave key cache entries in invalid state if not cancelled when compaction starts) if it couldn't finish. Cancelling it lead to serious problem. Cancelling the task interrupted the deserializer -> DataInputStream -> ChannelProxy which is throwing  _java.nio.channels.ClosedByInterruptException_ as _org.apache.cassandra.io.FSReadError_. FSReadError is treated as disk failure and instance is getting stopped as per disk failure policy. Pasting the stack trace here for reference. After looking at the performance improvement, I am not sure if it is worth digging down the path of modifying ChannelProxy to handle this case (not treating java.nio.channels.ClosedByInterruptException as error). So reverted the changes.
{code:java}
WARN [main] 2021-03-06 12:58:29,926 CassandraDaemon.java:346 - Cache did not load in given time. Cancelled loading of key and row cache. isCancelled: true
ERROR [pool-3-thread-1] 2021-03-06 12:58:29,928 DefaultFSErrorHandler.java:104 - Exiting forcefully due to file system exception on startup, disk failure policy "stop"
org.apache.cassandra.io.FSReadError: java.nio.channels.ClosedByInterruptException
 at org.apache.cassandra.io.util.ChannelProxy.read(ChannelProxy.java:143)
 at org.apache.cassandra.io.util.SimpleChunkReader.readChunk(SimpleChunkReader.java:41)
 at org.apache.cassandra.io.util.ChecksummedRebufferer.rebuffer(ChecksummedRebufferer.java:45)
 at org.apache.cassandra.io.util.RandomAccessReader.reBufferAt(RandomAccessReader.java:65)
 at org.apache.cassandra.io.util.RandomAccessReader.reBuffer(RandomAccessReader.java:59)
 at org.apache.cassandra.io.util.RebufferingInputStream.read(RebufferingInputStream.java:90)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
 at org.apache.cassandra.io.util.LengthAvailableInputStream.read(LengthAvailableInputStream.java:57)
 at java.io.DataInputStream.readFully(DataInputStream.java:195)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:433)
 at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:453)
 at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:227)
 at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:168)
 at org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:164)
 at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
 at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
 at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException: null
 at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
 at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:740)
 at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721)
 at org.apache.cassandra.io.util.ChannelProxy.read(ChannelProxy.java:139)
 ... 22 common frames omitted{code}



> Key cache loading is very slow when there are many SSTables
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-14898
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14898
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>         Environment: AWS i3.2xlarge, 4 physical cores (8 threads), 60GB of RAM, loading about 8MB of KeyCache with 10k keys in it.
>            Reporter: Joey Lynch
>            Assignee: Venkata Harikrishna Nukala
>            Priority: Low
>              Labels: Performance, low-hanging-fruit
>         Attachments: key_cache_load_slow.svg
>
>
> While dealing with a production issue today where some 3.0.17 nodes had close to ~8k sstables on disk due to excessive write pressure, we had a few nodes crash due to OOM and then they took close to 17 minutes to load the key cache and recover. This excessive key cache load significantly increased the duration of the outage (to mitigate we just removed the saved key cache files). For example here is one example taking 17 minutes to load 10k keys, or about 10 keys per second (which is ... very slow):
> {noformat}
> INFO  [pool-3-thread-1] 2018-11-15 21:50:21,885 AutoSavingCache.java:190 - reading saved cache /mnt/data/cassandra/saved_caches/KeyCache-d.db
> INFO  [pool-3-thread-1] 2018-11-15 22:07:16,490 AutoSavingCache.java:166 - Completed loading (1014606 ms; 10103 keys) KeyCache cache
> {noformat}
> I've witnessed similar behavior in the past with large LCS clusters, and indeed it appears that any time the number of sstables is large, KeyCache loading takes a _really_ long time. Today I got a flame graph and I believe that I found the issue and I think it's reasonably easy to fix. From what I can tell the {{KeyCacheSerializer::deserialize}} [method |https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L445] which is called for every key is linear in the number of sstables due to the [call|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L459] to {{ColumnFamilyStore::getSSTables}} which ends up calling {{View::select}} [here|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139]. The {{View::select}} call is linear in the number of sstables and causes a _lot_ of {{HashSet}} [resizing|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/lifecycle/View.java#L139] when the number of sstables is much greater than 16 (the default size of the backing {{HashMap}}).
> As we see in the attached flamegraph we spend 50% of our CPU time in these {{getSSTable}} calls, of which 36% is spent adding sstables to the HashSet in {{View::select}} and 17% is spent just iterating the sstables in the first place. A full 16% of CPU time is spent _just resizing the HashMap_. Then another 4% is spend calling {{CacheService::findDesc}} which does [a linear search|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/CacheService.java#L475] for the sstable generation.
> I believe that this affects at least Cassandra 3.0.17 and trunk, and could be pretty easily fixed by either caching the getSSTables call or at the very least pre-sizing the {{HashSet}} in {{View::select}} to be the size of the sstables map.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org