You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2015/09/16 23:06:47 UTC

[jira] [Updated] (CASSANDRA-8874) running out of FD, and causing clients hang when dropping a keyspace with many CF with many sstables

     [ https://issues.apache.org/jira/browse/CASSANDRA-8874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-8874:
-----------------------------------------
    Fix Version/s:     (was: 2.0.x)

> running out of FD, and causing clients hang when dropping a keyspace with many CF with many sstables
> ----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8874
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8874
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jackson Chung
>
> we already set number of file descriptors to 100000 for c* usage, and confirmed that from /proc/$cass_pid/limits
> we have 16 nodes, 2 DC, each node stores about 600GB to 1TB data; ec2, i2-2xl instances, raid0 the 2 disks
> we use both hector and datastax drivers, and there are many clients connecting to the cluster.
> 1 day we dropped a keyspace (that our app no longer use), which has a good amount of CFs, with some of them use leveledbcompaction and have some good amount of sstables... and our app went down. CPU/load avg were high and we couldn't even ssh to them. We have to force a reboot, and restart 2 of the C*, that was filled (hundreds of thousands) of errors of "too many open files"
> C* 2.0.11
> {noformat}$ grep -ic "caused by.*too many open file" system.log.*
> system.log.1:0
> system.log.10:18659
> system.log.11:17539
> system.log.12:18941
> system.log.13:18936
> system.log.14:18601
> system.log.15:18933
> system.log.16:18937
> system.log.17:18954
> system.log.18:18892
> system.log.19:18942
> system.log.2:0
> system.log.20:18977
> system.log.21:18977
> system.log.22:18852
> system.log.23:18978
> system.log.24:18978
> system.log.25:18978
> system.log.26:18978
> system.log.27:18978
> system.log.28:18978
> system.log.29:18978
> system.log.3:654
> system.log.30:18978
> system.log.31:18978
> system.log.32:18978
> system.log.33:18977
> system.log.34:18978
> system.log.35:18978
> system.log.36:17943
> system.log.37:18867
> system.log.38:15082
> system.log.39:17766
> system.log.4:17932
> system.log.40:18029
> system.log.41:18890
> system.log.42:18048
> system.log.43:18812
> system.log.44:18787
> system.log.45:18962
> system.log.46:18978
> system.log.47:18978
> system.log.48:18978
> system.log.49:18978
> system.log.5:15284
> system.log.50:18978
> system.log.6:17180
> system.log.7:17286
> system.log.8:18651
> system.log.9:17720
> {noformat}
> all the logs are from that day..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)