You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefano Ortolani (JIRA)" <ji...@apache.org> on 2016/05/27 13:47:12 UTC

[jira] [Created] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables

Stefano Ortolani created CASSANDRA-11906:
--------------------------------------------

             Summary: Unstable JVM due too many files when anticompacting big LCS tables
                 Key: CASSANDRA-11906
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11906
             Project: Cassandra
          Issue Type: Bug
            Reporter: Stefano Ortolani


I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite 
heavy:
	- 13 nodes with spinning disks
	- ~120 GB of data per node
	- 50% of CFs are LCS and have quite wide rows.
	- 2/3 CFs with LCS have more than 200 SStables

Incremental repairs do not seem to play really well with that.
I have been running some tests node by node by using the -pr option:

{code:xml}
nodetool -h localhost repair -pr keyscheme
{code}

and to my surprise the whole process takes quite some time (1 hour
minimum, 8 hours if I haven't been repairing for 5/6 days).

Yesterday I tried to run the command with the -seq option so to 
decrease the number of simultanoues compactions. After a while
the node on which I was running the repair simply died during
the anticompaction phase with the following
exception in the logs.

{code:xml}
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 ScheduledReporter.java:119 - RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
java.lang.RuntimeException: Failed to list files in /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) ~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files
	at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) ~[na:1.8.0_91]
	at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[na:1.8.0_91]
	at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_91]
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) ~[na:1.8.0_91]
	at org.apache.cassandra.db.lifecycle.LogReplicaSet.readRecords(LogReplicaSet.java:105) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogFile.verify(LogFile.java:132) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:113) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53) ~[apache-cassandra-3.0.6.jar:3.0.6]
	... 20 common frames omitted
Caused by: java.nio.file.FileSystemException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[na:1.8.0_91]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_91]
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_91]
	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_91]
	at java.nio.file.Files.newByteChannel(Files.java:361) ~[na:1.8.0_91]
	at java.nio.file.Files.newByteChannel(Files.java:407) ~[na:1.8.0_91]
	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) ~[na:1.8.0_91]
	at java.nio.file.Files.newInputStream(Files.java:152) ~[na:1.8.0_91]
	at java.nio.file.Files.newBufferedReader(Files.java:2784) ~[na:1.8.0_91]
	at java.nio.file.Files.readAllLines(Files.java:3202) ~[na:1.8.0_91]
	at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:615) ~[apache-cassandra-3.0.6.jar:3.0.6]
	... 45 common frames omitted
WARN  [HintsWriteExecutor:1] 2016-05-25 21:54:28,389 CLibrary.java:256 - open(/var/lib/cassandra/hints, O_RDONLY) failed, errno (24).
ERROR [CompactionExecutor:2660] 2016-05-25 21:54:29,915 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.io.FileNotFoundException: /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-92661-big-Filter.db (Too many open files)
	at java.io.FileOutputStream.open0(Native Method) ~[na:1.8.0_91]
	at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[na:1.8.0_91]
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[na:1.8.0_91]
	at java.io.FileOutputStream.<init>(FileOutputStream.java:101) ~[na:1.8.0_91]
	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.flushBf(BigTableWriter.java:431) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.doPrepare(BigTableWriter.java:461) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.io.sstable.format.big.BigTableWriter$TransactionalProxy.doPrepare(BigTableWriter.java:303) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.io.sstable.format.SSTableWriter.prepareToCommit(SSTableWriter.java:229) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:357) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.compaction.CompactionManager.antiCompactGroup(CompactionManager.java:1303) [apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.compaction.CompactionManager.doAntiCompaction(CompactionManager.java:1232) [apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.compaction.CompactionManager.performAnticompaction(CompactionManager.java:569) [apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:479) [apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-3.0.6.jar:3.0.6]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
{code}

Seems like that by using the -seq option, C* just opened wat too 
many files. The limits are currently quite ok though, as showed below:

{code:xml}
root@node3:/# cat /etc/security/limits.d/cassandra.conf 
# Provided by the cassandra package
cassandra  -  memlock  unlimited
cassandra  -  nofile   100000
cassandra  -  as       unlimited
cassandra  -  nproc    8096
{code}

Today I checked the number of file opened by C* (admittedly while running 
another repair without the -seq option):

{code:xml}
pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | wc -l"
[1] 14:29:43 [SUCCESS] node3 (where the repair is taking place)
25452
[2] 14:29:43 [SUCCESS] node4
46779

pssh --inline-stdout -h cassandra-hosts.txt -l root -t 0 "lsof -n | grep java | grep cassandra | grep 'REG' | wc -l"
[1] 14:26:40 [SUCCESS] node3 (where the repair is taking place)
24020
[2] 14:26:41 [SUCCESS] node4
4050
{code}

Looks like that during a repair, lots of file descriptors on the repairing node 
relate to regular files, while on all others the stark majority relate to memory 
mapped files.

Nevertheless 25K is still way less then 100K (still high though). 

(To be honest, when I started the repair nodetool logged it with "Starting repair 
command #2". I didn't pay that much attention to that as the previous repair
had returned already for quite some time, but it might have been the case that 
another repair was still lingering (weird if that was the case).

In conclusion, is this behavior normal? Shall I increase the limits, or is 
AntiCompaction really playing badly with incremental repairs and LCS 
(and quite overloaded to be honest) tables?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)