You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2015/11/10 07:04:10 UTC
[jira] [Commented] (CASSANDRA-10538) Assertion failed in LogFile when disk is full

    [ https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998012#comment-14998012 ] 

Stefania commented on CASSANDRA-10538:
--------------------------------------

[~benedict] I've rebased and rerun CI. There are several failing dtests due to time out issues, the same is true on the main 3.0 branch however.

Unit tests are OK except for [this timeout|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10538-3.0-testall/lastCompletedBuild/testReport/org.apache.cassandra.db/CellTest/BeforeFirstTest/] but this also happens on the [main 3.0 branch|http://cassci.datastax.com/job/cassandra-3.0_testall/242/testReport/org.apache.cassandra.db/SinglePartitionSliceCommandTest/BeforeFirstTest/] and I've opened CASSANDRA-10682 to address this.

Let me know if you want to retry the dtests or if you need any other changes in order to commit this patch.

> Assertion failed in LogFile when disk is full
> ---------------------------------------------
>
>                 Key: CASSANDRA-10538
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log
>
>
> [~carlyeks] was running a stress job which filled up the disk. At the end of the system logs there are several assertion errors:
> {code}
> ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 - Exception in thread Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
>         at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156) ~[main/:na]
>         at org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77) ~[main/:na]
>         at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110) ~[main/:na]
>         at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182) ~[main/:na]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na]
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) ~[main/:na]
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[main/:na]
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220) ~[main/:na]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_40]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [IndexSummaryManager:1] 2015-10-14 21:10:40,099 IndexSummaryManager.java:257 - Redistributing index summaries
> ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 CassandraDaemon.java:195 - Exception in thread Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: Already completed!
>         at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) ~[main/:na]
>         at org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376) ~[main/:na]
>         at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) ~[main/:na]
>         at org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259) ~[main/:na]
>         at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) ~[main/:na]
>         at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193) ~[main/:na]
>         at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158) ~[main/:na]
>         at org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242) ~[main/:na]
>         at org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134) ~[main/:na]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[main/:na]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
> {code}
> We should not have an assertion if it can happen when the disk is full, we should rather have a runtime exception.
> I also would like to understand exactly what triggered the assertion. {{LifecycleTransaction}} can throw at the beginning of the commit method if it cannot write the record to disk, in which case all we have to do is ensure we update the records in memory after writing to disk (currently we update them before). However, I am not sure this is what happened here, it looks more like abort was called twice, which should never happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)