You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org> on 2012/03/30 12:20:29 UTC

[jira] [Updated] (CASSANDRA-3758) parallel compaction hang (on large rows?)

     [ https://issues.apache.org/jira/browse/CASSANDRA-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3758:
----------------------------------------

    Attachment: 0001-Fix-totalBytes-count-for-ParallelCompactionIterable.txt

I'm not sure what's going on here. I went back over the parallel compaction code and didn't saw any obvious problem. I'm not sure it'll be easy to fix without being able to repro.

I'm also not completely sure what to make of the provided thread dump. Is that only one giant thread dump? If so, there seems to be tons of CompactionReducer threads, coming for lots of different ParallelCompactionIterable,  which would suggest we don't shutdown the executor of CompactionReducer correctly. But I don't see why that would happen.

At least, what is annoying is that the reporting of the total bytes to compact is buggy for parallel compactions (if it wasn't we could tell more precisely when during the compaction the hanging occured). So attaching a patch to fix that problem.
                
> parallel compaction hang (on large rows?)
> -----------------------------------------
>
>                 Key: CASSANDRA-3758
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3758
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jackson Chung
>            Assignee: Sylvain Lebresne
>              Labels: compaction, datastax_qa
>             Fix For: 1.0.9
>
>         Attachments: 0001-Fix-totalBytes-count-for-ParallelCompactionIterable.txt, cassandra.log.zip
>
>
> it is observed that:
> nodetool -h 127.0.0.1 -p 8080 compactionstats
> pending tasks: 1
> compaction type keyspace column family bytes compacted bytes total progress
> Compaction SyncCoreComputedContactNetworks 119739938 0 n/a
> and that is not moving (ie the bytes compacted never increase, the bytes total stay 0).
> this is probably going to be difficult to reproduce, as the problem is observed when compacting 15 large sstables (total ~300G).
> attaching the thread dumps (along with logs), when such happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira