You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2015/04/22 20:52:00 UTC

[jira] [Commented] (ACCUMULO-3745) deadlock in SourceSwitchingIterator

    [ https://issues.apache.org/jira/browse/ACCUMULO-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507658#comment-14507658 ] 

Eric Newton commented on ACCUMULO-3745:
---------------------------------------

The two locks that are mutually held are:

* the synchronization around {{copies}}, a synchronized list.
* the lock on the SourceSwitching iterator

The SourceSwitchingIterator adds itself to copies in the constructor (which, isn't the best form, but ignoring that for the moment). So, an implicit lock on the iterator while it is being initialized, means that the lock order is this, then copies.

But the call to switch now locks copies, then _switchNow() locks this.

There are two possible fixes:

# don't lock copies and call _switchNow: make a copy (under a lock), and then call _switchNow
# move the synchronized block to switchNow, and remove it from _switchNow


> deadlock in SourceSwitchingIterator
> -----------------------------------
>
>                 Key: ACCUMULO-3745
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3745
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>         Environment: Large production cluster, with complex iterator trees.
>            Reporter: Eric Newton
>            Priority: Blocker
>             Fix For: 1.7.0, 1.6.3
>
>
> Details come from an offline cluster, so it's difficult to reproduce the exact details.  A very complex iterator was running over tablet. "deepCopy" may have been called a couple dozen times, which may have contributed to the problem.
> Relevant facts:
> A scan and a minor compaction created a deadlock which was detected by the java runtime.
> {noformat}
> "Query... ":
>   waiting to lock monitor 0x1234 (object 0x1234, a java.util.Collections$SynchronizedRandomAccessList), 
>   which is held by "minor compactor 1"
> "minor compactor 1":
>  waiting to lock monitor 0x9876 (object 0x9876, a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator), 
>  which is held by "Query..."
> {noformat}
> Java stacks:
> {noformat}
> "Query..."
>   at java.util.Collections@SynchronizedCollection.add(Collections.java:1636)
>   - waiting to lock <0x1234> (a java.util.Collections$SynchronizedRandomAccessList)
>   at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.<init>(SourceSwitchingIterator.java:72)
>  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.deepCopy(SourceSwitchingIterator:85)
>  - locked <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator)
>   ... PartialMutationSkippingIterator.deepCopy(InMememoryMap.java:113)
>  ... InMemoryMap#MemoryIterator.deepCopy(InnMemoryMap.java:623)
>  ...
> {noformat}
> and:
> {noformat}
> "minor compactor 1":
>  at org.apache.accumulo.core.iterators.system.SourceSwitchingIterarot._switchNow(SourceSwitchingIterator:171)
>  - waiting to lock <0x9876> (a org.apache.accumulo.core.iterators.system.SourceSwitchingIterator)
>  at org.apache.accumulo.iterators.system.SourceSwitchingIterator.switchNow(SourceSwitchingIterator.java:184)
>  locked <0x1234> (a java.util.Collections#SynhronizedRandomAccessList)
>  at org.apache.accumulo.tserver.InMemoryMap$MemoryIterator.switchNow(InMemoryMap.java:647)
>  ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)