You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2017/06/06 09:07:18 UTC
[jira] [Commented] (HBASE-18168) NoSuchElementException when
rolling the log
[ https://issues.apache.org/jira/browse/HBASE-18168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038422#comment-16038422 ]
Ted Yu commented on HBASE-18168:
--------------------------------
Lgtm
> NoSuchElementException when rolling the log
> -------------------------------------------
>
> Key: HBASE-18168
> URL: https://issues.apache.org/jira/browse/HBASE-18168
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.11
> Reporter: Allan Yang
> Assignee: Allan Yang
> Attachments: HBASE-18168-branch-1.1.patch
>
>
> Today, one of our server aborted due to the following log.
> {code}
> 2017-06-06 05:38:47,142 ERROR [regionserver/xxxx.logRoller] regionserver.LogRoller: Log rolling failed
> java.util.NoSuchElementException
> at java.util.concurrent.ConcurrentSkipListMap$Iter.advance(ConcurrentSkipListMap.java:2224)
> at java.util.concurrent.ConcurrentSkipListMap$ValueIterator.next(ConcurrentSkipListMap.java:2253)
> at java.util.Collections.min(Collections.java:628)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findEligibleMemstoresToFlush(FSHLog.java:861)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog.findRegionsToForceFlush(FSHLog.java:886)
> at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:728)
> at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:137)
> at java.lang.Thread.run(Thread.java:756)
> 2017-06-06 05:38:47,142 FATAL [regionserver/xxxx.logRoller] regionserver.HRegionServer: ABORTING region server xxxx: Log rolling failed
> java.util.NoSuchElementException
> ......
> {code}
> The code is here:
> {code}
> private byte[][] findEligibleMemstoresToFlush(Map<byte[], Long> regionsSequenceNums) {
> List<byte[]> regionsToFlush = null;
> // Keeping the old behavior of iterating unflushedSeqNums under oldestSeqNumsLock.
> synchronized (regionSequenceIdLock) {
> for (Map.Entry<byte[], Long> e: regionsSequenceNums.entrySet()) {
> ConcurrentMap<byte[], Long> m =
> this.oldestUnflushedStoreSequenceIds.get(e.getKey());
> if (m == null) {
> continue;
> }
> long unFlushedVal = Collections.min(m.values()); //The exception is thrown here
> ......
> {code}
> The map 'm' is empty is the only reason I can think of why NoSuchElementException is thrown. I then looked up all code related to the update of 'oldestUnflushedStoreSequenceIds'. All update to 'oldestUnflushedStoreSequenceIds' is guarded by the synchronization of 'regionSequenceIdLock' except here:
> {code}
> private ConcurrentMap<byte[], Long> getOrCreateOldestUnflushedStoreSequenceIdsOfRegion(
> byte[] encodedRegionName) {
> ......
> oldestUnflushedStoreSequenceIdsOfRegion =
> new ConcurrentSkipListMap<byte[], Long>(Bytes.BYTES_COMPARATOR);
> ConcurrentMap<byte[], Long> alreadyPut =
> oldestUnflushedStoreSequenceIds.putIfAbsent(encodedRegionName,
> oldestUnflushedStoreSequenceIdsOfRegion); // Here, a empty map may put to 'oldestUnflushedStoreSequenceIds' with no synchronization
> return alreadyPut == null ? oldestUnflushedStoreSequenceIdsOfRegion : alreadyPut;
> }
> {code}
> It should be a very rare bug. But it can lead to server abort. It only exists in branch-1.1.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)