You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2010/01/09 15:00:54 UTC

[jira] Created: (HDFS-889) Possible race condition in BlocksMap.NodeIterator.

Possible race condition in BlocksMap.NodeIterator.
--------------------------------------------------

                 Key: HDFS-889
                 URL: https://issues.apache.org/jira/browse/HDFS-889
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 0.22.0
            Reporter: Steve Loughran


Hudson's test run for HDFS-165 is showing an NPE in {{org.apache.hadoop.hdfs.server.namenode.TestNodeCount.testNodeCount()}}
One problem could be in {{BlocksMap.NodeIterator}}. It's {{hasNext()}} method checks the next entry isn't null. But what if between the {{hasNext() call and the next() operation, the map changes and an entry goes away? In that situation, the node returned from next() will be null. 

This is potentially serious as a quick look through the code shows that the iterator gets retrieved a lot and everywhere hadoop does so, it assumes the value is not null. It's also one of those problems that doesn't have a simple "make it go away" fix.

Options
# Ignore it, hope it doesn't happen very often and the test failing was a one off that will never happen in a production datacentre. This is the default. The iterator is only used in the namenode, so while it does depend on the # of datanodes, it isn't running in 4000 machines in a big cluster.
# Leave the iterator as is, have all the in-Hadoop code check for a null-value and break the loop
# Patch the {{NodeIterator}} to be consistent with the {{Iterator}} specification and throw a {{NoSuchElementException}} if the next value is null. This does not make the problem go away, but now it is handled by having every use in-Hadoop catching the exception at the right point and exiting the loop. 

Testing. This should be possible.
# Create a block map
# iterate over a block
# while the iterator is in progress remove the next block in the list. Expect the next call to next() to fail in whatever way you choose. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.