You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2008/09/09 13:23:44 UTC

[jira] Created: (HADOOP-4128) NPE in FSDir.getBlockInfo

NPE in FSDir.getBlockInfo
-------------------------

                 Key: HADOOP-4128
                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
             Project: Hadoop Core
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.18.1
            Reporter: Steve Loughran
            Priority: Minor


This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629450#action_12629450 ] 

Steve Loughran commented on HADOOP-4128:
----------------------------------------

the line in question assumes that dir.listFiles() never returns a null list

      File blockFiles[] = dir.listFiles();
      for (int i = 0; i < blockFiles.length; i++) {    ///here

This assumption is at odds with the contract of java.io.File.listFiles, which states

" Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs."

FSDir.getBlockInfo() needs to check for a null return value and fail with a helpful message in such a circumstance

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629564#action_12629564 ] 

Raghu Angadi commented on HADOOP-4128:
--------------------------------------

> Question is: is that an appropriate patch? [...]
I think so. This kind of synchronization with volatiles always has these problems...

Also +1 for the attached patch.

listFiles() is called in other places too.. though many of them occur during initialization.

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-4128:
-----------------------------------

    Affects Version/s:     (was: 0.18.1)
                       0.19.0

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629517#action_12629517 ] 

Steve Loughran commented on HADOOP-4128:
----------------------------------------

Moving the shouldRun = false assignment up to earlier on in the server shutdown process appears to make this message go away altogether. It is clearly a race condition in my test -the temporary directories are cleaned before the datanode has shut down, but telling the data receiver threads sooner rather than later to shut down makes the window much smaller. 

Question is: is that an appropriate patch? Or should the window be left fairly wide, so that its existence is more obvious?

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629498#action_12629498 ] 

Steve Loughran commented on HADOOP-4128:
----------------------------------------

one possible root cause of this could be my tests deleting the datanode directory before the datanode is terminated. Or the new service code isn't terminating a datanode properly. More research needed on my side. Whatever the cause, the unexpected disappearance of any part of the filesystem is something that datanodes may encounter, and should be reported more meaningfully than with an NPE

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-4128:
----------------------------------

    Assignee: Steve Loughran
      Status: Open  (was: Patch Available)

If this is going to fix an errant assumption about the File::listFiles contract as quoted above, it's within scope of the patch to correct all the cases in FSDataset.

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-4128:
-----------------------------------

    Attachment: hadoop-4128.patch

This patch changes the stack trace to one that lists the directory at fault. It does not make the root cause go away, which could be a race condition or some other problem:

[sf-startdaemon-debug] 08/09/09 12:35:46 [Thread-49] ERROR datanode.DataNode : Exception: java.lang.IllegalStateException: Not a valid directory: /tmp/tempdir16656/current
[sf-startdaemon-debug] java.lang.IllegalStateException: Not a valid directory: /tmp/tempdir16656/current
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.getBlockInfo(FSDataset.java:190)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.getBlockInfo(FSDataset.java:423)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:521)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockReport(FSDataset.java:1137)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:773)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1151)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.ExtDataNode.run(ExtDataNode.java:113)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.ExtDataNode$DataNodeThread.execute(ExtDataNode.java:177)
[sf-startdaemon-debug]  at org.smartfrog.sfcore.utils.SmartFrogThread.run(SmartFrogThread.java:279)
[sf-startdaemon-debug]  at org.smartfrog.sfcore.utils.WorkflowThread.run(WorkflowThread.java:117)


> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645639#action_12645639 ] 

Chris Douglas commented on HADOOP-4128:
---------------------------------------

bq. listFiles() is called in other places too.. though many of them occur during initialization.
Shouldn't the other calls be corrected in this patch?

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-4128:
-----------------------------------

    Fix Version/s: 0.19.0
           Status: Patch Available  (was: Open)

I see this error message in my tests, even though the hadoop test suite is passing. 

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4128:
--------------------------------

    Fix Version/s:     (was: 0.19.0)

Please mark it as a blocker for 0.19 if required

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629449#action_12629449 ] 

Steve Loughran commented on HADOOP-4128:
----------------------------------------

Stack trace
[sf-startdaemon-debug] 08/09/09 12:05:27 [Thread-91] ERROR datanode.DataNode : Exception: java.lang.NullPointerException
[sf-startdaemon-debug] java.lang.NullPointerException
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.getBlockInfo(FSDataset.java:189)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.getBlockInfo(FSDataset.java:420)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolumeSet.getBlockInfo(FSDataset.java:518)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockReport(FSDataset.java:1134)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:773)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1151)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.ExtDataNode.run(ExtDataNode.java:113)
[sf-startdaemon-debug]  at org.apache.hadoop.hdfs.server.datanode.ExtDataNode$DataNodeThread.execute(ExtDataNode.java:177)
[sf-startdaemon-debug]  at org.smartfrog.sfcore.utils.SmartFrogThread.run(SmartFrogThread.java:279)
[sf-startdaemon-debug]  at org.smartfrog.sfcore.utils.WorkflowThread.run(WorkflowThread.java:117)



> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4128) NPE in FSDir.getBlockInfo

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629557#action_12629557 ] 

Hadoop QA commented on HADOOP-4128:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389740/hadoop-4128.patch
  against trunk revision 693460.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3219/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3219/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3219/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3219/console

This message is automatically generated.

> NPE in FSDir.getBlockInfo
> -------------------------
>
>                 Key: HADOOP-4128
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4128
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.19.0
>            Reporter: Steve Loughran
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: hadoop-4128.patch
>
>
> This could well be something I've introduced on my variant of the code, although its a recent arrival in my own tests: an NPE while bringing up a datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.