You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "lohit vijayarenu (JIRA)" <ji...@apache.org> on 2008/05/06 02:41:57 UTC

[jira] Created: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

TestUrlStreamHandler hangs on LINUX
-----------------------------------

                 Key: HADOOP-3348
                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
             Project: Hadoop Core
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.18.0
         Environment: LINUX 2.6.9
            Reporter: lohit vijayarenu


TestUrlStreamHandler sets setURLStreamHandlerFactory as
{noformat)
FsUrlStreamHandlerFactory factory =
        new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
    java.net.URL.setURLStreamHandlerFactory(factory);
(noformat}

After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
{noformat}
rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
{noformat}

jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read

(Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595463#action_12595463 ] 

Hadoop QA commented on HADOOP-3348:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12381709/HADOOP-3348.patch
  against trunk revision 654315.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2433/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2433/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2433/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2433/console

This message is automatically generated.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3348:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.18.0
         Assignee: lohit vijayarenu
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this. Thanks Lohit!

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.18.0
>
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594667#action_12594667 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3348:
------------------------------------------------

For generating storageIDs, the goal is to generate unique IDs.  These IDs are not for cryptographic uses.  So Random may be good enough.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595818#action_12595818 ] 

Hudson commented on HADOOP-3348:
--------------------------------

Integrated in Hadoop-trunk #486 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/486/])

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>            Assignee: lohit vijayarenu
>             Fix For: 0.18.0
>
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Christophe Taton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594668#action_12594668 ] 

Christophe Taton commented on HADOOP-3348:
------------------------------------------

Actually, this test probably hangs because of the use of our own file:// URL handler, but I don't understand yet what differences between the "file://" URL handling provided by Hadoop and the default (Sun) one could lead the SecureRandom to work well or not.
Besides, I noticed that the less activity (keyboard input, mouse moves, process activities, etc) there is on the machine that runs the test, the longer the test hangs.


> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3348:
---------------------------------

    Attachment: 3348-2nd-option.patch

patch for 2nd option. I am not very sure if it is the right fix for this jira, though it is a good change to have. The extra 'exists()' could be removed by a little bigger patch.

larger questions are: For JVM global replacement for "file://" handler, is LocalFileSystem appropriate? Does RawLocalFileSystem suite that better? etc.

If we could punt this issue by just changing the test a little bit, thats fine too.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594758#action_12594758 ] 

lohit vijayarenu commented on HADOOP-3348:
------------------------------------------

After looking a bit with hints from Raghu, looks like this is what is causing the problem.
once setUrlStreamHandle is set in JVM, opening a device file /dev/random is readFully opened as ChecksumFileSystem. 
SecureRandom wraps up the stream returned by opening /dev/random into BufferedInputStream and invokes read to get 20 byte DIGEST to generate the seed. This read when passed through FSInputChecker.read with a buffer size of 8K used to loop until we read 8K bytes. We invoke readFully() which loops calling multiple reads until 8K buffer is filled up. /dev/random was unable to produce random bytes so fast and hence registration of Datanode used to take forever.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594662#action_12594662 ] 

Raghu Angadi commented on HADOOP-3348:
--------------------------------------

It will be useful to have an explanation for why only this test triggers this problem, because Datanode invokes SecureRandom.getInstance() every time. It is surprising to see /dev/random being read through hadoop.ChecksumFileSystem. Is that expected? There is probably a unintentional mapping from "file:///" to ChecksumFileSystem (and if ChecksumFileSystem should have succeded anyway in this case etc).

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-3348:
-------------------------------------

    Attachment: HADOOP-3348.patch

Attached patch fixes the testcase to set UrlStreamHandlerFactory after brining up the cluster. If we decide to track issue about fixing handling local files with LocalFileSystem, then this patchs the failing testcase. 

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-3348:
-------------------------------------

    Status: Patch Available  (was: Open)

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594671#action_12594671 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3348:
------------------------------------------------

> Besides, I noticed that the less activity (keyboard input, mouse moves, process activities, etc) there is on the machine that runs the test, the longer the test hangs.

There are some random number generating algorithms using these activities as input.  I am surprise that you could notice the difference (if it is indeed the cause).   :)

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594669#action_12594669 ] 

Raghu Angadi commented on HADOOP-3348:
--------------------------------------

> For generating storageIDs, the goal is to generate unique IDs. These IDs are not for cryptographic uses. So Random may be good enough.
That is different issue, though it could be a work around if we don't fix the real issue.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Christophe Taton (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christophe Taton updated HADOOP-3348:
-------------------------------------

    Remaining Estimate: 0h
     Original Estimate: 0h

+1 for me, the workaround looks good to me

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-3348:
-------------------------------------

    Description: 
TestUrlStreamHandler sets setURLStreamHandlerFactory as
{noformat}
FsUrlStreamHandlerFactory factory =
        new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
    java.net.URL.setURLStreamHandlerFactory(factory);
{noformat}

After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
{noformat}
rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
{noformat}

jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read

(Attaching the jstack)

  was:
TestUrlStreamHandler sets setURLStreamHandlerFactory as
{noformat)
FsUrlStreamHandlerFactory factory =
        new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
    java.net.URL.setURLStreamHandlerFactory(factory);
(noformat}

After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
{noformat}
rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
{noformat}

jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read

(Attaching the jstack)


> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594973#action_12594973 ] 

Raghu Angadi commented on HADOOP-3348:
--------------------------------------

Two options :
1. FSInputChecker.read() should not do readFully(). This is harder to change since multiple users implicitly depend on current behaviour. But in long run it should change.
2. ChecksumFileSystem should use the base filesystem (RawLocalFileSystem in this case) directly when there is no .crc file. 

I think #2 is simpler to do and will reduce strange surprises. It is required anyway since even after #1, it will still try to read 512 bytes. 

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lohit vijayarenu updated HADOOP-3348:
-------------------------------------

    Attachment: Datanode_jstack.txt

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594521#action_12594521 ] 

lohit vijayarenu commented on HADOOP-3348:
------------------------------------------

After running a few times, I see that SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE) seem to hang when setURLStreamHandlerFactor(factory) is set. main thread as seen from the stack trace seem to hang on readBytes on file descriptor /dev/random strace. I see that this test runs to completion 1 out of 5-6 times tough. And also calling setURLStreamingHandlerFactory after MiniDFSCluster is up works fine.  

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595370#action_12595370 ] 

Raghu Angadi commented on HADOOP-3348:
--------------------------------------

+1. Patch looks fine to me. The larger issue could be discussed separately.

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: 3348-2nd-option.patch, Datanode_jstack.txt, HADOOP-3348.patch
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "Christophe Taton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594526#action_12594526 ] 

Christophe Taton commented on HADOOP-3348:
------------------------------------------

I have been running through similar issues. After looking into this for a while, I came to the conclusion that the problem is related to the linux random number generator, which I solved on my computer by running the rngd daemon (http://linux.die.net/man/8/rngd).


> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594648#action_12594648 ] 

lohit vijayarenu commented on HADOOP-3348:
------------------------------------------

Christophe, thanks for the pointer. Yes, on my LINUX machine, while running this program I do see that /proc/sys/kernel/random/entropy_avail returns 0. After digging a bit I came to know that, when this happens any readers of /dev/random hang, which seems to be the case right now.  Is it good idea to make an assumption that this works on a system we are trying to bring up our datanodes? Are there disadvantages to using Random.nextInt(Integer.MAX_VALUE);

> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-3348) TestUrlStreamHandler hangs on LINUX

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594758#action_12594758 ] 

lohit edited comment on HADOOP-3348 at 5/6/08 8:26 PM:
------------------------------------------------------------------

After looking a bit with hints from Raghu, looks like this is what is causing the problem.
once setUrlStreamHandle is set in JVM, opening a device file /dev/random is  opened as ChecksumFileSystem. 
SecureRandom wraps up the stream returned by opening /dev/random into BufferedInputStream and calls read to get 20 byte DIGEST to generate the seed. This read when passed through FSInputChecker.read with a buffer size of 8K, which used to loop until we read 8K bytes. We invoke readFully() which loops calling multiple reads until 8K buffer is filled up. /dev/random was unable to produce random bytes so fast and hence registration of Datanode used to take forever.

      was (Author: lohit):
    After looking a bit with hints from Raghu, looks like this is what is causing the problem.
once setUrlStreamHandle is set in JVM, opening a device file /dev/random is readFully opened as ChecksumFileSystem. 
SecureRandom wraps up the stream returned by opening /dev/random into BufferedInputStream and invokes read to get 20 byte DIGEST to generate the seed. This read when passed through FSInputChecker.read with a buffer size of 8K used to loop until we read 8K bytes. We invoke readFully() which loops calling multiple reads until 8K buffer is filled up. /dev/random was unable to produce random bytes so fast and hence registration of Datanode used to take forever.
  
> TestUrlStreamHandler hangs on LINUX
> -----------------------------------
>
>                 Key: HADOOP-3348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3348
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.0
>         Environment: LINUX 2.6.9
>            Reporter: lohit vijayarenu
>         Attachments: Datanode_jstack.txt
>
>
> TestUrlStreamHandler sets setURLStreamHandlerFactory as
> {noformat}
> FsUrlStreamHandlerFactory factory =
>         new org.apache.hadoop.fs.FsUrlStreamHandlerFactory();
>     java.net.URL.setURLStreamHandlerFactory(factory);
> {noformat}
> After this, MiniDFSCluster seems to hang while Datanodes tries to register in setNewStorageID, specifically at
> {noformat}
> rand = SecureRandom.getInstance("SHA1PRNG").nextInt(Integer.MAX_VALUE);
> {noformat}
> jstack output shows that the main thread is stuck in RawLocalFileSystem$LocalFSFileInputStream.read
> (Attaching the jstack)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.