You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/06/06 20:17:23 UTC

[jira] [Created] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

nkeywal created HBASE-6175:
------------------------------

             Summary: TestFSUtils flaky on hdfs getFileStatus method
                 Key: HBASE-6175
                 URL: https://issues.apache.org/jira/browse/HBASE-6175
             Project: HBase
          Issue Type: Bug
          Components: test
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Trivial
             Fix For: 0.96.0


This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.

{noformat}
@Test
 public void testFSUTils() throws Exception {
   final String hosts[] = {"host1", "host2", "host3", "host4"};
   Path testFile = new Path("/test1.txt");

   HBaseTestingUtility htu = new HBaseTestingUtility();

   try {
     htu.startMiniDFSCluster(hosts).waitActive();
     FileSystem fs = htu.getDFSCluster().getFileSystem();

     for (int i = 0; i < 100; ++i) {
       FSDataOutputStream out = fs.create(testFile);
       byte[] data = new byte[1];
       out.write(data, 0, 1);
       out.close();

       // Put a sleep here to make me work
       //Thread.sleep(2000);

       FileStatus status = fs.getFileStatus(testFile);
       HDFSBlocksDistribution blocksDistribution =
         FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
       assertEquals("Wrong number of hosts distributing blocks. at
iteration "+i, 3,
         blocksDistribution.getTopHosts().size());

       fs.delete(testFile, true);
     }
   } finally {
     htu.shutdownMiniDFSCluster();
   }
 }
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407149#comment-13407149 ] 

stack commented on HBASE-6175:
------------------------------

Is the loop right the way it sets ok = true at the top of the loop each time?  Indents seem off too.  Would suggest you document too why the loop (your findings above).  Else patch LGTM.

                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407161#comment-13407161 ] 

stack commented on HBASE-6175:
------------------------------

Don't worry about it; none of my comments are substantial enough to require redo (I saw the comment about hbase-6175 on one of the loops only).
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6175:
---------------------------

    Status: Patch Available  (was: Open)
    
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290903#comment-13290903 ] 

nkeywal commented on HBASE-6175:
--------------------------------

Yes, without much success :-). You're right, I will try on hdfs list. If it doesn't work out I will push a first patch to make the test non-flaky but keep this jira open as the root cause remains.
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402990#comment-13402990 ] 

nkeywal commented on HBASE-6175:
--------------------------------

Here is the fix. Without 'no go' I'll commit it this week end.
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291376#comment-13291376 ] 

nkeywal commented on HBASE-6175:
--------------------------------

The hbase-free version of the test:
{noformat}
package org.apache.hadoop.test;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hdfs.MiniDFSCluster;
import org.junit.Test;

import static junit.framework.Assert.assertEquals;

public class TestHDFS {

  @Test
  public void testFSUTils() throws Exception {
    final Configuration conf = new Configuration();
    final String hosts[] = {"host1", "host2", "host3", "host4"};
    final byte[] data = new byte[1]; // Will fit in one block
    final Path testFile = new Path("/test1.txt");

      MiniDFSCluster dfsCluster = new MiniDFSCluster(0, conf, hosts.length, true, true, true, null, null, hosts, null);
    try {
      FileSystem fs = dfsCluster.getFileSystem();
      dfsCluster.waitClusterUp();

      for (int i = 0; i < 200; ++i) {
        FSDataOutputStream out = fs.create(testFile);
        out.write(data, 0, 1);
        out.close();

        // Put a sleep here to make me work
        //Thread.sleep(1000);

        FileStatus status = fs.getFileStatus(testFile);
        int nbHosts = fs.getFileBlockLocations(status, 0, status.getLen())[0].getHosts().length;
        assertEquals(1, fs.getFileBlockLocations(status, 0, status.getLen()).length);
        assertEquals("Wrong number of hosts distributing blocks at iteration " + i, 3, nbHosts);

        fs.delete(testFile, true);
      }

    } finally {
      dfsCluster.shutdown();
    }
  }
}
{noformat}

                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407156#comment-13407156 ] 

nkeywal commented on HBASE-6175:
--------------------------------

bq. Is the loop right the way it sets ok = true at the top of the loop each time?
Yes, we set it to false when the condition it not met in "ok = (ok && uniqueBlocksTotalWeight == weight);"

bq. Would suggest you document too why the loop (your findings above).
The is this comment at the end of the loop. You want me to add something?
// NameNode is informed asynchronously, so we may have a delay. See HBASE-6175

bq. Indents seem off too. 
I'm gonna check. I've already committed the patch (yesterday), I will update if necessary.

                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403014#comment-13403014 ] 

Hadoop QA commented on HBASE-6175:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12533802/6175.v1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestStore

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2280//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2280//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2280//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2280//console

This message is automatically generated.
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291625#comment-13291625 ] 

nkeywal commented on HBASE-6175:
--------------------------------

Todd said on hdfs mailing list:
{noformat}
This is the expected behavior based on the default configuration of
dfs.replication.min. When you close the file, the client waits until
all of the DNs have the block fully written, but the DNs report the
replica to the NN asychronously. So with the default configuration,
the client then only waits for 1 replica to be available before
allowing the file to be closed.

If you need to wait for more replicas, I would recommend polling after
closing the file.
{noformat}

So I need to check if it's just the test or if HBase really needs to know the exact number of replica.
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6175:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-6175:
---------------------------

    Attachment: 6175.v1.patch
    
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>         Attachments: 6175.v1.patch
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "nkeywal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291667#comment-13291667 ] 

nkeywal commented on HBASE-6175:
--------------------------------

It's used mainly to estimate and in a cache to prioritize. It's not an issue if we miss one replica sometimes. So it's just a question of fixing the test itself.
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6175) TestFSUtils flaky on hdfs getFileStatus method

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290510#comment-13290510 ] 

stack commented on HBASE-6175:
------------------------------

This is the question you asked on the list Nicolas (why getFileStatus lags?)?  You never got an answer?  Ask on hdfs mailing list?
                
> TestFSUtils flaky on hdfs getFileStatus method
> ----------------------------------------------
>
>                 Key: HBASE-6175
>                 URL: https://issues.apache.org/jira/browse/HBASE-6175
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Trivial
>             Fix For: 0.96.0
>
>
> This is a simplified version of a TestFSUtils issue: a sleep and the test works 100% of the time. No sleep and it becomes flaky. Root cause unknown. While the issue appears on the tests, the root cause could be an issue on real production system as well.
> {noformat}
> @Test
>  public void testFSUTils() throws Exception {
>    final String hosts[] = {"host1", "host2", "host3", "host4"};
>    Path testFile = new Path("/test1.txt");
>    HBaseTestingUtility htu = new HBaseTestingUtility();
>    try {
>      htu.startMiniDFSCluster(hosts).waitActive();
>      FileSystem fs = htu.getDFSCluster().getFileSystem();
>      for (int i = 0; i < 100; ++i) {
>        FSDataOutputStream out = fs.create(testFile);
>        byte[] data = new byte[1];
>        out.write(data, 0, 1);
>        out.close();
>        // Put a sleep here to make me work
>        //Thread.sleep(2000);
>        FileStatus status = fs.getFileStatus(testFile);
>        HDFSBlocksDistribution blocksDistribution =
>          FSUtils.computeHDFSBlocksDistribution(fs, status, 0, status.getLen());
>        assertEquals("Wrong number of hosts distributing blocks. at
> iteration "+i, 3,
>          blocksDistribution.getTopHosts().size());
>        fs.delete(testFile, true);
>      }
>    } finally {
>      htu.shutdownMiniDFSCluster();
>    }
>  }
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira