You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Robert P. Thille (JIRA)" <ji...@apache.org> on 2013/04/10 23:25:16 UTC

[jira] [Created] (THRIFT-1928) Race may still exist in TFileTransport::flush()

Robert P. Thille created THRIFT-1928:
----------------------------------------

             Summary: Race may still exist in TFileTransport::flush()
                 Key: THRIFT-1928
                 URL: https://issues.apache.org/jira/browse/THRIFT-1928
             Project: Thrift
          Issue Type: Bug
    Affects Versions: 0.9
         Environment: Ubuntu 10.04 server running on Virtualbox hosted on Ubuntu 12.04. Core i7 dual-core. (Dell Latitude E6530)
            Reporter: Robert P. Thille


I'm building Thrift 0.9.0 on Ubuntu 10.04 to upgrade our production system. Since my dev box is 12.04, I'm using Virtualbox. When running 'make check' I ran into a failure on TFileTransportTest taking too long. Investigation found this patch:
https://issues.apache.org/jira/secure/attachment/12455704/ASF.LICENSE.NOT.GRANTED--v1-0034-THRIFT-926.-cpp-Fix-bugs-in-TFileTransport-flush.patch

I think I might still be seeing this bug pop up.
I thought at first it might be just that my machine was slow, it's a core i7 running Virtualbox so I can build for an older release we use in production.
So I augmented TFileTransportTest.cpp to give it plenty of time and show how long each flush took:

  for (unsigned int n = 0; n < 10; ++n) {
    struct timeval preflush;
    gettimeofday(&preflush, NULL);
    transport.flush();

    struct timeval now;
    gettimeofday(&now, NULL);
    int delta = time_diff(&preflush, &now);
    fprintf(stderr, "Flush took:%d microsecs\n",delta);

    // Fail if at any point we've been running for longer than half a second.
    // (With the buggy code, TFileTransport used to take 3 seconds per flush())
    //
    // Use a fatal fail so we break out early, rather than continuing to make
    // many more slow flush() calls.
    delta = time_diff(&start, &now);
    BOOST_REQUIRE_LT(delta, 100000000);
  }

Here's the output from the test run with '"make check":
...
PASS: ZlibTest
Flush took:75 microsecs
Flush took:3000195 microsecs
Flush took:3000405 microsecs
Flush took:27 microsecs
Flush took:16 microsecs
Flush took:16 microsecs
Flush took:16 microsecs
Flush took:16 microsecs
Flush took:13 microsecs
Flush took:14 microsecs
<TestLog><TestSuite name="TFileTransportTest"><TestCase name="test_destructor"><TestingTime>210000</TestingTime></TestCase><TestCase name="test_flush_max_us1"><TestingTime>20000</TestingTime></TestCase><TestCase name="test_flush_max_us2"><TestingTime>0</TestingTime></TestCase><TestCase name="test_flush_max_us3"><TestingTime>0</TestingTime></TestCase><TestCase name="test_noop_flush"><TestingTime>10000</TestingTime></TestCase></TestSuite></TestLog>
*** No errors detected
PASS: TFileTransportTest

Running the test more directly:
ubuntu@U1004AMD64:~/thrift-0.9.0/lib/cpp/test$ ./TFileTransportTest
Running 5 test cases...
Flush took:70 microsecs
Flush took:3000333 microsecs
Flush took:3000376 microsecs
Flush took:3000256 microsecs
Flush took:3000348 microsecs
Flush took:3000350 microsecs
Flush took:3000278 microsecs
Flush took:23 microsecs
Flush took:16 microsecs
Flush took:16 microsecs


As you can see, multiple runs show similar output, where some flushes are very fast, and some flushes take just over 3 seconds. That makes me think there's still a race lurking somewhere.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira