You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by geomacy <gi...@git.apache.org> on 2017/08/04 11:15:25 UTC

[GitHub] brooklyn-server pull request #791: Sync file system on temp file before movi...

GitHub user geomacy opened a pull request:

    https://github.com/apache/brooklyn-server/pull/791

    Sync file system on temp file before moving it.

    See https://issues.apache.org/jira/browse/BROOKLYN-526.
    
    This is a suggestion for a fix.  Not sure how to go about testing this!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/geomacy/brooklyn-server brooklyn-526

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/brooklyn-server/pull/791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #791
    
----
commit 97dfb9a16abbb4ac7deb0a032b5a46350ca25d9e
Author: Geoff Macartney <ge...@cloudsoftcorp.com>
Date:   2017-08-04T11:09:34Z

    Sync file system on temp file before moving it.
    
    See https://issues.apache.org/jira/browse/BROOKLYN-526

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    @aledsage finally got round to checking this against the integration tests.
    
    It doesn't seem to have caused any significant problems - with a run of the tests against current `master` I got 7 failures:
    ```
    grep "^<testng-results" {core,camp/camp-brooklyn,launcher}/target/surefire-reports/testng-results.xml
    core/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="2230" total="4492" passed="2259">
    camp/camp-brooklyn/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="755" total="806" passed="48">
    launcher/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="1" ignored="81" total="101" passed="19">
    ```
    
    With this PR I actually got fewer:
    ```
    grep "^<testng-results" {*,*/*}/target/surefire-reports/testng-results.xml | fgrep -v 'failed="0"'
    test-framework/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="1" ignored="96" total="119" passed="22">
    camp/camp-brooklyn/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="753" total="804" passed="48">
    ```
    
    Think this should be benign enough to merge.
    
    



---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    Interesting; 
    
         
    ```
    Test ResultĀ (2 failures / -15)
    org.apache.brooklyn.policy.loadbalancing.LoadBalancingPolicySoakTest.testLoadBalancingManyItemsQuickTest
    org.apache.brooklyn.policy.loadbalancing.LoadBalancingPolicySoakTest.testLoadBalancingQuickTest
    ```
    
    These run ok on my laptop. This may be turning up because of the comparatively restricted capability of the build machine.   Retest this please.
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    @aledsage very good questions!
    
    I agree it's best to propagate the exception if it happens - if it's due to a scenario like (1) where the FS can't guarantee the sync, then it's the wrong filesystem for doing persistence.  A retry is probably a good idea - have added one in latest commit, deliberately keeping it very simple, to avoid possibly introducing other issues. Let me know what you think.
    
    I had a look at testing this but can't think of a way to do it easily. You'd either have to have a file system that you could force to fail `sync` on demand (something using FUSE?), or mock out the `sync` itself - I looked into using a dynamic proxy, but `FileDescriptor` is `static final` so not suitable for proxying. Would be glad of any suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    rebased against master, still haven't got round to integration tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by ahgittin <gi...@git.apache.org>.
Github user ahgittin commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    @geomacy @aledsage I hadn't realized this either; javadocs and online searches suggest the `fd.sync()` may be needed.
    
    i'm curious whether this really is the issue -- at least on Mac -- as i've done some testing and the `sync` seems to be very fast whereas `write` is what takes the time.  this is true even under contention:  `write`s contend with each other (8 files written concurrently take 8+ times as long as writing 1 file; if the OS cache is in play i'd expect CPU contention which with 4 cores we'd expect 8 files to take twice as long, which is what happens if we do a `System.arraycopy` instead of a `write`) and if there's background disk activity it also slows down substantially.  numbers and code below.
    
    worth having this i agree, based on what i've read, though i have one worry -- if `sync` is done _after_ the `close`, i always get the `SyncFailedException`; and it looks like the code in this PR is synching after close.  we might want to understand that before merging?
    
    i also came across the code at https://github.com/apache/activemq-apollo/blob/trunk/apollo-util/src/main/scala/org/apache/activemq/apollo/util/IOHelper.java#L272 :  here to get sync they included some custom JNI calls.  this makes me wonder whether -- although things acted nicely in my tests -- there aren't actually any strong guarantees with java+mac, and you need that type of low-level library?
    
    although another part of me wonders if https://issues.apache.org/jira/browse/BROOKLYN-526 was in fact caused by something completely different (disk corruption?).
    
    ---
    
    one thread:
    
    ```
    t0 took 1.390 seconds to write 1,048,576,000 bytes to file
    t0 took 0.010 seconds to sync
    t0 took 0.000 seconds to close
    ```
    
    8 threads:
    
    ```
    t7 took 13.180 seconds to write 1,048,576,000 bytes to file
    t7 took 0.099 seconds to sync
    t7 took 0.000 seconds to close
    ```
    
    code:
    
    ```
    public class FileWrite {
    
        public static void main(String[] args) throws Exception {
            write("warmup");
            int NUM_THREADS = 8;
            for (int x=0; x<NUM_THREADS; x++) {
                final String id = "t"+x;
                new Thread() {
                    public void run() {
                        try {
                            write(id);
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }
                }.start();
            }
        }
        
        public static void write(String id) throws Exception {
            Stopwatch t = Stopwatch.createStarted();
            
            File file = File.createTempFile("deleteme", ".dat"); // put your file here.
            System.out.printf("%s took %.3f seconds to create %s%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file.getAbsolutePath()); t.reset(); t.start();
            
            FileOutputStream fos = new FileOutputStream(file);
            byte[] bytes = new byte[32 * 1024];
            // byte[] bytes2 = new byte[32 * 1024];
            for (long l = 0; l < 1000 * 1024 * 1024; l += bytes.length) { 
                fos.write(bytes);
                // System.arraycopy(bytes, 0, bytes2, 0, 32 * 1024);
            }
            System.out.printf("%s took %.3f seconds to write %,d bytes to file%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file.length()); t.reset(); t.start();
            
            fos.getFD().sync();
            System.out.printf("%s took %.3f seconds to sync%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9); t.reset(); t.start();
            
            fos.close();
            System.out.printf("%s took %.3f seconds to close%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9); t.reset(); t.start();
            
            File file2 = File.createTempFile("deleteme", ".dat"); // put your file here.
            java.nio.file.Files.move(file.toPath(), file2.toPath(), StandardCopyOption.ATOMIC_MOVE);
            System.out.printf("%s took %.3f seconds to move to %s%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file2.getAbsolutePath()); t.reset(); t.start();
        }
        
    }
    ```



---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    LGTM - merging now.


---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    You're right @ahgittin, the `sync` should be done before the `close`; I have added another commit to do this.
    
    I agree I'm not sure that the problem was caused by a failure to commit data to disk when @aledsage's laptop crashed, but it's certainly plausible.  That's an interesting link you shared above.  I've done a bit more digging and come across a very interesting email here:
    
    https://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html
    
    It's definitely worth reading; in short, it looks like just adding the `sync()` in Brooklyn, while it may help, isn't in itself enough to _ensure_ the data is committed to disk on pretty much any system, not just OSX.  :-(  
    
    In fact the JNI call that you referred to on OSX seems to be the only way to guarantee the data is committed, and of course it's only available on the Mac.
    
    I think this PR may still be something worth doing (it will eliminate one failure point, by ensuring the data is flushed from host memory to the disk device).  I don't know what else we might want to do to take this further, would be interested to hear what you think.
    
     


---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    @geomacy not convinced it's worth the effort of trying to mock the `FileDescriptor` to simulate a `SyncFailedException`. If we were doing more than retry + propagate, then maybe it would. But the current logic is pretty simple.
    
    One can mock final classes with Mockito v2 (we're on v2.7.12 currently): https://stackoverflow.com/a/40018295/1393883, but you need to add a config file to change the `MockMaker`.
    
    I think this is good as-is.
    
    Have you tried running the integration tests, as well as the unit tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by ahgittin <gi...@git.apache.org>.
Github user ahgittin commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    very interesting link @geomacy - the stronger guarantee of `fcntl()` probably what the activemq jni code does.
    
    i think this PR is good and your fix here for the problem caused by the also-needed #809 looks good.  i think merge though let's give @aledsage a few days to respond.
    
    as for any remaining risk ... the documentation for `sync` is unambiguous (and the post is old) so i tend towards thinking this does what the API says, synching data with the disk.  but let's remember this issue is here in case we come across systems where the OS/disk needs additional configuration to ensure `sync` does what it says it does.


---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    @geomacy Very cool - I didn't know about that!
    
    Looking at the javadoc (https://docs.oracle.com/javase/8/docs/api/java/io/FileDescriptor.html#sync--):
    ```
    Throws SyncFailedException - Thrown when the buffers cannot be flushed, or because the system cannot guarantee that all the buffers have been synchronized with physical media.
    ```
    That suggests we should catch + continue if that exception happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #791: Sync file system on temp file before moving it.

Posted by geomacy <gi...@git.apache.org>.
Github user geomacy commented on the issue:

    https://github.com/apache/brooklyn-server/pull/791
  
    I haven't run the integration tests, I shall try to make some time to do so.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server pull request #791: Sync file system on temp file before movi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/brooklyn-server/pull/791


---