You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by iv...@apache.org on 2015/05/21 13:00:43 UTC

svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Author: ivan
Date: Thu May 21 11:00:43 2015
New Revision: 1680819

URL: http://svn.apache.org/r1680819
Log:
Prevent a possible FSFS repository corruption with power or network disk
failures when changing revision properties. Perform a hardware flush
after we finished writing to a temporary revprop file and before moving
it into place. The change doesn't affect commit operation behavior.

The corruption can be easily reproduced by triggering a power loss while
performing svnsync.

This change is somewhat similar to what we did in r1483781, but covers how
we write revision property files. See related discussion in dev@s.a.o
(Subject: "FSFS Repository corruption on high load on Windows [...] ") [1].

Patch by: me
          kotkov

[1] http://svn.haxx.se/dev/archive-2013-05/0245.shtml

* subversion/libsvn_fs_fs/revprops.c
  (repack_stream_open): Rename the function ...
  (repack_file_open): ...to this. Rework it to return files (apr_file_t)
   instead of streams.
  (repack_revprops): Work with a file instead of a stream. Flush any
   unwritten data to disk before returning.
  (write_non_packed_revprop): Flush any unwritten data to disk after
   serializing the revision property list.
  (write_packed_revprop): Cope with the changes in repack_file_open() and
   repack_revprops() that now work with files. Flush the data to disk when
   done writing to a temporary manifest file.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Modified: subversion/trunk/subversion/libsvn_fs_fs/revprops.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/revprops.c?rev=1680819&r1=1680818&r2=1680819&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/revprops.c (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/revprops.c Thu May 21 11:00:43 2015
@@ -652,17 +652,23 @@ write_non_packed_revprop(const char **fi
                          apr_hash_t *proplist,
                          apr_pool_t *pool)
 {
+  apr_file_t *file;
   svn_stream_t *stream;
   *final_path = svn_fs_fs__path_revprops(fs, rev, pool);
 
   /* ### do we have a directory sitting around already? we really shouldn't
      ### have to get the dirname here. */
-  SVN_ERR(svn_stream_open_unique(&stream, tmp_path,
-                                 svn_dirent_dirname(*final_path, pool),
-                                 svn_io_file_del_none, pool, pool));
+  SVN_ERR(svn_io_open_unique_file3(&file, tmp_path,
+                                   svn_dirent_dirname(*final_path, pool),
+                                   svn_io_file_del_none, pool, pool));
+  stream = svn_stream_from_aprfile2(file, TRUE, pool);
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, pool));
   SVN_ERR(svn_stream_close(stream));
 
+  /* Flush temporary file to disk and close it. */
+  SVN_ERR(svn_io_file_flush_to_disk(file, pool));
+  SVN_ERR(svn_io_file_close(file, pool));
+
   return SVN_NO_ERROR;
 }
 
@@ -745,7 +751,7 @@ serialize_revprops_header(svn_stream_t *
   return SVN_NO_ERROR;
 }
 
-/* Writes the a pack file to FILE_STREAM.  It copies the serialized data
+/* Writes the a pack file to FILE.  It copies the serialized data
  * from REVPROPS for the indexes [START,END) except for index CHANGED_INDEX.
  *
  * The data for the latter is taken from NEW_SERIALIZED.  Note, that
@@ -763,7 +769,7 @@ repack_revprops(svn_fs_t *fs,
                 int changed_index,
                 svn_stringbuf_t *new_serialized,
                 apr_off_t new_total_size,
-                svn_stream_t *file_stream,
+                apr_file_t *file,
                 apr_pool_t *pool)
 {
   fs_fs_data_t *ffd = fs->fsap_data;
@@ -811,9 +817,11 @@ repack_revprops(svn_fs_t *fs,
                           ? SVN_DELTA_COMPRESSION_LEVEL_DEFAULT
                           : SVN_DELTA_COMPRESSION_LEVEL_NONE));
 
-  /* finally, write the content to the target stream and close it */
-  SVN_ERR(svn_stream_write(file_stream, compressed->data, &compressed->len));
-  SVN_ERR(svn_stream_close(file_stream));
+  /* finally, write the content to the target file, flush and close it */
+  SVN_ERR(svn_io_file_write_full(file, compressed->data, compressed->len,
+                                 NULL, pool));
+  SVN_ERR(svn_io_file_flush_to_disk(file, pool));
+  SVN_ERR(svn_io_file_close(file, pool));
 
   return SVN_NO_ERROR;
 }
@@ -821,23 +829,22 @@ repack_revprops(svn_fs_t *fs,
 /* Allocate a new pack file name for revisions
  *     [REVPROPS->START_REVISION + START, REVPROPS->START_REVISION + END - 1]
  * of REVPROPS->MANIFEST.  Add the name of old file to FILES_TO_DELETE,
- * auto-create that array if necessary.  Return an open file stream to
- * the new file in *STREAM allocated in POOL.
+ * auto-create that array if necessary.  Return an open file *FILE that is
+ * allocated in POOL.
  */
 static svn_error_t *
-repack_stream_open(svn_stream_t **stream,
-                   svn_fs_t *fs,
-                   packed_revprops_t *revprops,
-                   int start,
-                   int end,
-                   apr_array_header_t **files_to_delete,
-                   apr_pool_t *pool)
+repack_file_open(apr_file_t **file,
+                 svn_fs_t *fs,
+                 packed_revprops_t *revprops,
+                 int start,
+                 int end,
+                 apr_array_header_t **files_to_delete,
+                 apr_pool_t *pool)
 {
   apr_int64_t tag;
   const char *tag_string;
   svn_string_t *new_filename;
   int i;
-  apr_file_t *file;
   int manifest_offset
     = (int)(revprops->start_revision - revprops->manifest_start);
 
@@ -869,12 +876,11 @@ repack_stream_open(svn_stream_t **stream
     APR_ARRAY_IDX(revprops->manifest, i + manifest_offset, const char*)
       = new_filename->data;
 
-  /* create a file stream for the new file */
-  SVN_ERR(svn_io_file_open(&file, svn_dirent_join(revprops->folder,
-                                                  new_filename->data,
-                                                  pool),
+  /* open the file */
+  SVN_ERR(svn_io_file_open(file, svn_dirent_join(revprops->folder,
+                                                 new_filename->data,
+                                                 pool),
                            APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
-  *stream = svn_stream_from_aprfile2(file, FALSE, pool);
 
   return SVN_NO_ERROR;
 }
@@ -898,6 +904,7 @@ write_packed_revprop(const char **final_
   packed_revprops_t *revprops;
   apr_int64_t generation = 0;
   svn_stream_t *stream;
+  apr_file_t *file;
   svn_stringbuf_t *serialized;
   apr_off_t new_total_size;
   int changed_index;
@@ -928,11 +935,11 @@ write_packed_revprop(const char **final_
 
       *final_path = svn_dirent_join(revprops->folder, revprops->filename,
                                     pool);
-      SVN_ERR(svn_stream_open_unique(&stream, tmp_path, revprops->folder,
-                                     svn_io_file_del_none, pool, pool));
+      SVN_ERR(svn_io_open_unique_file3(&file, tmp_path, revprops->folder,
+                                       svn_io_file_del_none, pool, pool));
       SVN_ERR(repack_revprops(fs, revprops, 0, revprops->sizes->nelts,
                               changed_index, serialized, new_total_size,
-                              stream, pool));
+                              file, pool));
     }
   else
     {
@@ -978,50 +985,53 @@ write_packed_revprop(const char **final_
       /* write the new, split files */
       if (left_count)
         {
-          SVN_ERR(repack_stream_open(&stream, fs, revprops, 0,
-                                     left_count, files_to_delete, pool));
+          SVN_ERR(repack_file_open(&file, fs, revprops, 0,
+                                   left_count, files_to_delete, pool));
           SVN_ERR(repack_revprops(fs, revprops, 0, left_count,
                                   changed_index, serialized, new_total_size,
-                                  stream, pool));
+                                  file, pool));
         }
 
       if (left_count + right_count < revprops->sizes->nelts)
         {
-          SVN_ERR(repack_stream_open(&stream, fs, revprops, changed_index,
-                                     changed_index + 1, files_to_delete,
-                                     pool));
+          SVN_ERR(repack_file_open(&file, fs, revprops, changed_index,
+                                   changed_index + 1, files_to_delete,
+                                   pool));
           SVN_ERR(repack_revprops(fs, revprops, changed_index,
                                   changed_index + 1,
                                   changed_index, serialized, new_total_size,
-                                  stream, pool));
+                                  file, pool));
         }
 
       if (right_count)
         {
-          SVN_ERR(repack_stream_open(&stream, fs, revprops,
-                                     revprops->sizes->nelts - right_count,
-                                     revprops->sizes->nelts,
-                                     files_to_delete, pool));
+          SVN_ERR(repack_file_open(&file, fs, revprops,
+                                   revprops->sizes->nelts - right_count,
+                                   revprops->sizes->nelts,
+                                   files_to_delete, pool));
           SVN_ERR(repack_revprops(fs, revprops,
                                   revprops->sizes->nelts - right_count,
                                   revprops->sizes->nelts, changed_index,
-                                  serialized, new_total_size, stream,
+                                  serialized, new_total_size, file,
                                   pool));
         }
 
       /* write the new manifest */
       *final_path = svn_dirent_join(revprops->folder, PATH_MANIFEST, pool);
-      SVN_ERR(svn_stream_open_unique(&stream, tmp_path, revprops->folder,
-                                     svn_io_file_del_none, pool, pool));
+      SVN_ERR(svn_io_open_unique_file3(&file, tmp_path, revprops->folder,
+                                       svn_io_file_del_none, pool, pool));
 
       for (i = 0; i < revprops->manifest->nelts; ++i)
         {
           const char *filename = APR_ARRAY_IDX(revprops->manifest, i,
                                                const char*);
-          SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
+          SVN_ERR(svn_io_file_write_full(file, filename, strlen(filename),
+                                         NULL, pool));
+          SVN_ERR(svn_io_file_putc('\n', file, pool));
         }
 
-      SVN_ERR(svn_stream_close(stream));
+      SVN_ERR(svn_io_file_flush_to_disk(file, pool));
+      SVN_ERR(svn_io_file_close(file, pool));
     }
 
   return SVN_NO_ERROR;



Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Branko Čibej <br...@wandisco.com>.
I mean we could leave the current 1.8.x backport proposal unchanged.
Alternatively, tweaking your patch to make the new API private for 1.8.x is
fine; probably even better as it's likely to make future backports easier.
On 22 May 2015 12:46 pm, "Philip Martin" <ph...@wandisco.com> wrote:

> Branko Čibej <br...@wandisco.com> writes:
>
> > On 22.05.2015 10:50, Philip Martin wrote:
> >>
> >> Another approach would be to have a function to enable flushing directly
> >> on the stream created by svn_stream_from_aprfile2() without creating a
> >> new stream.  This is probably the smallest/simplest patch.  After
> >> reverting r1680819:
> >
> > [...]
> >
> > Big +1! Let's do it this way. We can keep the 1.8.x backport unchanged,
> too.
>
> Not sure what you mean by 1.8.x being unchanged.  My patch has a new
> public API and a 1.8.x backport would make it private, unless we make
> the new API private everywhere.
>
> --
> Philip Martin | Subversion Committer
> WANdisco // *Non-Stop Data*
>

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@wandisco.com> writes:

> On 22.05.2015 10:50, Philip Martin wrote:
>>
>> Another approach would be to have a function to enable flushing directly
>> on the stream created by svn_stream_from_aprfile2() without creating a
>> new stream.  This is probably the smallest/simplest patch.  After
>> reverting r1680819:
>
> [...]
>
> Big +1! Let's do it this way. We can keep the 1.8.x backport unchanged, too.

Not sure what you mean by 1.8.x being unchanged.  My patch has a new
public API and a 1.8.x backport would make it private, unless we make
the new API private everywhere.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Branko Čibej <br...@wandisco.com>.
On 22.05.2015 10:50, Philip Martin wrote:
> Branko Čibej <br...@wandisco.com> writes:
>
>> FWIW, I think in this case revving the API without deprecating the old
>> one is justified. Another option would be to invent a different API name
>> for the flushing stream, e.g., svn_stream_from_aprfile_flushed or some
>> such. This way we'd also avoid the dilemma about disown by simply not
>> having that flag in the new function.
> Are there bits missing from r1680819?  svn_fs_fs__pack_revprops_shard()
> still uses a stream to write the manifest file and it has not been
> modified to flush.  write_non_packed_revprop() also uses a stream that
> is not flushed.
>
> We could create a "flushing stream" that gets chained:
>
> struct baton_flush_t {
>   svn_stream_t *stream;
>   apr_pool_t *pool;
> };
>
> static svn_error_t *
> close_handler_flush(void *baton)
> {
>   struct baton_flush_t *b = baton;
>
>   SVN_ERR(svn_io_file_flush_to_disk(svn_stream__aprfile(b->stream), b->pool));
>   SVN_ERR(svn_stream_close(b->stream));
>   return SVN_NO_ERROR;
> }
>
> svn_error_t *
> svn_stream_flush(svn_stream_t **flush,
>                  svn_stream_t *stream,
>                  apr_pool_t *pool)
> {
>   struct baton_flush_t *baton;
>
>   if (!svn_stream__aprfile(stream))
>     return svn_error_create("No file to flush");
>
>   baton = apr_palloc(...);
>   *flush = svn_stream_create(baton, pool);
>   svn_stream_set_close(*flush, close_handler_flush);
>   svn_stream_set_read(...);
>   ...
>
>   return SVN_NO_ERROR;
> }
>
> That would allow the flushing of files we are not going to close which
> probably works but may not be something we want.  If we want to avoid
> that then perhaps we use a more specialised svn_stream_flush() that only
> supports streams created by svn_stream_from_aprfile2() by detecting
> close_fn==close_hadler_apr.
>
> svn_error_t *
> svn_stream_flush(svn_stream_t **flush,
>                  svn_stream_t *stream,
>                  apr_pool_t *pool)
> {
>   struct baton_flush_t *baton;
>
>   if (stream->close_fn != close_handler_apr)
>     return svn_error_create("No closing file to flush") 
>
>   ...
> }
>
>
> Another approach would be to have a function to enable flushing directly
> on the stream created by svn_stream_from_aprfile2() without creating a
> new stream.  This is probably the smallest/simplest patch.  After
> reverting r1680819:

[...]

Big +1! Let's do it this way. We can keep the 1.8.x backport unchanged, too.

-- Brane

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> I intentionally kept packing code unchanged: it seems that now we
> don't use hardware flush when packing repository: so we need apply fix
> to all pack code, which is separate issue IMO.

Yes!  I can confirm that with strace: we write pack/manifest files for
revisions without fsync().  For a repository with shard size 3 I get:

open("repo/db/revs/1.pack/pack", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 4
open("repo/db/revs/1.pack/manifest", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 5
open("repo/db/revs/1/3", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
open("repo/db/revs/1/4", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
open("repo/db/revs/1/5", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
write(5, "0\n240\n480\n", 10)           = 10
close(5)                                = 0
write(4, "DELTA\nSVN\1\0\0\4\2\5\1\204\4END\nENDREP\nid:"..., 720) = 720
close(4)                                = 0

followed by:

unlink("repo/db/revs/1/4")              = 0
unlink("repo/db/revs/1/5")              = 0
unlink("repo/db/revs/1/3")              = 0

For revprops I see the fsync() fix:

open("repo/db/revprops/1.pack/manifest", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 4
open("repo/db/revprops/1.pack/3.0", O_WRONLY|O_CREAT|O_CLOEXEC, 0666) = 5
open("repo/db/revprops/1/3", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
open("repo/db/revprops/1/4", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
open("repo/db/revprops/1/5", O_RDONLY|O_CLOEXEC) = 6
close(6)                                = 0
write(5, "\202\0373\n3\n91\n91\n91\n\nK 10\nsvn:author\n"..., 289) = 289
fsync(5)                                = 0
close(5)                                = 0
write(4, "3.0\n3.0\n3.0\n", 12)         = 12
fsync(4)                                = 0
close(4)                                = 0


-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> On 22 May 2015 at 11:50, Philip Martin <ph...@wandisco.com> wrote:
>>
>> Another approach would be to have a function to enable flushing directly
>> on the stream created by svn_stream_from_aprfile2() without creating a
>> new stream.  This is probably the smallest/simplest patch.  After
>> reverting r1680819:
>>
> [...]
>
> This approach introduce dependency what kind of stream returned from
> svn_stream_open_unique() which reduce advantage of using abstract
> stream_t object.

It's a disadvantage but it also has the big benefit that it allows the
flush to be applied to those wrappers.

> Given all feedback I suggest do the following:
> 1. Commit patch introducing svn_stream_from_aprfile3() with
> FLUSH_ON_CLOSE argument, without deprecating
> svn_stream_from_aprfile2().
> 2. I revert r1680819
> 3. Switch revprop change code to use svn_stream_from_aprfile3() with
> FLUSH_ON_CLOSE=TRUE.
>
> Then I'll nominate alternative revisions from (1) and (3) to 1.9.x.

That still leaves the problem of how to introduce a flush to something
using open_writable or open_unique.  Do those functions also get a
FLUSH_ON_CLOSE flag to pass through the from_aprfile3?  We have code
that is using those functions and not flushing even though it probably
should flush.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 22 May 2015 at 11:50, Philip Martin <ph...@wandisco.com> wrote:
> Branko Čibej <br...@wandisco.com> writes:
>
>> FWIW, I think in this case revving the API without deprecating the old
>> one is justified. Another option would be to invent a different API name
>> for the flushing stream, e.g., svn_stream_from_aprfile_flushed or some
>> such. This way we'd also avoid the dilemma about disown by simply not
>> having that flag in the new function.
>
> Are there bits missing from r1680819?  svn_fs_fs__pack_revprops_shard()
> still uses a stream to write the manifest file and it has not been
> modified to flush.
I intentionally kept packing code unchanged: it seems that now we
don't use hardware flush when packing repository: so we need apply fix
to all pack code, which is separate issue IMO.

> write_non_packed_revprop() also uses a stream that is not flushed.
What do you mean? Here is hunk changing write_non_packed_revprop() to
flush file before close.
[[
@@ -652,17 +652,23 @@ write_non_packed_revprop(const char **fi
                          apr_hash_t *proplist,
                          apr_pool_t *pool)
 {
+  apr_file_t *file;
   svn_stream_t *stream;
   *final_path = svn_fs_fs__path_revprops(fs, rev, pool);

   /* ### do we have a directory sitting around already? we really shouldn't
      ### have to get the dirname here. */
-  SVN_ERR(svn_stream_open_unique(&stream, tmp_path,
-                                 svn_dirent_dirname(*final_path, pool),
-                                 svn_io_file_del_none, pool, pool));
+  SVN_ERR(svn_io_open_unique_file3(&file, tmp_path,
+                                   svn_dirent_dirname(*final_path, pool),
+                                   svn_io_file_del_none, pool, pool));
+  stream = svn_stream_from_aprfile2(file, TRUE, pool);
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, pool));
   SVN_ERR(svn_stream_close(stream));

+  /* Flush temporary file to disk and close it. */
+  SVN_ERR(svn_io_file_flush_to_disk(file, pool));
+  SVN_ERR(svn_io_file_close(file, pool));
+
   return SVN_NO_ERROR;
 }

@@ -745,7 +751,7 @@ serialize_revprops_header(svn_stream_t *
   return SVN_NO_ERROR;
 }
]]]


> We could create a "flushing stream" that gets chained:
>
> struct baton_flush_t {
>   svn_stream_t *stream;
>   apr_pool_t *pool;
> };
>
> static svn_error_t *
> close_handler_flush(void *baton)
> {
>   struct baton_flush_t *b = baton;
>
>   SVN_ERR(svn_io_file_flush_to_disk(svn_stream__aprfile(b->stream), b->pool));
>   SVN_ERR(svn_stream_close(b->stream));
>   return SVN_NO_ERROR;
> }
>
> svn_error_t *
> svn_stream_flush(svn_stream_t **flush,
>                  svn_stream_t *stream,
>                  apr_pool_t *pool)
I think that using svn_stream_flush() as wrapper
svn_io_file_flush_to_disk() is very confusing since it will be mixed
up with svn_io_file_flush().

[...]

>
> Another approach would be to have a function to enable flushing directly
> on the stream created by svn_stream_from_aprfile2() without creating a
> new stream.  This is probably the smallest/simplest patch.  After
> reverting r1680819:
>
[...]

This approach introduce dependency what kind of stream returned from
svn_stream_open_unique() which reduce advantage of using abstract
stream_t object.

Given all feedback I suggest do the following:
1. Commit patch introducing svn_stream_from_aprfile3() with
FLUSH_ON_CLOSE argument, without deprecating
svn_stream_from_aprfile2().
2. I revert r1680819
3. Switch revprop change code to use svn_stream_from_aprfile3() with
FLUSH_ON_CLOSE=TRUE.

Then I'll nominate alternative revisions from (1) and (3) to 1.9.x.

Does it work for you?

-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Branko Čibej <br...@wandisco.com> writes:

> FWIW, I think in this case revving the API without deprecating the old
> one is justified. Another option would be to invent a different API name
> for the flushing stream, e.g., svn_stream_from_aprfile_flushed or some
> such. This way we'd also avoid the dilemma about disown by simply not
> having that flag in the new function.

Are there bits missing from r1680819?  svn_fs_fs__pack_revprops_shard()
still uses a stream to write the manifest file and it has not been
modified to flush.  write_non_packed_revprop() also uses a stream that
is not flushed.

We could create a "flushing stream" that gets chained:

struct baton_flush_t {
  svn_stream_t *stream;
  apr_pool_t *pool;
};

static svn_error_t *
close_handler_flush(void *baton)
{
  struct baton_flush_t *b = baton;

  SVN_ERR(svn_io_file_flush_to_disk(svn_stream__aprfile(b->stream), b->pool));
  SVN_ERR(svn_stream_close(b->stream));
  return SVN_NO_ERROR;
}

svn_error_t *
svn_stream_flush(svn_stream_t **flush,
                 svn_stream_t *stream,
                 apr_pool_t *pool)
{
  struct baton_flush_t *baton;

  if (!svn_stream__aprfile(stream))
    return svn_error_create("No file to flush");

  baton = apr_palloc(...);
  *flush = svn_stream_create(baton, pool);
  svn_stream_set_close(*flush, close_handler_flush);
  svn_stream_set_read(...);
  ...

  return SVN_NO_ERROR;
}

That would allow the flushing of files we are not going to close which
probably works but may not be something we want.  If we want to avoid
that then perhaps we use a more specialised svn_stream_flush() that only
supports streams created by svn_stream_from_aprfile2() by detecting
close_fn==close_hadler_apr.

svn_error_t *
svn_stream_flush(svn_stream_t **flush,
                 svn_stream_t *stream,
                 apr_pool_t *pool)
{
  struct baton_flush_t *baton;

  if (stream->close_fn != close_handler_apr)
    return svn_error_create("No closing file to flush") 

  ...
}


Another approach would be to have a function to enable flushing directly
on the stream created by svn_stream_from_aprfile2() without creating a
new stream.  This is probably the smallest/simplest patch.  After
reverting r1680819:

Index: subversion/include/svn_io.h
===================================================================
--- subversion/include/svn_io.h	(revision 1680818)
+++ subversion/include/svn_io.h	(working copy)
@@ -1087,6 +1087,9 @@ svn_stream_from_aprfile2(apr_file_t *file,
                          svn_boolean_t disown,
                          apr_pool_t *pool);
 
+svn_error_t *
+svn_stream_flush_on_close(svn_stream_t *stream);
+  
 /** Similar to svn_stream_from_aprfile2(), except that the file will
  * always be disowned.
  *
Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c	(revision 1680818)
+++ subversion/libsvn_fs_fs/revprops.c	(working copy)
@@ -660,6 +660,7 @@ write_non_packed_revprop(const char **final_path,
   SVN_ERR(svn_stream_open_unique(&stream, tmp_path,
                                  svn_dirent_dirname(*final_path, pool),
                                  svn_io_file_del_none, pool, pool));
+  SVN_ERR(svn_stream_flush_on_close(stream));
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, pool));
   SVN_ERR(svn_stream_close(stream));
 
@@ -875,6 +876,7 @@ repack_stream_open(svn_stream_t **stream,
                                                   pool),
                            APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
   *stream = svn_stream_from_aprfile2(file, FALSE, pool);
+  SVN_ERR(svn_stream_flush_on_close(*stream));
 
   return SVN_NO_ERROR;
 }
@@ -1206,6 +1208,7 @@ svn_fs_fs__copy_revprops(const char *pack_file_dir
 
   /* write the pack file content to disk */
   stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
+  SVN_ERR(svn_stream_flush_on_close(stream));
   SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
   SVN_ERR(svn_stream_close(stream));
 
@@ -1244,6 +1247,7 @@ svn_fs_fs__pack_revprops_shard(const char *pack_fi
   SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool));
   SVN_ERR(svn_stream_open_writable(&manifest_stream, manifest_file_path,
                                    scratch_pool, scratch_pool));
+  SVN_ERR(svn_stream_flush_on_close(manifest_stream));
 
   /* revisions to handle. Special case: revision 0 */
   start_rev = (svn_revnum_t) (shard * max_files_per_dir);
Index: subversion/libsvn_subr/stream.c
===================================================================
--- subversion/libsvn_subr/stream.c	(revision 1680818)
+++ subversion/libsvn_subr/stream.c	(working copy)
@@ -911,6 +911,29 @@ close_handler_apr(void *baton)
 }
 
 static svn_error_t *
+close_handler_flush(void *baton)
+{
+  struct baton_apr *btn = baton;
+
+  SVN_ERR(svn_io_file_flush_to_disk(btn->file, btn->pool));
+  SVN_ERR(close_handler_apr(baton));
+  return SVN_NO_ERROR;
+}
+
+svn_error_t *
+svn_stream_flush_on_close(svn_stream_t *stream)
+{
+  if (stream->close_fn != close_handler_apr)
+    return svn_error_create(SVN_ERR_STREAM_NOT_SUPPORTED, NULL,
+                            _("No closing file to flush"));
+
+  svn_stream_set_close(stream, close_handler_flush);
+
+  return SVN_NO_ERROR;
+}
+
+
+static svn_error_t *
 mark_handler_apr(void *baton, svn_stream_mark_t **mark, apr_pool_t *pool)
 {
   struct baton_apr *btn = baton;

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Julian Foad writes:
>> Evgeny Kotkov wrote:
>>> Philip, Julian, thank you for giving this patch a look.
>>>
>>> As I said in my previous e-mails, I think that the committed fix is a better
>>> choice here, as opposed to attempts to achieve the same behavior using the
>>> stream API. [...]
>>
>> That sounds good to me -- leaving the code using file APIs.
>
> It only fixes writing revprops during commit, it still leaves the
> problem that the pack operation doesn't flush.  We need to add flush
> when writing the revision pack file, the packed indices, the revprop
> pack file and the revprop manifest file.  That means replacing more, if
> not all, of the stream code in revprops.c.

Yes, like this, for a start, I suppose:

[[[
Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c    (revision 1682078)
+++ subversion/libsvn_fs_fs/revprops.c    (working copy)
@@ -1170,7 +1170,6 @@ svn_fs_fs__copy_revprops(const char *pac
   apr_file_t *pack_file;
   svn_revnum_t rev;
   apr_pool_t *iterpool = svn_pool_create(scratch_pool);
-  svn_stream_t *stream;

   /* create empty data buffer and a write stream on top of it */
   svn_stringbuf_t *uncompressed
@@ -1194,6 +1193,7 @@ svn_fs_fs__copy_revprops(const char *pac
   for (rev = start_rev; rev <= end_rev; rev++)
     {
       const char *path;
+      svn_stream_t *stream;

       svn_pool_clear(iterpool);

@@ -1214,10 +1214,11 @@ svn_fs_fs__copy_revprops(const char *pac
   /* compress the content (or just store it for COMPRESSION_LEVEL 0) */
   SVN_ERR(svn__compress(uncompressed, compressed, compression_level));

-  /* write the pack file content to disk */
-  stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
-  SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
-  SVN_ERR(svn_stream_close(stream));
+  /* write and flush the pack file content to disk */
+  SVN_ERR(svn_io_file_write_full(pack_file, compressed->data, compressed->len,
+                                 NULL, scratch_pool));
+  SVN_ERR(svn_io_file_flush_to_disk(pack_file, scratch_pool));
+  SVN_ERR(svn_io_file_close(pack_file, scratch_pool));

   svn_pool_destroy(iterpool);

@@ -1236,6 +1237,7 @@ svn_fs_fs__pack_revprops_shard(const cha
                                apr_pool_t *scratch_pool)
 {
   const char *manifest_file_path, *pack_filename = NULL;
+  apr_file_t *manifest_file;
   svn_stream_t *manifest_stream;
   svn_revnum_t start_rev, end_rev, rev;
   apr_off_t total_size;
@@ -1252,8 +1254,13 @@ svn_fs_fs__pack_revprops_shard(const cha

   /* Create the new directory and manifest file stream. */
   SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool));
-  SVN_ERR(svn_stream_open_writable(&manifest_stream, manifest_file_path,
-                                   scratch_pool, scratch_pool));
+  SVN_ERR(svn_io_file_open(&manifest_file, manifest_file_path,
+                           APR_WRITE
+                             | APR_BUFFERED
+                             | APR_CREATE
+                             | APR_EXCL,
+                           APR_OS_DEFAULT, scratch_pool));
+  manifest_stream = svn_stream_from_aprfile2(manifest_file, TRUE,
scratch_pool);

   /* revisions to handle. Special case: revision 0 */
   start_rev = (svn_revnum_t) (shard * max_files_per_dir);
@@ -1322,6 +1329,8 @@ svn_fs_fs__pack_revprops_shard(const cha

   /* flush the manifest file and update permissions */
   SVN_ERR(svn_stream_close(manifest_stream));
+  SVN_ERR(svn_io_file_flush_to_disk(manifest_file, scratch_pool));
+  SVN_ERR(svn_io_file_close(manifest_file, scratch_pool));
   SVN_ERR(svn_io_copy_perms(shard_path, pack_file_dir, iterpool));

   svn_pool_destroy(iterpool);
]]]

Here I just picked two functions that look like they needed flushes, I
didn't look for all places where it's required, as I just wanted to
check the patch would not be too difficult or ugly.

That looks good to me. Does it to you?

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Ivan Zhakov wrote:
> Please find patch that adds disk flushes to FSFS packing code in attachment.
>
> I'm currently testing it and going to commit it later. This patch
> should also slightly improve packing performance as a bonus, since we
> can avoid APR buffered IO for some cases.

Note that you can simplify svn_fs_fs__copy_revprops() a little further
by avoiding the stream completely, as in my previous email in this
thread.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 27 May 2015 at 19:56, Ivan Zhakov <iv...@visualsvn.com> wrote:
> On 27 May 2015 at 19:48, Philip Martin <ph...@wandisco.com> wrote:
>> Julian Foad <ju...@btopenworld.com> writes:
>>
>>> Evgeny Kotkov wrote:
>>>> Philip, Julian, thank you for giving this patch a look.
>>>>
>>>> As I said in my previous e-mails, I think that the committed fix is a better
>>>> choice here, as opposed to attempts to achieve the same behavior using the
>>>> stream API. [...]
>>>
>>> That sounds good to me -- leaving the code using file APIs.
>>
> +1.
>
>> It only fixes writing revprops during commit, it still leaves the
>> problem that the pack operation doesn't flush.  We need to add flush
>> when writing the revision pack file, the packed indices, the revprop
>> pack file and the revprop manifest file.  That means replacing more, if
>> not all, of the stream code in revprops.c.
>>
> I'm currently working on patch to fix packing and hotcopy code.
>
Please find patch that adds disk flushes to FSFS packing code in attachment.

I'm currently testing it and going to commit it later. This patch
should also slightly improve packing performance as a bonus, since we
can avoid APR buffered IO for some cases.


-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 27 May 2015 at 19:48, Philip Martin <ph...@wandisco.com> wrote:
> Julian Foad <ju...@btopenworld.com> writes:
>
>> Evgeny Kotkov wrote:
>>> Philip, Julian, thank you for giving this patch a look.
>>>
>>> As I said in my previous e-mails, I think that the committed fix is a better
>>> choice here, as opposed to attempts to achieve the same behavior using the
>>> stream API. [...]
>>
>> That sounds good to me -- leaving the code using file APIs.
>
+1.

> It only fixes writing revprops during commit, it still leaves the
> problem that the pack operation doesn't flush.  We need to add flush
> when writing the revision pack file, the packed indices, the revprop
> pack file and the revprop manifest file.  That means replacing more, if
> not all, of the stream code in revprops.c.
>
I'm currently working on patch to fix packing and hotcopy code.

-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@btopenworld.com> writes:

> Evgeny Kotkov wrote:
>> Philip, Julian, thank you for giving this patch a look.
>>
>> As I said in my previous e-mails, I think that the committed fix is a better
>> choice here, as opposed to attempts to achieve the same behavior using the
>> stream API. [...]
>
> That sounds good to me -- leaving the code using file APIs.

It only fixes writing revprops during commit, it still leaves the
problem that the pack operation doesn't flush.  We need to add flush
when writing the revision pack file, the packed indices, the revprop
pack file and the revprop manifest file.  That means replacing more, if
not all, of the stream code in revprops.c.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Evgeny Kotkov wrote:
> Philip, Julian, thank you for giving this patch a look.
>
> As I said in my previous e-mails, I think that the committed fix is a better
> choice here, as opposed to attempts to achieve the same behavior using the
> stream API. [...]

That sounds good to me -- leaving the code using file APIs.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 28 May 2015 at 14:56, Julian Foad <ju...@btopenworld.com> wrote:
[...]

> Bert is pointing out that we shouldn't be trying to make a full flush
> to hardware after writing each file, as that's unreasonably slow and
> unnecessary.
>
> So we need to re-think the approach here.
>
The need for full flush in FSFS already discussed before:
http://svn.haxx.se/dev/archive-2013-05/0245.shtml
http://svn.haxx.se/dev/archive-2013-05/0314.shtml

But I agree that we may choose different behavior for working copy library.

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> The argument appears to be that we cannot design a perfect stream
> library and so we should write code using files instead.

No, that's not it at all.

>  I'd prefer to
> accept limitations in our stream design and write a stream library that
> helps write the rest of our code.  I don't believe we have to design a
> 'perfect' stream library.

Agreed.

See the IRC conversation going on right now:
  http://colabti.org/irclogger/irclogger_log/svn-dev?date=2015-05-28#l63

Bert is pointing out that we shouldn't be trying to make a full flush
to hardware after writing each file, as that's unreasonably slow and
unnecessary.

So we need to re-think the approach here.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Julian Foad <ju...@btopenworld.com> writes:

> The point about the generic stream API is you should always be able to
> define a new stream class that wraps a stream (examples: a 'tee'
> stream wraps one stream while copying to another; a checksumming
> stream, etc.).
>
> And you should always be able to use the wrapping stream in place of
> the original stream.
>
> The 'svn_stream_flush_to_disk_on_close()' that you suggest breaks that.
>
> The implementation you suggest in your email an hour ago needs direct
> access to the implementation methods of all the stream classes that it
> may possibly encounter (close_handler_gz, for example).
>
> And functionality supported by streams should be provided as a virtual
> method, overridden in each stream class.
>
> Like Evgeny argued in his first email in the thread,
> http://svn.haxx.se/dev/archive-2015-05/0154.shtml
>
> He then proposed a virtualized method 'svn_stream_flush()' which
> solves the abstraction/virtualization issue.
>
> But then you have to define abstract semantics for 'flush', which that
> attempt didn't do well.
>
> It just doesn't all seem to fit together, the idea of telling a
> generic stream "you must ensure the result of this generic stream
> processing is written to *a*/*the*/*which?* phyical disk".
>
> For example, should a 'tee' stream ensure that *both* output streams
> are flushed to disk? That's a rhetorical question: the point is there
> is an semantic mismatch.

The argument appears to be that we cannot design a perfect stream
library and so we should write code using files instead.  I'd prefer to
accept limitations in our stream design and write a stream library that
helps write the rest of our code.  I don't believe we have to design a
'perfect' stream library.

We can use internal details to extend the stream library.  It does not
appear to be hard to extend the flush_on_close to tee.  Sure, a third
party would not be able to do things that we can do.  I think the fact
that third parties are limited is OK.

I've just applied the FSFS file flush changes to FSX.  I think I got it
right but it would have been easier with the stream approach.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> I still think that flush should happen explicit when neded: it should
> not happen as flag when file/stream is opened. Someone who opens
> file/stream may doesn't know whether flush will be needed or not.

Can you give an example?  The revprop code is exactly the opposite, the
code that opens the file/stream knows that the file/stream should be
flushed.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Branko Čibej <br...@wandisco.com>.
On 28.05.2015 15:01, Philip Martin wrote:
> Ivan Zhakov <iv...@visualsvn.com> writes:
>
>> Also I don't understand what do you mean "messy file-based code"? Imho
>> code using svn_stream_write() that require pointer to length is more
>> messy.
> "messy" is not my word but the reason I prefer the stream code is that
> we have been moving towards it.  Take this change:
>
> -          SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
> +          SVN_ERR(svn_io_file_write_full(file, filename, strlen(filename),
> +                                         NULL, pool));
> +          SVN_ERR(svn_io_file_putc('\n', file, pool));
>
> We have svn_stream_printf() and it makes the stream code neater.  We
> could fix that by introducing svn_io_file_printf().  Either we add all
> the neat stream features to the file code or we attempt to move to the
> stream code.

Yes, and the former fits my definition of "messy". Even if we add a
number of svn_io functions to make file-based code as neat and readable
as stream-based code, we'd still be taking a step back in terms of
flexibility: you can stack and combine streams, but you can't do that
with file handles (other than by inventing a new file abstraction, which
is exactly what a stream is).

-- Brane

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 28 May 2015 at 16:43, Philip Martin <ph...@wandisco.com> wrote:
> Ivan Zhakov <iv...@visualsvn.com> writes:
>
>> I hope you're aware the original code has unbounded memory usage?
>
> No, could you explain?  Is it a problem with the underlying stream code,
> or with the way it was used?
>
svn_stream_printf() allocates buffer for resulting string in pool
provided. The original code didn't have iterpool, so all lines written
to manifest file was allocated in scratch pool. The manifest file
usually is not big to cause problems, but it's still unbounded memory
usage.


-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> I hope you're aware the original code has unbounded memory usage?

No, could you explain?  Is it a problem with the underlying stream code,
or with the way it was used?

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 28 May 2015 at 16:01, Philip Martin <ph...@wandisco.com> wrote:
> Ivan Zhakov <iv...@visualsvn.com> writes:
>
>> Also I don't understand what do you mean "messy file-based code"? Imho
>> code using svn_stream_write() that require pointer to length is more
>> messy.
>
> "messy" is not my word but the reason I prefer the stream code is that
> we have been moving towards it.  Take this change:
>
> -          SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
> +          SVN_ERR(svn_io_file_write_full(file, filename, strlen(filename),
> +                                         NULL, pool));
> +          SVN_ERR(svn_io_file_putc('\n', file, pool));
>
> We have svn_stream_printf() and it makes the stream code neater.  We
> could fix that by introducing svn_io_file_printf().  Either we add all
> the neat stream features to the file code or we attempt to move to the
> stream code.
>
I hope you're aware the original code has unbounded memory usage?

But this part of code could be easily rewritten in old fashion with
patch like this:
Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c    (revision 1682076)
+++ subversion/libsvn_fs_fs/revprops.c    (working copy)
@@ -1020,16 +1020,14 @@
       *final_path = svn_dirent_join(revprops->folder, PATH_MANIFEST, pool);
       SVN_ERR(svn_io_open_unique_file3(&file, tmp_path, revprops->folder,
                                        svn_io_file_del_none, pool, pool));
-
+      stream = svn_stream_from_aprfile2(file, TRUE, pool);
       for (i = 0; i < revprops->manifest->nelts; ++i)
         {
           const char *filename = APR_ARRAY_IDX(revprops->manifest, i,
                                                const char*);
-          SVN_ERR(svn_io_file_write_full(file, filename, strlen(filename),
-                                         NULL, pool));
-          SVN_ERR(svn_io_file_putc('\n', file, pool));
+          SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
         }
-
+      SVN_ERR(svn_stream_close(stream));
       SVN_ERR(svn_io_file_flush_to_disk(file, pool));
       SVN_ERR(svn_io_file_close(file, pool));
     }

-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> Also I don't understand what do you mean "messy file-based code"? Imho
> code using svn_stream_write() that require pointer to length is more
> messy.

"messy" is not my word but the reason I prefer the stream code is that
we have been moving towards it.  Take this change:

-          SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
+          SVN_ERR(svn_io_file_write_full(file, filename, strlen(filename),
+                                         NULL, pool));
+          SVN_ERR(svn_io_file_putc('\n', file, pool));

We have svn_stream_printf() and it makes the stream code neater.  We
could fix that by introducing svn_io_file_printf().  Either we add all
the neat stream features to the file code or we attempt to move to the
stream code.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 28 May 2015 at 15:03, Branko Čibej <br...@wandisco.com> wrote:
> On 28.05.2015 13:07, Julian Foad wrote:
>> Philip Martin wrote:
>>> I still prefer the stream patch I posted earlier, and it can be extended
>>> to support compressed streams.  Something like:
>>>
>>> svn_error_t *
>>> svn_stream_flush_to_disk_on_close(svn_stream_t *stream)
>>> {
>>>   if (stream->close_fn == close_handler_apr)
>>>     {
>>>       svn_stream_set_close(stream, close_handler_flush);
>>>     }
>>>   else if (stream->close_fn == close_handler_gz)
>>>     {
>>>       struct zbaton *zbaton = stream->baton;
>>>       SVN_ERR(svn_stream_flush_to_disk_on_close(zbaton->substream));
>>>     }
>>>   else [...]
>>> }
>>>
>>> That only allows flushing the stream on close but I do not see any need
>>> at present to support flushing at arbitrary positions.
>> The point about the generic stream API is you should always be able to
>> define a new stream class that wraps a stream (examples: a 'tee'
>> stream wraps one stream while copying to another; a checksumming
>> stream, etc.).
>>
>> And you should always be able to use the wrapping stream in place of
>> the original stream.
>>
>> The 'svn_stream_flush_to_disk_on_close()' that you suggest breaks that.
>>
>> The implementation you suggest in your email an hour ago needs direct
>> access to the implementation methods of all the stream classes that it
>> may possibly encounter (close_handler_gz, for example).
>>
>> And functionality supported by streams should be provided as a virtual
>> method, overridden in each stream class.
>
> I still think that we should do this on the level of stream
> implementations, not on the level of the abstract stream API.
>
> In the case we're talking about here, that would mean revising the
> svn_stream_from_aprfile function and adding a flush-on-close flag, or
> adding a similar function that has the flush-on-close semantics. That
> would leave the generic stream API completely unchanged and only affect
> the properties of the stream itself, at the point where it's created.
I still think that flush should happen explicit when neded: it should
not happen as flag when file/stream is opened. Someone who opens
file/stream may doesn't know whether flush will be needed or not.

> Nicely encapsulated, and far better than replacing existing stream-based
> code with messy file-based code.
First of all current FSFS code uses apr_file_t in many places for
different reasons: to control offset in file, to control buffering
etc. See pack_log_addressed() in libsvn_fsfs\pack.c for example.

Also I don't understand what do you mean "messy file-based code"? Imho
code using svn_stream_write() that require pointer to length is more
messy.

-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Julian Foad wrote:
> Philip Martin wrote:
>> I still prefer the stream patch I posted earlier, [...]
>>
>> That only allows flushing the stream on close but I do not see any need
>> at present to support flushing at arbitrary positions.
>
> The point about the generic stream API is you should always be able to
> define a new stream class that wraps a stream (examples: a 'tee'
> stream wraps one stream while copying to another; a checksumming
> stream, etc.).
>
> And you should always be able to use the wrapping stream in place of
> the original stream.
>
> The 'svn_stream_flush_to_disk_on_close()' that you suggest breaks that.
>
> The implementation you suggest in your email an hour ago needs direct
> access to the implementation methods of all the stream classes that it
> may possibly encounter (close_handler_gz, for example).
>
> And functionality supported by streams should be provided as a virtual
> method, overridden in each stream class.
>
> Like Evgeny argued in his first email in the thread,
> http://svn.haxx.se/dev/archive-2015-05/0154.shtml
>
> He then proposed a virtualized method 'svn_stream_flush()' which
> solves the abstraction/virtualization issue.
>
> But then you have to define abstract semantics for 'flush', which that
> attempt didn't do well.
>
> It just doesn't all seem to fit together, the idea of telling a
> generic stream "you must ensure the result of this generic stream
> processing is written to *a*/*the*/*which?* phyical disk".
>
> For example, should a 'tee' stream ensure that *both* output streams
> are flushed to disk? That's a rhetorical question: the point is there
> is an semantic mismatch.

Hmm, Perhaps there's not quite as much difference as I was thinking,
between the meaning of "flush this generic stream" and "flush to
disk". It's a matter of how far we push the "flushing" down the chain
from generic stream through to hardware.

If we had a stream method defined as "flush all the way down the
chain, right to the hardware", that would be a meaning that can be
applied to generic streams.

Perhaps the only real problem with the virtual stream_flush method is
that what we actually wanted was "flush on close" not "flush now". The
difference is important in some cases, such as when closing the stream
generates a bit more "trailer" data that gets written after all
incoming data has been processed.

We could of course provide a virtual "stream_close_and_flush" method.
(Either as a separate new method, or with a parameter to the 'close'
method.)

This would be defined as pushing all stream data on through the
system, to the hardware, and to ensure this really happens it would
need to be defined to throw an error if it could not do so.

It might be diffficult to know, in some cases, whether a given stream
implementation is actually able to ensure a flush right through to the
hardware. The underlying library or operating system 'flush' calls
might not be clear about whether that is possible. But we could
probably make a reasonable attempt.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Branko Čibej <br...@wandisco.com>.
On 28.05.2015 13:07, Julian Foad wrote:
> Philip Martin wrote:
>> I still prefer the stream patch I posted earlier, and it can be extended
>> to support compressed streams.  Something like:
>>
>> svn_error_t *
>> svn_stream_flush_to_disk_on_close(svn_stream_t *stream)
>> {
>>   if (stream->close_fn == close_handler_apr)
>>     {
>>       svn_stream_set_close(stream, close_handler_flush);
>>     }
>>   else if (stream->close_fn == close_handler_gz)
>>     {
>>       struct zbaton *zbaton = stream->baton;
>>       SVN_ERR(svn_stream_flush_to_disk_on_close(zbaton->substream));
>>     }
>>   else [...]
>> }
>>
>> That only allows flushing the stream on close but I do not see any need
>> at present to support flushing at arbitrary positions.
> The point about the generic stream API is you should always be able to
> define a new stream class that wraps a stream (examples: a 'tee'
> stream wraps one stream while copying to another; a checksumming
> stream, etc.).
>
> And you should always be able to use the wrapping stream in place of
> the original stream.
>
> The 'svn_stream_flush_to_disk_on_close()' that you suggest breaks that.
>
> The implementation you suggest in your email an hour ago needs direct
> access to the implementation methods of all the stream classes that it
> may possibly encounter (close_handler_gz, for example).
>
> And functionality supported by streams should be provided as a virtual
> method, overridden in each stream class.

I still think that we should do this on the level of stream
implementations, not on the level of the abstract stream API.

In the case we're talking about here, that would mean revising the
svn_stream_from_aprfile function and adding a flush-on-close flag, or
adding a similar function that has the flush-on-close semantics. That
would leave the generic stream API completely unchanged and only affect
the properties of the stream itself, at the point where it's created.
Nicely encapsulated, and far better than replacing existing stream-based
code with messy file-based code.

Whether or not we need a generic svn_stream_flush method is a separate
question on a completely different level, and quite irrelevant to the
problem of fixing this flushing issue for 1.9.x.

-- Brane


Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> I still prefer the stream patch I posted earlier, and it can be extended
> to support compressed streams.  Something like:
>
> svn_error_t *
> svn_stream_flush_to_disk_on_close(svn_stream_t *stream)
> {
>   if (stream->close_fn == close_handler_apr)
>     {
>       svn_stream_set_close(stream, close_handler_flush);
>     }
>   else if (stream->close_fn == close_handler_gz)
>     {
>       struct zbaton *zbaton = stream->baton;
>       SVN_ERR(svn_stream_flush_to_disk_on_close(zbaton->substream));
>     }
>   else [...]
> }
>
> That only allows flushing the stream on close but I do not see any need
> at present to support flushing at arbitrary positions.

The point about the generic stream API is you should always be able to
define a new stream class that wraps a stream (examples: a 'tee'
stream wraps one stream while copying to another; a checksumming
stream, etc.).

And you should always be able to use the wrapping stream in place of
the original stream.

The 'svn_stream_flush_to_disk_on_close()' that you suggest breaks that.

The implementation you suggest in your email an hour ago needs direct
access to the implementation methods of all the stream classes that it
may possibly encounter (close_handler_gz, for example).

And functionality supported by streams should be provided as a virtual
method, overridden in each stream class.

Like Evgeny argued in his first email in the thread,
http://svn.haxx.se/dev/archive-2015-05/0154.shtml

He then proposed a virtualized method 'svn_stream_flush()' which
solves the abstraction/virtualization issue.

But then you have to define abstract semantics for 'flush', which that
attempt didn't do well.

It just doesn't all seem to fit together, the idea of telling a
generic stream "you must ensure the result of this generic stream
processing is written to *a*/*the*/*which?* phyical disk".

For example, should a 'tee' stream ensure that *both* output streams
are flushed to disk? That's a rhetorical question: the point is there
is an semantic mismatch.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> 1) We cannot really use svn_stream_flush(TRUE) + svn_stream_close() with
>    a compressed stream that is chained over a file stream or a stream that we
>    expect to be consistent with the underlying device.  A compressed stream
>    writes the final data chunk in close_handler_gz() and calls deflate() with
>    Z_FINISH.  In this case, closing the compressed stream would require a
>    full flush of the underlying stream in order to guarantee consistency.
>    Leaving the compressed streams without a flush handler implementation is
>    also a poor choice because the caller of svn_stream_flush() would have to
>    know the details about the given stream in order to avoid receiving the
>    SVN_ERR_STREAM_NOT_SUPPORTED error.
>
>    I believe that this is an example where achieving the proper behavior with
>    files is simple, but abstracting it with streams is a challenge.

I still prefer the stream patch I posted earlier, and it can be extended
to support compressed streams.  Something like:

svn_error_t *
svn_stream_flush_to_disk_on_close(svn_stream_t *stream)
{
  if (stream->close_fn == close_handler_apr)
    {
      svn_stream_set_close(stream, close_handler_flush);
    }
  else if (stream->close_fn == close_handler_gz)
    {
      struct zbaton *zbaton = stream->baton;
      SVN_ERR(svn_stream_flush_to_disk_on_close(zbaton->substream));
    }
  else if (stream->close_fn != close_handler_flush)
    {
      return svn_error_create(SVN_ERR_STREAM_NOT_SUPPORTED, NULL,
                              _("No closing file to flush"));
    }

  return SVN_NO_ERROR;
}

That only allows flushing the stream on close but I do not see any need
at present to support flushing at arbitrary positions.

I don't insist on this solution, if people think the file code is better
that is OK.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Philip Martin <ph...@wandisco.com> writes:

> Yes, that looks good.

[...]

Julian Foad <ju...@btopenworld.com> writes:

> Just a couple of comments, from a quick read of the patch.

[...]

Philip, Julian, thank you for giving this patch a look.

As I said in my previous e-mails, I think that the committed fix is a better
choice here, as opposed to attempts to achieve the same behavior using the
stream API.  With a couple of fresh thoughts, I can conclude that the approach
with svn_stream_flush() is also a poor solution for this problem:

1) We cannot really use svn_stream_flush(TRUE) + svn_stream_close() with
   a compressed stream that is chained over a file stream or a stream that we
   expect to be consistent with the underlying device.  A compressed stream
   writes the final data chunk in close_handler_gz() and calls deflate() with
   Z_FINISH.  In this case, closing the compressed stream would require a
   full flush of the underlying stream in order to guarantee consistency.
   Leaving the compressed streams without a flush handler implementation is
   also a poor choice because the caller of svn_stream_flush() would have to
   know the details about the given stream in order to avoid receiving the
   SVN_ERR_STREAM_NOT_SUPPORTED error.

   I believe that this is an example where achieving the proper behavior with
   files is simple, but abstracting it with streams is a challenge.

2) The same argument applies to any generic stream that writes data in its
   close handler.  If you have such a stream anywhere in the chain, using
   svn_stream_flush(TRUE) before svn_stream_close() wouldn't be enough to
   guarantee the consistency of the underlying device.

3) Doing so would make this fix a 1.9.0 release blocker.  Furthermore, we
   would probably have problems backporting it to 1.8.x, as I think that
   backporting this API as svn_stream__flush() would be a destabilizing
   change.

FSFS currently uses apr_file_t and svn_stream_t, depending on the context of
the operation, and I find this reasonable.  While streams are generally good
for serializing data, there are certain file operations that cannot be properly
abstracted on the stream layer — for example, an offset-based apr_file_seek()
or an ability to retrieve a file name or size.

We can create a stream from a file when we need to serialize or deserialize
data.  However, trying to achieve the opposite, i.e., pushing file-specific
operations through several abstraction layers *is* going to introduce different
sorts of problems — API functions that can only work with certain stream types,
stream open functions that keep the knowledge about the necessity of an fsync()
before closing the stream, runtime errors depending on whether the operation is
supported by the particular stream or not, and, possibly, other.


Regards,
Evgeny Kotkov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> Just a couple of comments, from a quick read of the patch.
>
> +/** Flushes all available internal buffers in a generic @a stream.  If
> + * @sync is non-zero, it causes any buffered data to be written to the
> + * underlying device (for example, to the disk for a file stream).
>
> I'm not sure about having the 'sync' flag. What does that mean for a
> non-disk stream? Does the API really need to support two different
> ways of flushing a generic stream? I feel probably not. If we need to
> support two different ways of flushing a disk stream, could we achieve
> that in a better way? Perhaps the code that opens the disk stream
> should specify this option when opening the disk stream, rather than a
> generic writer specifying this option when it calls 'flush'?

Closing a stream always implies first flushing its own buffers. This
proposed svn_stream_flush() API supports flushing a generic stream's
buffers at any (and multiple) points in time. That's potentially
useful, but the present use case only requires us to sync a disk
file's data to disk when closing it. That's a slightly different
thing.

I note that in the MSDN documentation that Evgeny linked to, the plain
'Stream' class has a plain 'Flush' method, while the 'FileStream'
class has (in addition) a Flush(bool) method that supports a 'sync to
disk' option.


Another small point. Ivan mentioned at the beginning of this thread:

> 3. It's unclear how FLUSH_TO_DISK flag should interact with DISOWN=TRUE.

The current patch chooses one of the two possible behaviours for
svn_stream_disown() -- it chooses to forward all flush() method calls
to the 'disowned' stream, and only blocks the close() method. Nothing
wrong with that; I'm just noting that one behaviour was chosen.

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Julian Foad <ju...@btopenworld.com>.
Philip Martin wrote:
> Evgeny Kotkov writes:
>> I sketched this option in the attached patch.  With this patch applied, we
>> could rework the r1680819 fix like below.  What do you think?
>
> Yes, that looks good.

Just a couple of comments, from a quick read of the patch.

+/** Flushes all available internal buffers in a generic @a stream.  If
+ * @sync is non-zero, it causes any buffered data to be written to the
+ * underlying device (for example, to the disk for a file stream).

I'm not sure about having the 'sync' flag. What does that mean for a
non-disk stream? Does the API really need to support two different
ways of flushing a generic stream? I feel probably not. If we need to
support two different ways of flushing a disk stream, could we achieve
that in a better way? Perhaps the code that opens the disk stream
should specify this option when opening the disk stream, rather than a
generic writer specifying this option when it calls 'flush'?


In flush_handler_lazyopen, I think the code should avoid opening the
stream first unless the 'always open' mode is active: see
close_handler_lazyopen().

- Julian

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> I sketched this option in the attached patch.  With this patch applied, we
> could rework the r1680819 fix like below.  What do you think?

Yes, that looks good.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Philip Martin <ph...@wandisco.com> writes:

> Evgeny Kotkov <ev...@visualsvn.com> writes:
>
>> Or is it just a gut feeling that we should be using streams here?
>
> We have been gradually moving our file-based code to stream-based code
> for years.

I believe that flushing files (instead of achieving the same behavior with
streams) is a better choice in this particular context.  The relevant bits of
code in libsvn_fs_fs/revprops.c that are responsible for persisting revision
property changes work with files, and we cannot do something with that.  We
can abstract certain parts of this logic with streams — e.g., when it comes
to functions that serialize changes, because they do not really care if the
destination is a file or a generic stream.

However, an attempt to provide a proper abstraction for fsync() with streams
is a step towards dynamic typing with a possibility of a runtime error like
SVN_ERR_STREAM_NOT_SUPPORTED.  I think that in this particular case using
apr_file_t is better than forcing the usage of svn_stream_t just because we
have been moving from files toward streams.  We have a temporary file on the
disk, and we need to fsync() it when we finished working with it.  What could
be more straightforward than calling svn_io_file_flush_to_disk() on the file
right before svn_io_file_close()?

I'd like to point out that we have an option of achieving the same behavior
with streams, if we're aiming towards this.  While this is probably a slight
overkill, we could add svn_stream_flush() — inspired by design of the .NET
system library [1,2] — to our API and provide the necessary flush handlers.
This could be useful not just for files, but also for svn_stream_compressed()
and maybe for flushing network buffers if we ever need to do so.

I sketched this option in the attached patch.  With this patch applied, we
could rework the r1680819 fix like below.  What do you think?
[[[
Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c  (revision 1680705)
+++ subversion/libsvn_fs_fs/revprops.c  (working copy)
@@ -661,6 +661,7 @@ write_non_packed_revprop(const char **final_path,
                                  svn_dirent_dirname(*final_path, pool),
                                  svn_io_file_del_none, pool, pool));
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, pool));
+  SVN_ERR(svn_stream_flush(stream, TRUE));
   SVN_ERR(svn_stream_close(stream));

   return SVN_NO_ERROR;
@@ -811,9 +812,8 @@ repack_revprops(svn_fs_t *fs,
                           ? SVN_DELTA_COMPRESSION_LEVEL_DEFAULT
                           : SVN_DELTA_COMPRESSION_LEVEL_NONE));

-  /* finally, write the content to the target stream and close it */
+  /* finally, write the content to the target stream */
   SVN_ERR(svn_stream_write(file_stream, compressed->data, &compressed->len));
-  SVN_ERR(svn_stream_close(file_stream));

   return SVN_NO_ERROR;
 }
@@ -933,6 +933,8 @@ write_packed_revprop(const char **final_path,
       SVN_ERR(repack_revprops(fs, revprops, 0, revprops->sizes->nelts,
                               changed_index, serialized, new_total_size,
                               stream, pool));
+      SVN_ERR(svn_stream_flush(stream, TRUE));
+      SVN_ERR(svn_stream_close(stream));
     }
   else
     {
@@ -983,6 +985,8 @@ write_packed_revprop(const char **final_path,
           SVN_ERR(repack_revprops(fs, revprops, 0, left_count,
                                   changed_index, serialized, new_total_size,
                                   stream, pool));
+          SVN_ERR(svn_stream_flush(stream, TRUE));
+          SVN_ERR(svn_stream_close(stream));
         }

       if (left_count + right_count < revprops->sizes->nelts)
@@ -994,6 +998,8 @@ write_packed_revprop(const char **final_path,
                                   changed_index + 1,
                                   changed_index, serialized, new_total_size,
                                   stream, pool));
+          SVN_ERR(svn_stream_flush(stream, TRUE));
+          SVN_ERR(svn_stream_close(stream));
         }

       if (right_count)
@@ -1007,6 +1013,8 @@ write_packed_revprop(const char **final_path,
                                   revprops->sizes->nelts, changed_index,
                                   serialized, new_total_size, stream,
                                   pool));
+          SVN_ERR(svn_stream_flush(stream, TRUE));
+          SVN_ERR(svn_stream_close(stream));
         }

       /* write the new manifest */
@@ -1021,6 +1029,7 @@ write_packed_revprop(const char **final_path,
           SVN_ERR(svn_stream_printf(stream, pool, "%s\n", filename));
         }

+      SVN_ERR(svn_stream_flush(stream, TRUE));
       SVN_ERR(svn_stream_close(stream));
     }
]]]

[1] https://msdn.microsoft.com/en-us/library/system.io.stream.flush
[2] https://msdn.microsoft.com/en-us/library/ee474552


Regards,
Evgeny Kotkov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Evgeny Kotkov <ev...@visualsvn.com> writes:

> Or is it just a gut feeling that we should be using streams here?

We have been gradually moving our file-based code to stream-based code
for years.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Philip Martin <ph...@wandisco.com> writes:

> I like this patch better.  It puts the flush into 2 extra places, 4 in
> total, in FSFS, and the corresponding 4 places in FSX.  For 1.8 we would
> make the new API private: svn_stream__flush_to_disk_on_close.
>
> Index: subversion/include/svn_io.h
> ===================================================================
> --- subversion/include/svn_io.h (revision 1680818)
> +++ subversion/include/svn_io.h (working copy)
> @@ -1087,6 +1087,17 @@ svn_stream_from_aprfile2(apr_file_t *file,
>                           svn_boolean_t disown,
>                           apr_pool_t *pool);
>
> +/** Arrange for the APR file underlying @a stream to be flushed to
> + * disk before being closed.  Returns SVN_ERR_STREAM_NOT_SUPPORTED for
> + * streams that cannot be flushed.  This can be applied to streams
> + * created by svn_stream_from_aprfile2(), svn_stream_open_unique() and
> + * svn_stream_open_writable().
> + *
> + * @since New in 1.9.
> + */
> +svn_error_t *
> +svn_stream_flush_to_disk_on_close(svn_stream_t *stream);

Isn't this an abstraction leak?

This function accepts a generic stream as an argument, but makes assumptions
about what the stream should look like.  You can't use it with an arbitrary
stream, because you have to be sure that it was opened using a function like
svn_stream_from_aprfile2() with some constraints, i.e., disown set to FALSE
and non-NULL file.  As another example, you cannot use it with a file stream
wrapped with svn_stream_compressed() due to a close handler mismatch.

>From my point of view, this means that svn_stream_flush_to_disk_on_close()
is only useful when you have full control over the stream that you're about to
wrap, probably, only in sequences like "open a stream and immediately install
a flush-on-close handler".  Hence, an API like svn_stream_from_aprfile3() with
a flush_on_close argument would probably be a better choice.

[...]

>
> @@ -875,6 +876,7 @@ repack_stream_open(svn_stream_t **stream,
>                                                    pool),
>                             APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
>    *stream = svn_stream_from_aprfile2(file, FALSE, pool);
> +  SVN_ERR(svn_stream_flush_to_disk_on_close(*stream));

This is something that we were trying to avoid in the original patch, as this
makes the open() function decide if the file should receive a hardware flush
when closed or not.  It's the caller who knows more about the details, not the
open() function.  Personally I believe that this is an example of when explicit
is better than implicit — i.e., explicitly stating that we're going to do a
hardware flush and then close the file is better than juggling with streams
that keep the knowledge that they must flush the underlying file before
closing.

With that in mind, I see two possible options.  Both of these options assume
that we handle the absence of flushing during packing and other operations
separately.  From what I witness, the same issue can happen with, for example,
hotcopy due to how svn_io_copy_file() works.  So, we could:

1) Leave this fix unchanged.  I see the benefit of being explicit, but if we're
   striving to get away from files in favor of streams, then the fix does not
   really follow the pattern.

2) Introduce svn_stream_from_aprfile3(), svn_stream_open_unique2(), revert the
   r1680819 fix and go again with these new functions.  There could be a couple
   of caveats in how we treat different combinations of svn_io_file_del_t and
   disown and flush_on_close, but I suppose we could work them out.

Personally, I don't see anything fundamentally wrong with 1), because it makes
it easy to tell whether the flush is going to happen or not — without the need
to find the chunk of code that opened the stream.  Is the patch hard to review?
Or is it just a gut feeling that we should be using streams here?


Regards,
Evgeny Kotkov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> Yes, this patch looks easier to review, the only problem that it's
> incomplete. I'm attaching minimal working patch with
> svn_stream_from_file3() against trunk@r1680818.

I like this patch better.  It puts the flush into 2 extra places, 4 in
total, in FSFS, and the corresponding 4 places in FSX.  For 1.8 we would
make the new API private: svn_stream__flush_to_disk_on_close.

Index: subversion/include/svn_io.h
===================================================================
--- subversion/include/svn_io.h	(revision 1680818)
+++ subversion/include/svn_io.h	(working copy)
@@ -1087,6 +1087,17 @@ svn_stream_from_aprfile2(apr_file_t *file,
                          svn_boolean_t disown,
                          apr_pool_t *pool);
 
+/** Arrange for the APR file underlying @a stream to be flushed to
+ * disk before being closed.  Returns SVN_ERR_STREAM_NOT_SUPPORTED for
+ * streams that cannot be flushed.  This can be applied to streams
+ * created by svn_stream_from_aprfile2(), svn_stream_open_unique() and
+ * svn_stream_open_writable().
+ *
+ * @since New in 1.9.
+ */
+svn_error_t *
+svn_stream_flush_to_disk_on_close(svn_stream_t *stream);
+  
 /** Similar to svn_stream_from_aprfile2(), except that the file will
  * always be disowned.
  *
Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c	(revision 1680818)
+++ subversion/libsvn_fs_fs/revprops.c	(working copy)
@@ -660,6 +660,7 @@ write_non_packed_revprop(const char **final_path,
   SVN_ERR(svn_stream_open_unique(&stream, tmp_path,
                                  svn_dirent_dirname(*final_path, pool),
                                  svn_io_file_del_none, pool, pool));
+  SVN_ERR(svn_stream_flush_to_disk_on_close(stream));
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, pool));
   SVN_ERR(svn_stream_close(stream));
 
@@ -875,6 +876,7 @@ repack_stream_open(svn_stream_t **stream,
                                                   pool),
                            APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
   *stream = svn_stream_from_aprfile2(file, FALSE, pool);
+  SVN_ERR(svn_stream_flush_to_disk_on_close(*stream));
 
   return SVN_NO_ERROR;
 }
@@ -1206,6 +1208,7 @@ svn_fs_fs__copy_revprops(const char *pack_file_dir
 
   /* write the pack file content to disk */
   stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
+  SVN_ERR(svn_stream_flush_to_disk_on_close(stream));
   SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
   SVN_ERR(svn_stream_close(stream));
 
@@ -1244,6 +1247,7 @@ svn_fs_fs__pack_revprops_shard(const char *pack_fi
   SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool));
   SVN_ERR(svn_stream_open_writable(&manifest_stream, manifest_file_path,
                                    scratch_pool, scratch_pool));
+  SVN_ERR(svn_stream_flush_to_disk_on_close(manifest_stream));
 
   /* revisions to handle. Special case: revision 0 */
   start_rev = (svn_revnum_t) (shard * max_files_per_dir);
Index: subversion/libsvn_fs_x/revprops.c
===================================================================
--- subversion/libsvn_fs_x/revprops.c	(revision 1680818)
+++ subversion/libsvn_fs_x/revprops.c	(working copy)
@@ -1188,6 +1188,7 @@ write_non_packed_revprop(const char **final_path,
                                                     scratch_pool),
                                  svn_io_file_del_none,
                                  result_pool, scratch_pool));
+  SVN_ERR(svn_stream_flush_to_disk_on_close(stream));
   SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR,
                           scratch_pool));
   SVN_ERR(svn_stream_close(stream));
@@ -1425,6 +1426,7 @@ repack_stream_open(svn_stream_t **stream,
                            APR_WRITE | APR_CREATE, APR_OS_DEFAULT,
                            result_pool));
   *stream = svn_stream_from_aprfile2(file, FALSE, result_pool);
+  SVN_ERR(svn_stream_flush_to_disk_on_close(*stream));
 
   return SVN_NO_ERROR;
 }
@@ -1797,6 +1799,7 @@ svn_fs_x__copy_revprops(const char *pack_file_dir,
 
   /* write the pack file content to disk */
   stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
+  SVN_ERR(svn_stream_flush_to_disk_on_close(stream));
   SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
   SVN_ERR(svn_stream_close(stream));
 
@@ -1835,6 +1838,7 @@ svn_fs_x__pack_revprops_shard(const char *pack_fil
   SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool));
   SVN_ERR(svn_stream_open_writable(&manifest_stream, manifest_file_path,
                                    scratch_pool, scratch_pool));
+  SVN_ERR(svn_stream_flush_to_disk_on_close(manifest_stream));
 
   /* revisions to handle. Special case: revision 0 */
   start_rev = (svn_revnum_t) (shard * max_files_per_dir);
Index: subversion/libsvn_subr/stream.c
===================================================================
--- subversion/libsvn_subr/stream.c	(revision 1680818)
+++ subversion/libsvn_subr/stream.c	(working copy)
@@ -911,6 +911,29 @@ close_handler_apr(void *baton)
 }
 
 static svn_error_t *
+close_handler_flush(void *baton)
+{
+  struct baton_apr *btn = baton;
+
+  SVN_ERR(svn_io_file_flush_to_disk(btn->file, btn->pool));
+  SVN_ERR(close_handler_apr(baton));
+  return SVN_NO_ERROR;
+}
+
+svn_error_t *
+svn_stream_flush_to_disk_on_close(svn_stream_t *stream)
+{
+  if (stream->close_fn != close_handler_apr)
+    return svn_error_create(SVN_ERR_STREAM_NOT_SUPPORTED, NULL,
+                            _("No closing file to flush"));
+
+  svn_stream_set_close(stream, close_handler_flush);
+
+  return SVN_NO_ERROR;
+}
+
+
+static svn_error_t *
 mark_handler_apr(void *baton, svn_stream_mark_t **mark, apr_pool_t *pool)
 {
   struct baton_apr *btn = baton;

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 22 May 2015 at 09:40, Branko Čibej <br...@wandisco.com> wrote:
> On 22.05.2015 00:09, Philip Martin wrote:
>> Ivan Zhakov <iv...@visualsvn.com> writes:
>>
>>> We considered adding FLUSH_TO_DISK flag to svn_stream_from_apr_file(),
>>> but decided to commit simple fix for now:
>>> 1. The current FSFS code uses explicit flush and this commit makes
>>> code consistent with other parts
>>> 2. Adding flag to svn_stream_from_apr_file() will require revving API
>>> etc, while this fix should be backported to 1.8.x and 1.9.x
>
> The API argument only applies to 1.8.x, which would need a completely
> different fix; not to 1.9.x, because it's not released yet.
Yes, we can introduce new API in 1.9.x but this automatically this fix
1.9.0 release blocker, while I wanted to avoid blocking 1.9.0 due this
fix since it's not regressing.

> The benefit
> of revising the API (which I tentatively support, see [1] below)
> outweighs the pain of having to write a different fix for 1.8.x.
>

>>> 3. It's unclear how FLUSH_TO_DISK flag should interact with DISOWN=TRUE.
>>>
>>> Taking in account all above we decided to commit simple fix for now
>>> and possibly implement FLUSH_TO_DISK flag later applying it to all
>>> FSFS code where needed.
>> Is it simple?  It's possible that r1680819 is more complicated than
>> adding flush to the the stream.
>>
>> Exactly what the API should look like is bit unclear.  Is there any need
>> to support disown=TRUE and flush=TRUE?  When would Subversion use it?
>> If we create svn_stream_from_aprfile3 that takes two booleans then
>> perhaps we change the return type to allow an error for the impossible
>> disown=TRUE and flush=TRUE.
>
> Yup. There's no reason to try to support mutually conflicting arguments.
But how svn_stream_from_aprfile3() should handle disown=TRUE and
flush=TRUE: assertion, disown=TRUE as priority and ignore flush=TRUE?
I don't say we cannot find good solution for this, I just wanted to
make simple fix targeted for 1.8.x and 1.9.x and then improve all FSFS
regarding flushes.

>
>> All the flushing code goes in libsvn_subr
>
> Just one nit here: the flushing code would indeed be localized in
> libsvn_subr, however, the API change would propagate to some 100 places
> in the code [1], including lots of places in libsvn_fs_fs. That's unless
> we decide /not/ to deprecate svn_stream_from_aprfile2 in the 1.9.x
> series and use it in parallel with the prospective svn_stream_from_aprfile3.
>
Yep, revving svn_stream_from_aprfile2() would introduce changing all
callers. But I agree that we can keep svn_stream_from_aprfile2()
without deprecating.


> FWIW, I think in this case revving the API without deprecating the old
> one is justified. Another option would be to invent a different API name
> for the flushing stream, e.g., svn_stream_from_aprfile_flushed or some
> such. This way we'd also avoid the dilemma about disown by simply not
> having that flag in the new function.
>
There are two kinds of flushes:
1) flush all data buffered in stream implementation
2) ensure that data written to disk hardware

Extending stream API with svn_stream_flush() with (1) semantic seems
to be natural extension, but introducing (2) would be leak of
abstraction.

>> and the code change in FSFS is really small, something like:
>>
>> Index: subversion/libsvn_fs_fs/revprops.c
>> ===================================================================
>> --- subversion/libsvn_fs_fs/revprops.c        (revision 1680818)
>> +++ subversion/libsvn_fs_fs/revprops.c        (working copy)
>> @@ -874,7 +874,7 @@
>>                                                    new_filename->data,
>>                                                    pool),
>>                             APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
>> -  *stream = svn_stream_from_aprfile2(file, FALSE, pool);
>> +  SVN_ERR(svn_stream_from_aprfile3(stream, file, FALSE, TRUE, pool));
>>
>>    return SVN_NO_ERROR;
>>  }
>> @@ -1205,7 +1205,8 @@
>>    SVN_ERR(svn__compress(uncompressed, compressed, compression_level));
>>
>>    /* write the pack file content to disk */
>> -  stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
>> +  SVN_ERR(svn_stream_from_aprfile3(&stream, pack_file, FALSE, TRUE,
>> +                                   scratch_pool));
>>    SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
>>    SVN_ERR(svn_stream_close(stream));
>>
>> A patch like that is possibly easier to review than r1680819.
>
> Quite a bit easier, yes.
>
Yes, this patch looks easier to review, the only problem that it's
incomplete. I'm attaching minimal working patch with
svn_stream_from_file3() against trunk@r1680818.


-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Branko Čibej <br...@wandisco.com>.
On 22.05.2015 00:09, Philip Martin wrote:
> Ivan Zhakov <iv...@visualsvn.com> writes:
>
>> We considered adding FLUSH_TO_DISK flag to svn_stream_from_apr_file(),
>> but decided to commit simple fix for now:
>> 1. The current FSFS code uses explicit flush and this commit makes
>> code consistent with other parts
>> 2. Adding flag to svn_stream_from_apr_file() will require revving API
>> etc, while this fix should be backported to 1.8.x and 1.9.x

The API argument only applies to 1.8.x, which would need a completely
different fix; not to 1.9.x, because it's not released yet. The benefit
of revising the API (which I tentatively support, see [1] below)
outweighs the pain of having to write a different fix for 1.8.x.

>> 3. It's unclear how FLUSH_TO_DISK flag should interact with DISOWN=TRUE.
>>
>> Taking in account all above we decided to commit simple fix for now
>> and possibly implement FLUSH_TO_DISK flag later applying it to all
>> FSFS code where needed.
> Is it simple?  It's possible that r1680819 is more complicated than
> adding flush to the the stream.
>
> Exactly what the API should look like is bit unclear.  Is there any need
> to support disown=TRUE and flush=TRUE?  When would Subversion use it?
> If we create svn_stream_from_aprfile3 that takes two booleans then
> perhaps we change the return type to allow an error for the impossible
> disown=TRUE and flush=TRUE.

Yup. There's no reason to try to support mutually conflicting arguments.

> All the flushing code goes in libsvn_subr

Just one nit here: the flushing code would indeed be localized in
libsvn_subr, however, the API change would propagate to some 100 places
in the code [1], including lots of places in libsvn_fs_fs. That's unless
we decide /not/ to deprecate svn_stream_from_aprfile2 in the 1.9.x
series and use it in parallel with the prospective svn_stream_from_aprfile3.

FWIW, I think in this case revving the API without deprecating the old
one is justified. Another option would be to invent a different API name
for the flushing stream, e.g., svn_stream_from_aprfile_flushed or some
such. This way we'd also avoid the dilemma about disown by simply not
having that flag in the new function.

> and the code change in FSFS is really small, something like:
>
> Index: subversion/libsvn_fs_fs/revprops.c
> ===================================================================
> --- subversion/libsvn_fs_fs/revprops.c	(revision 1680818)
> +++ subversion/libsvn_fs_fs/revprops.c	(working copy)
> @@ -874,7 +874,7 @@
>                                                    new_filename->data,
>                                                    pool),
>                             APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
> -  *stream = svn_stream_from_aprfile2(file, FALSE, pool);
> +  SVN_ERR(svn_stream_from_aprfile3(stream, file, FALSE, TRUE, pool));
>  
>    return SVN_NO_ERROR;
>  }
> @@ -1205,7 +1205,8 @@
>    SVN_ERR(svn__compress(uncompressed, compressed, compression_level));
>  
>    /* write the pack file content to disk */
> -  stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
> +  SVN_ERR(svn_stream_from_aprfile3(&stream, pack_file, FALSE, TRUE,
> +                                   scratch_pool));
>    SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
>    SVN_ERR(svn_stream_close(stream));
>  
> A patch like that is possibly easier to review than r1680819.

Quite a bit easier, yes.

-- Brane

[1]

    $ grep -r svn_stream_from_aprfile2 subversion | wc -l
         101
    $ grep -r svn_stream_from_aprfile2 subversion/libsvn_fs_fs | wc -l
         21



Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
Ivan Zhakov <iv...@visualsvn.com> writes:

> We considered adding FLUSH_TO_DISK flag to svn_stream_from_apr_file(),
> but decided to commit simple fix for now:
> 1. The current FSFS code uses explicit flush and this commit makes
> code consistent with other parts
> 2. Adding flag to svn_stream_from_apr_file() will require revving API
> etc, while this fix should be backported to 1.8.x and 1.9.x
> 3. It's unclear how FLUSH_TO_DISK flag should interact with DISOWN=TRUE.
>
> Taking in account all above we decided to commit simple fix for now
> and possibly implement FLUSH_TO_DISK flag later applying it to all
> FSFS code where needed.

Is it simple?  It's possible that r1680819 is more complicated than
adding flush to the the stream.

Exactly what the API should look like is bit unclear.  Is there any need
to support disown=TRUE and flush=TRUE?  When would Subversion use it?
If we create svn_stream_from_aprfile3 that takes two booleans then
perhaps we change the return type to allow an error for the impossible
disown=TRUE and flush=TRUE.

All the flushing code goes in libsvn_subr and the code change in FSFS is
really small, something like:

Index: subversion/libsvn_fs_fs/revprops.c
===================================================================
--- subversion/libsvn_fs_fs/revprops.c	(revision 1680818)
+++ subversion/libsvn_fs_fs/revprops.c	(working copy)
@@ -874,7 +874,7 @@
                                                   new_filename->data,
                                                   pool),
                            APR_WRITE | APR_CREATE, APR_OS_DEFAULT, pool));
-  *stream = svn_stream_from_aprfile2(file, FALSE, pool);
+  SVN_ERR(svn_stream_from_aprfile3(stream, file, FALSE, TRUE, pool));
 
   return SVN_NO_ERROR;
 }
@@ -1205,7 +1205,8 @@
   SVN_ERR(svn__compress(uncompressed, compressed, compression_level));
 
   /* write the pack file content to disk */
-  stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool);
+  SVN_ERR(svn_stream_from_aprfile3(&stream, pack_file, FALSE, TRUE,
+                                   scratch_pool));
   SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len));
   SVN_ERR(svn_stream_close(stream));
 
A patch like that is possibly easier to review than r1680819.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 21 May 2015 at 16:25, Philip Martin <ph...@wandisco.com> wrote:
> ivan@apache.org writes:
>
>> Author: ivan
>> Date: Thu May 21 11:00:43 2015
>> New Revision: 1680819
>>
>> URL: http://svn.apache.org/r1680819
>> Log:
>> Prevent a possible FSFS repository corruption with power or network disk
>> failures when changing revision properties. Perform a hardware flush
>> after we finished writing to a temporary revprop file and before moving
>> it into place. The change doesn't affect commit operation behavior.
>>
>> The corruption can be easily reproduced by triggering a power loss while
>> performing svnsync.
>>
>> This change is somewhat similar to what we did in r1483781, but covers how
>> we write revision property files. See related discussion in dev@s.a.o
>> (Subject: "FSFS Repository corruption on high load on Windows [...] ") [1].
>>
>> Patch by: me
>>           kotkov
>>
>> [1] http://svn.haxx.se/dev/archive-2013-05/0245.shtml
>>
>> * subversion/libsvn_fs_fs/revprops.c
>>   (repack_stream_open): Rename the function ...
>>   (repack_file_open): ...to this. Rework it to return files (apr_file_t)
>>    instead of streams.
>>   (repack_revprops): Work with a file instead of a stream. Flush any
>>    unwritten data to disk before returning.
>>   (write_non_packed_revprop): Flush any unwritten data to disk after
>>    serializing the revision property list.
>>   (write_packed_revprop): Cope with the changes in repack_file_open() and
>>    repack_revprops() that now work with files. Flush the data to disk when
>>    done writing to a temporary manifest file.
>
> Is that the best approach?  In the past we have been moving away from
> file code to stream code.  Can we make the flush part of the stream
> code?  Perhaps we could create a "flushing stream" that just does flush
> and then simply insert the new stream into the old code.
>
We considered adding FLUSH_TO_DISK flag to svn_stream_from_apr_file(),
but decided to commit simple fix for now:
1. The current FSFS code uses explicit flush and this commit makes
code consistent with other parts
2. Adding flag to svn_stream_from_apr_file() will require revving API
etc, while this fix should be backported to 1.8.x and 1.9.x
3. It's unclear how FLUSH_TO_DISK flag should interact with DISOWN=TRUE.

Taking in account all above we decided to commit simple fix for now
and possibly implement FLUSH_TO_DISK flag later applying it to all
FSFS code where needed.

-- 
Ivan Zhakov

Re: svn commit: r1680819 - /subversion/trunk/subversion/libsvn_fs_fs/revprops.c

Posted by Philip Martin <ph...@wandisco.com>.
ivan@apache.org writes:

> Author: ivan
> Date: Thu May 21 11:00:43 2015
> New Revision: 1680819
>
> URL: http://svn.apache.org/r1680819
> Log:
> Prevent a possible FSFS repository corruption with power or network disk
> failures when changing revision properties. Perform a hardware flush
> after we finished writing to a temporary revprop file and before moving
> it into place. The change doesn't affect commit operation behavior.
>
> The corruption can be easily reproduced by triggering a power loss while
> performing svnsync.
>
> This change is somewhat similar to what we did in r1483781, but covers how
> we write revision property files. See related discussion in dev@s.a.o
> (Subject: "FSFS Repository corruption on high load on Windows [...] ") [1].
>
> Patch by: me
>           kotkov
>
> [1] http://svn.haxx.se/dev/archive-2013-05/0245.shtml
>
> * subversion/libsvn_fs_fs/revprops.c
>   (repack_stream_open): Rename the function ...
>   (repack_file_open): ...to this. Rework it to return files (apr_file_t)
>    instead of streams.
>   (repack_revprops): Work with a file instead of a stream. Flush any
>    unwritten data to disk before returning.
>   (write_non_packed_revprop): Flush any unwritten data to disk after
>    serializing the revision property list.
>   (write_packed_revprop): Cope with the changes in repack_file_open() and
>    repack_revprops() that now work with files. Flush the data to disk when
>    done writing to a temporary manifest file.

Is that the best approach?  In the past we have been moving away from
file code to stream code.  Can we make the flush part of the stream
code?  Perhaps we could create a "flushing stream" that just does flush
and then simply insert the new stream into the old code.

-- 
Philip Martin | Subversion Committer
WANdisco // *Non-Stop Data*