You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2015/08/28 17:53:10 UTC

svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Author: stefan2
Date: Fri Aug 28 15:53:10 2015
New Revision: 1698359

URL: http://svn.apache.org/r1698359
Log:
Introduce a stream wrapper object that adds mark/seek support to any
readable stream.  Use it on the stdin streams in our CL tools. 

As it turns out, parsing data from a stdin byte-by-byte incurs a
massive overhead of 100% internal and 300% system load over a buffered
stream.  'svnadmin load-revprops' sees a 5 times speedup if all data
is in OS disc caches.  This is a realistic assumption in a "final sync
and switch over to new repository" scenario.

The other 2 uses of stdin either have less data to process (svnfsfs
load-index) or parse only a small fraction of the stream (svnadmin load).
To avoid any memory usage issue due to the added buffering, svnadmin
load will not use the stream wrapper - the loader might clean up some
of the pools only once per revision.

* subversion/include/svn_io.h
  (svn_stream_wrap_buffered_read): Declare the new stream constructor API.

* subversion/libsvn_subr/stream.c
  (buffering_stream_wrapper_baton,
   buffering_stream_wrapper_mark): New data structures describing the
                                   wrapper stream and marker states.
  (read_handler_buffering_wrapper,
   decrement_mark_count,
   mark_handler_buffering_wrapper,
   seek_handler_buffering_wrapper,
   data_available_handler_buffering_wrapper,
   is_buffered_handler_buffering_wrapper,
   assert_zero_mark_count): Internal logic of the new stream object.
  (svn_stream_wrap_buffered_read): New constructor implementation.

* subversion/svnadmin/svnadmin.c
  (subcommand_load_revprops): Wrap the stdin stream.

* subversion/svnfsfs/load-index-cmd.c
  (subcommand__load_index): Same.

* subversion/tests/libsvn_subr/stream-test.c
  (struct stream_baton_t,
   read_handler,
   data_available_handler,
   create_test_read_stream): New configurable test read stream.
  (expect_line_content,
   test_stream_buffered_wrapper): New test for the new wrapper stream. 
  (test_funcs): Register the new test.

Modified:
    subversion/trunk/subversion/include/svn_io.h
    subversion/trunk/subversion/libsvn_subr/stream.c
    subversion/trunk/subversion/svnadmin/svnadmin.c
    subversion/trunk/subversion/svnfsfs/load-index-cmd.c
    subversion/trunk/subversion/tests/libsvn_subr/stream-test.c

Modified: subversion/trunk/subversion/include/svn_io.h
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_io.h?rev=1698359&r1=1698358&r2=1698359&view=diff
==============================================================================
--- subversion/trunk/subversion/include/svn_io.h (original)
+++ subversion/trunk/subversion/include/svn_io.h Fri Aug 28 15:53:10 2015
@@ -1157,6 +1157,31 @@ svn_stream_t *
 svn_stream_from_string(const svn_string_t *str,
                        apr_pool_t *pool);
 
+/** Return a generic read-only stream that forward a data from INNER
+ *  but uses buffering and add support for seeking for being more
+ *  efficient with parsers.
+ *
+ *  Set @a support_seek_to_start to TRUE only if you to be able to seek
+ *  to the start of stream without prior setting a mark.  Since the
+ *  stream needs to buffer all data since the oldest existing mark,
+ *  support for @a svn_stream_seek(stream,@c  NULL) results in unbounded
+ *  memory usage.  If @a support_seek_to_start is FALSE,
+ *  @a svn_stream_seek(stream, @c NULL) returns
+ *  @c SVN_ERR_STREAM_SEEK_NOT_SUPPORTED.
+ *
+ *  Allocate the stream in @a result_pool.
+ *
+ * @note To keep memory consumption low, make sure to frequently clear
+ * the pools that contain marks on the buffered stream.  Also, make sure
+ * that the total number of marks returns to zero at regular intervals.
+ *
+ * @since New in 1.10.
+ */
+svn_stream_t *
+svn_stream_wrap_buffered_read(svn_stream_t *inner,
+                              svn_boolean_t support_seek_to_start,
+                              apr_pool_t *result_pool);
+
 /** Return a generic stream which implements buffered reads and writes.
  *  The stream will preferentially store data in-memory, but may use
  *  disk storage as backup if the amount of data is large.

Modified: subversion/trunk/subversion/libsvn_subr/stream.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_subr/stream.c?rev=1698359&r1=1698358&r2=1698359&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_subr/stream.c (original)
+++ subversion/trunk/subversion/libsvn_subr/stream.c Fri Aug 28 15:53:10 2015
@@ -1710,6 +1710,276 @@ svn_stream_from_string(const svn_string_
   return stream;
 }
 
+/* Baton structure for buffering read stream wrappers.
+ *
+ * We read from INNER and append the data to BUFFER.  From BUFFER, we serve
+ * read requests. Old buffer contents gets discarded once it is no longer
+ * needed.
+ *
+ * Marks can only refer to a position that the buffer covers and seeking to
+ * a position is trivially moving the BUFFER_POS.  This implies that the
+ * buffer start cannot be moved while there are any marks.  So, we track the
+ * MARK_COUNT and use pool cleanup functions to decrement it automatically.
+ */
+struct buffering_stream_wrapper_baton
+{
+  /* Our data source. */
+  svn_stream_t *inner;
+
+  /* Contains the data pre-read from INNER. Some of this may already have
+   * been delivered. */
+  svn_stringbuf_t *buffer;
+
+  /* Current read position relative to BUFFER_START. */
+  apr_size_t buffer_pos;
+
+  /* Absolute position within the stream that corresponds to the first
+   * byte in BUFFER. */
+  svn_filesize_t buffer_start;
+
+  /* If TRUE, BUFFER_START must always remain 0.
+   * To simplify the code, we initialize MARK_COUNT to 1 in that case, so
+   * it can never reach 0. */
+  svn_boolean_t support_seek_to_start;
+
+  /* Number of marks currently set on this stream.  We may only more the
+   * BUFFER_START when this is 0. */
+  apr_size_t mark_count;
+};
+
+/* svn_stream_mark_t for buffering read stream wrappers. */
+struct buffering_stream_wrapper_mark
+{
+    /* Absolute position within the stream. */
+    svn_filesize_t pos;
+};
+
+/* Implements svn_stream_t.read_fn for buffering read stream wrappers. */
+static svn_error_t *
+read_handler_buffering_wrapper(void *baton,
+                               char *buffer,
+                               apr_size_t *len)
+{
+  struct buffering_stream_wrapper_baton *btn = baton;
+  apr_size_t left_to_read = btn->buffer->len - btn->buffer_pos;
+
+  /* This is the "normal" and potential incomplete read function.
+   * So, we only need to replenish our buffers if we ran completely dry. */
+  if (left_to_read == 0)
+    {
+      /* For the same reason, reading no more than one chunk is fine. */
+      apr_size_t count = SVN__STREAM_CHUNK_SIZE;
+
+      /* If there is no mark left, we can drop the current buffer contents
+       * making room for new data without reallocating the buffer. */
+      if (btn->mark_count == 0)
+        {
+          btn->buffer_start += btn->buffer->len;
+          btn->buffer_pos = 0;
+          btn->buffer->len = 0;
+        }
+
+      /* Read from the INNER stream, appending to whatever buffer contents
+       * there is currently. */
+      svn_stringbuf_ensure(btn->buffer, btn->buffer->len + count);
+      SVN_ERR(svn_stream_read2(btn->inner,
+                               btn->buffer->data + btn->buffer->len,
+                               &count));
+      btn->buffer->len += count;
+
+      /* We may now have more data that we could return. */
+      left_to_read = btn->buffer->len - btn->buffer_pos;
+    }
+
+  /* Cap the read request to what we can deliver from the buffer. */
+  if (left_to_read < *len)
+    *len = left_to_read;
+
+  /* Copy the data from the buffer and move the read pointer accordingly. */
+  memcpy(buffer, btn->buffer->data + btn->buffer_pos, *len);
+  btn->buffer_pos += *len;
+
+  return SVN_NO_ERROR;
+}
+
+/* Pool cleanup callback for stream marks being cleaned up.
+ * This will not only do bookkeeping but release buffer space if possible. */
+static apr_status_t
+decrement_mark_count(void *baton)
+{
+  struct buffering_stream_wrapper_baton *btn = baton;
+
+  /* Bookkeeping. */
+  --btn->mark_count;
+
+  /* If there are no marks left, we may release / move the buffer.
+   *
+   * We will do so if we can move the buffer start by at least one chunk
+   * (prevent moving data around frequently) and the read pointer is past
+   * the middle of the buffer (again, reducing the amount of data copied).
+   *
+   * After some build-up phase, we will be fluctuating between the minimum
+   * needed buffer size and twice that (rounded up to chunk size).  The
+   * "needed buffer size" is the range covered from the first mark set to
+   * the max. read position when the last mark gets cleaned up.
+   */
+  if (   btn->mark_count == 0
+      && btn->buffer_pos > SVN__STREAM_CHUNK_SIZE
+      && btn->buffer_pos * 2 > btn->buffer->len)
+    {
+      /* There is no mark left, i.e. no way to access anything before the
+       * current read position.  So, make the buffer start at the current
+       * read position. */
+      svn_stringbuf_remove(btn->buffer, 0, btn->buffer_pos);
+
+      btn->buffer_start += btn->buffer_pos;
+      btn->buffer_pos = 0;
+    }
+
+  return APR_SUCCESS;
+}
+
+/* Implements svn_stream_t.mark_fn for buffering read stream wrappers. */
+static svn_error_t *
+mark_handler_buffering_wrapper(void *baton,
+                               svn_stream_mark_t **mark,
+                               apr_pool_t *pool)
+{
+  struct buffering_stream_wrapper_baton *btn = baton;
+  struct buffering_stream_wrapper_mark *stream_mark;
+
+  stream_mark = apr_palloc(pool, sizeof(*stream_mark));
+  stream_mark->pos = (svn_filesize_t)(btn->buffer_start + btn->buffer_pos);
+
+  /* Reference counting: Increment now and schedule automatic the decrement
+   * for when the mark is being cleaned up. */
+  ++btn->mark_count;
+  apr_pool_cleanup_register(pool, baton,
+                            decrement_mark_count, apr_pool_cleanup_null);
+
+  *mark = (svn_stream_mark_t *)stream_mark;
+  return SVN_NO_ERROR;
+}
+
+/* Implements svn_stream_t.seek_fn for buffering read stream wrappers. */
+static svn_error_t *
+seek_handler_buffering_wrapper(void *baton,
+                               const svn_stream_mark_t *mark)
+{
+  struct buffering_stream_wrapper_baton *btn = baton;
+
+  if (mark != NULL)
+    {
+      const struct buffering_stream_wrapper_mark *stream_mark
+        = (const struct buffering_stream_wrapper_mark *)mark;
+
+      /* Position to move to, relative to the start of the buffer.
+       * The casts here are safe due to svn_filesize_t being 64 bits. */
+      svn_filesize_t pos = stream_mark->pos - btn->buffer_start;
+
+      /* The mark must point to somewhere inside our buffered location or
+       * immediately at the upper end of the buffer. */
+      SVN_ERR_ASSERT(pos >= 0 && pos <= (svn_filesize_t)btn->buffer->len);
+      btn->buffer_pos = (apr_size_t)pos;
+    }
+  else if (btn->support_seek_to_start)
+    {
+      /* To support this feature, we locked the buffer at position 0.
+       * Make sure that this is still the case. */
+      SVN_ERR_ASSERT(btn->buffer_start == 0);
+
+      /* Next read starts at the begin of the stream again. */
+      btn->buffer_pos = 0;
+    }
+  else
+    {
+      return svn_error_create(SVN_ERR_STREAM_SEEK_NOT_SUPPORTED, NULL,
+                              _("Can't seek to begin of stream."));
+    }
+
+  return SVN_NO_ERROR;
+}
+
+/* Implements svn_stream_t.data_available_fn for buffering read stream
+ * wrappers. */
+static svn_error_t *
+data_available_handler_buffering_wrapper(void *baton,
+                                         svn_boolean_t *data_available)
+{
+  /* If we still have some unread data, this becomes easy to answer. */
+  struct buffering_stream_wrapper_baton *btn = baton;
+  if (btn->buffer->len > btn->buffer_pos)
+    {
+      *data_available = TRUE;
+      return SVN_NO_ERROR;
+    }
+
+  /* Otherwise, because we would always read from the inner streams' current
+   * position to fill the buffer, asking the inner stream when the buffer is
+   * exhausted gives the correct answer. */
+  return svn_error_trace(svn_stream_data_available(btn->inner,
+                                                   data_available));
+}
+
+/* Implements svn_stream_t.is_buffered_fn for buffering read stream wrappers.
+ */
+static svn_boolean_t
+is_buffered_handler_buffering_wrapper(void *baton)
+{
+  return TRUE;
+}
+
+/* Pool cleanup callback checking that there are no more user-created
+ * marks on that buffering_stream_wrapper_baton BATON.  If there were,
+ * they might cause memory corruption in their cleanup code. */
+static apr_status_t
+assert_zero_mark_count(void *baton)
+{
+  struct buffering_stream_wrapper_baton *btn = baton;
+
+  apr_size_t expected_mark_count = btn->support_seek_to_start ? 1 : 0;
+  SVN_ERR_ASSERT_NO_RETURN(btn->mark_count == expected_mark_count);
+
+  return APR_SUCCESS;
+}
+
+svn_stream_t *
+svn_stream_wrap_buffered_read(svn_stream_t *inner,
+                              svn_boolean_t support_seek_to_start,
+                              apr_pool_t *result_pool)
+{
+  svn_stream_t *stream;
+  struct buffering_stream_wrapper_baton *baton;
+
+  /* Create the wrapper stream state.
+   * The buffer is empty and we are at position 0. */
+  baton = apr_pcalloc(result_pool, sizeof(*baton));
+  baton->inner = inner;
+  baton->buffer = svn_stringbuf_create_empty(result_pool);
+  baton->buffer_pos = 0;
+  baton->buffer_start = 0;
+  baton->support_seek_to_start = support_seek_to_start;
+
+  /* If seek(NULL) is supported, lock the buffer like a mark at position 0
+   * would. */
+  baton->mark_count = support_seek_to_start ? 1 : 0;
+
+  /* Create the wrapper stream object and set up the vtable. */
+  stream = svn_stream_create(baton, result_pool);
+  svn_stream_set_read2(stream, read_handler_buffering_wrapper, NULL);
+  svn_stream_set_mark(stream, mark_handler_buffering_wrapper);
+  svn_stream_set_seek(stream, seek_handler_buffering_wrapper);
+  svn_stream_set_data_available(stream,
+                                data_available_handler_buffering_wrapper);
+  svn_stream__set_is_buffered(stream, is_buffered_handler_buffering_wrapper);
+
+  /* Assert that no marks get cleaned up after the stream. */
+  apr_pool_cleanup_register(result_pool, baton,
+                            assert_zero_mark_count, apr_pool_cleanup_null);
+
+  return stream;
+}
+
 
 svn_error_t *
 svn_stream_for_stdin(svn_stream_t **in, apr_pool_t *pool)

Modified: subversion/trunk/subversion/svnadmin/svnadmin.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/svnadmin/svnadmin.c?rev=1698359&r1=1698358&r2=1698359&view=diff
==============================================================================
--- subversion/trunk/subversion/svnadmin/svnadmin.c (original)
+++ subversion/trunk/subversion/svnadmin/svnadmin.c Fri Aug 28 15:53:10 2015
@@ -1548,6 +1548,7 @@ subcommand_load_revprops(apr_getopt_t *o
 
   /* Read the stream from STDIN.  Users can redirect a file. */
   SVN_ERR(svn_stream_for_stdin(&stdin_stream, pool));
+  stdin_stream = svn_stream_wrap_buffered_read(stdin_stream, FALSE, pool);
 
   /* Progress feedback goes to STDOUT, unless they asked to suppress it. */
   if (! opt_state->quiet)

Modified: subversion/trunk/subversion/svnfsfs/load-index-cmd.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/svnfsfs/load-index-cmd.c?rev=1698359&r1=1698358&r2=1698359&view=diff
==============================================================================
--- subversion/trunk/subversion/svnfsfs/load-index-cmd.c (original)
+++ subversion/trunk/subversion/svnfsfs/load-index-cmd.c Fri Aug 28 15:53:10 2015
@@ -187,6 +187,7 @@ subcommand__load_index(apr_getopt_t *os,
   svn_stream_t *input;
 
   SVN_ERR(svn_stream_for_stdin(&input, pool));
+  input = svn_stream_wrap_buffered_read(input, FALSE, pool);
   SVN_ERR(load_index(opt_state->repository_path, input, pool));
 
   return SVN_NO_ERROR;

Modified: subversion/trunk/subversion/tests/libsvn_subr/stream-test.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/libsvn_subr/stream-test.c?rev=1698359&r1=1698358&r2=1698359&view=diff
==============================================================================
--- subversion/trunk/subversion/tests/libsvn_subr/stream-test.c (original)
+++ subversion/trunk/subversion/tests/libsvn_subr/stream-test.c Fri Aug 28 15:53:10 2015
@@ -32,6 +32,75 @@
 
 #include "../svn_test.h"
 
+struct stream_baton_t
+{
+  svn_filesize_t capacity_left;
+  char current;
+  apr_size_t max_read;
+};
+
+/* Implements svn_stream_t.read_fn. */
+static svn_error_t *
+read_handler(void *baton,
+             char *buffer,
+             apr_size_t *len)
+{
+  struct stream_baton_t *btn = baton;
+  int i;
+
+  /* Cap the read request to what we actually support. */
+  if (btn->max_read < *len)
+    *len = btn->max_read;
+  if (btn->capacity_left < *len)
+    *len = (apr_size_t)btn->capacity_left;
+
+  /* Produce output */
+  for (i = 0; i < *len; ++i)
+    {
+      buffer[i] = btn->current + 1;
+      btn->current = (btn->current + 1) & 0x3f;
+
+      btn->capacity_left--;
+    }
+
+  return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+data_available_handler(void *baton,
+                       svn_boolean_t *data_available)
+{
+  struct stream_baton_t *btn = baton;
+  *data_available = btn->capacity_left > 0;
+
+  return SVN_NO_ERROR;
+}
+
+/* Return a stream that produces CAPACITY characters in chunks of at most
+ * MAX_READ chars.  The first char will be '\1' followed by '\2' etc. up
+ * to '\x40' and then repeating the cycle until the end of the stream.
+ * Allocate the result in RESULT_POOL. */
+static svn_stream_t *
+create_test_read_stream(svn_filesize_t capacity,
+                        apr_size_t max_read,
+                        apr_pool_t *result_pool)
+{
+  svn_stream_t *stream;
+  struct stream_baton_t *baton;
+
+  baton = apr_pcalloc(result_pool, sizeof(*baton));
+  baton->capacity_left = capacity;
+  baton->current = 0;
+  baton->max_read = max_read;
+
+  stream = svn_stream_create(baton, result_pool);
+  svn_stream_set_read2(stream, read_handler, NULL);
+  svn_stream_set_data_available(stream, data_available_handler);
+
+  return stream;
+}
+
+/*------------------------ Tests --------------------------- */
 
 static svn_error_t *
 test_stream_from_string(apr_pool_t *pool)
@@ -803,6 +872,75 @@ test_stream_compressed_read_full(apr_poo
   return SVN_NO_ERROR;
 }
 
+/* Utility function verifying that LINE contains LENGTH characters read
+ * from a stream returned by create_test_read_stream().  C is the first
+ * character expected in LINE. */
+static svn_error_t *
+expect_line_content(svn_stringbuf_t *line,
+                    char start,
+                    apr_size_t length)
+{
+  apr_size_t i;
+  char c = start - 1;
+
+  SVN_TEST_ASSERT(line->len == length);
+  for (i = 0; i < length; ++i)
+    {
+      SVN_TEST_ASSERT(line->data[i] == c + 1);
+      c = (c + 1) & 0x3f;
+    }
+
+  return SVN_NO_ERROR;
+}
+
+static svn_error_t *
+test_stream_buffered_wrapper(apr_pool_t *pool)
+{
+  apr_pool_t *iterpool = svn_pool_create(pool);
+  svn_stringbuf_t *line;
+  svn_boolean_t eof = FALSE;
+  apr_size_t read = 0;
+
+  /* At least a few stream chunks (16k) worth of data. */
+  const apr_size_t stream_length = 100000;
+
+  /* Our source stream delivers data in very small chunks only.
+   * This requires multiple reads per line while readline will hold marks
+   * etc. */
+  svn_stream_t *stream = create_test_read_stream(stream_length, 19, pool);
+  stream = svn_stream_wrap_buffered_read(stream, FALSE, pool);
+
+  /* We told the stream not to supports seeking to the start. */
+  SVN_TEST_ASSERT_ERROR(svn_stream_seek(stream, NULL),
+                        SVN_ERR_STREAM_SEEK_NOT_SUPPORTED);
+
+  /* Read all lines. Check EOF detection. */
+  while (!eof)
+    {
+      /* The local pool ensures that marks get cleaned up. */
+      svn_pool_clear(iterpool);
+      SVN_ERR(svn_stream_readline(stream, &line, "\n", &eof, iterpool));
+
+      /* Verify that we read the correct data and the full stream. */
+      if (read == 0)
+        SVN_ERR(expect_line_content(line, 1, '\n' - 1));
+      else if (eof)
+        SVN_ERR(expect_line_content(line, '\n' + 1, stream_length - read));
+      else
+        SVN_ERR(expect_line_content(line, '\n' + 1, 63));
+
+      /* Update bytes read. */
+      read += line->len + 1;
+    }
+
+  /* Test a stream wrapper that supports seeking to the start. */
+  stream = create_test_read_stream(stream_length, 19, pool);
+  stream = svn_stream_wrap_buffered_read(stream, TRUE, pool);
+  SVN_ERR(svn_stream_seek(stream, NULL));
+
+  return SVN_NO_ERROR;
+}
+
 /* The test table.  */
 
 static int max_threads = 1;
@@ -834,6 +972,8 @@ static struct svn_test_descriptor_t test
                    "test svn_stringbuf_from_stream"),
     SVN_TEST_PASS2(test_stream_compressed_read_full,
                    "test compression for streams without partial read"),
+    SVN_TEST_PASS2(test_stream_buffered_wrapper,
+                   "test buffering read stream wrapper"),
     SVN_TEST_NULL
   };
 



Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Wed, Sep 2, 2015 at 5:11 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Branko Čibej <br...@wandisco.com> writes:
>
> > Uh, sorry, you don't get to dictate how the veto gets resolved. Reworking
> > what's on trunk is as valid as reverting. If you don't intend to work on
> the
> > solution, you may as well just let Stefan do that whichever way works
> best
> > for him.
>
> I certainly intend to work on the solution for the problems that I stated
> in my e-mails, but it's a quite difficult task given the amount of changes
> that target these problems, but happen without any prior discussion.


Fair enough. My position on that has two different aspects. First, I think
that the initial commit was well-contained (embracing the stream design
pattern, no server code touched, i.e. no risk of CVEs, explicitly selective
in its application). There was no reason to believe any released
functionality
would be impaired - and it wasn't. So, nothing prompted me to reach out
for feedback beforehand.

The second thing is that I rather prefer exploring a solution in terms of
writing code over speculating about it. Once a patch is available, I find it
much more convenient to review it in the code base than to fiddle through
patch contributions in e-mail. If the necessary change seems complex or
risky, open a branch. This is particularly true in later stages of a release
cycle.

The exploring part becomes particularly attractive when "urgency" is
indicated by raising an issue late Friday night before a long weekend
(Monday was a holiday in the UK) or holidays in general. I then like
to get these things of my plate while at the same time quickly finding
a partner for discussion is difficult. So, that makes just trying to come
up with a fix very attractive.

I am aware that others may prefer e-mail discussions with patches
going back and forth and that's perfectly fine. It's just not my preference.
I'm also pretty sure that you do not purposefully post issues to ruin other
people's time off but it important to know that it does have an effect
on my mode of work. And this is the only reason why I even bring it up.


>   This
> has now happened twice — once with r1700305, and the second time with
> r1700799, and I believe that both of these changes contain (other)
> problems.
>

r1700305 is a prime example of what I explained above.

r1700799 is very different. You called my code a mess and asked
for changes to be combined for easy review. That's what you've got.

What other problems do you suspect in r1700799?

>
> The reason why I asked if Stefan could revert the old changes beforehand is
> because otherwise it becomes easy to get lost for anyone who tries to get
> the big picture of what's happening, what we are trying to fix, and how do
> we do that.
>

Honestly, I found the initial "Please revert." rude. An implicit but
nonetheless absolute demand. It can easily be read as "I wont
listen to anything you say until you did as I just told you!"  A "I
think we should revert this and look for an alternative solution"
would probably have been much closer to your actual intentions.
I say closer because those are my words not yours.

The second time, you gave at least a rationale (ease of review).
After a few moments of figuring out that all what was needed was
simply a double revert, it has been a matter of minutes to actually
do it - without losing any functionality from my POV. So, I took
that as an annoying but fair request.

After being all explainy in this mail, I'd like to close with something
constructive. Maybe, it would help to prefix "I think stuff is broken"
mails with a simple 1-liner like "[important but not urgent]" or even
"[quick revert needed]". Not as a tag in the subject but just to set
the tone. Then it becomes obvious a solution is needed right away
or if it can easily wait for a week or two.

-- Stefan^2.

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Branko Čibej <br...@wandisco.com> writes:

> Uh, sorry, you don't get to dictate how the veto gets resolved. Reworking
> what's on trunk is as valid as reverting. If you don't intend to work on the
> solution, you may as well just let Stefan do that whichever way works best
> for him.

I certainly intend to work on the solution for the problems that I stated
in my e-mails, but it's a quite difficult task given the amount of changes
that target these problems, but happen without any prior discussion.  This
has now happened twice — once with r1700305, and the second time with
r1700799, and I believe that both of these changes contain (other) problems.

The reason why I asked if Stefan could revert the old changes beforehand is
because otherwise it becomes easy to get lost for anyone who tries to get
the big picture of what's happening, what we are trying to fix, and how do
we do that.


Regards,
Evgeny Kotkov

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Branko Čibej <br...@wandisco.com>.
On 2 Sep 2015 2:44 pm, "Evgeny Kotkov" <ev...@visualsvn.com> wrote:

> I stated my veto and provided the justification that covers both r1700305
> and r1698359.  So, could you please revert both of them?  Reworking and
> adding changes on top of it is going to increase the mess and will be
harder
> to review.

Uh, sorry, you don't get to dictate how the veto gets resolved. Reworking
what's on trunk is as valid as reverting. If you don't intend to work on
the solution, you may as well just let Stefan do that whichever way works
best for him.

-- Brane

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ivan Zhakov wrote on Wed, Sep 02, 2015 at 18:39:38 +0300:
> On 2 September 2015 at 17:01, Johan Corveleyn <jc...@gmail.com> wrote:
> > On Wed, Sep 2, 2015 at 2:43 PM, Evgeny Kotkov
> > <ev...@visualsvn.com> wrote:
> >> Stefan Fuhrmann <st...@wandisco.com> writes:
> >>
> >>>> Given the above, I am -1 on this optimization for svnadmin load-revprops
> >>>> that was implemented in r1698359 and r1700305.  Please revert these
> >>>> changes.
> >>>
> >>> Thinking about alternative solutions I found that simply having a buffering
> >>> wrapper without mark support would still eliminate the OS overhead and parts
> >>> of the internal overhead. That would address all the points you have made
> >>> above while still providing a decent improvement.
> >>>
> >>> IOW, revert r1700305 and rework / reduce / simplify the code changed
> >>> by r1698359.
> >>
> >> I stated my veto and provided the justification that covers both r1700305
> >> and r1698359.  So, could you please revert both of them?  Reworking and
> >> adding changes on top of it is going to increase the mess and will be harder
> >> to review.
> >
> > ISTR that in this community we try to treat a veto as a signal that
> > further discussion is needed, or that more work is needed to resolve
> > the question / issue. You may ask / suggest a revert if you think
> > that's best, but it's mainly up for discussion how to best resolve
> > things (and other avenues, such as further commits, may also resolve
> > the issue).
> >
> You're right, but the problem is that the discussion doesn't happen.
> Instead of this new code is committed (usually broken again) [1].
> 

Could you please refrain from describing anyone's code as "usually
broken"?  That's neither concrete nor specific to the issue at hand
(r1698359 and r1700305).

Thanks,

Daniel

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Ivan Zhakov wrote on Wed, Sep 02, 2015 at 18:39:38 +0300:
> On 2 September 2015 at 17:01, Johan Corveleyn <jc...@gmail.com> wrote:
> > On Wed, Sep 2, 2015 at 2:43 PM, Evgeny Kotkov
> > <ev...@visualsvn.com> wrote:
> >> Stefan Fuhrmann <st...@wandisco.com> writes:
> >>
> >>>> Given the above, I am -1 on this optimization for svnadmin load-revprops
> >>>> that was implemented in r1698359 and r1700305.  Please revert these
> >>>> changes.
> >>>
> >>> Thinking about alternative solutions I found that simply having a buffering
> >>> wrapper without mark support would still eliminate the OS overhead and parts
> >>> of the internal overhead. That would address all the points you have made
> >>> above while still providing a decent improvement.
> >>>
> >>> IOW, revert r1700305 and rework / reduce / simplify the code changed
> >>> by r1698359.
> >>
> >> I stated my veto and provided the justification that covers both r1700305
> >> and r1698359.  So, could you please revert both of them?  Reworking and
> >> adding changes on top of it is going to increase the mess and will be harder
> >> to review.
> >
> > ISTR that in this community we try to treat a veto as a signal that
> > further discussion is needed, or that more work is needed to resolve
> > the question / issue. You may ask / suggest a revert if you think
> > that's best, but it's mainly up for discussion how to best resolve
> > things (and other avenues, such as further commits, may also resolve
> > the issue).
> >
> You're right, but the problem is that the discussion doesn't happen.
> Instead of this new code is committed (usually broken again) [1].
> 
> Let summarize what happened from my point of view:
> 1. An optimization was implemented in r1698359
> 2. Evgeny raised his concerns about possible unbounded memory usage
> 3. Some fix was applied in r1700305 (no discussion happened)
> 4. Evgeny vetoed both of them.
> 5. The original optimization was reworked (no discussion happened)

Vetoes do not require the vetoed code to be reverted.

Vetoes do not require the vetoed codepath to be frozen.

Vetoes require the community to resolve the technical concernes
underlying them before the next release.

Rewriting the code is a form of discussion.  That discussion is
continuing on the r1700799 thread which you linked.  Let's wait for
Stefan2 to reply on that thread to Evgeny's technical concerns with the
rewrite.

> 6. Another problem was found in the new code [1]
> 7. ???

What's the question?  Do what you always do whenever a problem is found
in the code: review the code, reach consensus on a fix, implement it;
iteratively as needed.  That's exactly what the r1700799 thread is.

Cheers,

Daniel

P.S. I haven't reviewed the technical content of the commits in
question.

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On 2 September 2015 at 17:01, Johan Corveleyn <jc...@gmail.com> wrote:
> On Wed, Sep 2, 2015 at 2:43 PM, Evgeny Kotkov
> <ev...@visualsvn.com> wrote:
>> Stefan Fuhrmann <st...@wandisco.com> writes:
>>
>>>> Given the above, I am -1 on this optimization for svnadmin load-revprops
>>>> that was implemented in r1698359 and r1700305.  Please revert these
>>>> changes.
>>>
>>> Thinking about alternative solutions I found that simply having a buffering
>>> wrapper without mark support would still eliminate the OS overhead and parts
>>> of the internal overhead. That would address all the points you have made
>>> above while still providing a decent improvement.
>>>
>>> IOW, revert r1700305 and rework / reduce / simplify the code changed
>>> by r1698359.
>>
>> I stated my veto and provided the justification that covers both r1700305
>> and r1698359.  So, could you please revert both of them?  Reworking and
>> adding changes on top of it is going to increase the mess and will be harder
>> to review.
>
> ISTR that in this community we try to treat a veto as a signal that
> further discussion is needed, or that more work is needed to resolve
> the question / issue. You may ask / suggest a revert if you think
> that's best, but it's mainly up for discussion how to best resolve
> things (and other avenues, such as further commits, may also resolve
> the issue).
>
You're right, but the problem is that the discussion doesn't happen.
Instead of this new code is committed (usually broken again) [1].

Let summarize what happened from my point of view:
1. An optimization was implemented in r1698359
2. Evgeny raised his concerns about possible unbounded memory usage
3. Some fix was applied in r1700305 (no discussion happened)
4. Evgeny vetoed both of them.
5. The original optimization was reworked (no discussion happened)
6. Another problem was found in the new code [1]
7. ???

This reminds me of the story we had with L1 DAG node cache -- see
"Unbounded memory usage and DoS possibility in mod_dav_svn / FSFS DAG
node cache" in private@s.a.o.

[1] http://svn.haxx.se/dev/archive-2015-09/0010.shtml

-- 
Ivan Zhakov

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Sep 02, 2015 at 04:01:08PM +0200, Johan Corveleyn wrote:
> ISTR that in this community we try to treat a veto as a signal that
> further discussion is needed, or that more work is needed to resolve
> the question / issue. You may ask / suggest a revert if you think
> that's best, but it's mainly up for discussion how to best resolve
> things (and other avenues, such as further commits, may also resolve
> the issue).
> 
> Of course, in some circumstances there is more pressure to revert
> quickly (e.g. when the build is broken, or other developer's work is
> blocked for other reasons, or ...). But usually we try to work our way
> towards consensus without "forcing" a revert.
> 
> On the other hand, it might have been better for Stefan to wait for
> your reaction first, and discuss the issue and a possible solution,
> before making further changes ...
> It's not like this is such an urgent matter that it cannot wait for a
> couple of days ...
> 
> Just my 2 cents,
> -- 
> Johan

+1

Thank you for writing this!

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Sep 2, 2015 at 2:43 PM, Evgeny Kotkov
<ev...@visualsvn.com> wrote:
> Stefan Fuhrmann <st...@wandisco.com> writes:
>
>>> Given the above, I am -1 on this optimization for svnadmin load-revprops
>>> that was implemented in r1698359 and r1700305.  Please revert these
>>> changes.
>>
>> Thinking about alternative solutions I found that simply having a buffering
>> wrapper without mark support would still eliminate the OS overhead and parts
>> of the internal overhead. That would address all the points you have made
>> above while still providing a decent improvement.
>>
>> IOW, revert r1700305 and rework / reduce / simplify the code changed
>> by r1698359.
>
> I stated my veto and provided the justification that covers both r1700305
> and r1698359.  So, could you please revert both of them?  Reworking and
> adding changes on top of it is going to increase the mess and will be harder
> to review.

ISTR that in this community we try to treat a veto as a signal that
further discussion is needed, or that more work is needed to resolve
the question / issue. You may ask / suggest a revert if you think
that's best, but it's mainly up for discussion how to best resolve
things (and other avenues, such as further commits, may also resolve
the issue).

Of course, in some circumstances there is more pressure to revert
quickly (e.g. when the build is broken, or other developer's work is
blocked for other reasons, or ...). But usually we try to work our way
towards consensus without "forcing" a revert.

On the other hand, it might have been better for Stefan to wait for
your reaction first, and discuss the issue and a possible solution,
before making further changes ...
It's not like this is such an urgent matter that it cannot wait for a
couple of days ...

Just my 2 cents,
-- 
Johan

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

>> Given the above, I am -1 on this optimization for svnadmin load-revprops
>> that was implemented in r1698359 and r1700305.  Please revert these
>> changes.
>
> Thinking about alternative solutions I found that simply having a buffering
> wrapper without mark support would still eliminate the OS overhead and parts
> of the internal overhead. That would address all the points you have made
> above while still providing a decent improvement.
>
> IOW, revert r1700305 and rework / reduce / simplify the code changed
> by r1698359.

I stated my veto and provided the justification that covers both r1700305
and r1698359.  So, could you please revert both of them?  Reworking and
adding changes on top of it is going to increase the mess and will be harder
to review.

> I wanted to add -F for a while now because it makes debugging much easier
> (when using IDEs). Never got around it, though.
>
> So, feel free to add -F support to load and load-revprops.

Frankly, I don't have the numbers that would prove the noticeable performance
benefit from replacing byte-by-byte parsing with chunk-based parsing.  So, this
is not on my current priority list.


Regards,
Evgeny Kotkov

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Sep 1, 2015 at 7:26 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > Yes. This is exactly why we can only use it when we have reasonable
> control
> > over the stream's usage, i.e. we can use it in our CL tools because all
> the
> > code that will be run is under our control. But we cannot make e.g.
> > svn_stream_for_stdin() use it by default.
>
> [...]
>
> > The best solution seems to be to allow for explicit resource management
> as
> > we do with other potentially "expensive" objects. r1700305 implements
> that.
>
> I have several concerns about these changes (r1698359 and r1700305):
>
> 1) The new public API — svn_stream_wrap_buffered_read() — is dangerous to
>    use, and wrong usage causes unbounded memory consumption in seemingly
>    harmless situations. [...]
>

This is also true for svn_stringbuf_t, property lists and changed paths
lists.

2) The new API is backward incompatible.
>
>    The problem is not only about how we handle this in our codebase.
> Existing
>    API users have no idea about the fact that having svn_stream_mark_t
> objects
>    could be "expensive", and about the existence of
> svn_stream_remove_mark().

[...]
>

Some of them already *are* expensive. See translated_stream_mark()
and to some degree stream_mark() in jni_io_stream.cpp.


> 3) The behavior of pool-based reference counting is unpredictable.
>
[...]


It is no more unpredictable than open file handles, right?
You have to close or clean up all of them before you can
e.g. delete the file in Windows.


> Given the above, I am -1 on this optimization for svnadmin load-revprops
> that
> was implemented in r1698359 and r1700305.  Please revert these changes.
>

Thinking about alternative solutions I found that simply
having a buffering wrapper without mark support would
still eliminate the OS overhead and parts of the internal
overhead. That would address all the points you have
made above while still providing a decent improvement.

IOW, revert r1700305 and rework / reduce / simplify the
code changed by r1698359.

As for the problem itself, if the way we currently process the input during
> svnadmin load and load-revprops is causing a noticeable overhead, I think
> that
> we should introduce -F (--file) option to both of these commands:
>
>   svnadmin load /path/to/repos -F (--file) /path/to/dump
>
>   svnadmin load-revprops /path/to/repos -F (--file) /path/to/dump
>

I wanted to add -F for a while now because it makes
debugging much easier (when using IDEs). Never got
around it, though.

So, feel free to add -F support to load and load-revprops.

-- Stefan^2.

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Branko Čibej <br...@wandisco.com>.
On 01.09.2015 20:26, Evgeny Kotkov wrote:
> Stefan Fuhrmann <st...@wandisco.com> writes:
>
>> Yes. This is exactly why we can only use it when we have reasonable control
>> over the stream's usage, i.e. we can use it in our CL tools because all the
>> code that will be run is under our control. But we cannot make e.g.
>> svn_stream_for_stdin() use it by default.
> [...]
>
>> The best solution seems to be to allow for explicit resource management as
>> we do with other potentially "expensive" objects. r1700305 implements that.
> I have several concerns about these changes (r1698359 and r1700305):

FWIW: I agree with Evgeny's analysis and conclusions. There surely must
be a way to get reasonable performance from a generic stream without the
really flaky memory management that these changes bring.

One approach might be a similar buffered-stream wrapper that supports
mark/seek, but where the caller provides a (fixed-size) buffer and/or
buffer management callbacks. Something like that would make the
buffering explicit to the API consumer, although things might still
become tricky if such a stream is used in a generic stream context.
Perhaps such a buffered stream should be a completely different type of
object.

> As for the problem itself, if the way we currently process the input during
> svnadmin load and load-revprops is causing a noticeable overhead, I think that
> we should introduce -F (--file) option to both of these commands:
>
>   svnadmin load /path/to/repos -F (--file) /path/to/dump
>
>   svnadmin load-revprops /path/to/repos -F (--file) /path/to/dump
>
> As long as file streams support both svn_stream_seek() and svn_stream_mark(),
> this should avoid byte-by-byte processing of the input and get rid of the
> associated overhead.

This would not solve the common case where users dump/load without
incurring the possibly huge, or even unmanageable overhead of creating
an intermediate dumpfile.

-- Brane


Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Stefan Fuhrmann <st...@wandisco.com> writes:

> Yes. This is exactly why we can only use it when we have reasonable control
> over the stream's usage, i.e. we can use it in our CL tools because all the
> code that will be run is under our control. But we cannot make e.g.
> svn_stream_for_stdin() use it by default.

[...]

> The best solution seems to be to allow for explicit resource management as
> we do with other potentially "expensive" objects. r1700305 implements that.

I have several concerns about these changes (r1698359 and r1700305):

1) The new public API — svn_stream_wrap_buffered_read() — is dangerous to
   use, and wrong usage causes unbounded memory consumption in seemingly
   harmless situations.

   A svn_stream_mark() mark placed on this stream causes all data that's read
   afterwards to be stored in memory, even if the data was processed and
   immediately discarded.  Lifetime of the buffered data is bound to the
   lifetime of the pool where the mark object lives.  Hence, avoiding memory
   usage issues now requires either precise pool management or explicit control
   of where and when the svn_stream_mark_t objects are added and removed.

   For example, prior to r1700305, svnadmin load-revprops was actually causing
   unbounded memory usage.  Executing this command with a dump containing large
   revisions (e.g., .iso files) was quickly consuming all the available RAM on
   the machine.  This behavior was caused by a svn_stream_mark_t allocated in
   a long-living pool that was preventing deallocation of the buffered data.

   As of r1700305, the problem is mitigated by calling svn_stream_remove_mark()
   within the svn_stream_readline() implementation.  However, the new stream
   is generic, and every potential user of the stream API should be aware of
   these caveats, because she could in fact be working with the new stream.

   Place a mark somewhere in the middle of what svn_repos_parse_dumpstream3()
   does — and you could be already consuming unbounded amounts of memory.
   As another example, the pools are not being cleaned up / destroyed during
   error flow.  Say, we successfully place a stream mark, but receive an error
   afterwards, and, coincidentally, the caller thinks that it's okay to keep
   reading from the stream after this specific error.  Again, the memory usage
   is potentially unbounded.

2) The new API is backward incompatible.

   The problem is not only about how we handle this in our codebase.  Existing
   API users have no idea about the fact that having svn_stream_mark_t objects
   could be "expensive", and about the existence of svn_stream_remove_mark().

   Calling svn_stream_mark() can have a severe impact on the behavior of the
   stream, and can trigger memory usage issues if used improperly.  The new
   stream is a generic svn_stream_t, but instances of this stream cannot cross
   the API boundaries, because if they do, the users would be required to
   adapt their code in order to avoid the possibility of unbounded memory
   consumption.

3) The behavior of pool-based reference counting is unpredictable.

   The new svn_stream_wrap_buffered_read() stream performs reference counting
   on a per-pool basis, and its behavior depends of whether the corresponding
   pools were cleared / destroyed, or not.  This behavior is unpredictable and
   can lead to hard-to-diagnose problems that only happen under non-default
   circumstances or with certain data patterns.

   Furthermore, we have a history of dealing with unbounded memory usage that
   happened due to pool-based refcounting in the first-level FSFS DAG cache
   (CVE-2015-0202 [1]).  As far as I recall, at some point a long-living pool
   was causing positive reference count and preventing deallocation of the
   cached objects.  As I see it, the new stream does the same sort of reference
   counting for pools holding svn_stream_mark_t objects, and, similarly, if
   one of those objects is allocated in a long-living pool, it prevents the
   deallocation of the data buffered by the stream.

Given the above, I am -1 on this optimization for svnadmin load-revprops that
was implemented in r1698359 and r1700305.  Please revert these changes.

As for the problem itself, if the way we currently process the input during
svnadmin load and load-revprops is causing a noticeable overhead, I think that
we should introduce -F (--file) option to both of these commands:

  svnadmin load /path/to/repos -F (--file) /path/to/dump

  svnadmin load-revprops /path/to/repos -F (--file) /path/to/dump

As long as file streams support both svn_stream_seek() and svn_stream_mark(),
this should avoid byte-by-byte processing of the input and get rid of the
associated overhead.

[1] https://subversion.apache.org/security/CVE-2015-0202-advisory.txt


Regards,
Evgeny Kotkov

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sat, Aug 29, 2015 at 3:57 PM, Evgeny Kotkov <ev...@visualsvn.com>
wrote:

> Hi Stefan,
>
> Stefan Fuhrmann <st...@apache.org> writes:
>
> > Introduce a stream wrapper object that adds mark/seek support to any
> > readable stream.  Use it on the stdin streams in our CL tools.
>
> [...]
>
> What happens if someone calls svn_stream_mark(stream, &mark, pool) on the
> stream and then — perhaps, much later — reads a huge amount of data from
> the
> stream?  Say, the total amount of data to read is 8 GiB, and it's
> processed in
> small chunks.  Is it true that in this case all 8 GiB of data are going to
> be
> buffered by the stream and kept in memory?
>

Yes. This is exactly why we can only use it when we have
reasonable control over the stream's usage, i.e. we can
use it in our CL tools because all the code that will be run
is under our control. But we cannot make e.g.
svn_stream_for_stdin() use it by default.


> Here is an quick example:
>
>  SVN_ERR(svn_stream_mark(buffered_stream, &mark, pool));
>  ...
>
>  while (!eof)
>    {
>      svn_stringbuf_t *str;
>
>      svn_pool_clear(iterpool);
>
>      SVN_ERR(svn_stream_readline(buffered_stream, &str, "\n", &eof,
> iterpool));
>      ...
>    }
>

And this example is even nicer than say our dump stream
parser code where some pools are relatively long-lived.
Not a problem for revprop-only dumps (one cleanup per
revision is enough there) but for generic dumps.

The best solution seems to be to allow for explicit resource
management as we do with other potentially "expensive"
objects. r1700305 implements that.

-- Stefan^2.

Re: svn commit: r1698359 - in /subversion/trunk/subversion: include/svn_io.h libsvn_subr/stream.c svnadmin/svnadmin.c svnfsfs/load-index-cmd.c tests/libsvn_subr/stream-test.c

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Hi Stefan,

Stefan Fuhrmann <st...@apache.org> writes:

> Introduce a stream wrapper object that adds mark/seek support to any
> readable stream.  Use it on the stdin streams in our CL tools.

[...]

What happens if someone calls svn_stream_mark(stream, &mark, pool) on the
stream and then — perhaps, much later — reads a huge amount of data from the
stream?  Say, the total amount of data to read is 8 GiB, and it's processed in
small chunks.  Is it true that in this case all 8 GiB of data are going to be
buffered by the stream and kept in memory?

Here is an quick example:

 SVN_ERR(svn_stream_mark(buffered_stream, &mark, pool));
 ...

 while (!eof)
   {
     svn_stringbuf_t *str;

     svn_pool_clear(iterpool);

     SVN_ERR(svn_stream_readline(buffered_stream, &str, "\n", &eof, iterpool));
     ...
   }


Regards,
Evgeny Kotkov