You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by st...@apache.org on 2014/09/28 19:56:01 UTC

svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Author: stefan2
Date: Sun Sep 28 17:56:01 2014
New Revision: 1628093

URL: http://svn.apache.org/r1628093
Log:
Support FSFS format 7 commits in load balanced mixed-architecture clusters.

At least theoretically, machines with different endianess or off_t sizes
might access the same repository on e.g. an iSCSI SAN.  Then the machine
performing a commit may have a different architecture from the one building
up the transaction.  To allow that, even the intermediate (proto-) index
format within those transactions must be platform-independent.

Instead of writing the records as blobs, we store each member individually
as an unsigned 64 bit integer in little endian order.  Limiting all values
to non-negative (with a special handling of SVN_INVALID_REVNUM) makes
conversions from and to the in-memory types well-defined.  We only need
to check for overflows - e.g. in case one machine's apr_off_t is too short.

All functions reading from or writing to the proto index files must use
the new converting reader and writer functions instead of reading plain
blobs themselves.

Finally, update the FSFS structure documentation.

* subversion/libsvn_fs_fs/index.c
  (off_t_max): Define this new APR_OFF_T_MAX surrogate.
  (P2L_PROTO_INDEX_ENTRY_SIZE): Define this record size constant to allow
                                for random file access. Only needed for the
                                P2L proto-index.

  (write_uint64_to_proto_index,
   read_uint64_from_proto_index): New code to read and write individual
                                  numbers to/from the proto-index file in
                                  a platform neutral fixed length format.
  (read_uint32_from_proto_index,
   read_off_t_from_proto_index): Convenience wrappers around the previous.

  (write_entry_to_proto_index): Rename to ...
  (write_l2p_entry_to_proto_index): ... this for clarity.  Use the new
                                    function to write proto index file.
  (read_l2p_entry_from_proto_index): New function to read the proto index.
  (svn_fs_fs__l2p_proto_index_add_revision,
   svn_fs_fs__l2p_proto_index_add_entry): Update callers after rename.
  (svn_fs_fs__l2p_index_append,
   l2p_proto_index_lookup): Call the new reader function instead of reading
                            a plain blob directly from the proto index.

  (svn_fs_fs__p2l_proto_index_add_entry): Write all struct elements
                                          individually, check before
                                          converting them to uint64.
  (read_p2l_entry_from_proto_index): New inverse function to the above.
  (svn_fs_fs__p2l_proto_index_next_offset): Call the new reader function
                                        and update the offset calculation.
  (svn_fs_fs__p2l_index_append): Call the new reader function.

* subversion/libsvn_fs_fs/structure-indexes
  (Design): Add a section about the general proto-index formatting.
  (Log-to-phys index,
   Phys-to-log index): Update the proto-index file format descriptions.

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/index.c
    subversion/trunk/subversion/libsvn_fs_fs/structure-indexes

Modified: subversion/trunk/subversion/libsvn_fs_fs/index.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/index.c?rev=1628093&r1=1628092&r2=1628093&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/index.c (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/index.c Sun Sep 28 17:56:01 2014
@@ -43,6 +43,22 @@
 /* maximum length of a uint64 in an 7/8b encoding */
 #define ENCODED_INT_LENGTH 10
 
+/* APR is missing an APR_OFF_T_MAX.  So, define one.  We will use it to
+ * limit file offsets stored in the indexes.
+ *
+ * We assume that everything shorter than 64 bits, it is at least 32 bits.
+ * We also assume that the type is always signed meaning we only have an
+ * effective positive range of 63 or 31 bits, respectively.
+ */
+const apr_uint64_t off_t_max = (sizeof(apr_off_t) == sizeof(apr_int64_t))
+                             ? APR_INT64_MAX
+                             : APR_INT32_MAX;
+
+/* We store P2L proto-index entries as 6 values, 64 bits each on disk.
+ * See also svn_fs_fs__p2l_proto_index_add_entry().
+ */
+#define P2L_PROTO_INDEX_ENTRY_SIZE (6 * sizeof(apr_uint64_t))
+
 /* Page tables in the log-to-phys index file exclusively contain entries
  * of this type to describe position and size of a given page.
  */
@@ -447,22 +463,165 @@ decode_int(apr_uint64_t value)
   return (apr_int64_t)(value % 2 ? -1 - value / 2 : value / 2);
 }
 
+/* Write VALUE to the PROTO_INDEX file, using SCRATCH_POOL for temporary
+ * allocations.
+ *
+ * The point of this function is to ensure an architecture-independent
+ * proto-index file format.  All data is written as unsigned 64 bits ints
+ * in little endian byte order.  64 bits is the largest portable integer
+ * we have and unsigned values have well-defined conversions in C.
+ */
+static svn_error_t *
+write_uint64_to_proto_index(apr_file_t *proto_index,
+                            apr_uint64_t value,
+                            apr_pool_t *scratch_pool)
+{
+  apr_byte_t buffer[sizeof(value)];
+  int i;
+  apr_size_t written;
+
+  /* Split VALUE into 8 bytes using LE ordering. */
+  for (i = 0; i < sizeof(buffer); ++i)
+    {
+      /* Unsigned conversions are well-defined ... */
+      buffer[i] = (apr_byte_t)value;
+      value >>= CHAR_BIT;
+    }
+
+  /* Write it all to disk. */
+  SVN_ERR(svn_io_file_write_full(proto_index, buffer, sizeof(buffer),
+                                 &written, scratch_pool));
+  SVN_ERR_ASSERT(written == sizeof(buffer));
+
+  return SVN_NO_ERROR;
+}
+
+/* Read one unsigned 64 bit value from PROTO_INDEX file and return it in
+ * *VALUE_P.  If EOF is NULL, error out when trying to read beyond EOF.
+ * Use SCRATCH_POOL for temporary allocations.
+ *
+ * This function is the inverse to write_uint64_to_proto_index (see there),
+ * reading the external LE byte order and convert it into host byte order.
+ */
+static svn_error_t *
+read_uint64_from_proto_index(apr_file_t *proto_index,
+                             apr_uint64_t *value_p,
+                             svn_boolean_t *eof,
+                             apr_pool_t *scratch_pool)
+{
+  apr_byte_t buffer[sizeof(*value_p)];
+  apr_size_t read;
+
+  /* Read the full 8 bytes or our 64 bit value, unless we hit EOF.
+   * Assert that we never read partial values. */
+  SVN_ERR(svn_io_file_read_full2(proto_index, buffer, sizeof(buffer),
+                                 &read, eof, scratch_pool));
+  SVN_ERR_ASSERT((eof && *eof) || read == sizeof(buffer));
+
+  /* If we did not hit EOF, reconstruct the uint64 value and return it. */
+  if (!eof || !*eof)
+    {
+      int i;
+      apr_uint64_t value;
+
+      /* This could only overflow if CHAR_BIT had a value that is not
+       * a divisor of 64. */
+      value = 0;
+      for (i = sizeof(buffer) - 1; i >= 0; --i)
+        value = (value << CHAR_BIT) + buffer[i];
+
+      *value_p = value;
+    }
+
+  return SVN_NO_ERROR;
+}
+
+/* Convenience function similar to read_uint64_from_proto_index, but returns
+ * an uint32 value in VALUE_P.  Return an error if the value does not fit.
+ */
+static svn_error_t *
+read_uint32_from_proto_index(apr_file_t *proto_index,
+                             apr_uint32_t *value_p,
+                             svn_boolean_t *eof,
+                             apr_pool_t *scratch_pool)
+{
+  apr_uint64_t value;
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &value, eof,
+                                       scratch_pool));
+  if (value > APR_UINT32_MAX)
+    return svn_error_createf(SVN_ERR_FS_ITEM_INDEX_OVERFLOW, NULL,
+                             _("UINT32 0x%" APR_UINT64_T_HEX_FMT
+                               " too large, max = 0x%" APR_UINT64_T_HEX_FMT),
+                             value, (apr_uint64_t)APR_UINT32_MAX);
+
+  /* This conversion is not lossy because the value can be represented in
+   * the target type. */
+  *value_p = (apr_uint32_t)value;
+
+  return SVN_NO_ERROR;
+}
+
+/* Convenience function similar to read_uint64_from_proto_index, but returns
+ * an off_t value in VALUE_P.  Return an error if the value does not fit.
+ */
+static svn_error_t *
+read_off_t_from_proto_index(apr_file_t *proto_index,
+                            apr_off_t *value_p,
+                            svn_boolean_t *eof,
+                            apr_pool_t *scratch_pool)
+{
+  apr_uint64_t value;
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &value, eof,
+                                       scratch_pool));
+  if (value > off_t_max)
+    return svn_error_createf(SVN_ERR_FS_ITEM_INDEX_OVERFLOW, NULL,
+                             _("File offset 0x%" APR_UINT64_T_HEX_FMT
+                               " too large, max = 0x%" APR_UINT64_T_HEX_FMT),
+                             value, off_t_max);
+
+  /* Shortening conversion from unsigned to signed int is well-defined and
+   * not lossy in C because the value can be represented in the target type.
+   */
+  *value_p = (apr_off_t)value;
+
+  return SVN_NO_ERROR;
+}
+
 /*
  * log-to-phys index
  */
 
-/* Write ENTRY to log-to-phys PROTO_INDEX file and verify the results.
+/* Append ENTRY to log-to-phys PROTO_INDEX file.
  * Use SCRATCH_POOL for temporary allocations.
  */
 static svn_error_t *
-write_entry_to_proto_index(apr_file_t *proto_index,
-                           l2p_proto_entry_t entry,
-                           apr_pool_t *scratch_pool)
+write_l2p_entry_to_proto_index(apr_file_t *proto_index,
+                               l2p_proto_entry_t entry,
+                               apr_pool_t *scratch_pool)
 {
-  apr_size_t written = sizeof(entry);
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry.offset,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry.item_index,
+                                      scratch_pool));
 
-  SVN_ERR(svn_io_file_write(proto_index, &entry, &written, scratch_pool));
-  SVN_ERR_ASSERT(written == sizeof(entry));
+  return SVN_NO_ERROR;
+}
+
+/* Read *ENTRY from log-to-phys PROTO_INDEX file and indicate end-of-file
+ * in *EOF, or error out in that case if EOF is NULL.  *ENTRY is in an
+ * undefined state if an end-of-file occurred.
+ * Use SCRATCH_POOL for temporary allocations.
+ */
+static svn_error_t *
+read_l2p_entry_from_proto_index(apr_file_t *proto_index,
+                                l2p_proto_entry_t *entry,
+                                svn_boolean_t *eof,
+                                apr_pool_t *scratch_pool)
+{
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &entry->offset, eof,
+                                       scratch_pool));
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &entry->item_index, eof,
+                                       scratch_pool));
 
   return SVN_NO_ERROR;
 }
@@ -516,8 +675,8 @@ svn_fs_fs__l2p_proto_index_add_revision(
   entry.offset = 0;
   entry.item_index = 0;
 
-  return svn_error_trace(write_entry_to_proto_index(proto_index, entry,
-                                                    scratch_pool));
+  return svn_error_trace(write_l2p_entry_to_proto_index(proto_index, entry,
+                                                        scratch_pool));
 }
 
 svn_error_t *
@@ -539,8 +698,8 @@ svn_fs_fs__l2p_proto_index_add_entry(apr
   SVN_ERR_ASSERT(item_index < UINT_MAX / 2);
   entry.item_index = item_index;
 
-  return svn_error_trace(write_entry_to_proto_index(proto_index, entry,
-                                                    scratch_pool));
+  return svn_error_trace(write_l2p_entry_to_proto_index(proto_index, entry,
+                                                        scratch_pool));
 }
 
 svn_error_t *
@@ -597,13 +756,10 @@ svn_fs_fs__l2p_index_append(svn_fs_t *fs
   for (entry = 0; !eof; ++entry)
     {
       l2p_proto_entry_t proto_entry;
-      apr_size_t read = 0;
 
       /* (attempt to) read the next entry from the source */
-      SVN_ERR(svn_io_file_read_full2(proto_index,
-                                     &proto_entry, sizeof(proto_entry),
-                                     &read, &eof, local_pool));
-      SVN_ERR_ASSERT(eof || read == sizeof(proto_entry));
+      SVN_ERR(read_l2p_entry_from_proto_index(proto_index, &proto_entry,
+                                              &eof, local_pool));
 
       /* handle new revision */
       if ((entry > 0 && proto_entry.offset == 0) || eof)
@@ -1420,12 +1576,10 @@ l2p_proto_index_lookup(apr_off_t *offset
   while (!eof)
     {
       l2p_proto_entry_t entry;
-      apr_size_t read = 0;
 
       /* (attempt to) read the next entry from the source */
-      SVN_ERR(svn_io_file_read_full2(file, &entry, sizeof(entry),
-                                     &read, &eof, scratch_pool));
-      SVN_ERR_ASSERT(eof || read == sizeof(entry));
+      SVN_ERR(read_l2p_entry_from_proto_index(file, &entry, &eof,
+                                              scratch_pool));
 
       /* handle new revision */
       if (!eof && entry.item_index == item_index)
@@ -1603,11 +1757,91 @@ svn_fs_fs__p2l_proto_index_add_entry(apr
                                      svn_fs_fs__p2l_entry_t *entry,
                                      apr_pool_t *scratch_pool)
 {
-  apr_size_t written = sizeof(*entry);
+  apr_uint64_t revision;
 
-  SVN_ERR(svn_io_file_write_full(proto_index, entry, sizeof(*entry),
-                                 &written, scratch_pool));
-  SVN_ERR_ASSERT(written == sizeof(*entry));
+  /* Make sure all signed elements of ENTRY have non-negative values.
+   *
+   * For file offsets and sizes, this is a given as we use them to describe
+   * absolute positions and sizes.  For revisions, SVN_INVALID_REVNUM is
+   * valid, hence we have to shift it by 1.
+   */
+  SVN_ERR_ASSERT(entry->offset >= 0);
+  SVN_ERR_ASSERT(entry->size >= 0);
+  SVN_ERR_ASSERT(   entry->item.revision >= 0
+                 || entry->item.revision == SVN_INVALID_REVNUM);
+
+  revision = entry->item.revision == SVN_INVALID_REVNUM
+           ? 0
+           : ((apr_uint64_t)entry->item.revision + 1);
+
+  /* Now, all values will nicely convert to uint64. */
+  /* Make sure to keep P2L_PROTO_INDEX_ENTRY_SIZE consistent with this: */
+
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry->offset,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry->size,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry->type,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry->fnv1_checksum,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, revision,
+                                      scratch_pool));
+  SVN_ERR(write_uint64_to_proto_index(proto_index, entry->item.number,
+                                      scratch_pool));
+
+  return SVN_NO_ERROR;
+}
+
+/* Read *ENTRY from log-to-phys PROTO_INDEX file and indicate end-of-file
+ * in *EOF, or error out in that case if EOF is NULL.  *ENTRY is in an
+ * undefined state if an end-of-file occurred.
+ * Use SCRATCH_POOL for temporary allocations.
+ */
+static svn_error_t *
+read_p2l_entry_from_proto_index(apr_file_t *proto_index,
+                                svn_fs_fs__p2l_entry_t *entry,
+                                svn_boolean_t *eof,
+                                apr_pool_t *scratch_pool)
+{
+  apr_uint64_t revision;
+
+  SVN_ERR(read_off_t_from_proto_index(proto_index, &entry->offset,
+                                      eof, scratch_pool));
+  SVN_ERR(read_off_t_from_proto_index(proto_index, &entry->size,
+                                      eof, scratch_pool));
+  SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->type,
+                                       eof, scratch_pool));
+  SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->fnv1_checksum,
+                                       eof, scratch_pool));
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &revision,
+                                       eof, scratch_pool));
+  SVN_ERR(read_uint64_from_proto_index(proto_index, &entry->item.number,
+                                       eof, scratch_pool));
+
+  /* Do the inverse REVSION number conversion (see
+   * svn_fs_fs__p2l_proto_index_add_entry), if we actually read a complete
+   * record.
+   */
+  if (!eof || !*eof)
+    {
+      /* Be careful with the arithmetics here (overflows and wrap-around): */
+      if (revision > 0 && revision - 1 > LONG_MAX)
+        return svn_error_createf(SVN_ERR_FS_ITEM_INDEX_OVERFLOW, NULL,
+                                _("Revision 0x%" APR_UINT64_T_HEX_FMT
+                                " too large, max = 0x%" APR_UINT64_T_HEX_FMT),
+                                revision, (apr_uint64_t)LONG_MAX);
+
+      /* Shortening conversion from unsigned to signed int is well-defined
+       * and not lossy in C because the value can be represented in the
+       * target type.  Also, cast to 'long' instead of 'svn_revnum_t' here
+       * to provoke a compiler warning if those types should differ and we
+       * would need to change the overflow checking logic.
+       */
+      entry->item.revision = revision == 0
+                           ? SVN_INVALID_REVNUM
+                           : (long)(revision - 1);
+    }
 
   return SVN_NO_ERROR;
 }
@@ -1629,11 +1863,11 @@ svn_fs_fs__p2l_proto_index_next_offset(a
     {
       /* At least one entry.  Read last entry. */
       svn_fs_fs__p2l_entry_t entry;
-      offset -= sizeof(entry);
+      offset -= P2L_PROTO_INDEX_ENTRY_SIZE;
 
       SVN_ERR(svn_io_file_seek(proto_index, APR_SET, &offset, scratch_pool));
-      SVN_ERR(svn_io_file_read_full2(proto_index, &entry, sizeof(entry),
-                                    NULL, NULL, scratch_pool));
+      SVN_ERR(read_p2l_entry_from_proto_index(proto_index, &entry,
+                                              NULL, scratch_pool));
 
       /* Return next offset. */
       *next_offset = entry.offset + entry.size;
@@ -1686,7 +1920,6 @@ svn_fs_fs__p2l_index_append(svn_fs_t *fs
   while (!eof)
     {
       svn_fs_fs__p2l_entry_t entry;
-      apr_size_t read = 0;
       apr_uint64_t entry_end;
       svn_boolean_t new_page = svn_spillbuf__get_size(buffer) == 0;
       apr_uint64_t compound;
@@ -1695,9 +1928,8 @@ svn_fs_fs__p2l_index_append(svn_fs_t *fs
       svn_pool_clear(iterpool);
 
       /* (attempt to) read the next entry from the source */
-      SVN_ERR(svn_io_file_read_full2(proto_index, &entry, sizeof(entry),
-                                     &read, &eof, iterpool));
-      SVN_ERR_ASSERT(eof || read == sizeof(entry));
+      SVN_ERR(read_p2l_entry_from_proto_index(proto_index, &entry,
+                                              &eof, iterpool));
 
       /* "unused" (and usually non-existent) section to cover the offsets
          at the end the of the last page. */

Modified: subversion/trunk/subversion/libsvn_fs_fs/structure-indexes
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/structure-indexes?rev=1628093&r1=1628092&r2=1628093&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/structure-indexes (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/structure-indexes Sun Sep 28 17:56:01 2014
@@ -80,6 +80,14 @@ Most data is unsigned by nature but will
 signed integers.
 
 
+Encoding in proto-index files
+-----------------------------
+
+These have a much simpler encoding.  Throughout the files, all records have
+the same length (but different between L2P and P2L).  All records contain
+unsigned 64 bit integers only, stored in little endian byte order.
+
+
 Log-to-phys index
 =================
 
@@ -190,7 +198,7 @@ at the beginning of the file is optional
   ...
   <eof>         /* end of file. */
 
-All entries are pairs of 64 bit unsigned integers in machine endianess.
+All entries are pairs of 64 bit unsigned integers in little endian order.
 
 
 Phys-to-log index
@@ -293,7 +301,18 @@ Proto index file format
 -----------------------
 
 The index will be created from a proto index file containing simple
-instances of the in-memory representation of svn_fs_fs__p2l_entry_t.
+instances of svn_fs_fs__p2l_entry_t with the following element order:
+
+  item offset               as uint64
+  item size                 as uint64
+  item type                 as uint64
+  modified FNV1a checksum   as uint64
+  revision                  as uint64, with SVN_INVALID_REVNUM mapped to 0
+                                       and revisions >= 0 stored as rev+1
+  item index                as uint64
+
+All values are stored in little endian order.
+
 Page table and header information, except start revision and page size,
 can easily be derived from that information.
 



Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Branko Čibej <br...@wandisco.com>.
On 03.11.2014 16:16, Stefan Fuhrmann wrote:
> On Mon, Nov 3, 2014 at 3:41 PM, Branko Čibej <brane@wandisco.com
> <ma...@wandisco.com>> wrote:
>
>     On 28.09.2014 19:56, stefan2@apache.org
>     <ma...@apache.org> wrote:
>     > Author: stefan2
>     > Date: Sun Sep 28 17:56:01 2014
>     > New Revision: 1628093
>     >
>     > URL: http://svn.apache.org/r1628093
>     > Log:
>     > Support FSFS format 7 commits in load balanced
>     mixed-architecture clusters.
>
>     > +/* Write VALUE to the PROTO_INDEX file, using SCRATCH_POOL for
>     temporary
>     > + * allocations.
>     > + *
>     > + * The point of this function is to ensure an
>     architecture-independent
>     > + * proto-index file format.  All data is written as unsigned 64
>     bits ints
>     > + * in little endian byte order.  64 bits is the largest
>     portable integer
>     > + * we have and unsigned values have well-defined conversions in C.
>     > + */
>     > +static svn_error_t *
>     > +write_uint64_to_proto_index(apr_file_t *proto_index,
>     > +                            apr_uint64_t value,
>     > +                            apr_pool_t *scratch_pool)
>     > +{
>     > +  apr_byte_t buffer[sizeof(value)];
>     > +  int i;
>     > +  apr_size_t written;
>     > +
>     > +  /* Split VALUE into 8 bytes using LE ordering. */
>     > +  for (i = 0; i < sizeof(buffer); ++i)
>     > +    {
>     > +      /* Unsigned conversions are well-defined ... */
>     > +      buffer[i] = (apr_byte_t)value;
>     > +      value >>= CHAR_BIT;
>
>     Does APR guarantee that sizeof(apr_byte_t) == CHAR_BIT?
>
>
> No. It guarantees that sizeof(apr_byte_t) == 1
> (you probably meant to say that).

Meh, yes I did ...

> APR.H on all
> platform defines it as 'unsigned char' - as opposed
> to apr_uint16_t and friends, which could be anything.

Ack, looks OK then.

-- Brane


Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Mon, Nov 3, 2014 at 3:41 PM, Branko Čibej <br...@wandisco.com> wrote:

> On 28.09.2014 19:56, stefan2@apache.org wrote:
> > Author: stefan2
> > Date: Sun Sep 28 17:56:01 2014
> > New Revision: 1628093
> >
> > URL: http://svn.apache.org/r1628093
> > Log:
> > Support FSFS format 7 commits in load balanced mixed-architecture
> clusters.
>
> > +/* Write VALUE to the PROTO_INDEX file, using SCRATCH_POOL for temporary
> > + * allocations.
> > + *
> > + * The point of this function is to ensure an architecture-independent
> > + * proto-index file format.  All data is written as unsigned 64 bits
> ints
> > + * in little endian byte order.  64 bits is the largest portable integer
> > + * we have and unsigned values have well-defined conversions in C.
> > + */
> > +static svn_error_t *
> > +write_uint64_to_proto_index(apr_file_t *proto_index,
> > +                            apr_uint64_t value,
> > +                            apr_pool_t *scratch_pool)
> > +{
> > +  apr_byte_t buffer[sizeof(value)];
> > +  int i;
> > +  apr_size_t written;
> > +
> > +  /* Split VALUE into 8 bytes using LE ordering. */
> > +  for (i = 0; i < sizeof(buffer); ++i)
> > +    {
> > +      /* Unsigned conversions are well-defined ... */
> > +      buffer[i] = (apr_byte_t)value;
> > +      value >>= CHAR_BIT;
>
> Does APR guarantee that sizeof(apr_byte_t) == CHAR_BIT?
>

No. It guarantees that sizeof(apr_byte_t) == 1
(you probably meant to say that). APR.H on all
platform defines it as 'unsigned char' - as opposed
to apr_uint16_t and friends, which could be anything.

-- Stefan^2.

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Branko Čibej <br...@wandisco.com>.
On 28.09.2014 19:56, stefan2@apache.org wrote:
> Author: stefan2
> Date: Sun Sep 28 17:56:01 2014
> New Revision: 1628093
>
> URL: http://svn.apache.org/r1628093
> Log:
> Support FSFS format 7 commits in load balanced mixed-architecture clusters.

> +/* Write VALUE to the PROTO_INDEX file, using SCRATCH_POOL for temporary
> + * allocations.
> + *
> + * The point of this function is to ensure an architecture-independent
> + * proto-index file format.  All data is written as unsigned 64 bits ints
> + * in little endian byte order.  64 bits is the largest portable integer
> + * we have and unsigned values have well-defined conversions in C.
> + */
> +static svn_error_t *
> +write_uint64_to_proto_index(apr_file_t *proto_index,
> +                            apr_uint64_t value,
> +                            apr_pool_t *scratch_pool)
> +{
> +  apr_byte_t buffer[sizeof(value)];
> +  int i;
> +  apr_size_t written;
> +
> +  /* Split VALUE into 8 bytes using LE ordering. */
> +  for (i = 0; i < sizeof(buffer); ++i)
> +    {
> +      /* Unsigned conversions are well-defined ... */
> +      buffer[i] = (apr_byte_t)value;
> +      value >>= CHAR_BIT;

Does APR guarantee that sizeof(apr_byte_t) == CHAR_BIT?

-- Brane

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Thu, Oct 16, 2014 at 4:25 PM, Stefan Sperling <st...@elego.de> wrote:

> On Tue, Sep 30, 2014 at 11:44:04AM +0200, Stefan Fuhrmann wrote:
> > On Tue, Sep 30, 2014 at 11:15 AM, Stefan Sperling <st...@apache.org>
> wrote:
> > > How do you open a transaction and postpone the commit?
> > > Using some custom code written against the FS API?
> > >
> >
> > It would require some custom code like "create greek tree,
> > create txn, modify a few nodes" on one side and "open the
> > only available txn, commit txn" on the other side.
>
> Do you have some example or starting point for that somewhere?
>

I attached a test that should do the right thing.
Execute the first two steps on separate architectures
and the 3rd one on an arch of your choice.


> > > Or can some tool such as svnmucc already do this?
> > >
> >
> > svnadmin can only list and remove txns. svnmucc
> >
>
> svnmucc what? :)
>

Hm ... don't quite remember. Probably something
along the line of "starts, builds up and commits
a txn in a single call". Basically, hard to use for
the purpose at hand.


> > > I presume you rely on apr_off_t, not off_t, right?
> >
> >
> > Yes, I always use apr_off_t. On my system, APR typedefs
> > it as off_t.
>
> off_t is always 64bit on OpenBSD, so I could only test little/big endian
> variance. Unless perhaps if I patched APR to use a 32bit type for off_t.
>

It would certainly be interesting to a) find systems
in the wild that still uses 32 bit off_t and b) for us to
check whether we still work on them.


> Would it be possible to test this in our regression test suite somehow?
>

That's probably hard unless we find a setup with
genuine 32 bit off_t because APR uses it directly
with lseek() and friends. So, we can't just redefine it.
Even the ILP32 bb-openbsd buildbot has 64 off_t.

-- Stefan^2.

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Sep 30, 2014 at 11:44:04AM +0200, Stefan Fuhrmann wrote:
> On Tue, Sep 30, 2014 at 11:15 AM, Stefan Sperling <st...@apache.org> wrote:
> > How do you open a transaction and postpone the commit?
> > Using some custom code written against the FS API?
> >
> 
> It would require some custom code like "create greek tree,
> create txn, modify a few nodes" on one side and "open the
> only available txn, commit txn" on the other side.

Do you have some example or starting point for that somewhere?

> > Or can some tool such as svnmucc already do this?
> >
> 
> svnadmin can only list and remove txns. svnmucc
> 

svnmucc what? :)

> > I presume you rely on apr_off_t, not off_t, right?
> 
> 
> Yes, I always use apr_off_t. On my system, APR typedefs
> it as off_t.

off_t is always 64bit on OpenBSD, so I could only test little/big endian
variance. Unless perhaps if I patched APR to use a 32bit type for off_t.

Would it be possible to test this in our regression test suite somehow?

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Tue, Sep 30, 2014 at 11:15 AM, Stefan Sperling <st...@apache.org> wrote:

> On Mon, Sep 29, 2014 at 07:18:05PM +0200, Stefan Fuhrmann wrote:
> > On Sun, Sep 28, 2014 at 11:17 PM, Stefan Sperling <st...@apache.org>
> wrote:
> >
> > > On Sun, Sep 28, 2014 at 05:56:01PM -0000, stefan2@apache.org wrote:
> > > > Author: stefan2
> > > > Date: Sun Sep 28 17:56:01 2014
> > > > New Revision: 1628093
> > > >
> > > > URL: http://svn.apache.org/r1628093
> > > > Log:
> > > > Support FSFS format 7 commits in load balanced mixed-architecture
> > > clusters.
> > > >
> > > > At least theoretically, machines with different endianess or off_t
> sizes
> > > > might access the same repository on e.g. an iSCSI SAN.  Then the
> machine
> > > > performing a commit may have a different architecture from the one
> > > building
> > > > up the transaction.  To allow that, even the intermediate (proto-)
> index
> > > > format within those transactions must be platform-independent.
> > >
> > > Hi Stefan,
> > >
> > > If you describe in some more detail how to set it up I can test this
> > > scenario for you on OpenBSD with sparc64 (64bit big-endian), amd64
> > > (64bit little endian), macppc (32bit big-endian) and i386 (32bit little
> > > endian) machines.
> > >
> >
> > Thanks for the offer. So, this is the issue that I addressed
> > with the above patch:
> >
> > At the FS API level, transactions can be opened (and implicitly
> > closed again) many times before being committed. If these
> > operations are performed from different machines accessing
> > the same repository, they might not agree upon C struct size,
> > layout or interpretation. So, for pure API consistency alone,
> > all data within a txn needs to be portable / architecture independent.
> >
> > I may be wrong but I vaguely remember that the HTTP client
> > sends the "commit" operation over a separate connection that
> > the one it used to build up the txn. If there is a load balancer
> > in front of the actual servers, that commit may get served by
> > a different machine.
> >
> > Testing the general scenario is simpler, though. Have one machine
> > build up a transaction which touches a few of nodes (such that
> > different record sizes would result in missing / misaligned data).
> > Then e.g. copy the repo to a different machine and let that one
> > do the commit. The result must then pass 'svnadmin verify'.
> >
> > The critical combinations are little / big endian and 32 bit vs.
> > 64 bit file offsets (not the same thing as 32 / 64 bit in general).
> > So, 64 bit big endian vs. 32 bit little endian with 32 bit off_t
> > would probably cover it. Tests with both machines in both
> > roles (txn creation and txn commit).
> >
> > -- Stefan^2.
>
> How do you open a transaction and postpone the commit?
> Using some custom code written against the FS API?
>

It would require some custom code like "create greek tree,
create txn, modify a few nodes" on one side and "open the
only available txn, commit txn" on the other side.


> Or can some tool such as svnmucc already do this?
>

svnadmin can only list and remove txns. svnmucc


> I presume you rely on apr_off_t, not off_t, right?


Yes, I always use apr_off_t. On my system, APR typedefs
it as off_t.


> I.e. it's
> enough to recompile APR and SVN with or without large file
> support to switch between 32bit and 64bit apr_off_t?
>

It will be enough if it changes sizeof(apr_off_t).

-- Stefan^2.

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Branko Čibej <br...@wandisco.com>.
On 30.09.2014 11:15, Stefan Sperling wrote:
> On Mon, Sep 29, 2014 at 07:18:05PM +0200, Stefan Fuhrmann wrote:
>> On Sun, Sep 28, 2014 at 11:17 PM, Stefan Sperling <st...@apache.org> wrote:
>>
>>> On Sun, Sep 28, 2014 at 05:56:01PM -0000, stefan2@apache.org wrote:
>>>> Author: stefan2
>>>> Date: Sun Sep 28 17:56:01 2014
>>>> New Revision: 1628093
>>>>
>>>> URL: http://svn.apache.org/r1628093
>>>> Log:
>>>> Support FSFS format 7 commits in load balanced mixed-architecture
>>> clusters.
>>>> At least theoretically, machines with different endianess or off_t sizes
>>>> might access the same repository on e.g. an iSCSI SAN.  Then the machine
>>>> performing a commit may have a different architecture from the one
>>> building
>>>> up the transaction.  To allow that, even the intermediate (proto-) index
>>>> format within those transactions must be platform-independent.
>>> Hi Stefan,
>>>
>>> If you describe in some more detail how to set it up I can test this
>>> scenario for you on OpenBSD with sparc64 (64bit big-endian), amd64
>>> (64bit little endian), macppc (32bit big-endian) and i386 (32bit little
>>> endian) machines.
>>>
>> Thanks for the offer. So, this is the issue that I addressed
>> with the above patch:
>>
>> At the FS API level, transactions can be opened (and implicitly
>> closed again) many times before being committed. If these
>> operations are performed from different machines accessing
>> the same repository, they might not agree upon C struct size,
>> layout or interpretation. So, for pure API consistency alone,
>> all data within a txn needs to be portable / architecture independent.
>>
>> I may be wrong but I vaguely remember that the HTTP client
>> sends the "commit" operation over a separate connection that
>> the one it used to build up the txn. If there is a load balancer
>> in front of the actual servers, that commit may get served by
>> a different machine.
>>
>> Testing the general scenario is simpler, though. Have one machine
>> build up a transaction which touches a few of nodes (such that
>> different record sizes would result in missing / misaligned data).
>> Then e.g. copy the repo to a different machine and let that one
>> do the commit. The result must then pass 'svnadmin verify'.
>>
>> The critical combinations are little / big endian and 32 bit vs.
>> 64 bit file offsets (not the same thing as 32 / 64 bit in general).
>> So, 64 bit big endian vs. 32 bit little endian with 32 bit off_t
>> would probably cover it. Tests with both machines in both
>> roles (txn creation and txn commit).
>>
>> -- Stefan^2.
> How do you open a transaction and postpone the commit?
> Using some custom code written against the FS API?
> Or can some tool such as svnmucc already do this?

I think you have to write a tool that works agains the repos API. The
Python bindings should be good enough for that.

> I presume you rely on apr_off_t, not off_t, right? I.e. it's
> enough to recompile APR and SVN with or without large file
> support to switch between 32bit and 64bit apr_off_t?

AFAIR changing largefile support does not change the APR ABI since 1.0.

-- Brane

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Sperling <st...@apache.org>.
On Mon, Sep 29, 2014 at 07:18:05PM +0200, Stefan Fuhrmann wrote:
> On Sun, Sep 28, 2014 at 11:17 PM, Stefan Sperling <st...@apache.org> wrote:
> 
> > On Sun, Sep 28, 2014 at 05:56:01PM -0000, stefan2@apache.org wrote:
> > > Author: stefan2
> > > Date: Sun Sep 28 17:56:01 2014
> > > New Revision: 1628093
> > >
> > > URL: http://svn.apache.org/r1628093
> > > Log:
> > > Support FSFS format 7 commits in load balanced mixed-architecture
> > clusters.
> > >
> > > At least theoretically, machines with different endianess or off_t sizes
> > > might access the same repository on e.g. an iSCSI SAN.  Then the machine
> > > performing a commit may have a different architecture from the one
> > building
> > > up the transaction.  To allow that, even the intermediate (proto-) index
> > > format within those transactions must be platform-independent.
> >
> > Hi Stefan,
> >
> > If you describe in some more detail how to set it up I can test this
> > scenario for you on OpenBSD with sparc64 (64bit big-endian), amd64
> > (64bit little endian), macppc (32bit big-endian) and i386 (32bit little
> > endian) machines.
> >
> 
> Thanks for the offer. So, this is the issue that I addressed
> with the above patch:
> 
> At the FS API level, transactions can be opened (and implicitly
> closed again) many times before being committed. If these
> operations are performed from different machines accessing
> the same repository, they might not agree upon C struct size,
> layout or interpretation. So, for pure API consistency alone,
> all data within a txn needs to be portable / architecture independent.
> 
> I may be wrong but I vaguely remember that the HTTP client
> sends the "commit" operation over a separate connection that
> the one it used to build up the txn. If there is a load balancer
> in front of the actual servers, that commit may get served by
> a different machine.
> 
> Testing the general scenario is simpler, though. Have one machine
> build up a transaction which touches a few of nodes (such that
> different record sizes would result in missing / misaligned data).
> Then e.g. copy the repo to a different machine and let that one
> do the commit. The result must then pass 'svnadmin verify'.
> 
> The critical combinations are little / big endian and 32 bit vs.
> 64 bit file offsets (not the same thing as 32 / 64 bit in general).
> So, 64 bit big endian vs. 32 bit little endian with 32 bit off_t
> would probably cover it. Tests with both machines in both
> roles (txn creation and txn commit).
> 
> -- Stefan^2.

How do you open a transaction and postpone the commit?
Using some custom code written against the FS API?
Or can some tool such as svnmucc already do this?

I presume you rely on apr_off_t, not off_t, right? I.e. it's
enough to recompile APR and SVN with or without large file
support to switch between 32bit and 64bit apr_off_t?

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Fuhrmann <st...@wandisco.com>.
On Sun, Sep 28, 2014 at 11:17 PM, Stefan Sperling <st...@apache.org> wrote:

> On Sun, Sep 28, 2014 at 05:56:01PM -0000, stefan2@apache.org wrote:
> > Author: stefan2
> > Date: Sun Sep 28 17:56:01 2014
> > New Revision: 1628093
> >
> > URL: http://svn.apache.org/r1628093
> > Log:
> > Support FSFS format 7 commits in load balanced mixed-architecture
> clusters.
> >
> > At least theoretically, machines with different endianess or off_t sizes
> > might access the same repository on e.g. an iSCSI SAN.  Then the machine
> > performing a commit may have a different architecture from the one
> building
> > up the transaction.  To allow that, even the intermediate (proto-) index
> > format within those transactions must be platform-independent.
>
> Hi Stefan,
>
> If you describe in some more detail how to set it up I can test this
> scenario for you on OpenBSD with sparc64 (64bit big-endian), amd64
> (64bit little endian), macppc (32bit big-endian) and i386 (32bit little
> endian) machines.
>

Thanks for the offer. So, this is the issue that I addressed
with the above patch:

At the FS API level, transactions can be opened (and implicitly
closed again) many times before being committed. If these
operations are performed from different machines accessing
the same repository, they might not agree upon C struct size,
layout or interpretation. So, for pure API consistency alone,
all data within a txn needs to be portable / architecture independent.

I may be wrong but I vaguely remember that the HTTP client
sends the "commit" operation over a separate connection that
the one it used to build up the txn. If there is a load balancer
in front of the actual servers, that commit may get served by
a different machine.

Testing the general scenario is simpler, though. Have one machine
build up a transaction which touches a few of nodes (such that
different record sizes would result in missing / misaligned data).
Then e.g. copy the repo to a different machine and let that one
do the commit. The result must then pass 'svnadmin verify'.

The critical combinations are little / big endian and 32 bit vs.
64 bit file offsets (not the same thing as 32 / 64 bit in general).
So, 64 bit big endian vs. 32 bit little endian with 32 bit off_t
would probably cover it. Tests with both machines in both
roles (txn creation and txn commit).

-- Stefan^2.

Re: svn commit: r1628093 - in /subversion/trunk/subversion/libsvn_fs_fs: index.c structure-indexes

Posted by Stefan Sperling <st...@apache.org>.
On Sun, Sep 28, 2014 at 05:56:01PM -0000, stefan2@apache.org wrote:
> Author: stefan2
> Date: Sun Sep 28 17:56:01 2014
> New Revision: 1628093
> 
> URL: http://svn.apache.org/r1628093
> Log:
> Support FSFS format 7 commits in load balanced mixed-architecture clusters.
> 
> At least theoretically, machines with different endianess or off_t sizes
> might access the same repository on e.g. an iSCSI SAN.  Then the machine
> performing a commit may have a different architecture from the one building
> up the transaction.  To allow that, even the intermediate (proto-) index
> format within those transactions must be platform-independent.

Hi Stefan,

If you describe in some more detail how to set it up I can test this
scenario for you on OpenBSD with sparc64 (64bit big-endian), amd64
(64bit little endian), macppc (32bit big-endian) and i386 (32bit little
endian) machines.