You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/04/09 09:06:27 UTC

[GitHub] [couchdb-documentation] garrensmith commented on a change in pull request #403: RFC for document storage

garrensmith commented on a change in pull request #403: RFC for document storage
URL: https://github.com/apache/couchdb-documentation/pull/403#discussion_r273393877
 
 

 ##########
 File path: rfcs/004-document-storage.md
 ##########
 @@ -0,0 +1,251 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'JSON document storage in FoundationDB'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model for storing JSON documents as key-value
+pairs in FoundationDB. It includes a discussion of storing multiple versions of
+the document, each identified by unique revision identifiers, and discusses some
+of the operations needed to query and modify these documents.
+
+## Abstract
+
+The data model maps each "leaf" JSON value (number, string, true, false, and
+null) to a single KV in FoundationDB. Nested relationships are modeled using a
+tuple structure in the keys. Different versions of a document are stored
+completely independently from one another. Values are encoded using
+FoundationDB's tuple encoding.
+
+The use of a single KV pair for each leaf value implies a new 100KB limit on
+those values stored in CouchDB documents. An alternative design could split
+these large (string) values across multiple KV pairs.
+
+Extremely deeply-nested data structures and the use of long names in the nesting
+objects could cause a path to a leaf value to exceed FoundationDB's 10KB limit
+on key sizes. String interning could reduce the likelihood of this occurring but
+not eliminate it entirely. Interning could also provide some significant space
+savings in the current FoundationDB storage engine, although the introduction of
+key prefix elision in the Redwood engine should also help on that front.
+
+FoundationDB imposes a hard 10MB limit on transactions. In order to reserve
+space for additional metadata, user-defined indexes, and generally drive users
+towards best practices in data modeling this RFC proposes a **1MB (1,000,000
+byte)** limit on document sizes going forward.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+---
+
+# Detailed Description
+
+## Value Encoding
+
+The `true` (`\x27`), `false` (`\x26`) and `null` (`\x00`) values each have a
+single-byte encoding in FoundationDB's tuple layer. Integers are represented
+with arbitrary precision (technically, up to 255 bytes can be used).
+Floating-point numbers use an IEEE binary representation up to double precision.
+More details on these specific byte codes are available in the [FoundationDB
+documentation](https://github.com/apple/foundationdb/blob/6.0.18/design/tuple.md).
+
+Unicode strings must be encoded into UTF-8. They are prefixed with a `\x02`
+bytecode and are null-terminated. Any nulls within the string must be replaced
+by `\x00\xff`. Raw byte strings have their own `\x01` prefix and must follow the
+same rules regarding null bytes in the string. Both are limited to 100KB.
+
+An object is decomposed into multiple key-value pairs, where each key is a tuple
+identifying the path to a final leaf value. For example, the object
+
+```
+{
+    "foo": {
+        "bar": {
+            "baz": 123
+        }
+    }
+}
+```
+
+would be represented by a key-value pair of
+
+```
+pack({"foo", "bar", "baz"}) = pack({123})
+```
+
+Clients SHOULD NOT submit objects containing duplicate keys, as CouchDB will
+only preserve  the last occurence of the key and will silently drop the other
+occurrences. Similarly, clients MUST NOT rely on the ordering of keys within an
+Object as this ordering will generally not be preserved by the database.
+
+An array of N elements is represented by N distinct key-value pairs, where the
+last element of the tuple key is an integer representing the zero-indexed
+position of the value within the array. As an example:
+
+```
+{
+    "states": ["MA", "OH", "TX", "NM", "PA"]
+}
+```
+
+becomes
+
+```
+pack({"states", 0}) = "MA"
+pack({"states", 1}) = "OH"
 
 Review comment:
   Is there a specific reason why we don't pack the values for the array?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services