You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2019/04/03 22:13:39 UTC

[GitHub] [couchdb-documentation] rnewson commented on a change in pull request #403: RFC for document storage

rnewson commented on a change in pull request #403: RFC for document storage
URL: https://github.com/apache/couchdb-documentation/pull/403#discussion_r271953490

##########
File path: rfcs/004-document-storage.md
##########
@@ -0,0 +1,246 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'JSON document storage in FoundationDB'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model for storing JSON documents as key-value
+pairs in FoundationDB. It includes a discussion of storing multiple versions of
+the document, each identified by unique revision identifiers, and discusses some
+of the operations needed to query and modify these documents.
+
+## Abstract
+
+The data model maps each "leaf" JSON value (number, string, true, false, and
+null) to a single KV in FoundationDB. Nested relationships are modeled using a
+tuple structure in the keys. Different versions of a document are stored
+completely independently from one another. Values are encoded using
+FoundationDB's tuple encoding.
+
+The use of a single KV pair for each leaf value implies a new 100KB limit on
+those values stored in CouchDB documents. An alternative design could split
+these large (string) values across multiple KV pairs.
+
+Extremely deeply-nested data structures and the use of long names in the nesting
+objects could cause a path to a leaf value to exceed FoundationDB's 10KB limit
+on key sizes. String interning could reduce the likelihood of this occurring but
+not eliminate it entirely. Interning could also provide some significant space
+savings in the current FoundationDB storage engine, although the introduction of
+key prefix elision in the Redwood engine should also help on that front.
+
+FoundationDB imposes a hard 10MB limit on transactions. In order to reserve
+space for additional metadata, user-defined indexes, and generally drive users
+towards best practices in data modeling this RFC proposes a **1 MiB** limit on
+document sizes going forward.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+---
+
+# Detailed Description
+
+## Value Encoding
+
+The `true` (`\x27`), `false` (`\x26`) and `null` (`\x00`) values each have a
+single-byte encoding in FoundationDB's tuple layer. Integers are represented
+with arbitrary precision (technically, up to 255 bytes can be used).
+Floating-point numbers use an IEEE binary representation up to double precision.
+More details on these specific byte codes are available in the [FoundationDB
+documentation](https://github.com/apple/foundationdb/blob/6.0.18/design/tuple.md).
+
+Unicode strings must be encoded into UTF-8. They are prefixed with a `\x02`
+bytecode and are null-terminated. Any nulls within the string must be replaced
+by `\x00\xff`. Raw byte strings have their own `\x01` prefix and must follow the
+same rules regarding null bytes in the string. Both are limited to 100KB.
+
+An object is decomposed into multiple key-value pairs, where each key is a tuple
+identifying the path to a final leaf value. For example, the object
+
+```
+{
+ "foo": {
+ "bar": {
+ "baz": 123
+ }
+ }
+}
+```
+
+would be represented by a key-value pair of
+
+```
+pack({"foo", "bar", "baz"}) = pack(123)

Review comment:
at least in @davisp's work on erlfdb so far, packing only happens for tuples. so should this be `pack({123})` or should erlfdb allow packing of primitives? I think Paul is reflecting the API's elsewhere so our choice may be forced here as we'd like our data to be readable with those API's.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services