You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2018/12/24 09:19:54 UTC

[GitHub] jorgebay commented on a change in pull request #1000: TINKERPOP-1942 New Binary Serialization Format

jorgebay commented on a change in pull request #1000: TINKERPOP-1942 New Binary Serialization Format
URL: https://github.com/apache/tinkerpop/pull/1000#discussion_r243803794
 
 

 ##########
 File path: docs/src/dev/io/graphbinary.asciidoc
 ##########
 @@ -0,0 +1,771 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+////
+
+[[graphbinary]]
+= GraphBinary
+
+GraphBinary is a binary serialization format suitable for object trees, designed to reduce serialization
+overhead on both the client and the server, as well as limiting the size of the payload that is transmitted over the
+wire.
+
+It describes arbitrary object graphs with a fully-qualified format:
+
+[source]
+----
+{type_code}{type_info}{value_flag}{value}
+----
+
+Where:
+
+* `{type_code}` is a single byte representing the type number.
+* `{type_info}` is an optional sequence of bytes providing additional information of the type represented. This is
+specially useful for representing complex and custom types.
+* `{value_flag}` is a single byte providing information about the value. Flags have the following meaning:
+** `0x01` The value is `null`. When this flag is set, no bytes for `{value}` will be provided.
+* `{value}` is a sequence of bytes which content is determined by the type.
+
+All encodings are big-endian.
+
+Quick examples, using hexadecimal notation to represent each byte:
+
+- `01 00 00 00 00 01`: a 32-bit integer number, that represents the decimal number 1. It’s composed by the
+type_code `0x01`, and empty flag value `0x00` and four bytes to describe the value.
+- `01 00 00 00 00 ff`: a 32-bit integer, representing the number 256.
+- `01 01`: a null value for a 32-bit integer. It’s composed by the type_code `0x01`, and a null flag value `0x01`.
+- `02 00 00 00 00 00 00 00 00 01`: a 64-bit integer number 1. It’s composed by the type_code `0x02`, empty flags and
+eight bytes to describe the value.
+
+== Version 1.0
+
+=== Forward Compatibility
+
+The serialization format supports new types being added without the need to introduce a new version.
+
+Changes to existing types require new revision.
+
+=== Data Type Codes
+
+==== Core Data Types
+
+- `0x01`: Int
+- `0x02`: Long
+- `0x03`: String
+- `0x04`: Date
+- `0x05`: Timestamp
+- `0x06`: Class
+- `0x07`: Double
+- `0x08`: Float
+- `0x09`: List
+- `0x0a`: Map
+- `0x0b`: Set
+- `0x0c`: UUID
+- `0x0d`: Edge
+- `0x0e`: Path
+- `0x0f`: Property
+- `0x10`: TinkerGraph
+- `0x11`: Vertex
+- `0x12`: VertexProperty
+- `0x13`: Barrier
+- `0x14`: Binding
+- `0x15`: Bytecode
+- `0x16`: Cardinality
+- `0x17`: Column
+- `0x18`: Direction
+- `0x19`: Operator
+- `0x1a`: Order
+- `0x1b`: Pick
+- `0x1c`: Pop
+- `0x1d`: Lambda
+- `0x1e`: P
+- `0x1f`: Scope
+- `0x20`: T
+- `0x21`: Traverser
+- `0x22`: BigDecimal
+- `0x23`: BigInteger
+- `0x24`: Byte
+- `0x25`: ByteBuffer
+- `0x26`: Short
+- `0x27`: Boolean
+- `0x28`: TextP
+- `0x29`: TraversalStrategy
+- `0x2a`: BulkSet
+- `0x2b`: Tree
+- `0x2c`: Metrics
+- `0xfe`: Unspecified null object
+- `0x00`: Custom
+
+==== Extended Types
+
+- `0x80`: Char
+- `0x81`: Duration
+- `0x82`: InetAddress
+- `0x83`: Instant
+- `0x84`: LocalDate
+- `0x85`: LocalDateTime
+- `0x86`: LocalTime
+- `0x87`: MonthDay
+- `0x88`: OffsetDateTime
+- `0x89`: OffsetTime
+- `0x8a`: Period
+- `0x8b`: Year
+- `0x8c`: YearMonth
+- `0x8d`: ZonedDateTime
+- `0x8e`: ZoneOffset
+
+=== Null handling
+
+The serialization format defines two ways to represent null values:
+
+- Unspecified null object
+- Fully-qualified null
+
+When a parent type can contain any subtype e.g., a object collection, a `null` value must be represented using the
+"Unspecified Null Object" type code and the null value flag.
+
+In contrast, when the parent type contains a type parameter that must be specified, a `null` value is represented using
+a fully-qualified object using the appropriate type code and type information.
+
+=== Data Type Formats
+
+==== Int
+
+Format: 4-byte two's complement integer.
+
+Example values:
+
+- `00 00 00 01`: 32-bit integer number 1.
+- `00 00 01 01`: 32-bit integer number 256.
+- `ff ff ff ff`: 32-bit integer number -1.
+- `ff ff ff fe`: 32-bit integer number -2.
+
+==== Long
+
+Format: 4-byte two's complement integer.
+
+Example values
+
+- `00 00 00 00 00 00 00 01`: 64-bit integer number 1.
+- `ff ff ff ff ff ff ff fe`: 64-bit integer number -2.
+
+==== String
+
+Format: `{length}{text_value}`
+
+Where:
+
+- `{length}` is an `Int` describing the byte length of the text. Length is a positive number or zero to represent
+the empty string.
+- `{text_value}` is a sequence of bytes representing the string value in UTF8 encoding.
+
+Example values
+
+- `00 00 00 03 61 62 63`: the string 'abc'.
+- `00 00 00 04 61 62 63 64`: the string 'abcd'.
+- `00 00 00 00`: the empty string ''.
+
+==== Date
+
+Format: An 8-byte two's complement signed integer representing a millisecond-precision offset from the unix epoch.
+
+Example values
+
+- `00 00 00 00 00 00 00 00`: The moment in time 1970-01-01T00:00:00.000Z.
+- `ff ff ff ff ff ff ff ff`: The moment in time 1969-12-31T23:59:59.999Z.
+
+==== Timestamp
+
+Format: The same as `Date`.
+
+==== Class
+
+Format: A `String` containing the fqcn.
+
+==== Double
+
+Format: 8 bytes representing IEEE 754 double-precision binary floating-point format.
+
+Example values
+
+- `3f f0 00 00 00 00 00 00`: Double 1
+- `3f 70 00 00 00 00 00 00`: Double 0.00390625
+- `3f b9 99 99 99 99 99 9a`: Double 0.1
+
+==== Float
+
+Format: 4 bytes representing IEEE 754 single-precision binary floating-point format.
+
+Example values
+
+- `3f 80 00 00`: Float 1
+- `3e c0 00 00`: Float 0.375
+
+==== List
+
+An ordered collection of items.
+
+Format: `{length}{item_0}...{item_n}`
+
+Where:
+
+- `{length}` is an `Int` describing the length of the collection.
+- `{item_0}...{item_n}` are the items of the list. `{item_i}` is a fully qualified typed value composed of
+`{type_code}{type_info}{value_flag}{value}`.
+
+==== Set
+
+A collection that contains no duplicate elements.
+
+Format: Same as `List`.
+
+==== Map
+
+A dictionary of keys to values.
+
+Format: `{length}{item_0}...{item_n}`
+
+Where:
+
+- `{length}` is an `Int` describing the length of the map.
+- `{item_0}...{item_n}` are the items of the map. `{item_i}` is sequence of 2 fully qualified typed values one
+representing the key and the following representing the value, each composed of
+`{type_code}{type_info}{value_flag}{value}`.
+
+==== UUID
+
+A 128-bit universally unique identifier.
+
+Format: 16 bytes representing the uuid.
+
+Example
+
+- `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff`: Uuid 00112233-4455-6677-8899-aabbccddeeff.
+
+==== Edge
+
+Format: `{id}{label}{inVId}{inVLabel}{outVId}{outVLabel}{parent}{properties}`
+
+Where:
+
+- `{id}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{label}` is a `String` value.
+- `{inVId}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{inVLabel}` is a `String` value.
+- `{outVId}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{outVLabel}` is a `String` value.
+- `{parent}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which contains
+the parent `Vertex`. Note that as TinkerPop currently send "references" only, this value will always be `null`.
+- `{properties}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which contains
+the properties for the edge. Note that as TinkerPop currently send "references" only this value will always be `null`.
+
+==== Path
+
+Format: `{labels}{objects}`
+
+Where:
+
+- `{labels}` is a `List` in which each item is a `Set` of `String`.
+- `{objects}` is a `List` of fully qualified typed values.
+
+==== Property
+
+Format: `{key}{value}{parent}`
+
+Where:
+
+- `{key}` is a `String` value.
+- `{value}`  is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{parent}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which is either
+an `Edge` or `VertexProperty`. Note that as TinkerPop currently sends "references" only this value will always be
+`null`.
+
+==== Graph
+
+A collection of vertices and edges. Note that while similar the vertex/edge formats here hold some differences as
+compared to the `Vertex` and `Edge` formats used for standard serialization/deserialiation of a single graph element.
+
+Format: `{vlength}{vertex_0}...{vertex_n}{elength}{edge_0}...{edge_n}`
+
+Where:
+
+- `{vlength}` is an `Int` describing the number of vertices.
+- `{vertex_0}...{vertex_n}` are vertices as described below.
+- `{elength}` is an `Int` describing the number of edges.
+- `{edge_0}...{edge_n}` are edges as described below.
+
+Vertex Format: `{id}{label}{plength}{property_0}...{property_n}`
+
+- `{id}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{label}` is a `String` value.
+- `{plength}` is an `Int` describing the number of properties on the vertex.
+- `{property_0}...{property_n}` are the vertex properties consisting of `{id}{label}{value}{parent}{properties}` as
+defined in `VertexProperty` where the `{parent}` is always `null` and `{properties}` is a `List` of `Property` objects.
+
+Edge Format: `{id}{label}{inVId}{inVLabel}{outVId}{outVLabel}{parent}{properties}`
+
+Where:
+
+- `{id}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{label}` is a `String` value.
+- `{inVId}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{inVLabel}` is always `null`.
+- `{outVId}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{outVLabel}` is always `null`.
+- `{parent}` is always `null`.
+- `{properties}` is a `List` of `Property` objects.
+
+==== Vertex
+
+Format: `{id}{label}{properties}`
+
+Where:
+
+- `{id}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{label}` is a `String` value.
+- `{properties}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which contains
+properties. Note that as TinkerPop currently send "references" only, this value will always be `null`.
+
+==== VertexProperty
+
+Format: `{id}{label}{value}{parent}{properties}`
+
+Where:
+
+- `{id}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{label}` is a `String` value.
+- `{value}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+- `{parent}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which contains
+the parent `Vertex`. Note that as TinkerPop currently send "references" only, this value will always be `null`.
+- `{properties}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}` which contains
+properties. Note that as TinkerPop currently send "references" only, this value will always be `null`.
+
+==== Barrier
+
+Format: a single `String` representing the enum value.
+
+==== Binding
+
+Format: `{key}{value}`
+
+Where:
+
+- `{key}` is a `String` value.
+- `{value}` is a fully qualified typed value composed of `{type_code}{type_info}{value_flag}{value}`.
+
+==== Bytecode
+
+Format: `{steps_length}{step_0}...{step_n}{sources_length}{source_0}...{source_n}`
 
 Review comment:
   In this case, both (lengths) will be provided so it's basically the same :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services