You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/07/24 02:23:24 UTC

[GitHub] [pulsar] sijie commented on a change in pull request #4786: Add *Understand Schema* Section

sijie commented on a change in pull request #4786: Add *Understand Schema* Section
URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603660
 
 

 ##########
 File path: site2/docs/schema-understand.md
 ##########
 @@ -0,0 +1,321 @@
+---
+id: schema-understand
+title: Understand schema
+sidebar_label: Understand schema
+---
+
+## `SchemaInfo`
+
+Pulsar schema is defined in a data structure called `SchemaInfo`. 
+
+The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be stored at the namespace or tenant level.
+
+A `SchemaInfo` consists of the following fields:
+
+| Field | Description |
+|---|---|
+| `name` | Schema name (a string). |
+| `type` | Schema type, which determines how to interpret the schema data. |
+| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and schema-type specific. |
+| `properties` | A map of string key/value pairs, which is application-specific. |
+
+**Example**
+
+This is the `SchemaInfo` of a string.
+
+```text
+{
+    “name”: “test-string-schema”,
+    “type”: “STRING”,
+    “schema”: “”,
+    “properties”: {}
+}
+```
+
+## Schema type
+
+Pulsar supports various schema types, which are mainly divided into two categories: 
+
+* Primitive type 
+
+* Complex type
+
+> #### Note
+> 
+> If you create a schema without specifying a type, producers and consumers can only handle raw bytes.
+
+### Primitive type
+
+Currently, Pulsar supports the following primitive types:
+
+| Primitive Type | Description |
+|---|---|
+| `BOOLEAN` | A binary value |
+| `INT8` | A 8-bit signed integer |
+| `INT16` | A 16-bit signed integer |
+| `INT32` | A 32-bit signed integer |
+| `INT64` | A 64-bit signed integer |
+| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
+| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
+| `BYTES` | A sequence of 8-bit unsigned bytes |
+| `STRING` | A Unicode character sequence |
+| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant in time with millisecond precision. It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | 
+
+For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. 
+
+Some of the primitive schema implementations can use `properties` to store implementation-specific tunable settings. For example, a `string` schema can use `properties` to store the encoding charset to serialize and deserialize strings.
+
+The conversions between **Pulsar schema types** and **language-specific primitive types** are as below.
+
+| Schema Type | Java Type| Python Type |
+|---|---|---|
+| BOOLEAN | boolean | bool |
+| INT8 | byte | |
+| INT16 | short | | 
+| INT32 | int | |
+| INT64 | long | |
+| FLOAT | float | float |
+| DOUBLE | double | float |
+| BYTES | byte[], ByteBuffer, ByteBuf | bytes |
+| STRING | string | str |
+| TIMESTAMP | java.sql.Timestamp | |
+| TIME | java.sql.Time | |
+| DATE | java.util.Date | |
+
+**Example**
+
+This example demonstrates how to use a string schema.
+
+1. Create a producer with a string schema and send messages.
+
+    ```text
+    Producer<String> producer = client.newProducer(Schema.STRING).create();
+    producer.newMessage().value("Hello Pulsar!").send();
+    ```
+
+2. Create a consumer with a string schema and receive messages.  
+
+    ```text
+    Consumer<String> consumer = client.newConsumer(Schema.STRING).create();
+    consumer.receive();
+    ```
+
+### Complex type
+
+Currently, Pulsar supports the following complex types:
+
+| Complex Type | Description |
+|---|---|
+| `keyvalue` | Represents a complex type of a key/value pair. |
+| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. |
+
+* **Complex type 1: `keyvalue`**
+
+    `keyvalue` schema helps applications define schemas for both key and value. 
+
+    For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+
+    Pulsar provides two methods to encode a key/value pair in messages: 
+
+    * **`INLINE`** mode: a key/value pair will be encoded together in the message payload.
+  
+    * **`SEPARATED`** mode: the key will be encoded in the message key and the value will be encoded in the message payload. 
+  
+    Users can choose the encoding type when constructing the key/value schema.
+
+    **Example**
 
 Review comment:
   Have you verified the final rendered result?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services