You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/28 09:19:17 UTC

[GitHub] [pulsar] momo-jun opened a new pull request, #18242: [refactor][doc] Refactor the information architecture of Schema topics

momo-jun opened a new pull request, #18242:
URL: https://github.com/apache/pulsar/pull/18242

   
   
   Master Issue: #xyz
   
   ### Motivation
   
   <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. -->
   
   
   
   ### Modifications
   
   <!-- Describe the modifications you've done. -->
   1. Curate a new `Overview` topic to introduce WHAT/WHY/HOW around Schema.
   2. Curate a new `Get Started` topic to introduce how to construct a schema.
   3. Improve the IA inside the `Understand Schema` and `Schema evolution and compatibility` topics.
   4. Separate content in `Manage Schema` into  `Admin Interfaces - Schemas` and `Understand Schema"` topics.
   5. Move examples and references from `Client Libraries` topics to a single source inside the Schema chapter.
   6. Remove duplicate or redundant information.
   
   Note that the detailed content improvement will be continued and reviewed in a follow-up PR.
   
   Screenshots will be attached soon.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [x] `doc` <!-- Your PR contains doc changes. Please attach the local preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR description, or else your PR might not get merged. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
   - [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012613875


##########
site2/docs/schema-understand.md:
##########
@@ -43,74 +41,36 @@ This is the `SchemaInfo` of a string.
 ## Schema type
 
 Pulsar supports various schema types, which are mainly divided into two categories: 
-
-* Primitive type 
-
-* Complex type
+* [Primitive type](#primitive-type) 
+* [Complex type](#complex-type)
 
 ### Primitive type
 
-Currently, Pulsar supports the following primitive types:
-
-| Primitive Type | Description |
-|---|---|
-| `BOOLEAN` | A binary value |
-| `INT8` | A 8-bit signed integer |
-| `INT16` | A 16-bit signed integer |
-| `INT32` | A 32-bit signed integer |
-| `INT64` | A 64-bit signed integer |
-| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
-| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
-| `BYTES` | A sequence of 8-bit unsigned bytes |
-| `STRING` | A Unicode character sequence |
-| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant in time with millisecond precision. <br />It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | 
-| INSTANT | A single instantaneous point on the time-line with nanoseconds precision|
-| LOCAL_DATE | An immutable date-time object that represents a date, often viewed as year-month-day|
-| LOCAL_TIME | An immutable date-time object that represents a time, often viewed as hour-minute-second. Time is represented to nanosecond precision.|
-| LOCAL_DATE_TIME | An immutable date-time object that represents a date-time, often viewed as year-month-day-hour-minute-second |
-
-For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. 
+The following table outlines the primitive types that Pulsar schema supports, and the conversions between **schema types** and **language-specific primitive types**.
+
+| Primitive Type | Description | Java Type| Python Type | Go Type |
+|---|---|---|---|---|
+| `BOOLEAN` | A binary value | boolean | bool | bool |
+| `INT8` | A 8-bit signed integer | byte | | int8 |
+| `INT16` | A 16-bit signed integer | short | | int16 |
+| `INT32` | A 32-bit signed integer | int | | int32 |
+| `INT64` | A 64-bit signed integer | long | | int64 |

Review Comment:
   Nice catch!



##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))
+        .topic("sensor-readings")
+        .create();
+```
+
+The following schema formats are currently available for Java:
+
+* No schema or the byte array schema (which can be applied using `Schema.BYTES`):
+
+  ```java
+  Producer<byte[]> bytesProducer = client.newProducer(Schema.BYTES)
+      .topic("some-raw-bytes-topic")
+      .create();
+  ```
+
+  Or, equivalently:
+
+  ```java
+  Producer<byte[]> bytesProducer = client.newProducer()
+      .topic("some-raw-bytes-topic")
+      .create();
+  ```
+
+* `String` for normal UTF-8-encoded string data. Apply the schema using `Schema.STRING`:
+
+  ```java
+  Producer<String> stringProducer = client.newProducer(Schema.STRING)
+      .topic("some-string-topic")
+      .create();
+  ```
+
+* Create JSON schemas for POJOs using `Schema.JSON`. The following is an example.
+
+  ```java
+  Producer<MyPojo> pojoProducer = client.newProducer(Schema.JSON(MyPojo.class))
+      .topic("some-pojo-topic")
+      .create();
+  ```
+
+* Generate Protobuf schemas using `Schema.PROTOBUF`. The following example shows how to create the Protobuf schema and use it to instantiate a new producer:
+
+  ```java
+  Producer<MyProtobuf> protobufProducer = client.newProducer(Schema.PROTOBUF(MyProtobuf.class))
+      .topic("some-protobuf-topic")
+      .create();
+  ```
+
+* Define Avro schemas with `Schema.AVRO`. The following code snippet demonstrates how to create and use Avro schema.
 
-### Without schema
+  ```java
+  Producer<MyAvro> avroProducer = client.newProducer(Schema.AVRO(MyAvro.class))
+      .topic("some-avro-topic")
+      .create();
+  ```
 
-If you construct a producer without specifying a schema, then the producer can only produce messages of type `byte[]`. If you have a POJO class, you need to serialize the POJO into bytes before sending messages.
 
-**Example**
+### Avro schema using C++
+
+- The following example shows how to create a producer with an Avro schema.
+
+  ```cpp
+  static const std::string exampleSchema =
+      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+  Producer producer;
+  ProducerConfiguration producerConf;
+  producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+  client.createProducer("topic-avro", producerConf, producer);
+  ```
+
+- The following example shows how to create a consumer with an Avro schema.
+
+  ```cpp
+  static const std::string exampleSchema =
+      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+  ConsumerConfiguration consumerConf;
+  Consumer consumer;
+  consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+  client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
+  ```
+
+### ProtobufNative schema using C++

Review Comment:
   Agree. Will leave it to the next round of content review/revise because this PR focuses more on the information architecture changes.



##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))

Review Comment:
   Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012608488


##########
site2/docs/client-libraries-cpp.md:
##########
@@ -412,80 +412,4 @@ For complete examples, refer to [C++ client examples](https://github.com/apache/
 
 ## Schema
 
-This section describes some examples about schema. For more information about schema, see [Pulsar schema](schema-get-started.md).
-
-### Avro schema
-
-- The following example shows how to create a producer with an Avro schema.
-
-  ```cpp
-  static const std::string exampleSchema =
-      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
-  Producer producer;
-  ProducerConfiguration producerConf;
-  producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
-  client.createProducer("topic-avro", producerConf, producer);
-  ```
-
-- The following example shows how to create a consumer with an Avro schema.
-
-  ```cpp
-  static const std::string exampleSchema =
-      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
-  ConsumerConfiguration consumerConf;
-  Consumer consumer;
-  consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
-  client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
-  ```
-
-### ProtobufNative schema
-
-The following example shows how to create a producer and a consumer with a ProtobufNative schema.
-
-1. Generate the `User` class using Protobuf3 or later versions.
-
-   ```protobuf
-   syntax = "proto3";
-
-   message User {
-       string name = 1;
-       int32 age = 2;
-   }
-   ```
-
-2. Include the `ProtobufNativeSchema.h` in your source code. Ensure the Protobuf dependency has been added to your project.
-
-   ```cpp
-   #include <pulsar/ProtobufNativeSchema.h>
-   ```
-
-3. Create a producer to send a `User` instance.
-
-   ```cpp
-   ProducerConfiguration producerConf;
-   producerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
-   Producer producer;
-   client.createProducer("topic-protobuf", producerConf, producer);
-   User user;
-   user.set_name("my-name");
-   user.set_age(10);
-   std::string content;
-   user.SerializeToString(&content);
-   producer.send(MessageBuilder().setContent(content).build());
-   ```
-
-4. Create a consumer to receive a `User` instance.
-
-   ```cpp
-   ConsumerConfiguration consumerConf;
-   consumerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
-   consumerConf.setSubscriptionInitialPosition(InitialPositionEarliest);
-   Consumer consumer;
-   client.subscribe("topic-protobuf", "my-sub", consumerConf, consumer);
-   Message msg;
-   consumer.receive(msg);
-   User user2;
-   user2.ParseFromArray(msg.getData(), msg.getLength());
-   ```
+To work with [Pulsar schema](schema-overview.md) using C++ clients, see [Schema - Get started](schema-get-started.md). For specific schema types that C++ clients support, see [code](https://github.com/apache/pulsar-client-cpp/blob/main/include/pulsar/Schema.h#L51-L132).

Review Comment:
   This is the CPP doc itself:) I will remove the specific line number from the link for a temporary transition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- merged pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
Technoboy- merged PR #18242:
URL: https://github.com/apache/pulsar/pull/18242


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] congbobo184 commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013578128


##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema

Review Comment:
   ```suggestion
   #### `KeyValue` schema
   ```



##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.

Review Comment:
   ```suggestion
   `KeyValue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the value schema together.
   ```



##########
site2/docs/schema-overview.md:
##########
@@ -0,0 +1,154 @@
+---
+id: schema-overview
+title: Overview
+sidebar_label: "Overview"
+---
+
+This section introduces the following content:
+* [What is Pulsar Schema](#what-is-pulsar-schema)
+* [Why use it](#why-use-it)
+* [How it works](#how-it-works)
+* [Use case](#use-case)
+* [What's next?](#whats-next)
+
+## What is Pulsar Schema
+
+Pulsar messages are stored as unstructured byte arrays and the data structure (as known as schema) is applied to this data only when it's read. The schema serializes the bytes before they are published to a topic and deserializes them before they are delivered to the consumers, dictating which data types are recognized as valid for a given topic.
+
+Pulsar schema registry is a central repository to store the schema information, which enables producers/consumers to coordinate on the schema of a topic’s data through brokers.
+
+:::note
+
+Currently, Pulsar schema is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+
+:::
+
+## Why use it
+
+Type safety is extremely important in any application built around a messaging and streaming system. Raw bytes are flexible for data transfer, but the flexibility and neutrality come with a cost: you have to overlay data type checking and serialization/deserialization to ensure that the bytes fed into the system can be read and successfully consumed. In other words, you need to make sure the data intelligible and usable to applications.
+
+Pulsar schema resolves the pain points with the following capabilities:
+* enforces the data type safety when a topic has a schema defined. As a result, producers/consumers are only allowed to connect if they are using a “compatible” schema.
+* provides a central location for storing information about the schemas used within your organization, in turn greatly simplifies the sharing of this information across application teams.
+* serves as a single source of truth for all the message schemas used across all your services and development teams, which makes it easier for them to collaborate.
+* keeps data compatibility on-track between schema versions. When new schemas are uploaded, the new versions can be read by old consumers. 
+* stored in the existing storage layer BookKeeper, no additional system required.
+
+## How it works

Review Comment:
   whether move to `understand schema`, the user may not understand how it works here, and may not care about it in the Overview



##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
 
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value schema](schema-get-started.md#construct-a-keyvalue-schema).
 
-* `INLINE`
-
-* `SEPARATED`
-
-You can choose the encoding type when constructing the key/value schema.
-
-````mdx-code-block
-<Tabs 
-  defaultValue="INLINE"
-  values={[{"label":"INLINE","value":"INLINE"},{"label":"SEPARATED","value":"SEPARATED"}]}>
-
-<TabItem value="INLINE">
-
-Key/value pairs are encoded together in the message payload.
-
-</TabItem>
-<TabItem value="SEPARATED">
-
-Key is encoded in the message key and the value is encoded in the message payload. 
-  
-**Example**
-    
-This example shows how to construct a key/value schema and then use it to produce and consume messages.
-
-1. Construct a key/value schema with `INLINE` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.INLINE
-   );
-   ```
-
-2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-   ```
-
-3. Produce messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
-       .topic(TOPIC)
-       .create();
-
-   final int key = 100;
-   final String value = "value-100";
-
-   // send the key/value message
-   producer.newMessage()
-   .value(new KeyValue(key, value))
-   .send();
-   ```
-
-4. Consume messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
-       ...
-       .topic(TOPIC)
-       .subscriptionName(SubscriptionName).subscribe();
-
-   // receive key/value pair
-   Message<KeyValue<Integer, String>> msg = consumer.receive();
-   KeyValue<Integer, String> kv = msg.getValue();
-   ```
-
-</TabItem>
-
-</Tabs>
-````
-
-#### struct
-
-This section describes the details of type and usage of the `struct` schema.
-
-##### Type
+#### `struct` schema
 
 `struct` schema supports `AvroBaseStructSchema` and `ProtobufNativeSchema`.

Review Comment:
   `AvroSchema` `JsonSchema` etc. is struct schema, later we should add these



##########
site2/docs/schema-evolution-compatibility.md:
##########
@@ -6,29 +6,21 @@ sidebar_label: "Schema evolution and compatibility"
 
 Normally, schemas do not stay the same over a long period of time. Instead, they undergo evolutions to satisfy new needs. 
 
-This chapter examines how Pulsar schema evolves and what Pulsar schema compatibility check strategies are.
+This chapter introduces how Pulsar schema evolves and what compatibility check strategies it adopts.
 
 ## Schema evolution

Review Comment:
   can we move this to `Understand Schema`? It feels awkward to put it here, the user must have enough context to understand it



##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
 
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value schema](schema-get-started.md#construct-a-keyvalue-schema).
 
-* `INLINE`
-
-* `SEPARATED`
-
-You can choose the encoding type when constructing the key/value schema.
-
-````mdx-code-block
-<Tabs 
-  defaultValue="INLINE"
-  values={[{"label":"INLINE","value":"INLINE"},{"label":"SEPARATED","value":"SEPARATED"}]}>
-
-<TabItem value="INLINE">
-
-Key/value pairs are encoded together in the message payload.
-
-</TabItem>
-<TabItem value="SEPARATED">
-
-Key is encoded in the message key and the value is encoded in the message payload. 
-  
-**Example**
-    
-This example shows how to construct a key/value schema and then use it to produce and consume messages.
-
-1. Construct a key/value schema with `INLINE` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.INLINE
-   );
-   ```
-
-2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-   ```
-
-3. Produce messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
-       .topic(TOPIC)
-       .create();
-
-   final int key = 100;
-   final String value = "value-100";
-
-   // send the key/value message
-   producer.newMessage()
-   .value(new KeyValue(key, value))
-   .send();
-   ```
-
-4. Consume messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
-       ...
-       .topic(TOPIC)
-       .subscriptionName(SubscriptionName).subscribe();
-
-   // receive key/value pair
-   Message<KeyValue<Integer, String>> msg = consumer.receive();
-   KeyValue<Integer, String> kv = msg.getValue();
-   ```
-
-</TabItem>
-
-</Tabs>
-````
-
-#### struct
-
-This section describes the details of type and usage of the `struct` schema.
-
-##### Type
+#### `struct` schema

Review Comment:
   ```suggestion
   #### `Struct` schema
   ```



##########
site2/docs/schema-evolution-compatibility.md:
##########
@@ -6,29 +6,21 @@ sidebar_label: "Schema evolution and compatibility"
 
 Normally, schemas do not stay the same over a long period of time. Instead, they undergo evolutions to satisfy new needs. 
 
-This chapter examines how Pulsar schema evolves and what Pulsar schema compatibility check strategies are.
+This chapter introduces how Pulsar schema evolves and what compatibility check strategies it adopts.
 
 ## Schema evolution
 
-Pulsar schema is defined in a data structure called `SchemaInfo`. 
-
-Each `SchemaInfo` stored with a topic has a version. The version is used to manage the schema changes happening within a topic. 
-
 The message produced with `SchemaInfo` is tagged with a schema version. When a message is consumed by a Pulsar client, the Pulsar client can use the schema version to retrieve the corresponding `SchemaInfo` and use the correct schema information to deserialize data.
 
-### What is schema evolution?
-
 Schemas store the details of attributes and types. To satisfy new business requirements,  you need to update schemas inevitably over time, which is called **schema evolution**. 
 
 Any schema changes affect downstream consumers. Schema evolution ensures that the downstream consumers can seamlessly handle data encoded with both old schemas and new schemas. 
 
-### How Pulsar schema should evolve? 
-
-The answer is Pulsar schema compatibility check strategy. It determines how schema compares old schemas with new schemas in topics.
+### How schema evolves? 
 
-For more information, see [Schema compatibility check strategy](#schema-compatibility-check-strategy).
+The answer is [schema compatibility check strategy](#schema-compatibility-check-strategy). It determines how schema compares old schemas with new schemas in topics.

Review Comment:
   Can we delete it? doesn't seem to say anything



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#issuecomment-1301791370

   @RobertIndie Thanks for your review and comments. I addressed some of them and will leave the others to the next follow-up PR. Because this PR focuses more on the information architecture changes, and some changes (except the WHAT/WHY in the Overview) are a copy&paste work among topics. A thorough content review will be done in the following week through Google docs to add/change more details and I will invite you as a reviewer :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#issuecomment-1302892909

   @congbobo184 @RobertIndie I've created a [Google doc](https://docs.google.com/document/d/1YYansXopyVV66NaQ6TR9tfpq9MEhcbvCqv8ByFWuadA/edit?usp=sharing) to start the review/revise process on the content and added Zike's comments there. We can collaborate to review/revise in detail there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] RobertIndie commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
RobertIndie commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012464500


##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );

Review Comment:
   ```suggestion
   ```
   We have already created the schema in the previous section.



##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))

Review Comment:
   ```suggestion
   Producer<SensorReading> producer = client.newProducer(AvroSchema.of(SensorReading.class))
   ```



##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))
+        .topic("sensor-readings")
+        .create();
+```
+
+The following schema formats are currently available for Java:
+
+* No schema or the byte array schema (which can be applied using `Schema.BYTES`):
+
+  ```java
+  Producer<byte[]> bytesProducer = client.newProducer(Schema.BYTES)
+      .topic("some-raw-bytes-topic")
+      .create();
+  ```
+
+  Or, equivalently:
+
+  ```java
+  Producer<byte[]> bytesProducer = client.newProducer()
+      .topic("some-raw-bytes-topic")
+      .create();
+  ```
+
+* `String` for normal UTF-8-encoded string data. Apply the schema using `Schema.STRING`:
+
+  ```java
+  Producer<String> stringProducer = client.newProducer(Schema.STRING)
+      .topic("some-string-topic")
+      .create();
+  ```
+
+* Create JSON schemas for POJOs using `Schema.JSON`. The following is an example.
+
+  ```java
+  Producer<MyPojo> pojoProducer = client.newProducer(Schema.JSON(MyPojo.class))
+      .topic("some-pojo-topic")
+      .create();
+  ```
+
+* Generate Protobuf schemas using `Schema.PROTOBUF`. The following example shows how to create the Protobuf schema and use it to instantiate a new producer:
+
+  ```java
+  Producer<MyProtobuf> protobufProducer = client.newProducer(Schema.PROTOBUF(MyProtobuf.class))
+      .topic("some-protobuf-topic")
+      .create();
+  ```
+
+* Define Avro schemas with `Schema.AVRO`. The following code snippet demonstrates how to create and use Avro schema.
 
-### Without schema
+  ```java
+  Producer<MyAvro> avroProducer = client.newProducer(Schema.AVRO(MyAvro.class))
+      .topic("some-avro-topic")
+      .create();
+  ```
 
-If you construct a producer without specifying a schema, then the producer can only produce messages of type `byte[]`. If you have a POJO class, you need to serialize the POJO into bytes before sending messages.
 
-**Example**
+### Avro schema using C++
+
+- The following example shows how to create a producer with an Avro schema.
+
+  ```cpp
+  static const std::string exampleSchema =
+      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+  Producer producer;
+  ProducerConfiguration producerConf;
+  producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+  client.createProducer("topic-avro", producerConf, producer);
+  ```
+
+- The following example shows how to create a consumer with an Avro schema.
+
+  ```cpp
+  static const std::string exampleSchema =
+      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
+      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
+  ConsumerConfiguration consumerConf;
+  Consumer consumer;
+  consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
+  client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
+  ```
+
+### ProtobufNative schema using C++

Review Comment:
   Do we also need to talk about `ProtobufNative schema using Java`?



##########
site2/docs/client-libraries-cpp.md:
##########
@@ -412,80 +412,4 @@ For complete examples, refer to [C++ client examples](https://github.com/apache/
 
 ## Schema
 
-This section describes some examples about schema. For more information about schema, see [Pulsar schema](schema-get-started.md).
-
-### Avro schema
-
-- The following example shows how to create a producer with an Avro schema.
-
-  ```cpp
-  static const std::string exampleSchema =
-      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
-  Producer producer;
-  ProducerConfiguration producerConf;
-  producerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
-  client.createProducer("topic-avro", producerConf, producer);
-  ```
-
-- The following example shows how to create a consumer with an Avro schema.
-
-  ```cpp
-  static const std::string exampleSchema =
-      "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\","
-      "\"fields\":[{\"name\":\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"int\"}]}";
-  ConsumerConfiguration consumerConf;
-  Consumer consumer;
-  consumerConf.setSchema(SchemaInfo(AVRO, "Avro", exampleSchema));
-  client.subscribe("topic-avro", "sub-2", consumerConf, consumer)
-  ```
-
-### ProtobufNative schema
-
-The following example shows how to create a producer and a consumer with a ProtobufNative schema.
-
-1. Generate the `User` class using Protobuf3 or later versions.
-
-   ```protobuf
-   syntax = "proto3";
-
-   message User {
-       string name = 1;
-       int32 age = 2;
-   }
-   ```
-
-2. Include the `ProtobufNativeSchema.h` in your source code. Ensure the Protobuf dependency has been added to your project.
-
-   ```cpp
-   #include <pulsar/ProtobufNativeSchema.h>
-   ```
-
-3. Create a producer to send a `User` instance.
-
-   ```cpp
-   ProducerConfiguration producerConf;
-   producerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
-   Producer producer;
-   client.createProducer("topic-protobuf", producerConf, producer);
-   User user;
-   user.set_name("my-name");
-   user.set_age(10);
-   std::string content;
-   user.SerializeToString(&content);
-   producer.send(MessageBuilder().setContent(content).build());
-   ```
-
-4. Create a consumer to receive a `User` instance.
-
-   ```cpp
-   ConsumerConfiguration consumerConf;
-   consumerConf.setSchema(createProtobufNativeSchema(User::GetDescriptor()));
-   consumerConf.setSubscriptionInitialPosition(InitialPositionEarliest);
-   Consumer consumer;
-   client.subscribe("topic-protobuf", "my-sub", consumerConf, consumer);
-   Message msg;
-   consumer.receive(msg);
-   User user2;
-   user2.ParseFromArray(msg.getData(), msg.getLength());
-   ```
+To work with [Pulsar schema](schema-overview.md) using C++ clients, see [Schema - Get started](schema-get-started.md). For specific schema types that C++ clients support, see [code](https://github.com/apache/pulsar-client-cpp/blob/main/include/pulsar/Schema.h#L51-L132).

Review Comment:
   Can we link it to the CPP doc? This link is not stable, it will point to other lines of codes with codes changing.



##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
 
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value schema](schema-get-started.md#construct-a-keyvalue-schema).

Review Comment:
   Better to briefly explain `SEPARATED` here.



##########
site2/docs/schema-understand.md:
##########
@@ -43,74 +41,36 @@ This is the `SchemaInfo` of a string.
 ## Schema type
 
 Pulsar supports various schema types, which are mainly divided into two categories: 
-
-* Primitive type 
-
-* Complex type
+* [Primitive type](#primitive-type) 
+* [Complex type](#complex-type)
 
 ### Primitive type
 
-Currently, Pulsar supports the following primitive types:
-
-| Primitive Type | Description |
-|---|---|
-| `BOOLEAN` | A binary value |
-| `INT8` | A 8-bit signed integer |
-| `INT16` | A 16-bit signed integer |
-| `INT32` | A 32-bit signed integer |
-| `INT64` | A 64-bit signed integer |
-| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
-| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
-| `BYTES` | A sequence of 8-bit unsigned bytes |
-| `STRING` | A Unicode character sequence |
-| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant in time with millisecond precision. <br />It stores the number of milliseconds since `January 1, 1970, 00:00:00 GMT` as an `INT64` value | 
-| INSTANT | A single instantaneous point on the time-line with nanoseconds precision|
-| LOCAL_DATE | An immutable date-time object that represents a date, often viewed as year-month-day|
-| LOCAL_TIME | An immutable date-time object that represents a time, often viewed as hour-minute-second. Time is represented to nanosecond precision.|
-| LOCAL_DATE_TIME | An immutable date-time object that represents a date-time, often viewed as year-month-day-hour-minute-second |
-
-For primitive types, Pulsar does not store any schema data in `SchemaInfo`. The `type` in `SchemaInfo` is used to determine how to serialize and deserialize the data. 
+The following table outlines the primitive types that Pulsar schema supports, and the conversions between **schema types** and **language-specific primitive types**.
+
+| Primitive Type | Description | Java Type| Python Type | Go Type |
+|---|---|---|---|---|
+| `BOOLEAN` | A binary value | boolean | bool | bool |
+| `INT8` | A 8-bit signed integer | byte | | int8 |
+| `INT16` | A 16-bit signed integer | short | | int16 |
+| `INT32` | A 32-bit signed integer | int | | int32 |
+| `INT64` | A 64-bit signed integer | long | | int64 |

Review Comment:
   ```suggestion
   | `INT8` | A 8-bit signed integer | byte | int | int8 |
   | `INT16` | A 16-bit signed integer | short | int | int16 |
   | `INT32` | A 32-bit signed integer | int | int | int32 |
   | `INT64` | A 64-bit signed integer | long | int | int64 |
   ```



##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))
+        .topic("sensor-readings")
+        .create();
+```
+
+The following schema formats are currently available for Java:

Review Comment:
   This section seems to be introducing the Avro schema, but why are other schemas also introduced here?
   
   If we want to indicate that all these schemas are based on Avro protocol, then I think it's better to use `Avro based schema ...` as the title. Otherwise, it will make uses confused because there is an `AvroSchema` based on Avro protocol.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013655561


##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
 
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value schema](schema-get-started.md#construct-a-keyvalue-schema).
 
-* `INLINE`
-
-* `SEPARATED`
-
-You can choose the encoding type when constructing the key/value schema.
-
-````mdx-code-block
-<Tabs 
-  defaultValue="INLINE"
-  values={[{"label":"INLINE","value":"INLINE"},{"label":"SEPARATED","value":"SEPARATED"}]}>
-
-<TabItem value="INLINE">
-
-Key/value pairs are encoded together in the message payload.
-
-</TabItem>
-<TabItem value="SEPARATED">
-
-Key is encoded in the message key and the value is encoded in the message payload. 
-  
-**Example**
-    
-This example shows how to construct a key/value schema and then use it to produce and consume messages.
-
-1. Construct a key/value schema with `INLINE` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.INLINE
-   );
-   ```
-
-2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-   ```
-
-3. Produce messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
-       .topic(TOPIC)
-       .create();
-
-   final int key = 100;
-   final String value = "value-100";
-
-   // send the key/value message
-   producer.newMessage()
-   .value(new KeyValue(key, value))
-   .send();
-   ```
-
-4. Consume messages using a key/value schema.
-
-   ```java
-   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
-   Schema.INT32,
-   Schema.STRING,
-   KeyValueEncodingType.SEPARATED
-   );
-
-   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
-       ...
-       .topic(TOPIC)
-       .subscriptionName(SubscriptionName).subscribe();
-
-   // receive key/value pair
-   Message<KeyValue<Integer, String>> msg = consumer.receive();
-   KeyValue<Integer, String> kv = msg.getValue();
-   ```
-
-</TabItem>
-
-</Tabs>
-````
-
-#### struct
-
-This section describes the details of type and usage of the `struct` schema.
-
-##### Type
+#### `struct` schema
 
 `struct` schema supports `AvroBaseStructSchema` and `ProtobufNativeSchema`.

Review Comment:
   I've added the comment in the [Google doc](https://docs.google.com/document/d/1YYansXopyVV66NaQ6TR9tfpq9MEhcbvCqv8ByFWuadA/edit?usp=sharing) as part of the content review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012619499


##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
 
-There are a few methods to prevent and overcome these exceptions, for example, you can catch exceptions when parsing errors, which makes code hard to maintain; or you can adopt a schema management system to perform schema evolution, not to break downstream applications, and enforces type safety to max extend in the language you are using, the solution is Pulsar Schema.
+   Producer<KeyValue<Integer, String>> producer = client.newProducer(kvSchema)
+       .topic(TOPIC)
+       .create();
 
-Pulsar schema enables you to use language-specific types of data when constructing and handling messages from simple types like `string` to more complex application-specific types. 
+   final int key = 100;
+   final String value = "value-100";
+
+   // send the key/value message
+   producer.newMessage()
+   .value(new KeyValue(key, value))
+   .send();
+   ```
+
+4. Consume messages using a key/value schema.
+
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+
+   Consumer<KeyValue<Integer, String>> consumer = client.newConsumer(kvSchema)
+       ...
+       .topic(TOPIC)
+       .subscriptionName(SubscriptionName).subscribe();
+
+   // receive key/value pair
+   Message<KeyValue<Integer, String>> msg = consumer.receive();
+   KeyValue<Integer, String> kv = msg.getValue();
+   ```
+
+## Construct a struct schema
+
+This example shows how to construct a [struct schema](schema-understand.md#struct-schema) and use it to produce and consume messages using different methods.
+
+````mdx-code-block
+<Tabs 
+  defaultValue="static"
+  values={[{"label":"static","value":"static"},{"label":"generic","value":"generic"},{"label":"SchemaDefinition","value":"SchemaDefinition"}]}>
+
+<TabItem value="static">
+
+You can predefine the `struct` schema, which can be a POJO in Java, a `struct` in Go, or classes generated by Avro or Protobuf tools. 
+
+**Example** 
+
+Pulsar gets the schema definition from the predefined `struct` using an Avro library. The schema definition is the schema data stored as a part of the `SchemaInfo`.
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `struct` schema and send messages.
+
+   ```java
+   Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `struct` schema and receive messages
+
+   ```java
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+<TabItem value="generic">
+
+Sometimes applications do not have pre-defined structs, and you can use this method to define schema and access data.
+
+You can define the `struct` schema using the `GenericSchemaBuilder`, generate a generic struct using `GenericRecordBuilder` and consume messages into `GenericRecord`.
 
 **Example** 
 
-You can use the _User_ class to define the messages sent to Pulsar topics.
+1. Use `RecordSchemaBuilder` to build a schema.
+
+   ```java
+   RecordSchemaBuilder recordSchemaBuilder = SchemaBuilder.record("schemaName");
+   recordSchemaBuilder.field("intField").type(SchemaType.INT32);
+   SchemaInfo schemaInfo = recordSchemaBuilder.build(SchemaType.AVRO);
+
+   Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
+   ```
+
+2. Use `RecordBuilder` to build the struct records.
+
+   ```java
+   producer.newMessage().value(schema.newRecordBuilder()
+               .set("intField", 32)
+               .build()).send();
+   ```
+
+</TabItem>
+<TabItem value="SchemaDefinition">
+
+You can define the `schemaDefinition` to generate a `struct` schema.
+
+**Example** 
+
+1. Create the _User_ class to define the messages sent to Pulsar topics.
+
+   ```java
+   @Builder
+   @AllArgsConstructor
+   @NoArgsConstructor
+   public static class User {
+       String name;
+       int age;
+   }
+   ```
+
+2. Create a producer with a `SchemaDefinition` and send messages.
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Producer<User> producer = client.newProducer(Schema.AVRO(schemaDefinition)).create();
+   producer.newMessage().value(User.builder().name("pulsar-user").age(1).build()).send();
+   ```
+
+3. Create a consumer with a `SchemaDefinition` schema and receive messages
+
+   ```java
+   SchemaDefinition<User> schemaDefinition = SchemaDefinition.<User>builder().withPojo(User.class).build();
+   Consumer<User> consumer = client.newConsumer(Schema.AVRO(schemaDefinition)).subscribe();
+   User user = consumer.receive().getValue();
+   ```
+
+</TabItem>
+
+</Tabs>
+````
+
+### Avro schema using Java
+
+Suppose you have a `SensorReading` class as follows, and you'd like to transmit it over a Pulsar topic.
 
 ```java
-public class User {
-    String name;
-    int age;
+public class SensorReading {
+    public float temperature;
+
+    public SensorReading(float temperature) {
+        this.temperature = temperature;
+    }
+
+    // A no-arg constructor is required
+    public SensorReading() {
+    }
+
+    public float getTemperature() {
+        return temperature;
+    }
+
+    public void setTemperature(float temperature) {
+        this.temperature = temperature;
+    }
 }
 ```
 
-When constructing a producer with the _User_ class, you can specify a schema or not as below.
+Create a `Producer<SensorReading>` (or `Consumer<SensorReading>`) like this:
+
+```java
+Producer<SensorReading> producer = client.newProducer(JSONSchema.of(SensorReading.class))
+        .topic("sensor-readings")
+        .create();
+```
+
+The following schema formats are currently available for Java:

Review Comment:
   Agree. I added a new heading to introduce these examples as a quick twist. Most content changes in this PR are a copy&paste to implement a quick information architecture change. A thorough content review will be done in the following week through Google docs:)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012613663


##########
site2/docs/schema-understand.md:
##########
@@ -121,109 +81,15 @@ Currently, Pulsar supports the following complex types:
 | `keyvalue` | Represents a complex type of a key/value pair. |
 | `struct` | Handles structured data. It supports `AvroBaseStructSchema` and `ProtobufNativeSchema`. |
 
-#### keyvalue
-
-`Keyvalue` schema helps applications define schemas for both key and value. 
+#### `keyvalue` schema
 
-For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
+`Keyvalue` schema helps applications define schemas for both key and value. Pulsar stores the `SchemaInfo` of key schema and the `SchemaInfo` of value schema together.
 
-Pulsar provides the following methods to encode a key/value pair in messages:
+You can choose the encoding type when constructing the key/value schema.:
+* `INLINE` - Key/value pairs are encoded together in the message payload.
+* `SEPARATED` - see [Construct a key/value schema](schema-get-started.md#construct-a-keyvalue-schema).

Review Comment:
   Agree. Will leave it to the next round of content review/revise because this PR focuses more on the information architecture changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013654236


##########
site2/docs/schema-evolution-compatibility.md:
##########
@@ -6,29 +6,21 @@ sidebar_label: "Schema evolution and compatibility"
 
 Normally, schemas do not stay the same over a long period of time. Instead, they undergo evolutions to satisfy new needs. 
 
-This chapter examines how Pulsar schema evolves and what Pulsar schema compatibility check strategies are.
+This chapter introduces how Pulsar schema evolves and what compatibility check strategies it adopts.
 
 ## Schema evolution
 
-Pulsar schema is defined in a data structure called `SchemaInfo`. 
-
-Each `SchemaInfo` stored with a topic has a version. The version is used to manage the schema changes happening within a topic. 
-
 The message produced with `SchemaInfo` is tagged with a schema version. When a message is consumed by a Pulsar client, the Pulsar client can use the schema version to retrieve the corresponding `SchemaInfo` and use the correct schema information to deserialize data.
 
-### What is schema evolution?
-
 Schemas store the details of attributes and types. To satisfy new business requirements,  you need to update schemas inevitably over time, which is called **schema evolution**. 
 
 Any schema changes affect downstream consumers. Schema evolution ensures that the downstream consumers can seamlessly handle data encoded with both old schemas and new schemas. 
 
-### How Pulsar schema should evolve? 
-
-The answer is Pulsar schema compatibility check strategy. It determines how schema compares old schemas with new schemas in topics.
+### How schema evolves? 
 
-For more information, see [Schema compatibility check strategy](#schema-compatibility-check-strategy).
+The answer is [schema compatibility check strategy](#schema-compatibility-check-strategy). It determines how schema compares old schemas with new schemas in topics.

Review Comment:
   Indeed. Can be considered and improved in the content review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013655142


##########
site2/docs/schema-overview.md:
##########
@@ -0,0 +1,154 @@
+---
+id: schema-overview
+title: Overview
+sidebar_label: "Overview"
+---
+
+This section introduces the following content:
+* [What is Pulsar Schema](#what-is-pulsar-schema)
+* [Why use it](#why-use-it)
+* [How it works](#how-it-works)
+* [Use case](#use-case)
+* [What's next?](#whats-next)
+
+## What is Pulsar Schema
+
+Pulsar messages are stored as unstructured byte arrays and the data structure (as known as schema) is applied to this data only when it's read. The schema serializes the bytes before they are published to a topic and deserializes them before they are delivered to the consumers, dictating which data types are recognized as valid for a given topic.
+
+Pulsar schema registry is a central repository to store the schema information, which enables producers/consumers to coordinate on the schema of a topic’s data through brokers.
+
+:::note
+
+Currently, Pulsar schema is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+
+:::
+
+## Why use it
+
+Type safety is extremely important in any application built around a messaging and streaming system. Raw bytes are flexible for data transfer, but the flexibility and neutrality come with a cost: you have to overlay data type checking and serialization/deserialization to ensure that the bytes fed into the system can be read and successfully consumed. In other words, you need to make sure the data intelligible and usable to applications.
+
+Pulsar schema resolves the pain points with the following capabilities:
+* enforces the data type safety when a topic has a schema defined. As a result, producers/consumers are only allowed to connect if they are using a “compatible” schema.
+* provides a central location for storing information about the schemas used within your organization, in turn greatly simplifies the sharing of this information across application teams.
+* serves as a single source of truth for all the message schemas used across all your services and development teams, which makes it easier for them to collaborate.
+* keeps data compatibility on-track between schema versions. When new schemas are uploaded, the new versions can be read by old consumers. 
+* stored in the existing storage layer BookKeeper, no additional system required.
+
+## How it works

Review Comment:
   Makes sense. I think it also helps users understand what schema is. This move can be considered and improved in the content review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1012614113


##########
site2/docs/schema-get-started.md:
##########
@@ -4,92 +4,480 @@ title: Get started
 sidebar_label: "Get started"
 ---
 
-This chapter introduces Pulsar schemas and explains why they are important. 
 
-## Schema Registry
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
 
-Type safety is extremely important in any application built around a message bus like Pulsar. 
 
-Producers and consumers need some kind of mechanism for coordinating types at the topic level to avoid various potential problems arising. For example, serialization and deserialization issues. 
+This hands-on tutorial provides instructions and examples on how to construct and customize schemas.
 
-Applications typically adopt one of the following approaches to guarantee type safety in messaging. Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.
+## Construct a string schema
 
-#### Note
->
-> Currently, the Pulsar schema registry is only available for the [Java client](client-libraries-java.md), [Go client](client-libraries-go.md), [Python client](client-libraries-python.md), and [C++ client](client-libraries-cpp.md).
+This example demonstrates how to construct a [string schema](schema-understand.md#primitive-type) and use it to produce and consume messages in Java.
 
-### Client-side approach
+1. Create a producer with a string schema and send messages.
 
-Producers and consumers are responsible for not only serializing and deserializing messages (which consist of raw bytes) but also "knowing" which types are being transmitted via which topics. 
+   ```java
+   Producer<String> producer = client.newProducer(Schema.STRING).create();
+   producer.newMessage().value("Hello Pulsar!").send();
+   ```
 
-If a producer is sending temperature sensor data on the topic `topic-1`, consumers of that topic will run into trouble if they attempt to parse that data as moisture sensor readings.
+2. Create a consumer with a string schema and receive messages.  
 
-Producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.
+   ```java
+   Consumer<String> consumer = client.newConsumer(Schema.STRING).subscribe();
+   consumer.receive();
+   ```
 
-### Server-side approach 
+## Construct a key/value schema
 
-Producers and consumers inform the system which data types can be transmitted via the topic. 
+This example shows how to construct a [key/value schema](schema-understand.md#keyvalue-schema) and use it to produce and consume messages in Java.
 
-With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.
+1. Construct a key/value schema with `INLINE` encoding type.
 
-Pulsar has a built-in **schema registry** that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.INLINE
+   );
+   ```
 
-## Why use schema
+2. Optionally, construct a key/value schema with `SEPARATED` encoding type.
 
-When a schema is enabled, Pulsar does parse data, it takes bytes as inputs and sends bytes as outputs. While data has meaning beyond bytes, you need to parse data and might encounter parse exceptions which mainly occur in the following situations:
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );
+   ```
 
-* The field does not exist
+3. Produce messages using a key/value schema.
 
-* The field type has changed (for example, `string` is changed to `int`)
+   ```java
+   Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
+   Schema.INT32,
+   Schema.STRING,
+   KeyValueEncodingType.SEPARATED
+   );

Review Comment:
   Nice catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #18242: [refactor][doc] Refactor the information architecture of Schema topics

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #18242:
URL: https://github.com/apache/pulsar/pull/18242#discussion_r1013653422


##########
site2/docs/schema-evolution-compatibility.md:
##########
@@ -6,29 +6,21 @@ sidebar_label: "Schema evolution and compatibility"
 
 Normally, schemas do not stay the same over a long period of time. Instead, they undergo evolutions to satisfy new needs. 
 
-This chapter examines how Pulsar schema evolves and what Pulsar schema compatibility check strategies are.
+This chapter introduces how Pulsar schema evolves and what compatibility check strategies it adopts.
 
 ## Schema evolution

Review Comment:
   Yes, will be done in the next move. Leaving it in this PR just to track the changes made to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org