You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by li...@apache.org on 2022/06/10 06:22:12 UTC

[pulsar] branch master updated: [fix][doc] Add context for schemaRegistryCompatibilityCheckers (#15887)

This is an automated email from the ASF dual-hosted git repository.

liuyu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 3d7fca3ee70 [fix][doc] Add context for schemaRegistryCompatibilityCheckers (#15887)
3d7fca3ee70 is described below

commit 3d7fca3ee7030ce2d53d5fe85d1a79b8784b1c2e
Author: momo-jun <60...@users.noreply.github.com>
AuthorDate: Fri Jun 10 14:22:00 2022 +0800

    [fix][doc] Add context for schemaRegistryCompatibilityCheckers (#15887)
---
 conf/broker.conf                             |  3 ++
 conf/standalone.conf                         |  3 ++
 site2/docs/reference-configuration.md        |  2 +
 site2/docs/schema-evolution-compatibility.md | 58 ++++++++++++++++------------
 4 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/conf/broker.conf b/conf/broker.conf
index a22bec3e7a4..24e32e0b7f4 100644
--- a/conf/broker.conf
+++ b/conf/broker.conf
@@ -557,6 +557,9 @@ zookeeperSessionExpiredPolicy=reconnect
 # Enable or disable system topic
 systemTopicEnabled=true
 
+# Deploy the schema compatibility checker for a specific schema type to enforce schema compatibility check
+schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
+
 # The schema compatibility strategy is used for system topics.
 # Available values: ALWAYS_INCOMPATIBLE, ALWAYS_COMPATIBLE, BACKWARD, FORWARD, FULL, BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, FULL_TRANSITIVE
 systemTopicSchemaCompatibilityStrategy=ALWAYS_COMPATIBLE
diff --git a/conf/standalone.conf b/conf/standalone.conf
index 121f10ad00f..cae48d57dc2 100644
--- a/conf/standalone.conf
+++ b/conf/standalone.conf
@@ -442,6 +442,9 @@ brokerClientTlsProtocols=
 # Enable or disable system topic
 systemTopicEnabled=true
 
+# Deploy the schema compatibility checker for a specific schema type to enforce schema compatibility check
+schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
+
 # The schema compatibility strategy is used for system topics.
 # Available values: ALWAYS_INCOMPATIBLE, ALWAYS_COMPATIBLE, BACKWARD, FORWARD, FULL, BACKWARD_TRANSITIVE, FORWARD_TRANSITIVE, FULL_TRANSITIVE
 systemTopicSchemaCompatibilityStrategy=ALWAYS_COMPATIBLE
diff --git a/site2/docs/reference-configuration.md b/site2/docs/reference-configuration.md
index 4cf0a840def..72ee5884b9a 100644
--- a/site2/docs/reference-configuration.md
+++ b/site2/docs/reference-configuration.md
@@ -258,6 +258,7 @@ brokerServiceCompactionThresholdInBytes|If the estimated backlog size is greater
 |schemaRegistryStorageClassName|The schema storage implementation used by this broker.|org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory|
 |isSchemaValidationEnforced| Whether to enable schema validation, when schema validation is enabled, if a producer without a schema attempts to produce the message to a topic with schema, the producer is rejected and disconnected.|false|
 |isAllowAutoUpdateSchemaEnabled|Allow schema to be auto updated at broker level.|true|
+|schemaRegistryCompatibilityCheckers | Deploy the schema compatibility checker for a specific schema type to enforce schema compatibility check. |org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck |
 |schemaCompatibilityStrategy| The schema compatibility strategy at broker level, see [here](schema-evolution-compatibility.md#schema-compatibility-check-strategy) for available values.|FULL|
 |systemTopicSchemaCompatibilityStrategy| The schema compatibility strategy is used for system topics, see [here](schema-evolution-compatibility.md#schema-compatibility-check-strategy) for available values.|ALWAYS_COMPATIBLE|
 | topicFencingTimeoutSeconds | If a topic remains fenced for a certain time period (in seconds), it is closed forcefully. If set to 0 or a negative number, the fenced topic is not closed. | 0 |
@@ -731,6 +732,7 @@ You can set the log level and configuration in the  [log4j2.yaml](https://github
 |schemaRegistryStorageClassName|The schema storage implementation used by this broker.|org.apache.pulsar.broker.service.schema.BookkeeperSchemaStorageFactory|
 |isSchemaValidationEnforced| Whether to enable schema validation, when schema validation is enabled, if a producer without a schema attempts to produce the message to a topic with schema, the producer is rejected and disconnected.|false|
 |isAllowAutoUpdateSchemaEnabled|Allow schema to be auto updated at broker level.|true|
+|schemaRegistryCompatibilityCheckers | Deploy the schema compatibility checker for a specific schema type to enforce schema compatibility check. |org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck |
 |schemaCompatibilityStrategy| The schema compatibility strategy at broker level, see [here](schema-evolution-compatibility.md#schema-compatibility-check-strategy) for available values.|FULL|
 |systemTopicSchemaCompatibilityStrategy| The schema compatibility strategy is used for system topics, see [here](schema-evolution-compatibility.md#schema-compatibility-check-strategy) for available values.|ALWAYS_COMPATIBLE|
 |managedCursorInfoCompressionType | The compression type of managed cursor information. <br />Available options are `NONE`, `LZ4`, `ZLIB`, `ZSTD`, and `SNAPPY`). <br />If this value is `NONE`, managed cursor information is not compressed. | NONE
diff --git a/site2/docs/schema-evolution-compatibility.md b/site2/docs/schema-evolution-compatibility.md
index 881180711e5..dc9b11eb2a3 100644
--- a/site2/docs/schema-evolution-compatibility.md
+++ b/site2/docs/schema-evolution-compatibility.md
@@ -30,19 +30,26 @@ For more information, see [Schema compatibility check strategy](#schema-compatib
 
 ### How does Pulsar support schema evolution?
 
-1. When a producer/consumer/reader connects to a broker, the broker deploys the schema compatibility checker configured by `schemaRegistryCompatibilityCheckers` to enforce schema compatibility check. 
+The process of how Pulsar supports schema evolution is described as follows.
 
-   The schema compatibility checker is one instance per schema type. 
+1. The producer/consumer/reader sends the `SchemaInfo` of its client to brokers. 
    
-   Currently, Avro and JSON have their own compatibility checkers, while all the other schema types share the default compatibility checker which disables schema evolution.
-
-2. The producer/consumer/reader sends its client `SchemaInfo` to the broker. 
+2. Brokers recognize the schema type and deploy the schema compatibility checker `schemaRegistryCompatibilityCheckers` for that schema type to enforce the schema compatibility check. By default, the value of `schemaRegistryCompatibilityCheckers` in the `conf/broker.conf` or `conf/standalone.conf` file is as follows.
    
-3. The broker knows the schema type and locates the schema compatibility checker for that type. 
+   ```properties
+   schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
+   ```
+
+   :::note
+
+   Each schema type corresponds to one instance of schema compatibility checker. Currently, Avro, JSON, and Protobuf have their own compatibility checkers, while all the other schema types share the default compatibility checker which disables the schema evolution. In a word, schema evolution is only available in Avro, JSON, and Protobuf schema.
+
+   :::
+
+3. Brokers use the schema compatibility checker to check if the `SchemaInfo` is compatible with the latest schema of the topic by applying its [compatibility check strategy](#compatibility-check-strategy). Currently, the compatibility check strategy is configured at the namespace level and applied to all the topics within that namespace.
+
+For more details, see [`schemaRegistryCompatibilityCheckers`](https://github.com/apache/pulsar/blob/bf194b557c48e2d3246e44f1fc28876932d8ecb8/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java).
 
-4. The broker uses the checker to check if the `SchemaInfo` is compatible with the latest schema of the topic by applying its compatibility check strategy. 
-   
-   Currently, the compatibility check strategy is configured at the namespace level and applied to all the topics within that namespace.
 
 ## Schema compatibility check strategy
 
@@ -78,7 +85,7 @@ Suppose that you have a topic containing three schemas (V1, V2, and V3), V1 is t
 
   For example, for a user entity, there are `userCreated`, `userAddressChanged` and `userEnquiryReceived` events. The application requires that those events are always read in the same order. 
 
-  Consequently, those events need to go in the same Pulsar partition to maintain order. This application can use `ALWAYS_COMPATIBLE` to allow different kinds of events co-exist in the same topic.
+  Consequently, those events need to go in the same Pulsar partition to maintain order. This application can use `ALWAYS_COMPATIBLE` to allow different kinds of events to co-exist in the same topic.
 
 * Example 2
 
@@ -113,7 +120,7 @@ Suppose that you have a topic containing three schemas (V1, V2, and V3), V1 is t
   
   You want to load all Pulsar data into a Hive data warehouse and run SQL queries against the data. 
 
-  Same SQL queries must continue to work even the data is changed. To support it, you can evolve the schemas using the `BACKWARD` strategy.
+  Same SQL queries must continue to work even if the data is changed. To support it, you can evolve the schemas using the `BACKWARD` strategy.
 
 ### FORWARD and FORWARD_TRANSITIVE 
 
@@ -165,40 +172,41 @@ When a producer or a consumer tries to connect to a topic, a broker performs som
 
 ### Producer
 
-When a producer tries to connect to a topic (suppose ignore the schema auto creation), a broker does the following checks:
+When a producer tries to connect to a topic (suppose ignore the schema auto-creation), a broker does the following checks:
 
 * Check if the schema carried by the producer exists in the schema registry or not.
 
-  * If the schema is already registered, then the producer is connected to a broker and produce messages with that schema.
+  * If the schema is already registered, then the producer is connected to a broker and produces messages with that schema.
   
   * If the schema is not registered, then Pulsar verifies if the schema is allowed to be registered based on the configured compatibility check strategy.
   
 ### Consumer
+
 When a consumer tries to connect to a topic, a broker checks if a carried schema is compatible with a registered schema based on the configured schema compatibility check strategy.
 
-|  Compatibility check strategy  |   Check logic  | 
-| --- | --- |
-|  `ALWAYS_COMPATIBLE`  |   All pass  | 
-|  `ALWAYS_INCOMPATIBLE`  |   No pass  | 
-|  `BACKWARD`  |   Can read the last schema  | 
-|  `BACKWARD_TRANSITIVE`  |   Can read all schemas  | 
-|  `FORWARD`  |   Can read the last schema  | 
-|  `FORWARD_TRANSITIVE`  |   Can read the last schema  | 
-|  `FULL`  |   Can read the last schema  | 
-|  `FULL_TRANSITIVE`  |   Can read all schemas  | 
+| Compatibility check strategy | Check logic              |
+|------------------------------|--------------------------|
+| `ALWAYS_COMPATIBLE`          | All pass                 |
+| `ALWAYS_INCOMPATIBLE`        | No pass                  |
+| `BACKWARD`                   | Can read the last schema |
+| `BACKWARD_TRANSITIVE`        | Can read all schemas     |
+| `FORWARD`                    | Can read the last schema |
+| `FORWARD_TRANSITIVE`         | Can read the last schema |
+| `FULL`                       | Can read the last schema |
+| `FULL_TRANSITIVE`            | Can read all schemas     |
 
 ## Order of upgrading clients
 
 The order of upgrading client applications is determined by the compatibility check strategy.
 
-For example, the producers using schemas to write data to Pulsar and the consumers using schemas to read data from Pulsar. 
+For example, the producers use schemas to write data to Pulsar and the consumers use schemas to read data from Pulsar. 
 
 |  Compatibility check strategy  |   Upgrade first  | Description                                                                                                                                                                                                                                                                                                         | 
 | --- | --- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 |  `ALWAYS_COMPATIBLE`  |   Any order  | The compatibility check is disabled. Consequently, you can upgrade the producers and consumers in **any order**.                                                                                                                                                                                                    | 
 |  `ALWAYS_INCOMPATIBLE`  |   None  | The schema evolution is disabled.                                                                                                                                                                                                                                                                                   | 
 |  <li>`BACKWARD` </li><li>`BACKWARD_TRANSITIVE` </li> |   Consumers  | There is no guarantee that consumers using the old schema can read data produced using the new schema. Consequently, **upgrade all consumers first**, and then start producing new data.                                                                                                                            | 
-|  <li>`FORWARD` </li><li>`FORWARD_TRANSITIVE` </li> |   Producers  | There is no guarantee that consumers using the new schema can read data produced using the old schema. Consequently, **upgrade all producers first**<li>to use the new schema and ensure that the data already produced using the old schemas are not available to consumers, and then upgrade the consumers. </li> | 
+|  <li>`FORWARD` </li><li>`FORWARD_TRANSITIVE` </li> |   Producers  | There is no guarantee that consumers using the new schema can read data produced using the old schema. Consequently, **upgrade all producers first**<li>to use the new schema and ensure that the data already produced using the old schemas are not available to consumers, and then upgrades the consumers. </li> | 
 |  <li>`FULL` </li><li>`FULL_TRANSITIVE` </li> |   Any order  | It is guaranteed that consumers using the old schema can read data produced using the new schema and consumers using the new schema can read data produced using the old schema. Consequently, you can upgrade the producers and consumers in **any order**.                                                        |