You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/26 07:42:39 UTC

[GitHub] [pulsar] hnail opened a new pull request #8372: [WIP]ProtobufNative Schema Support

hnail opened a new pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372


    ###  **warnning**
   
   > this PR depend on [PR-8246](https://github.com/apache/pulsar/pull/8246) and haven't finish unit test case , please don't review or merge. 
   
   ### Motivation
   
   support Protobuf  Native Schema Support for  [PIP-Add ProtobufNative Schema Support ](https://docs.google.com/document/d/1XR_MNOuSXyig-CKsdVhr6IXvFwziBRdSoS3oEUiLFe8/edit?usp=sharing)
   
   #### set ProtobufNative schema Example
   ```
   SchemaDefinition def =  SchemaDefinition.<JsonGen.User>builder()
                   .withAlwaysAllowNull(true)
                   .withPojo(AdnMonitor.AdnJson.class)
                   .build();
   Schema schema = Schema.PROTOBUFNATIVE(def);
   admin.schemas().createSchema(topic, schema.getSchemaInfo());
   ```
   #### POJO schema consumer Example
   ```
   Consumer<AdnMonitor.AdnJson> consumer = client.newConsumer(Schema.PROTOBUFNATIVE(AdnMonitor.AdnJson.class)).topic(topic)
                   .subscriptionName("my-subscription-name").subscriptionInitialPosition(SubscriptionInitialPosition.Earliest).subscribe();
           while (!consumer.hasReachedEndOfTopic()){
               Message<AdnMonitor.AdnJson> msg = consumer.receive();
               AdnMonitor.AdnJson adnJson =  msg.getValue();
               System.out.println("getPlatform : "+ adnJson.getPlatform());
           }
   ```
   #### ProtobufNative AUTO_CONSUME  Example
   ```
    Consumer<GenericRecord> consumer = client.newConsumer(Schema.AUTO_CONSUME()).topic(topic)
                   .subscriptionName("my-subscription-nameaaa").subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)
           .subscribe();
           while (!consumer.hasReachedEndOfTopic()){
               Message<GenericRecord> msg = consumer.receive() ;
              // System.out.println("GenericRecord :"+msg.getValue().toString());
               GenericRecord genericRecord =  msg.getValue();
               System.out.println(genericRecord.getField("platform"));
               GenericProtobufNativeRecord genericProtobufNativeRecord = (GenericProtobufNativeRecord)genericRecord;
               DynamicMessage dynamicMessage = genericProtobufNativeRecord.getProtobufRecord();         
               String platform = dynamicMessage.getField(dynamicMessage.getDescriptorForType().findFieldByName("platform")).toString();
               System.out.println("platform : "+ platform);
           }
   ```
   
    ### Modifications
   
   //TODO
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-726666439


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215


   > I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet data from test, it works. What am I doing wrong while generating descriptors using protoc ?
   
   hello , happy for your take notice of this pull request .
   I test as your description with the two following test case :
   
   - FileDescriptorSet build by JAVA Proto Class  , same as [ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53)
   
   ```
   // buid FileDescriptorSet byte[] by Class  
   byte[] fileDescriptorSetBytesByClass = DescriptorProtos.FileDescriptorSet.newBuilder()
                   .addFile(Service.ServiceRequest.getDescriptor().getFile().toProto())
                   .build().toByteArray();  
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByClass);
   // print FileDescriptorSet string which buid by Class
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   - FileDescriptorSet build by protoc command , same as your description:
   ```
   // buid FileDescriptorSet byte[] by 'protoc --include_imports --descriptor_set_out Request.desc Request.proto
   byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new File("/Users/wangguowei/source" +
                   "/pulsar/dev/pulsar_test/src/main/resources/pulsar/Request.desc"));
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByProtoC);
   
   // print FileDescriptorSet string which buid by ProtoC
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   The two realize above works  for me : 
   - The serialized FileDescriptorSet byte[] is slightly different as your description, I think maybe java `proto class` lack of information compare with `proto file ` when compile java class .
   - Deserialize  FileDescriptorSet is by `DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in  two way above [ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90) . so , even though byte[] is slightly different , but at the same of working .
   - The current realize is reference google-doc :  [protobuf v3 Self-describing Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description)
   
   --- 
   so , I think the reason is `new ObjectMapper().writeValueAsBytes(schemaData);` which serialize byte[] to String , may be is 'UTF-8' or 'Base64' ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui merged pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
codelipenghui merged pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r521782662



##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java
##########
@@ -251,6 +251,27 @@ default void configureSchemaInfo(String topic, String componentName,
         return DefaultImplementation.newProtobufSchema(schemaDefinition);
     }
 
+    /**
+     * Create a Protobuf-Native schema type by extracting the fields of the specified class.
+     *
+     * @param clazz the Protobuf generated class to be used to extract the schema
+     * @return a Schema instance
+     */
+    static <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> PROTOBUFNATIVE(Class<T> clazz) {
+        return DefaultImplementation.newProtobufNativeSchema(SchemaDefinition.builder().withPojo(clazz).build());
+    }
+
+    /**
+     * Create a Protobuf-Native schema type with schema definition.
+     *
+     * @param schemaDefinition schemaDefinition the definition of the schema
+     * @return a Schema instance
+     */
+    static <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> PROTOBUFNATIVE(

Review comment:
       ```suggestion
       static <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> PROTOBUF_NATIVE(
   ```

##########
File path: pulsar-client-api/src/main/java/org/apache/pulsar/client/api/Schema.java
##########
@@ -251,6 +251,27 @@ default void configureSchemaInfo(String topic, String componentName,
         return DefaultImplementation.newProtobufSchema(schemaDefinition);
     }
 
+    /**
+     * Create a Protobuf-Native schema type by extracting the fields of the specified class.
+     *
+     * @param clazz the Protobuf generated class to be used to extract the schema
+     * @return a Schema instance
+     */
+    static <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> PROTOBUFNATIVE(Class<T> clazz) {

Review comment:
       ```suggestion
       static <T extends com.google.protobuf.GeneratedMessageV3> Schema<T> PROTOBUF_NATIVE(Class<T> clazz) {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r521364446



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java
##########
@@ -0,0 +1,139 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.client.impl.schema;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pulsar.client.api.SchemaSerializationException;
+import org.apache.pulsar.common.protocol.schema.ProtobufNativeSchemaData;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static com.google.protobuf.DescriptorProtos.FileDescriptorProto;
+import static com.google.protobuf.DescriptorProtos.FileDescriptorSet;
+import static com.google.protobuf.Descriptors.*;

Review comment:
       Avoid use import.* here

##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/generic/GenericProtobufNativeSchema.java
##########
@@ -0,0 +1,104 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.client.impl.schema.generic;
+
+import com.google.protobuf.Descriptors;
+import lombok.extern.slf4j.Slf4j;
+import org.apache.pulsar.client.api.schema.*;

Review comment:
       Avoid use import.* here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-717799350


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
sijie commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-717370034


   @hnail Can you send the PIP of adding native protobuf support to dev@ mailing list? So the community can review the PIP and we can add the PIP to Pulsar's wiki page.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-806391298


   @fjod  Hi , Your `rootMessageTypeName` is incorrect , set to `nativeCheck.XXXMessage` , rootMessageTypeName must be **full path** , `nativeCheck` is your package path .
   
   Simpler Test code :
   ```
   String schemaDataBytes = "{\"fileDescriptorSet" +
                   "\":\"ClMKClRlc3QucHJvdG8SC25hdGl2ZUNoZWNrIjAKClhYWE1lc3NhZ2USEAoDZm9vGAEgASgJUgNmb28SEAoDYmFyGAIgASgBUgNiYXJiBnByb3RvMw==\",\"rootMessageTypeName\":\"nativeCheck.XXXMessage\",\"rootFileDescriptorName\":\"Test.proto\"}";
           
   Descriptors.Descriptor descriptor = ProtobufNativeSchemaUtils.deserialize(schemaDataBytes.getBytes());
   
   System.out.println(descriptor);
   ```
   It is works fine .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail edited a comment on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail edited a comment on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805769267


   > I found the culprit here, I deleted important information from source .proto file like **java_package** and **java_outer_classname**, and protoc was generating different info which apache pulsar was not able to use.
   
   I recommend  you can use  JAVA API to create Schema like this : 
   ```
   SchemaDefinition def =  SchemaDefinition.<ProtoMsg>builder()
                   .withPojo(ProtoMsg.class)
                   .build();
   Schema schema = Schema.PROTOBUF_NATIVE(def);
   admin.schemas().createSchema(topic, schema.getSchemaInfo());
   ```
   The **java_package** and **java_outer_classname** restrictions which you mentioned may need check in other language client when create schema by POJO API , be appreciate if your have time to help pull a issue to describe  the problem for future optimization .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail edited a comment on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail edited a comment on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215


   > I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet data from test, it works. What am I doing wrong while generating descriptors using protoc ?
   
   hello , happy for your take notice of this pull request .
   I test as your description with the two following test case :
   
   - FileDescriptorSet build by JAVA Proto Class  , same as [ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53)
   
   ```
   // buid FileDescriptorSet byte[] by Class  
   byte[] fileDescriptorSetBytesByClass = DescriptorProtos.FileDescriptorSet.newBuilder()
                   .addFile(Service.ServiceRequest.getDescriptor().getFile().toProto())
                   .build().toByteArray();  
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByClass);
   
   // print FileDescriptorSet string which buid by Class
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   - FileDescriptorSet build by protoc command , same as your description:
   ```
   // buid FileDescriptorSet byte[] by 'protoc --include_imports --descriptor_set_out Request.desc Request.proto
   byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new File("Request.desc"));
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByProtoC);
   
   // print FileDescriptorSet string which buid by ProtoC
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   The two realize above works  for me : 
   - The serialized FileDescriptorSet byte[] is slightly different as your description, I think maybe java `proto class` lack of information compare with `proto file ` when compile java class .
   - Deserialize  FileDescriptorSet is by `DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in  two way above [ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90) . so , even though byte[] is slightly different , but at the same of working .
   - The current realize is reference google-doc :  [protobuf v3 Self-describing Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description)
   
   --- 
   so , I think the reason is `new ObjectMapper().writeValueAsBytes(schemaData);` which serialize byte[] to String , may be is 'UTF-8' or 'Base64' ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-723872167


   @sijie @codelipenghui Has there been progresses on this review ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
sijie commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r521185099



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufNativeSchemaCompatibilityCheck.java
##########
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.broker.service.schema;
+
+import org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException;
+import org.apache.pulsar.client.impl.schema.ProtobufNativeSchemaUtils;
+import org.apache.pulsar.common.policies.data.SchemaCompatibilityStrategy;
+import org.apache.pulsar.common.protocol.schema.SchemaData;
+import org.apache.pulsar.common.schema.SchemaType;
+
+import static com.google.protobuf.Descriptors.Descriptor;
+
+/**
+ * The {@link SchemaCompatibilityCheck} implementation for {@link SchemaType#PROTOBUF_NATIVE}.
+ */
+public class ProtobufNativeSchemaCompatibilityCheck implements SchemaCompatibilityCheck {
+
+    @Override
+    public SchemaType getSchemaType() {
+        return SchemaType.PROTOBUF_NATIVE;
+    }
+
+    @Override
+    public void checkCompatible(SchemaData from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        Descriptor fromDescriptor = ProtobufNativeSchemaUtils.deserialize(from.getData());
+        Descriptor toDescriptor = ProtobufNativeSchemaUtils.deserialize(to.getData());
+        switch (strategy) {
+            case BACKWARD_TRANSITIVE:
+            case BACKWARD:
+            case FORWARD_TRANSITIVE:
+            case FORWARD:
+            case FULL_TRANSITIVE:
+            case FULL:
+                CheckRootRootMessageChange(fromDescriptor, toDescriptor, strategy);
+                return;
+            case ALWAYS_COMPATIBLE:
+                return;
+            default:
+                throw new IncompatibleSchemaException("Unknown SchemaCompatibilityStrategy");
+        }
+    }
+
+    @Override
+    public void checkCompatible(Iterable<SchemaData> from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        for (SchemaData schemaData : from) {
+            checkCompatible(schemaData, to, strategy);
+        }
+    }
+
+    private void CheckRootRootMessageChange(Descriptor fromDescriptor, Descriptor toDescriptor, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {

Review comment:
       why there is "RootRoot" in the method name `CheckRootRootMessageChange`?
   
   Also, we need to make the first character low case.

##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/ProtobufNativeSchemaDataValidator.java
##########
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.broker.service.schema.validator;
+
+import com.google.protobuf.Descriptors;
+import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
+import org.apache.pulsar.client.impl.schema.ProtobufNativeSchemaUtils;
+import org.apache.pulsar.common.protocol.schema.SchemaData;
+
+public class ProtobufNativeSchemaDataValidator implements SchemaDataValidator {
+
+    @Override
+    public void validate(SchemaData schemaData) throws InvalidSchemaDataException {
+        Descriptors.Descriptor descriptor;
+        try {
+            descriptor = ProtobufNativeSchemaUtils.deserialize(schemaData.getData());
+        } catch (Exception e) {
+            throw new InvalidSchemaDataException("deserialize  ProtobufNative Schema failed", e);

Review comment:
       ```suggestion
               throw new InvalidSchemaDataException("deserialize ProtobufNative Schema failed", e);
   ```

##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/ProtobufNativeSchemaDataValidator.java
##########
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.broker.service.schema.validator;
+
+import com.google.protobuf.Descriptors;
+import org.apache.pulsar.broker.service.schema.exceptions.InvalidSchemaDataException;
+import org.apache.pulsar.client.impl.schema.ProtobufNativeSchemaUtils;
+import org.apache.pulsar.common.protocol.schema.SchemaData;
+
+public class ProtobufNativeSchemaDataValidator implements SchemaDataValidator {
+
+    @Override
+    public void validate(SchemaData schemaData) throws InvalidSchemaDataException {
+        Descriptors.Descriptor descriptor;
+        try {
+            descriptor = ProtobufNativeSchemaUtils.deserialize(schemaData.getData());
+        } catch (Exception e) {
+            throw new InvalidSchemaDataException("deserialize  ProtobufNative Schema failed", e);
+        }
+        if (descriptor == null) {
+            throw new InvalidSchemaDataException("protobuf root message Descriptor is null , please recheck rootMessageTypeName or rootFileDescriptorName conf. ");

Review comment:
       ```suggestion
               throw new InvalidSchemaDataException("protobuf root message descriptor is null , please recheck rootMessageTypeName or rootFileDescriptorName conf. ");
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r522701791



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufNativeSchemaCompatibilityCheck.java
##########
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.broker.service.schema;
+
+import org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException;
+import org.apache.pulsar.client.impl.schema.ProtobufNativeSchemaUtils;
+import org.apache.pulsar.common.policies.data.SchemaCompatibilityStrategy;
+import org.apache.pulsar.common.protocol.schema.SchemaData;
+import org.apache.pulsar.common.schema.SchemaType;
+
+import static com.google.protobuf.Descriptors.Descriptor;
+
+/**
+ * The {@link SchemaCompatibilityCheck} implementation for {@link SchemaType#PROTOBUF_NATIVE}.
+ */
+public class ProtobufNativeSchemaCompatibilityCheck implements SchemaCompatibilityCheck {
+
+    @Override
+    public SchemaType getSchemaType() {
+        return SchemaType.PROTOBUF_NATIVE;
+    }
+
+    @Override
+    public void checkCompatible(SchemaData from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        Descriptor fromDescriptor = ProtobufNativeSchemaUtils.deserialize(from.getData());
+        Descriptor toDescriptor = ProtobufNativeSchemaUtils.deserialize(to.getData());
+        switch (strategy) {
+            case BACKWARD_TRANSITIVE:
+            case BACKWARD:
+            case FORWARD_TRANSITIVE:
+            case FORWARD:
+            case FULL_TRANSITIVE:
+            case FULL:
+                CheckRootRootMessageChange(fromDescriptor, toDescriptor, strategy);
+                return;
+            case ALWAYS_COMPATIBLE:
+                return;
+            default:
+                throw new IncompatibleSchemaException("Unknown SchemaCompatibilityStrategy");
+        }
+    }
+
+    @Override
+    public void checkCompatible(Iterable<SchemaData> from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        for (SchemaData schemaData : from) {
+            checkCompatible(schemaData, to, strategy);
+        }
+    }
+
+    private void CheckRootRootMessageChange(Descriptor fromDescriptor, Descriptor toDescriptor, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {

Review comment:
       @sijie thanks for the reminders , It was my carelessness, this shoud be `CheckRootMessageChange ` , I will recheck my code details carefully.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] fjod commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
fjod commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-806366160


   @hnail I think some bug in deserialize method of ProtobufNativeSchemaUtils. I generated descriptor using probuf-net library, here it is **CgpUZXN0LnByb3RvEgVwcm90byImCgpYWFhNZXNzYWdlEgsKA2ZvbxgBIAEoCRILCgNiYXIYAiABKAFCD6oCDG5hdGl2ZUNoZWNrMmIGcHJvdG8z** .  If you check it on https://protogen.marcgravell.com/decode , it decodes fine. Here is json file with some additional fields:
   
   {"fileDescriptorSet":"ClMKClRlc3QucHJvdG8SC25hdGl2ZUNoZWNrIjAKClhYWE1lc3NhZ2USEAoDZm9vGAEgASgJUgNmb28SEAoDYmFyGAIgASgBUgNiYXJiBnByb3RvMw==","rootMessageTypeName":"proto.XXXMessage","rootFileDescriptorName":"Test.proto"}
   
   When I try to send it to apache-pulsar server, I get null-pointer exception on line 104 at  https://github.com/apache/pulsar/blob/2a1828ca08f096a94b696737e93e84a2bc60a8cc/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L104
   
   I created sample java app copy-pasting code from ProtobufNativeSchemaUtils. Here is simpler version of deserializer:
   
   ```
            File file;
            Descriptors.Descriptor descriptor;
            file = new File("D:\\serialized");
            byte[] fileContent = Files.readAllBytes(file.toPath());
            ProtobufNativeSchemaData schemaData = new ObjectMapper().readValue(fileContent, ProtobufNativeSchemaData.class);
            FileDescriptorSet fileDescriptorSet = FileDescriptorSet.parseFrom(schemaData.fileDescriptorSet);   
           Map<String, FileDescriptorProto> fileDescriptorProtoCache = new HashMap<>();
           Map<String, Descriptors.FileDescriptor> fileDescriptorCache = new HashMap<>();
           fileDescriptorSet.getFileList().forEach(fileDescriptorProto -> fileDescriptorProtoCache.put(fileDescriptorProto.getName(), fileDescriptorProto));        
           FileDescriptorProto rootFileDescriptorProto = fileDescriptorProtoCache.get("Test.proto");
           deserializeFileDescriptor(rootFileDescriptorProto, fileDescriptorCache, fileDescriptorProtoCache);
           Descriptors.FileDescriptor fileDescriptor = fileDescriptorCache.get("Test.proto");      
           System.out.print(fileDescriptor.toProto());
   ```
   
   output is:
   
   ```
   name: "Test.proto"
   package: "nativeCheck"
   message_type {
     name: "XXXMessage"
     field {
       name: "foo"
       number: 1
       label: LABEL_OPTIONAL
       type: TYPE_STRING
       json_name: "foo"
     }
     field {
       name: "bar"
       number: 2
       label: LABEL_OPTIONAL
       type: TYPE_DOUBLE
       json_name: "bar"
     }
   }
   syntax: "proto3"
   ```
   
   **So Descriptor is fine**! But if I try to use code from the repo:
   
   ```
           Descriptors.Descriptor descriptor2;
           Descriptors.FileDescriptor fileDescriptor2 = fileDescriptorCache.get(schemaData.rootMessageTypeName);
           String package1 = fileDescriptor2.getPackage();
                   //trim package
           String[] paths = StringUtils.removeFirst(schemaData.rootMessageTypeName, package1).replaceFirst("\\.", "").split("\\.");
           //extract root message
           descriptor2 = fileDescriptor2.findMessageTypeByName(paths[0]);
           //extract nested message
           for (int i = 1; i < paths.length; i++) {
               descriptor2 = descriptor2.findNestedTypeByName(paths[i]);
           }
           System.out.print(descriptor2.toProto());
   ```
   
   I get nullpointer exception on line with getPackage(); - similar to apache-pulsar exception. 
   ![image](https://user-images.githubusercontent.com/7898721/112421589-5fd2b580-8d40-11eb-9e5a-07c7b1854dd7.png)
   I guess our generated descriptor does not contain such "package" and fails therefore? It would be great if we can use descriptor from our client too ;)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r522814157



##########
File path: pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java
##########
@@ -0,0 +1,139 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.client.impl.schema;
+
+import com.fasterxml.jackson.databind.ObjectMapper;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.pulsar.client.api.SchemaSerializationException;
+import org.apache.pulsar.common.protocol.schema.ProtobufNativeSchemaData;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static com.google.protobuf.DescriptorProtos.FileDescriptorProto;
+import static com.google.protobuf.DescriptorProtos.FileDescriptorSet;
+import static com.google.protobuf.Descriptors.*;

Review comment:
       @codelipenghui thanks for review , fixed as Suggestion。




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on a change in pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on a change in pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#discussion_r522705833



##########
File path: pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufNativeSchemaCompatibilityCheck.java
##########
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pulsar.broker.service.schema;
+
+import org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException;
+import org.apache.pulsar.client.impl.schema.ProtobufNativeSchemaUtils;
+import org.apache.pulsar.common.policies.data.SchemaCompatibilityStrategy;
+import org.apache.pulsar.common.protocol.schema.SchemaData;
+import org.apache.pulsar.common.schema.SchemaType;
+
+import static com.google.protobuf.Descriptors.Descriptor;
+
+/**
+ * The {@link SchemaCompatibilityCheck} implementation for {@link SchemaType#PROTOBUF_NATIVE}.
+ */
+public class ProtobufNativeSchemaCompatibilityCheck implements SchemaCompatibilityCheck {
+
+    @Override
+    public SchemaType getSchemaType() {
+        return SchemaType.PROTOBUF_NATIVE;
+    }
+
+    @Override
+    public void checkCompatible(SchemaData from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        Descriptor fromDescriptor = ProtobufNativeSchemaUtils.deserialize(from.getData());
+        Descriptor toDescriptor = ProtobufNativeSchemaUtils.deserialize(to.getData());
+        switch (strategy) {
+            case BACKWARD_TRANSITIVE:
+            case BACKWARD:
+            case FORWARD_TRANSITIVE:
+            case FORWARD:
+            case FULL_TRANSITIVE:
+            case FULL:
+                CheckRootRootMessageChange(fromDescriptor, toDescriptor, strategy);
+                return;
+            case ALWAYS_COMPATIBLE:
+                return;
+            default:
+                throw new IncompatibleSchemaException("Unknown SchemaCompatibilityStrategy");
+        }
+    }
+
+    @Override
+    public void checkCompatible(Iterable<SchemaData> from, SchemaData to, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {
+        for (SchemaData schemaData : from) {
+            checkCompatible(schemaData, to, strategy);
+        }
+    }
+
+    private void CheckRootRootMessageChange(Descriptor fromDescriptor, Descriptor toDescriptor, SchemaCompatibilityStrategy strategy) throws IncompatibleSchemaException {

Review comment:
       @sijie thanks for the reminders , it due to my carelessness . this shoud be `CheckRootMessageChange `, I may need recheck my code details carefully




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-717757978


   > @hnail Can you send the PIP of adding native protobuf support to dev@ mailing list? So the community can review the PIP and we can add the PIP to Pulsar's wiki page.
   
   send <[DISCUSS] PIP-Add ProtobufNative Schema Support> to dev@pulsar.apache.org   done 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
sijie commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-725276685


   @hnail overall the change looks really good. I made a few comments. PTAL.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail edited a comment on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail edited a comment on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215


   > I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet data from test, it works. What am I doing wrong while generating descriptors using protoc ?
   
   hello , happy for your take notice of this pull request .
   I test as your description with the two following test case :
   
   - FileDescriptorSet build by JAVA Proto Class  , same as [ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53)
   
   ```
   // buid FileDescriptorSet byte[] by Class  
   byte[] fileDescriptorSetBytesByClass = DescriptorProtos.FileDescriptorSet.newBuilder()
                   .addFile(Service.ServiceRequest.getDescriptor().getFile().toProto())
                   .build().toByteArray();  
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByClass);
   // print FileDescriptorSet string which buid by Class
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   - FileDescriptorSet build by protoc command , same as your description:
   ```
   // buid FileDescriptorSet byte[] by 'protoc --include_imports --descriptor_set_out Request.desc Request.proto
   byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new File("Request.desc"));
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByProtoC);
   
   // print FileDescriptorSet string which buid by ProtoC
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   The two realize above works  for me : 
   - The serialized FileDescriptorSet byte[] is slightly different as your description, I think maybe java `proto class` lack of information compare with `proto file ` when compile java class .
   - Deserialize  FileDescriptorSet is by `DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in  two way above [ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90) . so , even though byte[] is slightly different , but at the same of working .
   - The current realize is reference google-doc :  [protobuf v3 Self-describing Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description)
   
   --- 
   so , I think the reason is `new ObjectMapper().writeValueAsBytes(schemaData);` which serialize byte[] to String , may be is 'UTF-8' or 'Base64' ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-725434610


   @congbobo184 Could you please help review this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] fjod commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
fjod commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-804900649


   I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java  file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet  content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet  generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet   data from test, it works. What am I doing wrong while generating descriptors using protoc ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail edited a comment on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail edited a comment on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-806391298


   @fjod  Hi , Your `rootMessageTypeName` is incorrect , change to `nativeCheck.XXXMessage` , rootMessageTypeName must be **full path** , `nativeCheck` is your package path .
   
   Simpler Test code :
   ```
   String schemaDataBytes = "{\"fileDescriptorSet" +
                   "\":\"ClMKClRlc3QucHJvdG8SC25hdGl2ZUNoZWNrIjAKClhYWE1lc3NhZ2USEAoDZm9vGAEgASgJUgNmb28SEAoDYmFyGAIgASgBUgNiYXJiBnByb3RvMw==\",\"rootMessageTypeName\":\"nativeCheck.XXXMessage\",\"rootFileDescriptorName\":\"Test.proto\"}";
           
   Descriptors.Descriptor descriptor = ProtobufNativeSchemaUtils.deserialize(schemaDataBytes.getBytes());
   
   System.out.println(descriptor);
   ```
   It is works fine .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] fjod commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
fjod commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805791109


   
   > The **java_package** and **java_outer_classname** restrictions which you mentioned may need check in other language client when create schema by POJO API , be appreciate if your have time to help pull a issue to describe the problem for future optimization .
   
   I created issue here about csharp/java mismatch in generating Descriptors
   
   https://github.com/protocolbuffers/protobuf/issues/8425


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail edited a comment on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail edited a comment on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805632215


   > I have a question about FileDescriptorSet field. In https://github.com/apache/pulsar/blob/c01b1eeda3221bdbf863bf0f3f8373e93d90adef/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaTest.java file there is an example test class. It is generated from Test.proto and ExternalMessage.proto files. The problem is that, no matter what I try with protoc , I can not get same FileDescriptorSet content. It's always slightly different. I tried csharp, java, cpp output, with or without --descriptor_set_out - I always get different byte array (and base64 string). And I can not create producer/consumer on apache pulsar client with FileDescriptorSet generated by me, I get com.google.protobuf.InvalidProtocolBufferException.invalidEndTag exception which as far as I googled tells that data is corrupted. When I try to create producer with FileDescriptorSet data from test, it works. What am I doing wrong while generating descriptors using protoc ?
   
   hello , happy for your take notice of this pull request .
   I test as your description with the two following test case :
   
   - FileDescriptorSet build by JAVA Proto Class  , same as [ProtobufNativeSchemaUtils#serialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L53)
   
   ```
   // buid FileDescriptorSet byte[] by Class  
   byte[] fileDescriptorSetBytesByClass = DescriptorProtos.FileDescriptorSet.newBuilder()
                   .addFile(Service.ServiceRequest.getDescriptor().getFile().toProto())
                   .build().toByteArray();  
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByClass = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByClass);
   
   // print FileDescriptorSet string which buid by Class
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   - FileDescriptorSet build by protoc command , same as your description:
   ```
   // buid FileDescriptorSet byte[] by 'protoc --include_imports --descriptor_set_out Request.desc Request.proto
   byte[] fileDescriptorSetBytesByProtoC = FileUtils.readFileToByteArray(new File("Request.desc"));
   
   DescriptorProtos.FileDescriptorSet fileDescriptorSetByProtoC = DescriptorProtos.FileDescriptorSet
                   .parseFrom(fileDescriptorSetBytesByProtoC);
   
   // print FileDescriptorSet string which buid by ProtoC
   System.out.println(new String(fileDescriptorSetByClass.toBuilder().build().toByteArray(),"utf-8"));
   ```
   The two realize above works  for me : 
   - The serialized FileDescriptorSet byte[] is slightly different as your description, I think maybe java `proto class` lack of information compare with `proto file ` when compile java class .
   - Deserialize  FileDescriptorSet is by `DescriptorProtos.FileDescriptorSet.parseFrom(bytes[])` in  two way above [ProtobufNativeSchemaUtils#deserialize](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L90) . so , even though byte[] is slightly different , but at the same of working .
   - The current realize is reference google-doc :  [protobuf v3 Self-describing Messages](https://developers.google.com/protocol-buffers/docs/techniques#self-description)
   
   --- 
   so , I think the reason is [new ObjectMapper().writeValueAsBytes(schemaData);](https://github.com/hnail/pulsar/blob/2597f2c6287783ba285737a28cb39cf3e058aa37/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufNativeSchemaUtils.java#L58) which serialize byte[] to String , may be is 'UTF-8' or 'Base64' ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hnail commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
hnail commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805769267


   > I found the culprit here, I deleted important information from source .proto file like **java_package** and **java_outer_classname**, and protoc was generating different info which apache pulsar was not able to use.
   
   I recommend  you can use  JAVA API to create Schema like this : 
   ```
   SchemaDefinition def =  SchemaDefinition.<ProtoMsg>builder()
                   .withPojo(ProtoMsg.class)
                   .build();
   Schema schema = Schema.PROTOBUF_NATIVE(def);
   admin.schemas().createSchema(topic, schema.getSchemaInfo());
   ```
   The **java_package** and **java_outer_classname** restrictions which you mentioned may need check in other language client when create schema by API , be appreciate if your have time to help pull a issue to describe  the problem for future optimization .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] fjod commented on pull request #8372: [schemaregistry]ProtobufNative Schema Support

Posted by GitBox <gi...@apache.org>.
fjod commented on pull request #8372:
URL: https://github.com/apache/pulsar/pull/8372#issuecomment-805754608


   I found the culprit here, I deleted important information from source .proto file like **java_package**  and **java_outer_classname**, and protoc was generating different info which apache pulsar was not able to use.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org