You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/24 11:26:38 UTC

[GitHub] [druid] spinatelli opened a new pull request #10314: Add config and header support for confluent schema registry.

spinatelli opened a new pull request #10314:
URL: https://github.com/apache/druid/pull/10314


   (porting code from https://github.com/apache/druid/pull/9096)
   
   Fixes #8806.
   
   ### Description
   This is an update to the code in PR 9096 (https://github.com/apache/druid/pull/9096), since it's stale. I report the original description and the changes I made
   
   Enhancing schema registry client i.e. SchemaRegistryBasedAvroBytesDecoder to accept additional configs, headers and able to query schemas from multi schema registry instances.
   
   - Changed SchemaRegistryBasedAvroBytesDecoder to accept config and headers as JSON properties.
   - Added support to pass multiple URLs for multiple schema registry instances by passing them as an array
   - Upgraded Confluent version to 5.5.1, which is compatible with existing kafka version: 2.6.0
   - Changed deprecated method getByID to getById.
   
   <hr>
   
   **Key changed/added classes in this PR**
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added documentation for new or modified features or behaviors.
   - [x] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `SchemaRegistryBasedAvroBytesDecoder`
    * `SchemaRegistryBasedAvroBytesDecoderTest`
    * Updated libraries
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] spinatelli commented on a change in pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
spinatelli commented on a change in pull request #10314:
URL: https://github.com/apache/druid/pull/10314#discussion_r583650673



##########
File path: extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java
##########
@@ -63,7 +73,7 @@ public GenericRecord parse(ByteBuffer bytes)
       int id = bytes.getInt(); // extract schema registry id
       int length = bytes.limit() - 1 - 4;
       int offset = bytes.position() + bytes.arrayOffset();
-      Schema schema = registry.getByID(id);
+      Schema schema = registry.getById(id);

Review comment:
       Updated to use getSchemaById, which is anyway overridden by the CachedSchemaRegistryClient implementation that is being used here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #10314:
URL: https://github.com/apache/druid/pull/10314


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10314:
URL: https://github.com/apache/druid/pull/10314#discussion_r533817780



##########
File path: extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java
##########
@@ -63,7 +73,7 @@ public GenericRecord parse(ByteBuffer bytes)
       int id = bytes.getInt(); // extract schema registry id
       int length = bytes.limit() - 1 - 4;
       int offset = bytes.position() + bytes.arrayOffset();
-      Schema schema = registry.getByID(id);
+      Schema schema = registry.getById(id);

Review comment:
       it seems like this method is deprecated as well, but it doesn't really seem like there is a better replacement since the underlying method is doing this
   ```
       ParsedSchema schema = this.getSchemaById(id);
       return schema instanceof AvroSchema ? ((AvroSchema)schema).rawSchema() : null;
   ```

##########
File path: docs/ingestion/data-formats.md
##########
@@ -1020,6 +1026,27 @@ For details, see the Schema Registry [documentation](http://docs.confluent.io/cu
 ...
 ```
 
+Multiple Instances:
+```json
+...
+"avroBytesDecoder" : {
+   "type"   : "schema_registry",
+   "urls"   : [<schema-registry-url-1>, <schema-registry-url-2>, ...],
+   "config" : {
+               "schema.registry.basic.auth.credentials.source" : "USER_INFO",
+               "schema.registry.basic.auth.user.info"          : "fred:letmein",
+               ... 
+              },
+   "headers": {
+               "traceID"   : "b29c5de2-0db4-490b-b421",
+               "timeStamp" : "1577191871865",
+               ...
+               
+             }

Review comment:
       The basic auth example I don't think works anymore, looking at the code underneath it looks like the `schema.registry` prefix is only for the ssl config
   ```
       if (configs != null && !configs.isEmpty()) {
         restService.configure(configs);
         Map<String, Object> sslConfigs = (Map)configs.entrySet().stream().filter((e) -> {
           return ((String)e.getKey()).startsWith("schema.registry.");
         }).collect(Collectors.toMap((e) -> {
           return ((String)e.getKey()).substring("schema.registry.".length());
         }, Entry::getValue));
         SslFactory sslFactory = new SslFactory(sslConfigs);
         if (sslFactory != null && sslFactory.sslContext() != null) {
           restService.setSslSocketFactory(sslFactory.sslContext().getSocketFactory());
         }
       }
   ```
   where `restService` is not considering the prefix:
   ```
       String basicCredentialsSource = (String)configs.get("basic.auth.credentials.source");
       String bearerCredentialsSource = (String)configs.get("bearer.auth.credentials.source");
   ...
   ```
   After reading through https://docs.confluent.io/platform/current/schema-registry/security/index.html#additional-configurations-for-https, should we also include the SSL config since it seems relevant if connecting to https?
   
   Also, formatting seems a bit off here compared to the other JSON example:
   ```suggestion
      "type" : "schema_registry",
      "urls" : [<schema-registry-url-1>, <schema-registry-url-2>, ...],
      "config" : {
           "basic.auth.credentials.source": "USER_INFO",
           "basic.auth.user.info": "fred:letmein",
           "schema.registry.ssl.truststore.location": "/some/secrets/kafka.client.truststore.jks",
           "schema.registry.ssl.truststore.password": "<password>",
           "schema.registry.ssl.keystore.location": "/some/secrets/kafka.client.keystore.jks",
           "schema.registry.ssl.keystore.password": "<password>",
           "schema.registry.ssl.key.password": "<password>"
          ... 
      },
      "headers": {
          "traceID" : "b29c5de2-0db4-490b-b421",
          "timeStamp" : "1577191871865",
          ...
       }
   ```
   

##########
File path: licenses.yaml
##########
@@ -3081,13 +3081,105 @@ notices:
 ---
 
 name: Kafka Schema Registry Client
-version: 3.0.1
+version: 5.5.1
 license_category: binary
 module: extensions/druid-avro-extensions
 license_name: Apache License version 2.0
 libraries:
   - io.confluent: kafka-schema-registry-client
+  - io.confluent: common-config
+  - io.confluent: common-utils
+
+---
+
+name: com.101tec zkclient
+license_category: binary
+version: '0.10'
+module: druid-ranger-security

Review comment:
       nit: did this get tagged with an incorrect module or was it just missing from the file? (it doesn't seem like there were any changes to that extension)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] spinatelli commented on a change in pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
spinatelli commented on a change in pull request #10314:
URL: https://github.com/apache/druid/pull/10314#discussion_r583650673



##########
File path: extensions-core/avro-extensions/src/main/java/org/apache/druid/data/input/avro/SchemaRegistryBasedAvroBytesDecoder.java
##########
@@ -63,7 +73,7 @@ public GenericRecord parse(ByteBuffer bytes)
       int id = bytes.getInt(); // extract schema registry id
       int length = bytes.limit() - 1 - 4;
       int offset = bytes.position() + bytes.arrayOffset();
-      Schema schema = registry.getByID(id);
+      Schema schema = registry.getById(id);

Review comment:
       Updated to use getSchemaById, which is anyway overridden by the CachedSchemaRegistryClient implementation that is being used here.
   
   EDIT: actually just used that piece of code, it makes more sense I think, in case getById() gets changed in the future




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] coldmav commented on pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
coldmav commented on pull request #10314:
URL: https://github.com/apache/druid/pull/10314#issuecomment-788983700


   Hello all,
   Any chance on adding the usage of a Password Provider to encapsulate the passwords, just like it is done on the Kafka Consumer properties (https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html#kafkasupervisorioconfig) ?
   Thank you all!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #10314:
URL: https://github.com/apache/druid/pull/10314#discussion_r533993972



##########
File path: docs/ingestion/data-formats.md
##########
@@ -1020,6 +1026,27 @@ For details, see the Schema Registry [documentation](http://docs.confluent.io/cu
 ...
 ```
 
+Multiple Instances:
+```json
+...
+"avroBytesDecoder" : {
+   "type"   : "schema_registry",
+   "urls"   : [<schema-registry-url-1>, <schema-registry-url-2>, ...],
+   "config" : {
+               "schema.registry.basic.auth.credentials.source" : "USER_INFO",
+               "schema.registry.basic.auth.user.info"          : "fred:letmein",
+               ... 
+              },
+   "headers": {
+               "traceID"   : "b29c5de2-0db4-490b-b421",
+               "timeStamp" : "1577191871865",
+               ...
+               
+             }

Review comment:
       Also since this config section contains potentially sensitive information, longer term it might make sense to use something like what is suggested in #10309 (though obviously not in this PR)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] spinatelli commented on a change in pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
spinatelli commented on a change in pull request #10314:
URL: https://github.com/apache/druid/pull/10314#discussion_r583650370



##########
File path: licenses.yaml
##########
@@ -3081,13 +3081,105 @@ notices:
 ---
 
 name: Kafka Schema Registry Client
-version: 3.0.1
+version: 5.5.1
 license_category: binary
 module: extensions/druid-avro-extensions
 license_name: Apache License version 2.0
 libraries:
   - io.confluent: kafka-schema-registry-client
+  - io.confluent: common-config
+  - io.confluent: common-utils
+
+---
+
+name: com.101tec zkclient
+license_category: binary
+version: '0.10'
+module: druid-ranger-security

Review comment:
       Removed it, it was a duplicate




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] spinatelli commented on pull request #10314: Add config and header support for confluent schema registry.

Posted by GitBox <gi...@apache.org>.
spinatelli commented on pull request #10314:
URL: https://github.com/apache/druid/pull/10314#issuecomment-789045456


   Yea something like that would be the next step, I might look into it in the next week


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org