You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/13 08:52:19 UTC

[GitHub] [hudi] chrischnweiss opened a new issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

chrischnweiss opened a new issue #4585:
URL: https://github.com/apache/hudi/issues/4585


   Hi guys,
   
   we ran into a problem setting the target schema of our Hudi table using the MultiTableDeltaStreamer.
   
   Using a normal DeltaStreamer, we are able to set our source and target schemas using the properties:
   
   - hoodie.deltastreamer.schemaprovider.registry.url
   - hoodie.deltastreamer.schemaprovider.registry.targetUrl
   
   We found that we are not able to set these properties on a table basis using the MultiTableDeltaStreamer, since the MTDS builds SchemaRegistry URLs for target and source schema using the properties:
   
   - hoodie.deltastreamer.schemaprovider.registry.baseUrl
   - hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix
   - hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix
   
   Later the MultiTableDeltaStreamer uses the source Kafka Topic name also for setting the name of the target schema:
   
   https://github.com/apache/hudi/blob/9fe28e56b49c7bf68ae2d83bfe89755314aa793b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L167
   
   We think, that schema names should be more configurable, like the origin DeltaStreamer would handle it. Actually the names of the schemas you want to use for reading or writing the data are very tight coupled to the name of the Kafka topic the data is loaded from.
   
   What did you think?
   
   Cheers,
   Christian 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1013346554


   Hey @chrischnweiss : I inspected the code and I feel we could configure per table schema providers. 
   ```
    String schemaRegistryBaseUrl = typedProperties.getString(Constants.SCHEMA_REGISTRY_BASE_URL_PROP);
         String schemaRegistrySuffix = typedProperties.getString(Constants.SCHEMA_REGISTRY_URL_SUFFIX_PROP, null);
         String sourceSchemaRegistrySuffix;
         String targetSchemaRegistrySuffix;
         if (StringUtils.isNullOrEmpty(schemaRegistrySuffix)) {
           sourceSchemaRegistrySuffix = typedProperties.getString(Constants.SCHEMA_REGISTRY_SOURCE_URL_SUFFIX);
           targetSchemaRegistrySuffix = typedProperties.getString(Constants.SCHEMA_REGISTRY_TARGET_URL_SUFFIX);
         } else {
           targetSchemaRegistrySuffix = schemaRegistrySuffix;
           sourceSchemaRegistrySuffix = schemaRegistrySuffix;
         }
         typedProperties.setProperty(Constants.SOURCE_SCHEMA_REGISTRY_URL_PROP, schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + sourceSchemaRegistrySuffix);
         typedProperties.setProperty(Constants.TARGET_SCHEMA_REGISTRY_URL_PROP, schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + targetSchemaRegistrySuffix);
       }
   ```
   
   If you set `hoodie.deltastreamer.schemaprovider.registry.urlSuffix`, and then you can set appropriate values for `hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix` and `hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix`. 
   but base url and kafka topic will have to same for both source and target schema. 
   
   Does above proposal work for you? I understand, it may not be as easy as setting register.url and registery.targetUrl directly at per table level. But looking if we can achieve w/ existing code. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015323676


   I have filed a [jira](https://issues.apache.org/jira/browse/HUDI-3264) on this end. @chrischnweiss : Feel free to update the jira w/ your suggestions. Even if you can't find cycles to contribute, one of us from the community can try to find time to work towards it. 
   
   closing the github issue. we can continue the conversation in jira. 
   thanks for reporting! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015195895


   @chrischnweiss So it makes sense to make the registry url more configurable. I would recommend you propose idea to improve this based on your use case. You can elaborate your idea here or file a JIRA directly to elaborate on how exactly the configs could be. Anyone from the community may pick it up for implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] chrischnweiss commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
chrischnweiss commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015125841


   Hey @nsivabalan,
   
   unfortunately our Kafka topic naming schema makes it impossible for us to use it this way.
   
   Cheers,
   Christian


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pratyakshsharma commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015203741


   > unfortunately our Kafka topic naming schema makes it impossible for us to use it this way.
   
   @chrischnweiss Are you trying to say you guys are using a subject naming strategy other than `TopicNameStrategy` for your schema registry? MTDS was originally designed to cater to use cases with `TopicNameStrategy` as the subject naming strategy which is the default provided by Confluent. 
   
   As mentioned by Raymond, please feel free to elaborate your use case and contribute the fix back. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015083274


   @pratyakshsharma can you chime in here please? looks like some improvements for `MultiTableDeltaStreamer` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4585:
URL: https://github.com/apache/hudi/issues/4585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org