You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/14 13:25:51 UTC

[GitHub] [hudi] nsivabalan commented on issue #2406: [SUPPORT] HoodieMultiTableDeltastreamer - Bypassing SchemaProvider-Class requirement for ParquetDFS

nsivabalan commented on issue #2406:
URL: https://github.com/apache/hudi/issues/2406#issuecomment-778777681


   looks like there could be a bug. Here is the reason:
   Deltastreamer works fine for Dataset<Row> sources w/o providing schema provider. But looks like in multi table delta streamer we missed to hold on to that assumption. Tests written for multi table delta streamer are for Dataset<GenericRecord> and hence schema providers are mandatory. 
   
   ```
   git diff
   diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
   index 9d5ca3ca..91742ec0 100644
   --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
   +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java
   @@ -147,7 +147,7 @@ public class HoodieMultiTableDeltaStreamer {
      }
    
      private void populateSchemaProviderProps(HoodieDeltaStreamer.Config cfg, TypedProperties typedProperties) {
   -    if (cfg.schemaProviderClassName.equals(SchemaRegistryProvider.class.getName())) {
   +    if (cfg.schemaProviderClassName != null && cfg.schemaProviderClassName.equals(SchemaRegistryProvider.class.getName())) {
          String schemaRegistryBaseUrl = typedProperties.getString(Constants.SCHEMA_REGISTRY_BASE_URL_PROP);
          String schemaRegistrySuffix = typedProperties.getString(Constants.SCHEMA_REGISTRY_URL_SUFFIX_PROP);
          typedProperties.setProperty(Constants.SOURCE_SCHEMA_REGISTRY_URL_PROP, schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + schemaRegistrySuffix);
   ```
   
   As you might have figured out, I don't have exp with this code base before. So, will have to write tests to ensure the fix works. But in the mean time, you can fetch StructType, then you can try using [RowBasedSchemaProvider](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/RowBasedSchemaProvider.java ) to unblock for now. 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org