You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/20 14:20:37 UTC

[GitHub] [hudi] shivabodepudi opened a new issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

shivabodepudi opened a new issue #3835:
URL: https://github.com/apache/hudi/issues/3835


   when trying to read data from Kafka which is stored as Json from schema registry. using below configuration.
   
    "--schemaprovider-class",
       "org.apache.hudi.utilities.schema.SchemaRegistryProvider",
       "--source-class",
       "org.apache.hudi.utilities.sources.JsonKafkaSource",
       "--continuous",
       "--table-type",
       "COPY_ON_WRITE",
       "--hoodie-conf",
       "hoodie.deltastreamer.schemaprovider.registry.url=https://localhost/subjects/testtopic-value/versions/latest",
   
   
   **Environment Description**
   
   * Hudi version :0.8.0
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) : No
   
   
   Below is the error am getting.
   ![image](https://user-images.githubusercontent.com/83971639/138111314-d2f290c0-5893-4255-aac2-a19733f74376.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan edited a comment on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
xushiyan edited a comment on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484


   @shivabodepudi I see. The problem is you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro schema to be provided. You could extend `org.apache.hudi.schema.SchemaRegistryProvider` to convert the json schema into avro by overriding `org.apache.hudi.schema.SchemaRegistryProvider#fetchSchemaFromRegistry`
   
   Meanwhile i do think support json schema makes sense as we support JsonSource anyway. Filing a JIRA for this. https://issues.apache.org/jira/browse/HUDI-2608 @shivabodepudi if you interested, feel free to pick up this feature.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] shivabodepudi closed issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
shivabodepudi closed issue #3835:
URL: https://github.com/apache/hudi/issues/3835


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-947876124


   @shivabodepudi can you provide a sample schema from your schema registry? and also a sample kafka message? we should examine those first. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan edited a comment on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
xushiyan edited a comment on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484


   @shivabodepudi I see. The problem is only Avro schema is supported and you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro schema to be provided. You could extend `org.apache.hudi.schema.SchemaRegistryProvider` to convert the json schema into avro by overriding `org.apache.hudi.schema.SchemaRegistryProvider#fetchSchemaFromRegistry`
   
   Meanwhile i do think support json schema makes sense as we support JsonSource anyway. Filing a JIRA for this. https://issues.apache.org/jira/browse/HUDI-2608 @shivabodepudi if you interested, feel free to pick up this feature.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] shivabodepudi edited a comment on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
shivabodepudi edited a comment on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-948290654


   below is the schema we are using right now. @xushiyan @codope 
   
   ```
   {
   	"type": "object",
   	"title": "testtopic.Value",
   	"properties": {
   		"Zipcode": {
   			"type": "string",
   			"connect.index": 8
   		},
   		"__snapshot": {
   			"connect.index": 98,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string",
   					"title": "io.debezium.data.Enum",
   					"connect.version": 1,
   					"connect.parameters": {
   						"allowed": "true,last,false"
   					}
   				}
   			]
   		},
   		"__change_lsn": {
   			"connect.index": 103,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"Last_Upd_DateTime": {
   			"type": "integer",
   			"title": "io.debezium.time.Timestamp",
   			"connect.index": 62,
   			"connect.version": 1,
   			"connect.type": "int64"
   		},
   		"Crt_DateTime": {
   			"type": "integer",
   			"title": "io.debezium.time.Timestamp",
   			"connect.index": 59,
   			"connect.version": 1,
   			"connect.type": "int64"
   		},
   		"Description": {
   			"connect.index": 3,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"BillTo_Address2": {
   			"type": "string",
   			"connect.index": 16
   		},
   		"BillTo_Address1": {
   			"type": "string",
   			"connect.index": 15
   		},
   		"Time_Zone": {
   			"type": "integer",
   			"connect.index": 12,
   			"connect.type": "int16"
   		},
   		"_hoodie_is_deleted": {
   			"connect.index": 104,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "boolean"
   				}
   			]
   		},
   		"__schema": {
   			"connect.index": 102,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"Last_Upd_UserNum": {
   			"type": "integer",
   			"connect.index": 60,
   			"connect.type": "int32"
   		},
   		"__db": {
   			"connect.index": 101,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"ClinicPK": {
   			"type": "string",
   			"connect.index": 51
   		},
   		"Clinic_Num": {
   			"type": "integer",
   			"connect.index": 2,
   			"connect.type": "int32"
   		},
   		"__ts_ms": {
   			"connect.index": 97,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "integer",
   					"connect.type": "int64"
   				}
   			]
   		},
   		"__name": {
   			"connect.index": 100,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"IsCCEnabled": {
   			"connect.index": 53,
   			"type": "boolean"
   		},
   		"Clinic_Name": {
   			"type": "string",
   			"connect.index": 1
   		},
   		"Address2": {
   			"type": "string",
   			"connect.index": 5
   		},
   		"Address1": {
   			"type": "string",
   			"connect.index": 4
   		},
   		"__op": {
   			"connect.index": 95,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		
   		"__table": {
   			"connect.index": 96,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"__connector": {
   			"connect.index": 99,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		}
   	}
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484


   @shivabodepudi I see. The problem is only Avro schema is supported and you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro schema to be provided. You could extend `org.apache.hudi.schema.SchemaRegistryProvider` to convert the json schema into avro. 
   
   Meanwhile i do think support json schema makes sense as we support JsonSource anyway. Filing a JIRA for this. https://issues.apache.org/jira/browse/HUDI-2608 @shivabodepudi if you interested, feel free to pick up this feature.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] shivabodepudi commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
shivabodepudi commented on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-948290654


   below is the schema we are using right now.
   ```
   {
   	"type": "object",
   	"title": "testtopic.Value",
   	"properties": {
   		"Zipcode": {
   			"type": "string",
   			"connect.index": 8
   		},
   		"__snapshot": {
   			"connect.index": 98,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string",
   					"title": "io.debezium.data.Enum",
   					"connect.version": 1,
   					"connect.parameters": {
   						"allowed": "true,last,false"
   					}
   				}
   			]
   		},
   		"__change_lsn": {
   			"connect.index": 103,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"Last_Upd_DateTime": {
   			"type": "integer",
   			"title": "io.debezium.time.Timestamp",
   			"connect.index": 62,
   			"connect.version": 1,
   			"connect.type": "int64"
   		},
   		"Crt_DateTime": {
   			"type": "integer",
   			"title": "io.debezium.time.Timestamp",
   			"connect.index": 59,
   			"connect.version": 1,
   			"connect.type": "int64"
   		},
   		"Description": {
   			"connect.index": 3,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"BillTo_Address2": {
   			"type": "string",
   			"connect.index": 16
   		},
   		"BillTo_Address1": {
   			"type": "string",
   			"connect.index": 15
   		},
   		"Time_Zone": {
   			"type": "integer",
   			"connect.index": 12,
   			"connect.type": "int16"
   		},
   		"_hoodie_is_deleted": {
   			"connect.index": 104,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "boolean"
   				}
   			]
   		},
   		"__schema": {
   			"connect.index": 102,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"Last_Upd_UserNum": {
   			"type": "integer",
   			"connect.index": 60,
   			"connect.type": "int32"
   		},
   		"__db": {
   			"connect.index": 101,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"ClinicPK": {
   			"type": "string",
   			"connect.index": 51
   		},
   		"Clinic_Num": {
   			"type": "integer",
   			"connect.index": 2,
   			"connect.type": "int32"
   		},
   		"__ts_ms": {
   			"connect.index": 97,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "integer",
   					"connect.type": "int64"
   				}
   			]
   		},
   		"__name": {
   			"connect.index": 100,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"IsCCEnabled": {
   			"connect.index": 53,
   			"type": "boolean"
   		},
   		"Clinic_Name": {
   			"type": "string",
   			"connect.index": 1
   		},
   		"Address2": {
   			"type": "string",
   			"connect.index": 5
   		},
   		"Address1": {
   			"type": "string",
   			"connect.index": 4
   		},
   		"__op": {
   			"connect.index": 95,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		
   		"__table": {
   			"connect.index": 96,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		},
   		"__connector": {
   			"connect.index": 99,
   			"oneOf": [
   				{
   					"type": "null"
   				},
   				{
   					"type": "string"
   				}
   			]
   		}
   	}
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] shivabodepudi commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
shivabodepudi commented on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-947770659


   As I am using JsonKafka source, not sure why avro is coming into picture.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
xushiyan closed issue #3835:
URL: https://github.com/apache/hudi/issues/3835


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3835:
URL: https://github.com/apache/hudi/issues/3835#issuecomment-947738050


   `object` is not a valid type in Avro. Apart from the primitive types, Avro supports following types: `records, array, map, enum, fixed and error`. Not sure where is `object` coming from. Can you share the schema from the schema registry?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org