You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@gobblin.apache.org by "roman.tarasov@mgid.com" <ro...@mgid.com> on 2019/02/12 10:48:33 UTC
Gobblin Avro to Json convert
Hello,
I,m hadoop cluster admin in MGID company.
We try to use gobblin to ingest from kafka to hdfs.
We have 4 kafka clusters (not confluent, but we use confluent schema
registry) and our application write to kafka in avro.
Our problem is convert avro to parquet befor write to hdfs.
For this we use converter
converter.classes=org.apache.gobblin.converter.avro.AvroToJsonStringConverter,org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter,org.apache.gobblin.converter.parquet.JsonIntermediateToParquetGroupConverter
But we have an error at sturtup our job
In test, use standalone mode
java.lang.IllegalStateException: This is not a JSON Array. at
com.google.gson.JsonElement.getAsJsonArray(JsonElement.java:106) at
org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter.convertSchema(JsonStringToJsonIntermediateConverter.java:71)
at
org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter.convertSchema(JsonStringToJsonIntermediateConverter.java:48)
at
Maybe you know how to solve this problem?
Our test config
|kafka.brokers=kafka-node:9092
#kafka.schema.registry.class=org.apache.gobblin.source.extractor.extract.kafka.ConfluentKafkaSchemaRegistry
kafka.deserializer.type=CONFLUENT_AVRO
kafka.schema.registry.url=http://kafka-node:8081
source.class=org.apache.gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
extract.namespace=org.apache.gobblin.extract.kafka
converter.classes="org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter,org.apache.gobblin.converter.parquet.JsonIntermediateToParquetGroupConverter"
extract.namespace=org.apache.gobblin.extract.converter
writer.builder.class=org.apache.gobblin.writer.ParquetDataWriterBuilderwriter.destination.type=HDFS
writer.output.format=PARQUET writer.file.path.type=tablename
topic.name=test_hdfs topic.whitelist=test_hdfs
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
Regards, Roman Tarasov! |
Re: Gobblin Avro to Json convert
Posted by Hung Tran <hu...@linkedin.com>.
Hi Roman,
There is a log line log.info("Schema: " + inputSchema) in JsonStringToJsonIntermediateConverter. What does this print?
Also, the converter config you mentioned is not the same as the one you listed in the `Our test confg` section. Which config are you using?
Hung.
________________________________
From: roman.tarasov@mgid.com <ro...@mgid.com>
Sent: Tuesday, February 12, 2019 2:48:33 AM
To: user@gobblin.incubator.apache.org
Subject: Gobblin Avro to Json convert
Hello,
I,m hadoop cluster admin in MGID company.
We try to use gobblin to ingest from kafka to hdfs.
We have 4 kafka clusters (not confluent, but we use confluent schema registry) and our application write to kafka in avro.
Our problem is convert avro to parquet befor write to hdfs.
For this we use converter converter.classes=org.apache.gobblin.converter.avro.AvroToJsonStringConverter,org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter,org.apache.gobblin.converter.parquet.JsonIntermediateToParquetGroupConverter
But we have an error at sturtup our job
In test, use standalone mode
java.lang.IllegalStateException: This is not a JSON Array. at com.google.gson.JsonElement.getAsJsonArray(JsonElement.java:106) at org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter.convertSchema(JsonStringToJsonIntermediateConverter.java:71) at org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter.convertSchema(JsonStringToJsonIntermediateConverter.java:48) at
Maybe you know how to solve this problem?
Our test config
kafka.brokers=kafka-node:9092
#kafka.schema.registry.class=org.apache.gobblin.source.extractor.extract.kafka.ConfluentKafkaSchemaRegistry
kafka.deserializer.type=CONFLUENT_AVRO
kafka.schema.registry.url=http://kafka-node:8081
source.class=org.apache.gobblin.source.extractor.extract.kafka.KafkaDeserializerSource
extract.namespace=org.apache.gobblin.extract.kafka
converter.classes="org.apache.gobblin.converter.json.JsonStringToJsonIntermediateConverter,org.apache.gobblin.converter.parquet.JsonIntermediateToParquetGroupConverter"
extract.namespace=org.apache.gobblin.extract.converter
writer.builder.class=org.apache.gobblin.writer.ParquetDataWriterBuilder
writer.destination.type=HDFS
writer.output.format=PARQUET
writer.file.path.type=tablename
topic.name=test_hdfs
topic.whitelist=test_hdfs
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
Regards,
Roman Tarasov!