You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chaitanya (Jira)" <ji...@apache.org> on 2020/09/10 04:43:00 UTC
[jira] [Updated] (SPARK-32834) from_avro is giving empty result
[ https://issues.apache.org/jira/browse/SPARK-32834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chaitanya updated SPARK-32834:
------------------------------
Description:
I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema
Code:
df = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", "host:6667")\
.option("subscribe", "utopic1")\
.option("failOnDataLoss", "false")\
.option("startingOffsets", "earliest")\
.option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
.load()
outputDF = df\
.select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))
outputDF.printSchema()
query = outputDF.writeStream.format("console").start()
time.sleep(10)
Input:
avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]
Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}
Expected Output:
It should print values.
Actual Output:
+----+
|user|
+----+
|[,]|
+----+
Additional information:
# Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
# I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")
was:
I am trying to read a Kafka topic with avro avro
Code:
df = spark\
.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", "host:6667")\
.option("subscribe", "utopic1")\
.option("failOnDataLoss", "false")\
.option("startingOffsets", "earliest")\
.option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
.load()
outputDF = df\
.select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))
outputDF.printSchema()
query = outputDF.writeStream.format("console").start()
time.sleep(10)
Input:
avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]
Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}
Expected Output:
It should print values.
Actual Output:
+----+
|user|
+----+
| [,]|
+----+
Additional information:
# Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
# I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")
> from_avro is giving empty result
> --------------------------------
>
> Key: SPARK-32834
> URL: https://issues.apache.org/jira/browse/SPARK-32834
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.0.0
> Environment: Ubuntu 18
> Spark 3.0
> Kafka 2.0.0
> Reporter: Chaitanya
> Priority: Major
>
> I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema
>
> Code:
> df = spark\
> .readStream\
> .format("kafka")\
> .option("kafka.bootstrap.servers", "host:6667")\
> .option("subscribe", "utopic1")\
> .option("failOnDataLoss", "false")\
> .option("startingOffsets", "earliest")\
> .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
> .load()
>
> outputDF = df\
> .select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))
> outputDF.printSchema()
>
> query = outputDF.writeStream.format("console").start()
> time.sleep(10)
> Input:
> avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]
> Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}
> Expected Output:
> It should print values.
> Actual Output:
> +----+
> |user|
> +----+
> |[,]|
> +----+
> Additional information:
> # Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
> # I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org