You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chaitanya (Jira)" <ji...@apache.org> on 2020/09/10 04:43:00 UTC

[jira] [Updated] (SPARK-32834) from_avro is giving empty result

     [ https://issues.apache.org/jira/browse/SPARK-32834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chaitanya updated SPARK-32834:
------------------------------
    Description: 
I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema

 

Code:

df = spark\

  .readStream\

  .format("kafka")\

  .option("kafka.bootstrap.servers", "host:6667")\

  .option("subscribe", "utopic1")\

  .option("failOnDataLoss", "false")\

  .option("startingOffsets", "earliest")\

  .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\

  .load()

 

outputDF = df\

        .select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))

outputDF.printSchema()

 

query = outputDF.writeStream.format("console").start()

time.sleep(10)

Input:

avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]

Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}

Expected Output:

It should print values. 

Actual Output:

+----+
|user|

+----+
|[,]|

+----+

Additional information:
 # Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
 # I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")

 

  was:
I am trying to read a Kafka topic with avro avro

 

Code:

df = spark\

  .readStream\

  .format("kafka")\

  .option("kafka.bootstrap.servers", "host:6667")\

  .option("subscribe", "utopic1")\

  .option("failOnDataLoss", "false")\

  .option("startingOffsets", "earliest")\

  .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\

  .load()

 

outputDF = df\

        .select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))

outputDF.printSchema()

 

query = outputDF.writeStream.format("console").start()

time.sleep(10)

Input:

avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]

Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}

Expected Output:

It should print values. 

Actual Output:

+----+

|user|

+----+

| [,]|

+----+

Additional information:
 # Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
 # I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")

 


> from_avro is giving empty result
> --------------------------------
>
>                 Key: SPARK-32834
>                 URL: https://issues.apache.org/jira/browse/SPARK-32834
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>         Environment: Ubuntu 18
> Spark 3.0
> Kafka 2.0.0
>            Reporter: Chaitanya
>            Priority: Major
>
> I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema
>  
> Code:
> df = spark\
>   .readStream\
>   .format("kafka")\
>   .option("kafka.bootstrap.servers", "host:6667")\
>   .option("subscribe", "utopic1")\
>   .option("failOnDataLoss", "false")\
>   .option("startingOffsets", "earliest")\
>   .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
>   .load()
>  
> outputDF = df\
>         .select(from_avro("value", jsonFormatSchema, options=\{"mode":"FASTFAIL"}).alias("user"))
> outputDF.printSchema()
>  
> query = outputDF.writeStream.format("console").start()
> time.sleep(10)
> Input:
> avro schema file: [user.avsc|https://github.com/apache/spark/raw/4ad9bfd53b84a6d2497668c73af6899bae14c187/examples/src/main/resources/user.avsc]
> Kafka topic: \{'favorite_color': 'Red', 'name': 'Alyssa'}
> Expected Output:
> It should print values. 
> Actual Output:
> +----+
> |user|
> +----+
> |[,]|
> +----+
> Additional information:
>  # Searched in the internet and found that other peson faced same issue. [https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values]
>  # I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org