You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sridhar Baddela (Jira)" <ji...@apache.org> on 2020/07/17 04:36:00 UTC

[jira] [Created] (SPARK-32342) Kafka events are missing magic byte

Sridhar Baddela created SPARK-32342:
---------------------------------------

             Summary: Kafka events are missing magic byte
                 Key: SPARK-32342
                 URL: https://issues.apache.org/jira/browse/SPARK-32342
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.0.0
         Environment: Pyspark 3.0.0, Python 3.7 Confluent cloud Kafka with Schema registry 5.5 
            Reporter: Sridhar Baddela


Please refer to the documentation link for to_avro and from_avro.[ http://spark.apache.org/docs/latest/sql-data-sources-avro.html|http://spark.apache.org/docs/latest/sql-data-sources-avro.html]

Tested the to_avro function by making sure that data is sent to Kafka topic. But when a Confluent Avro consumer is used to read data from the same topic, the consumer fails with an error message that event is missing the magic byte. 

Used another topic to simulate reads from Kafka and further deserialization using from_avro. Use case is, use a Confluent Avro producer to produce a few events. And when I attempt to read this topic using structured streaming and applying the function from_avro, it fails with a message indicating that malformed records are present. 

Using from_avro (deserialization) and to_avro (serialization) from Spark, only works with Spark. And other consumers outside of Spark which do not use this approach are failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org