You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/13 22:46:39 UTC

[GitHub] HeartSaVioR edited a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming

HeartSaVioR edited a comment on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
URL: https://github.com/apache/spark/pull/22282#issuecomment-463041068
 
 
   >> If a user uses a Kafka cluster which runs using an old version that doesn't support Kafka headers, will their query fail?
   
   > @zsxwing In that case, the users will get an empty `headers` map. (see `KafkaRecordToUnsafeRowConverter.scala`)
   
   Personally that answer leads me to wonder how Kafka 2.x client can provide header to be empty array when dealing with Kafka 0.10.x broker without any error. (Just guessing protocol compatibility and dealing with default value but would like to be sure.) To ensure it safely, it would be ideal to test against Kafka 0.10.x broker manually, or some of explanation how Kafka deals with it. This is not an optional feature so we may need to be keen to not break something, or know what we are breaking.
   
   > * The minimum required version of Kafka client changes from 0.10.x to 0.11.x.
   
   If I'm not missing here, we're shipping Kafka-client altogether in `spark-sql-kafka` package, and it's already 2.0.0 in Spark 2.4 branch given assumption that it is compatible with 0.10.0. So this might not be changed. Did you mean `Kafka broker`?
   
   Other than that huge +1 on explanation on `Add a new column to the Kafka source schema.`. Nice analysis! As state is involved it doesn't look like simple one to address. I wonder if there has been some cases on datasource option to be allowed to affect schema: if then this could be mitigated as making this as optional with disable by default. (changing the value for the option may lead query failure so should be noticed though)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org