You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2017/03/10 21:21:05 UTC

[jira] [Comment Edited] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

    [ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905716#comment-15905716 ] 

Shixiong Zhu edited comment on SPARK-18057 at 3/10/17 9:21 PM:
---------------------------------------------------------------

I did some investigation yesterday, and found one issue in 0.10.2.0:
https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may hang forever when deleting a topic

Our current tests will just hang forever due to KAFKA-4879. This prevents us from upgrading 0.10.2.0.

I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try to summary the current situation:

The benefits of upgrading Kafka client to 0.10.2.0:
- Forward compatibility
- Reading topics from a timestamp
- The following bug fixes:

Issues that we already have workarounds:
https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow some interrupts meant for the calling thread
https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an infinite loop if the polling thread is interrupted, and either commitSync or committed is called
https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw NullPointerException on poll when delete the relative topic

Issues related to Kafka record compression
https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages
https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does not write EndMark if flush() is not called before close()

Others:
https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope well with topic deletions

For 0.10.1.x, KAFKA-4547 prevents us from upgrading to 0.10.1.x.

At last, IMO, "Reading topics from a timestamp" is pretty useful and is the most important reason that we should upgrade Kafka. However, since the Spark 2.2 code freeze is coming, we won't get enough time to deliver this feature to the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka release. I don't think the next Kafka release will be later than Spark 2.3.



was (Author: zsxwing):
I did some investigation yesterday, and found one issue in 0.10.2.0:
https://issues.apache.org/jira/browse/KAFKA-4879 : KafkaConsumer.position may hang forever when deleting a topic

Our current tests will just hang forever due to KAFKA-4879. This prevents us from upgrading 0.10.2.0.

I also went through the Kafka tickets between 0.10.0.1 and 0.10.2.0. Let me try to summary the current situation:

The benefits of upgrading Kafka client to 0.10.2.0:
- Forward compatibility
- Reading topics from a timestamp
- The following bug fixes:

Issues that we already have workarounds:
https://issues.apache.org/jira/browse/KAFKA-4375 : Kafka consumer may swallow some interrupts meant for the calling thread
https://issues.apache.org/jira/browse/KAFKA-4387 : KafkaConsumer will enter an infinite loop if the polling thread is interrupted, and either commitSync or committed is called
https://issues.apache.org/jira/browse/KAFKA-4536 : Kafka clients throw NullPointerException on poll when delete the relative topic

Issues related to Kafka record compression
https://issues.apache.org/jira/browse/KAFKA-3937 : Kafka Clients Leak Native Memory For Longer Than Needed With Compressed Messages
https://issues.apache.org/jira/browse/KAFKA-4549 : KafkaLZ4OutputStream does not write EndMark if flush() is not called before close()

Others:
https://issues.apache.org/jira/browse/KAFKA-2948 : Kafka producer does not cope well with topic deletions

For 0.10.1.*, KAFKA-4547 prevents us from upgrading to 0.10.1.*.

At last, IMO, "Reading topics from a timestamp" is pretty useful and is the most important reason that we should upgrade Kafka. However, since the Spark 2.2 code freeze is coming, we won't get enough time to deliver this feature to the user, it's fine to just wait for them fixing KAFKA-4879 in the next Kafka release. I don't think the next Kafka release will be later than Spark 2.3.


> Update structured streaming kafka from 10.0.1 to 10.2.0
> -------------------------------------------------------
>
>                 Key: SPARK-18057
>                 URL: https://issues.apache.org/jira/browse/SPARK-18057
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>            Reporter: Cody Koeninger
>
> There are a couple of relevant KIPs here, https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_NOTES.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org