You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Quentin Ambard (JIRA)" <ji...@apache.org> on 2018/10/03 21:38:20 UTC

[jira] [Commented] (SPARK-25005) Structured streaming doesn't support kafka transaction (creating empty offset with abort & markers)

    [ https://issues.apache.org/jira/browse/SPARK-25005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637532#comment-16637532 ] 

Quentin Ambard commented on SPARK-25005:
----------------------------------------

How do you make difference between data loss or data missing when .pool() doesn't return any value [~zsxwing] ? Correct me if I'm wrong but you could lose data in this situation no ?

I think there is a third case here [https://github.com/zsxwing/spark/blob/ea804cfe840196519cc9444be9bedf03d10aa11a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala#L474] which is : something went wrong, data is available in kafka but I failed to get it.
I've seen it happening when the max.pool size is big with big messages and the heap is getting full. Message exist but the jvm lags and the consumer timeout before getting the messages

> Structured streaming doesn't support kafka transaction (creating empty offset with abort & markers)
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25005
>                 URL: https://issues.apache.org/jira/browse/SPARK-25005
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Quentin Ambard
>            Assignee: Shixiong Zhu
>            Priority: Major
>             Fix For: 2.4.0
>
>
> Structured streaming can't consume kafka transaction. 
> We could try to apply SPARK-24720 (DStream) logic to Structured Streaming source



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org