You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (JIRA)" <ji...@apache.org> on 2019/07/17 10:20:00 UTC
[jira] [Commented] (SPARK-28415) Add messageHandler to Kafka 10
direct stream API
[ https://issues.apache.org/jira/browse/SPARK-28415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886898#comment-16886898 ]
Gabor Somogyi commented on SPARK-28415:
---------------------------------------
This is more like a new feature than a bug so modified.
> Add messageHandler to Kafka 10 direct stream API
> ------------------------------------------------
>
> Key: SPARK-28415
> URL: https://issues.apache.org/jira/browse/SPARK-28415
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.3
> Reporter: Michael Spector
> Priority: Major
>
> Lack of messageHandler parameter to KafkaUtils.createDirectStrem(...) in new Kafka API is what prevents us from upgrading our processes to use it, and here's why:
> # messageHandler() allowed parsing / filtering / projecting huge JSON files at an early stage (only a small subset of JSON fields is required for a process), without this current cluster configuration doesn't keep up with the traffic.
> # Transforming Kafka events right after a stream is created prevents from using HasOffsetRanges interface later. This means that whole message must be propagated to the end of a pipeline, which is very ineffective.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org