You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "sam (JIRA)" <ji...@apache.org> on 2018/09/22 10:36:00 UTC

[jira] [Created] (KAFKA-7432) API Method on Kafka Streams for processing chunks/batches of data

sam created KAFKA-7432:
--------------------------

             Summary: API Method on Kafka Streams for processing chunks/batches of data
                 Key: KAFKA-7432
                 URL: https://issues.apache.org/jira/browse/KAFKA-7432
             Project: Kafka
          Issue Type: New Feature
            Reporter: sam


For many situations in Big Data it is preferable to work with a small buffer of records at a go, rather than one record at a time.

The natural example is calling some external API that supports batching for efficiency.

How can we do this in Kafka Streams? I cannot find anything in the API that looks like what I want.

So far I have:

{{builder.stream[String, String]("my-input-topic") .mapValues(externalApiCall).to("my-output-topic")}}

What I want is:

{{builder.stream[String, String]("my-input-topic") .batched(chunkSize = 2000).map(externalBatchedApiCall).to("my-output-topic")}}

In Scala and Akka Streams the function is called {{grouped}} or {{batch}}. In Spark Structured Streaming we can do {{mapPartitions.map(_.grouped(2000).map(externalBatchedApiCall))}}.

 

 

https://stackoverflow.com/questions/52366623/how-to-process-data-in-chunks-batches-with-kafka-streams



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)