You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2016/04/12 02:37:25 UTC

[jira] [Commented] (KAFKA-3545) Generalized Serdes for List/Map

    [ https://issues.apache.org/jira/browse/KAFKA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236334#comment-15236334 ] 

Guozhang Wang commented on KAFKA-3545:
--------------------------------------

Thanks for reporting. Having generalized serdes for collection types is definitely on our road map.

As for "group-by" followed by "aggregate", as I mentioned in KAFKA-3544 there are already built-in operators where users can use a "selector" to pick the aggregation key and an "aggregator" to aggregate the records with the same selected key. And in KAFKA-3337 we plan to extract the "selector" into a separate "groupBy" operator in Kafka Streams DSL. Would that work for your case?

> Generalized Serdes for List/Map
> -------------------------------
>
>                 Key: KAFKA-3545
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3545
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
>            Priority: Minor
>              Labels: api
>             Fix For: 0.10.1.0
>
>
> In working with Kafka Streams I've found it's often the case I want to perform a "group by" operation, where I repartition a stream based on a foreign key and then do an aggregation of all the values into a single collection, so the stream becomes one where each entry has a value that is a serialized list of values that belonged to the key. (This seems unrelated to the 'group by' operation talked about in KAFKA-3544.) Basically the same typical group by operation found in systems like Cascading.
> In order to create these intermediate list values I needed to define custom avro schemas that simply wrap the elements of interest into a list. It seems desirable that there be some basic facility for constructing simple Serdes of Lists/Maps/Sets of other types, potentially using avro's serialization under the hood. If this existed in the core library it would also enable the addition of higher level operations on streams that can use these Serdes to perform simple operations like the "group by" example I mention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)