You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Daniyar Yeralin (JIRA)" <ji...@apache.org> on 2019/05/06 15:59:00 UTC

[jira] [Created] (KAFKA-8326) Add List Serde

Daniyar Yeralin created KAFKA-8326:
--------------------------------------

             Summary: Add List<T> Serde
                 Key: KAFKA-8326
                 URL: https://issues.apache.org/jira/browse/KAFKA-8326
             Project: Kafka
          Issue Type: Improvement
          Components: clients, streams
            Reporter: Daniyar Yeralin


I propose adding serializers and deserializers for the java.util.List class.

I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and use their associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly.

I believe there are many use cases where one would want to have a List serde. Ex. [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api]

 

KIP Link: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
(1) The current PR suggests to always instantiate an `ArrayList` --
however, if a user wants to use any other list implementation, they have
no way to specify this. It might be good to either allow users to
specify the list-type on the deserializer, or encode the list type
directly in the bytes, and hence, whatever type the serialized list was,
the same type will be used on deserialization (might only work for Java
build-it list types).

Personally, I thinks its better/more flexible to specify the list-type
on the deserializer, as it also allows to plug-in any custom list types.

This could of course be opt-in and for the case users don't care, we
just default to `ArrayList`.


(2) For Java built-in types, we could check the type via `instanceof` --
if the type is unknown, we fall back to per-element length encoding. As
an alternative, we could also add a constructor taking an `enum` with
two values `fixed-size` and `variable-size`, or a config instead of a
constructor element.


Just bounding off ideas -- maybe there are good reasons (too
complicated?) to not support either of them.


-Matthias


On 5/24/19 11:09 AM, Development wrote:
> Hey,
> 
> - did we consider to make the return type (ie, ArrayList, vs
> LinkesList) configurable or encode it the serialized bytes?
> 
> Not sure about this one. Could you elaborate?
> 
> - atm the size of each element is encoded individually; did we consider
> an optimization for fixed size elements (like Long) to avoid this overhead?
> 
> I cannot think of any clean way to do so. How would you see it?
> 
> Btw I resolved all your comments under PR
> 
> Best,
> Daniyar Yeralin
> 
>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>
>> Thanks for the KIP. I also had a look into the PR and have two follow up
>> question:
>>
>>
>> - did we consider to make the return type (ie, ArrayList, vs
>> LinkesList) configurable or encode it the serialized bytes?
>>
>> - atm the size of each element is encoded individually; did we consider
>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>
>>
>>
>> -Matthias
>>
>> On 5/15/19 6:05 PM, John Roesler wrote:
>>> Sounds good!
>>>
>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>>
>>>> Hey,
>>>>
>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>>
>>>> Best,
>>>> Daniyar Yeralin
>>>>
>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>
>>>>> Good observation, Daniyar.
>>>>>
>>>>> Maybe we should just not implement support for serdeFrom.
>>>>>
>>>>> We can always add it later, but I think you're right, we need some
>>>>> kind of more sophisticated support, or at least a second argument for
>>>>> the inner class.
>>>>>
>>>>> For now, it seems like most use cases would be satisfied without
>>>>> serdeFrom(...List...)
>>>>>
>>>>> -John
>>>>>
>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>>
>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>>
>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>>
>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>>
>>>>>>> Hey Sophie,
>>>>>>>
>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>>
>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>>
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>>
>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>>
>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>> the ROI could be low..
>>>>>>>>
>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>> be useful, and there may be more)
>>>>>>>>
>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>> called for a vote?
>>>>>>>>>
>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>> Like queue or set for example
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>>
>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>
>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>> not required :)
>>>>>>>>>>
>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>> others will find easy to review.
>>>>>>>>>>
>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>> the syntax.
>>>>>>>>>>
>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>>
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hey,
>>>>>>>>>>>
>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>
>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>
>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>>
>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> Does that work?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Chris
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>
>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>> KIPs :)
>>>>>>>>>>>>
>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>> KIP document ? (
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> <
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>> but I
>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>> control
>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>> generic
>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>> serialization.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>> an old
>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>> editing
>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>> any
>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>> serde a
>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>> is going
>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>> capture the
>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>> known
>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>> adding
>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>> many
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>> captured type
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>> make
>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>> desire
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>> a
>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>> could
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>> static
>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>> certainly
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>        Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>        URL:
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>    Project: Kafka
>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>   Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>> message to
>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>> arrays
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>> a List
>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>> <
>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>
>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>> <
>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> <
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey Matthias,

Yes, you are totally right, “list.key/value.serializer.type" in ProducerConfigs. Removed!

And yes, now StreamsConfig just points to configs stored in CommonClientsConfig.

I’ll update the KIP.

I think we can continue with voting now.

Best,
Daniyar Yeralin

> On Jul 23, 2019, at 9:08 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
>>> Just to make sure I understand the problem you're highlighting:
>>> I guess the difference is that the serializer and deserializer that are
>>> nested inside the serde also need to be configured? So, by default I'd have
>>> to specify all six configs when I'm using Streams?
> 
> That is not the problem. And you actually describe the solution for it
> yourself:
> 
>>> I guess in the Serde, it could make use of a package-protected constructor
>>> for the serializer and deserializer that fixes the list type and inner type
>>> to the serde-configured ones. Then, when you're configuring Streams, you
>>> only need to specify the StreamsConfigs.
> 
> 
> 
> 
> The problem is, that `ListSerde` is in package `clients` and thus
> `ListSerde` cannot access `StreamsConfig`, and hence cannot use
> `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we
> either need to hard-code strings literal for the config names (what does
> not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE`
> (and others).
> 
> In StreamsConfig we would just redefine them for convenience:
> 
>> public static final String DEFAULT_LIST_KEY_SERDE_TYPE = CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE;
> 
> 
> Note that `TimeWindowSerde` is contained in `streams` package and thus
> it can access `StreamsConfig` and
> `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`.
> 
> 
> 
> 
> Btw: I just realized that we actually don't need `ProducerConfig`
> 
>> list.key/value.serializer.type
> 
> because the list-type is irrelevant on write. We only need `inner` config.
> 
> 
> 
> -Matthias
> 
> 
> On 7/23/19 1:30 PM, John Roesler wrote:
>> Hmm, that's a tricky situation.
>> 
>> I think Daniyar was on the right track... Producer only cares about
>> serializer configs, and Consumer only cares about deserializer configs.
>> 
>> I didn't see the problem with your proposal:
>> 
>> ProducerConfig:
>>> list.key/value.serializer.type
>>> list.key/value.serializer.inner
>>> ConsumerConfig:
>>> list.key/value.deserializer.type
>>> list.key/value.deserializer.inner
>>> StreamsConfig:
>>> default.list.key/value.serde.type
>>> default.list.key/value.serde.inner
>> 
>> 
>> It seems like the key/value serde configs are a better analogy than the
>> windowed serde.
>> ProducerConfig: key.serializer
>> ConsumerConfig: key.deserializer
>> StreamsConfig: default.key.serde
>> 
>> Just to make sure I understand the problem you're highlighting:
>> I guess the difference is that the serializer and deserializer that are
>> nested inside the serde also need to be configured? So, by default I'd have
>> to specify all six configs when I'm using Streams?
>> 
>> I guess in the Serde, it could make use of a package-protected constructor
>> for the serializer and deserializer that fixes the list type and inner type
>> to the serde-configured ones. Then, when you're configuring Streams, you
>> only need to specify the StreamsConfigs.
>> 
>> Does that work?
>> -John
>> 
>> 
>> On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:
>> 
>>> Bump
>>> 
>>>> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
>>>> 
>>>> Hey Matthias,
>>>> 
>>>> It looks a little confusing, but I don’t have enough expertise to judge
>>> on the configuration placement.
>>>> 
>>>> If you think, it is fine I’ll go ahead with this approach.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>>> 
>>>>> Good point.
>>>>> 
>>>>> I guess the simplest solution is, to actually add
>>>>> 
>>>>>>> default.list.key/value.serde.type
>>>>>>> default.list.key/value.serde.inner
>>>>> 
>>>>> to both `CommonClientConfigs` and `StreamsConfig`.
>>>>> 
>>>>> It's not super clean, but I think it's the best we can do. Thoughts?
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> On 7/19/19 1:23 PM, Development wrote:
>>>>>> Hi Matthias,
>>>>>> 
>>>>>> I agree, ConsumerConfig did not seem like a right place for these
>>> configurations.
>>>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>>>>>> 
>>>>>> However, I have a question. What should I do in "configure(Map<String,
>>> ?> configs, boolean isKey)” methods? Which configurations should I try to
>>> locate? I was comparing my (de)serializer implementations with
>>> SessionWindows(De)serializer classes, and they use StreamsConfig class to
>>> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
>>> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>>>>>> 
>>>>>> In my case, as I mentioned earlier, StreamsConfig class is not
>>> accessible from org.apache.kafka.common.serialization package. So, I can’t
>>> utilize it. Any suggestions here?
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> 
>>>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> One minor question about the configs. The KIP adds three classes, a
>>>>>>> Serializer, a Deserializer, and a Serde.
>>>>>>> 
>>>>>>> Hence, would it make sense to add the corresponding configs to
>>>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
>>>>>>> different names each time?
>>>>>>> 
>>>>>>> 
>>>>>>> Somethin like this:
>>>>>>> 
>>>>>>> ProducerConfig:
>>>>>>> 
>>>>>>> list.key/value.serializer.type
>>>>>>> list.key/value.serializer.inner
>>>>>>> 
>>>>>>> ConsumerConfig:
>>>>>>> 
>>>>>>> list.key/value.deserializer.type
>>>>>>> list.key/value.deserializer.inner
>>>>>>> 
>>>>>>> StreamsConfig:
>>>>>>> 
>>>>>>> default.list.key/value.serde.type
>>>>>>> default.list.key/value.serde.inner
>>>>>>> 
>>>>>>> 
>>>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
>>> right
>>>>>>> to me. Also note, that it seems better to avoid the `default.` prefix
>>>>>>> for consumers and producers because there is only one Serializer or
>>>>>>> Deserializer anyway. Only for Streams, there are multiple and
>>>>>>> StreamsConfig specifies the default one of an operator does not
>>>>>>> overwrite it.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> 
>>>>>>> Also, the KIP should explicitly mention to what classed certain
>>> configs
>>>>>>> are added. Atm, the KIP only list parameter names, but does not state
>>>>>>> where those are added.
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/16/19 1:11 PM, Development wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>>>>>> 
>>>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman
>>> for your valuable input!
>>>>>>>> 
>>>>>>>> I hope I did not cause too much trouble :)
>>>>>>>> 
>>>>>>>> I’ll start the vote now.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Daniyar,
>>>>>>>>> 
>>>>>>>>> Thanks for that update. I took a look, and I think this is in good
>>> shape.
>>>>>>>>> 
>>>>>>>>> One note, the statement "New method public static <T> Serde<List<T>>
>>>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>>>>>> (infers list implementation and inner serde from config file)" is
>>>>>>>>> still present in the KIP, although I do it is was removed from the
>>> PR.
>>>>>>>>> 
>>>>>>>>> Once you remove that statement from the KIP, then I think this KIP
>>> is
>>>>>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>>>>>> earnest and get this thing merged.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -john
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> Pushed new changes under my PR:
>>> https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592>
>>>>>>>>>> 
>>>>>>>>>> Feel free to put any comments in there.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi John,
>>>>>>>>>>> 
>>>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I
>>> removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>>>>>> 
>>>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
>>> have any properties related to serdes, and that bothers me a little.
>>>>>>>>>>> 
>>>>>>>>>>> All properties like `default.key.serde`
>>> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want
>>> to create a confusion.
>>>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
>>> its (de)serializers are not located in
>>> org.apache.kafka.common.serialization. I guess it kind of makes sense since
>>> windowed serdes are only available for kafka streams, not vice versa.
>>>>>>>>>>> 
>>>>>>>>>>> If everyone is okay to put list properties in
>>> `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you for your input!
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding the placement, you might as well move the constants to
>>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
>>> the configs and the code are in the same module.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The
>>> serde, serializer, and deserializer all need to have zero-arg constructors
>>> so they can be instantiated reflectively by Kafka. However, the factory
>>> method you proposed "New method public static <T> Serde<List<T>>
>>> ListSerde()" is not a constructor, and is not required. It would be used
>>> purely from the Java interface, but has the drawbacks I listed above. This
>>> method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -John
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>> One problem though.
>>>>>>>>>>>> 
>>>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m
>>> trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>>>>>> 
>>>>>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> And trying to do similar construct:
>>>>>>>>>>>> final String propertyName = isKey ?
>>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
>>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>>>>>> But then found out that StreamsConfig is not accessible from
>>> org.apache.kafka.common.serialization package while window serde
>>> (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>>>>>>>> 
>>>>>>>>>>>> What should I do? Should I move my classes under
>>> org.apache.kafka.streams.kstream package instead?
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for your input.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit
>>> them under my PR.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
>>> matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
>>> and well
>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> About the default constructor question:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
>>> constructor to
>>>>>>>>>>>>>> create them easily via reflections when specifies in a config.
>>> I
>>>>>>>>>>>>>> understand that it is not super user friendly, but all
>>> existing code
>>>>>>>>>>>>>> works this way. Hence, it seems best to stick with the
>>> established pattern.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
>>> improve user
>>>>>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
>>> https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the
>>> user
>>>>>>>>>>>>>> would also need to call `configure()` to setup the inner
>>> serdes. Kafka
>>>>>>>>>>>>>> Streams would not setup those automatically and one might most
>>> likely
>>>>>>>>>>>>>> end-up with an NPE.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes`
>>> are
>>>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
>>> `WindowedSerdes`,
>>>>>>>>>>>>>> we use the following parameter names:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It might be good to align the naming pattern. I would also
>>> suggest to
>>>>>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>>>>>> default.value.list.serde.impl  ->
>>> default.list.value.serde.type
>>>>>>>>>>>>>> default.key.list.serde.element  ->
>>> default.list.key.serde.inner
>>>>>>>>>>>>>> default.value.list.serde.element  ->
>>> default.list.value.serde.inner
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
>>> that it goes against what Matthias suggested earlier:
>>>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default
>>> constructor and it should be possible to pass in the `Class listClass`
>>> information via a configuration. Otherwise, KafkaStreams cannot use it as
>>> default serde.”
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing
>>> anything.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io
>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for
>>> the update, too.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
>>> org.apache.kafka.common.serialization.Serdes class (infers list
>>> implementation and inner serde from config file)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> It seems like this situation implies my config file is
>>> already set up for the list serde, so passing this serde (e.g., in
>>> Produced) would have the same effect as not specifying it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I guess that it could be the case that you have the
>>> `default.key/value.serde` set to something else, like StringSerde, but you
>>> still have the `default.key/value.list.serde.impl/element` set. This seems
>>> like it would result in more confusion than convenience, so my gut instinct
>>> is maybe we shouldn't introduce the `ListSerde()` variant until people
>>> actually request it later on.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
>>> source-code-driven, not half/half.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
>>> correctly, but I think I already listed them:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> So for Produced, you would use it in the following fashion,
>>> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
>>> Serdes.Integer()))
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
>>> Strategy” where I describe our logic of conditional serialization based on
>>> the type of an inner serde.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>> <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In addition to specifying the config interface, can you
>>> also specify
>>>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
>>> of this
>>>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized,
>>> etc., what
>>>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
>>> Serializer
>>>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
>>> implementation
>>>>>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If you also want to specify the serialized format of the
>>> data records
>>>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
>>> letting us
>>>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility
>>> concerns, etc.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Finally made updates to the KIP:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>> 
>>>>>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>> matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, something like this. I did not think about good
>>> configuration
>>>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all
>>> proposed
>>>>>>>>>>>>>>>>> configs atm. But all configs should be listed and explained
>>> in the KIP
>>>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
>>> the KIP (I can
>>>>>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
>>> have in a list
>>>>>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> As for the default constructor and configurability, just
>>> want to make
>>>>>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
>>> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
>>> `ListSerde`
>>>>>>>>>>>>>>>>> should have an default constructor and it should be
>>> possible to pass in
>>>>>>>>>>>>>>>>> the `Class listClass` information via a configuration.
>>> Otherwise,
>>>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
>>> primitive IMHO,
>>>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
>>> arrays, not for
>>>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping
>>> `byte[]`).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
>>> the KIP
>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the
>>> final
>>>>>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592
>>> <https://github.com/apache/kafka/pull/6592> <
>>> https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I also need some advice on writing tests for this new
>>> serde. So far I
>>>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m
>>> not sure
>>>>>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>
>>> wrote:
>>>>>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
>>> information.
>>>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
>>> type. I
>>>>>>>>>>>>>>>>> realized that’s the type is not really needed in
>>> ListSerializer, but
>>>>>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
>>> serializer is
>>>>>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that
>>> during
>>>>>>>>>>>>>>>>> deserialization we could create an instance of it with
>>> predefined
>>>>>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
>>> deserializer.
>>>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
>>> means
>>>>>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>> matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about
>>> re-encoding the
>>>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
>>> that the
>>>>>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type
>>> on
>>>>>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
>>> encode the
>>>>>>>>>>>>>>>>> type size, but users could specify the type on the
>>> deserializer (via a
>>>>>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I
>>> doubt
>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in
>>> the user
>>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of
>>> what I was
>>>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
>>> would that not
>>>>>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
>>> though. For
>>>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
>>> The serde
>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still
>>> need an
>>>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
>>> Serdes.listSerde(
>>>>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>> 
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list
>>> serde.
>>>>>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>>>>>> supported:
>>>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>> https://github.com/FasterXML/jackson-databind/issues/1490> <
>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>> https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490
>>> <https://github.com/FasterXML/jackson-databind/issues/1490> <
>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>> https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>>>>>> The only way is to pass an actual class object into the
>>> constructor,
>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
>>> think of my
>>>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
>>> described
>>>>>>>>>>>>>>>>> previously)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>> 
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
>>> over
>>>>>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
>>> he gave
>>>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>> <ma...@confluent.io>>>
>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>> <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
>>> and then
>>>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>>>>>> There are
>>>>>>>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
>>> duplicated
>>>>>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
>>> they
>>>>>>>>>>>>>>>>> tell us
>>>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
>>> this
>>>>>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>>>>>>>> topic.
>>>>>>>>>>>>>>>>> This problem is essentially the core motivation for
>>> serializations
>>>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>>>>>> that the records won't contain so much redundant
>>> information.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>> like what you had earlier in which you don't support
>>> perfectly
>>>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
>>> just
>>>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer
>>> full,
>>>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>>>>>> system in
>>>>>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>>>>>>>> what the
>>>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
>>> well.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
>>> take
>>>>>>>>>>>>>>>>> a look
>>>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
>>> specific,
>>>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
>>> structure.
>>>>>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>>>>>> 
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>> 
>>>>>>>>>>>>>>>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> <
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>> 
>>>>>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>> 
>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> bump
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
> 



Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks!

Glad we are on the same page on how to address the cyclic dependency issue.


-Matthias

On 7/24/19 8:09 AM, Development wrote:
> KIP-466 is updated and new commit is pushed.
> 
> Thank you guys!
> 
>> On Jul 24, 2019, at 10:53 AM, John Roesler <jo...@confluent.io> wrote:
>>
>> Ah, thanks for setting me straight, Matthias.
>>
>> Given the choice between defining the Serde in the streams module (hence it
>> would not be in the Serdes "menu" class) or defining the configuration
>> property in CommonClientConfig, I think I'm leaning toward the latter.
>>
>> Really good catch on the ProducerConfig; otherwise, I think we should go
>> ahead and add the serializer/deserializer configs as discussed to
>> ProducerConfig and ConsumerConfig. It's just cleaner and more uniform that
>> way.
>>
>> Thanks again,
>> -John
>>
>> On Tue, Jul 23, 2019 at 8:08 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>>>> Just to make sure I understand the problem you're highlighting:
>>>>> I guess the difference is that the serializer and deserializer that are
>>>>> nested inside the serde also need to be configured? So, by default I'd
>>> have
>>>>> to specify all six configs when I'm using Streams?
>>>
>>> That is not the problem. And you actually describe the solution for it
>>> yourself:
>>>
>>>>> I guess in the Serde, it could make use of a package-protected
>>> constructor
>>>>> for the serializer and deserializer that fixes the list type and inner
>>> type
>>>>> to the serde-configured ones. Then, when you're configuring Streams, you
>>>>> only need to specify the StreamsConfigs.
>>>
>>>
>>>
>>>
>>> The problem is, that `ListSerde` is in package `clients` and thus
>>> `ListSerde` cannot access `StreamsConfig`, and hence cannot use
>>> `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we
>>> either need to hard-code strings literal for the config names (what does
>>> not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE`
>>> (and others).
>>>
>>> In StreamsConfig we would just redefine them for convenience:
>>>
>>>> public static final String DEFAULT_LIST_KEY_SERDE_TYPE =
>>> CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE;
>>>
>>>
>>> Note that `TimeWindowSerde` is contained in `streams` package and thus
>>> it can access `StreamsConfig` and
>>> `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`.
>>>
>>>
>>>
>>>
>>> Btw: I just realized that we actually don't need `ProducerConfig`
>>>
>>>> list.key/value.serializer.type
>>>
>>> because the list-type is irrelevant on write. We only need `inner` config.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 7/23/19 1:30 PM, John Roesler wrote:
>>>> Hmm, that's a tricky situation.
>>>>
>>>> I think Daniyar was on the right track... Producer only cares about
>>>> serializer configs, and Consumer only cares about deserializer configs.
>>>>
>>>> I didn't see the problem with your proposal:
>>>>
>>>> ProducerConfig:
>>>>> list.key/value.serializer.type
>>>>> list.key/value.serializer.inner
>>>>> ConsumerConfig:
>>>>> list.key/value.deserializer.type
>>>>> list.key/value.deserializer.inner
>>>>> StreamsConfig:
>>>>> default.list.key/value.serde.type
>>>>> default.list.key/value.serde.inner
>>>>
>>>>
>>>> It seems like the key/value serde configs are a better analogy than the
>>>> windowed serde.
>>>> ProducerConfig: key.serializer
>>>> ConsumerConfig: key.deserializer
>>>> StreamsConfig: default.key.serde
>>>>
>>>> Just to make sure I understand the problem you're highlighting:
>>>> I guess the difference is that the serializer and deserializer that are
>>>> nested inside the serde also need to be configured? So, by default I'd
>>> have
>>>> to specify all six configs when I'm using Streams?
>>>>
>>>> I guess in the Serde, it could make use of a package-protected
>>> constructor
>>>> for the serializer and deserializer that fixes the list type and inner
>>> type
>>>> to the serde-configured ones. Then, when you're configuring Streams, you
>>>> only need to specify the StreamsConfigs.
>>>>
>>>> Does that work?
>>>> -John
>>>>
>>>>
>>>> On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:
>>>>
>>>>> Bump
>>>>>
>>>>>> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
>>>>>>
>>>>>> Hey Matthias,
>>>>>>
>>>>>> It looks a little confusing, but I don’t have enough expertise to judge
>>>>> on the configuration placement.
>>>>>>
>>>>>> If you think, it is fine I’ll go ahead with this approach.
>>>>>>
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>>
>>>>>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
>>>>> wrote:
>>>>>>>
>>>>>>> Good point.
>>>>>>>
>>>>>>> I guess the simplest solution is, to actually add
>>>>>>>
>>>>>>>>> default.list.key/value.serde.type
>>>>>>>>> default.list.key/value.serde.inner
>>>>>>>
>>>>>>> to both `CommonClientConfigs` and `StreamsConfig`.
>>>>>>>
>>>>>>> It's not super clean, but I think it's the best we can do. Thoughts?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 7/19/19 1:23 PM, Development wrote:
>>>>>>>> Hi Matthias,
>>>>>>>>
>>>>>>>> I agree, ConsumerConfig did not seem like a right place for these
>>>>> configurations.
>>>>>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>>>>>>>>
>>>>>>>> However, I have a question. What should I do in
>>> "configure(Map<String,
>>>>> ?> configs, boolean isKey)” methods? Which configurations should I try
>>> to
>>>>> locate? I was comparing my (de)serializer implementations with
>>>>> SessionWindows(De)serializer classes, and they use StreamsConfig class
>>> to
>>>>> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
>>>>> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>>>>>>>>
>>>>>>>> In my case, as I mentioned earlier, StreamsConfig class is not
>>>>> accessible from org.apache.kafka.common.serialization package. So, I
>>> can’t
>>>>> utilize it. Any suggestions here?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matthias@confluent.io
>>>>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> One minor question about the configs. The KIP adds three classes, a
>>>>>>>>> Serializer, a Deserializer, and a Serde.
>>>>>>>>>
>>>>>>>>> Hence, would it make sense to add the corresponding configs to
>>>>>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using
>>> slightly
>>>>>>>>> different names each time?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Somethin like this:
>>>>>>>>>
>>>>>>>>> ProducerConfig:
>>>>>>>>>
>>>>>>>>> list.key/value.serializer.type
>>>>>>>>> list.key/value.serializer.inner
>>>>>>>>>
>>>>>>>>> ConsumerConfig:
>>>>>>>>>
>>>>>>>>> list.key/value.deserializer.type
>>>>>>>>> list.key/value.deserializer.inner
>>>>>>>>>
>>>>>>>>> StreamsConfig:
>>>>>>>>>
>>>>>>>>> default.list.key/value.serde.type
>>>>>>>>> default.list.key/value.serde.inner
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
>>>>> right
>>>>>>>>> to me. Also note, that it seems better to avoid the `default.`
>>> prefix
>>>>>>>>> for consumers and producers because there is only one Serializer or
>>>>>>>>> Deserializer anyway. Only for Streams, there are multiple and
>>>>>>>>> StreamsConfig specifies the default one of an operator does not
>>>>>>>>> overwrite it.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, the KIP should explicitly mention to what classed certain
>>>>> configs
>>>>>>>>> are added. Atm, the KIP only list parameter names, but does not
>>> state
>>>>>>>>> where those are added.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7/16/19 1:11 PM, Development wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>>>>>>>>
>>>>>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie
>>> Blee-Goldman
>>>>> for your valuable input!
>>>>>>>>>>
>>>>>>>>>> I hope I did not cause too much trouble :)
>>>>>>>>>>
>>>>>>>>>> I’ll start the vote now.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for that update. I took a look, and I think this is in good
>>>>> shape.
>>>>>>>>>>>
>>>>>>>>>>> One note, the statement "New method public static <T>
>>> Serde<List<T>>
>>>>>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>>>>>>>> (infers list implementation and inner serde from config file)" is
>>>>>>>>>>> still present in the KIP, although I do it is was removed from the
>>>>> PR.
>>>>>>>>>>>
>>>>>>>>>>> Once you remove that statement from the KIP, then I think this KIP
>>>>> is
>>>>>>>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>>>>>>>> earnest and get this thing merged.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -john
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Pushed new changes under my PR:
>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592>
>>>>>>>>>>>>
>>>>>>>>>>>> Feel free to put any comments in there.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net>
>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I
>>>>> removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
>>>>> have any properties related to serdes, and that bothers me a little.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All properties like `default.key.serde`
>>>>> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t
>>> want
>>>>> to create a confusion.
>>>>>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
>>>>> its (de)serializers are not located in
>>>>> org.apache.kafka.common.serialization. I guess it kind of makes sense
>>> since
>>>>> windowed serdes are only available for kafka streams, not vice versa.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If everyone is okay to put list properties in
>>>>> `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for your input!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regarding the placement, you might as well move the constants
>>> to
>>>>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants
>>> and
>>>>> the configs and the code are in the same module.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The
>>>>> serde, serializer, and deserializer all need to have zero-arg
>>> constructors
>>>>> so they can be instantiated reflectively by Kafka. However, the factory
>>>>> method you proposed "New method public static <T> Serde<List<T>>
>>>>> ListSerde()" is not a constructor, and is not required. It would be used
>>>>> purely from the Java interface, but has the drawbacks I listed above.
>>> This
>>>>> method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>> One problem though.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar,
>>> I’m
>>>>> trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And trying to do similar construct:
>>>>>>>>>>>>>> final String propertyName = isKey ?
>>>>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
>>>>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>>>>>>>> But then found out that StreamsConfig is not accessible from
>>>>> org.apache.kafka.common.serialization package while window serde
>>>>> (de)serializers are located under org.apache.kafka.streams.kstream
>>> package.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What should I do? Should I move my classes under
>>>>> org.apache.kafka.streams.kstream package instead?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for your input.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the configuration parameters strategy is finalized
>>> then.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and
>>> commit
>>>>> them under my PR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
>>>>> matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Daniyar,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
>>>>> and well
>>>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About the default constructor question:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
>>>>> constructor to
>>>>>>>>>>>>>>>> create them easily via reflections when specifies in a
>>> config.
>>>>> I
>>>>>>>>>>>>>>>> understand that it is not super user friendly, but all
>>>>> existing code
>>>>>>>>>>>>>>>> works this way. Hence, it seems best to stick with the
>>>>> established pattern.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
>>>>> improve user
>>>>>>>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
>>>>> https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually,
>>> the
>>>>> user
>>>>>>>>>>>>>>>> would also need to call `configure()` to setup the inner
>>>>> serdes. Kafka
>>>>>>>>>>>>>>>> Streams would not setup those automatically and one might
>>> most
>>>>> likely
>>>>>>>>>>>>>>>> end-up with an NPE.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Coming back the KIP, and the parameter names.
>>> `WindowedSerdes`
>>>>> are
>>>>>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
>>>>> `WindowedSerdes`,
>>>>>>>>>>>>>>>> we use the following parameter names:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It might be good to align the naming pattern. I would also
>>>>> suggest to
>>>>>>>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>>>>>>>> default.value.list.serde.impl  ->
>>>>> default.list.value.serde.type
>>>>>>>>>>>>>>>> default.key.list.serde.element  ->
>>>>> default.list.key.serde.inner
>>>>>>>>>>>>>>>> default.value.list.serde.element  ->
>>>>> default.list.value.serde.inner
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
>>>>> that it goes against what Matthias suggested earlier:
>>>>>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default
>>>>> constructor and it should be possible to pass in the `Class listClass`
>>>>> information via a configuration. Otherwise, KafkaStreams cannot use it
>>> as
>>>>> default serde.”
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing
>>>>> anything.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <
>>> john@confluent.io
>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks
>>> for
>>>>> the update, too.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Just one more super-small question, do we need this
>>> variant:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
>>>>> org.apache.kafka.common.serialization.Serdes class (infers list
>>>>> implementation and inner serde from config file)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It seems like this situation implies my config file is
>>>>> already set up for the list serde, so passing this serde (e.g., in
>>>>> Produced) would have the same effect as not specifying it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I guess that it could be the case that you have the
>>>>> `default.key/value.serde` set to something else, like StringSerde, but
>>> you
>>>>> still have the `default.key/value.list.serde.impl/element` set. This
>>> seems
>>>>> like it would result in more confusion than convenience, so my gut
>>> instinct
>>>>> is maybe we shouldn't introduce the `ListSerde()` variant until people
>>>>> actually request it later on.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
>>>>> source-code-driven, not half/half.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <
>>> dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>>
>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
>>>>> correctly, but I think I already listed them:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So for Produced, you would use it in the following
>>> fashion,
>>>>> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
>>>>> Serdes.Integer()))
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
>>>>> Strategy” where I describe our logic of conditional serialization based
>>> on
>>>>> the type of an inner serde.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
>>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>>> <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In addition to specifying the config interface, can you
>>>>> also specify
>>>>>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
>>>>> of this
>>>>>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced,
>>> Materialized,
>>>>> etc., what
>>>>>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
>>>>> Serializer
>>>>>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
>>>>> implementation
>>>>>>>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If you also want to specify the serialized format of the
>>>>> data records
>>>>>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
>>>>> letting us
>>>>>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility
>>>>> concerns, etc.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
>>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Finally made updates to the KIP:
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
>>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>>> matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, something like this. I did not think about good
>>>>> configuration
>>>>>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand
>>> all
>>>>> proposed
>>>>>>>>>>>>>>>>>>> configs atm. But all configs should be listed and
>>> explained
>>>>> in the KIP
>>>>>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
>>>>> the KIP (I can
>>>>>>>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
>>>>> have in a list
>>>>>>>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As for the default constructor and configurability, just
>>>>> want to make
>>>>>>>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
>>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
>>> matthias@confluent.io>
>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
>>>>> `ListSerde`
>>>>>>>>>>>>>>>>>>> should have an default constructor and it should be
>>>>> possible to pass in
>>>>>>>>>>>>>>>>>>> the `Class listClass` information via a configuration.
>>>>> Otherwise,
>>>>>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
>>>>> primitive IMHO,
>>>>>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
>>>>> arrays, not for
>>>>>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class
>>> wrapping
>>>>> `byte[]`).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
>>>>> the KIP
>>>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review
>>> the
>>>>> final
>>>>>>>>>>>>>>>>>>> version under PR
>>> https://github.com/apache/kafka/pull/6592
>>>>> <https://github.com/apache/kafka/pull/6592> <
>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also need some advice on writing tests for this new
>>>>> serde. So far I
>>>>>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload),
>>> I’m
>>>>> not sure
>>>>>>>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
>>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>>> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
>>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>>>
>>>>> wrote:
>>>>>>>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
>>>>> information.
>>>>>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
>>>>> type. I
>>>>>>>>>>>>>>>>>>> realized that’s the type is not really needed in
>>>>> ListSerializer, but
>>>>>>>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
>>>>> serializer is
>>>>>>>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so
>>> that
>>>>> during
>>>>>>>>>>>>>>>>>>> deserialization we could create an instance of it with
>>>>> predefined
>>>>>>>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
>>>>> deserializer.
>>>>>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
>>>>> means
>>>>>>>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
>>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
>>> matthias@confluent.io>
>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
>>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>>> matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about
>>>>> re-encoding the
>>>>>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
>>>>> that the
>>>>>>>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list
>>> type
>>>>> on
>>>>>>>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
>>>>> encode the
>>>>>>>>>>>>>>>>>>> type size, but users could specify the type on the
>>>>> deserializer (via a
>>>>>>>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence,
>>> I
>>>>> doubt
>>>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in
>>>>> the user
>>>>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of
>>>>> what I was
>>>>>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
>>>>> would that not
>>>>>>>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
>>>>> though. For
>>>>>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
>>>>> The serde
>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd
>>> still
>>>>> need an
>>>>>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T>
>>> innerSerde).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
>>>>> Serdes.listSerde(
>>>>>>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
>>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net
>>>>>>>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the
>>> list
>>>>> serde.
>>>>>>>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>>>>>>>> supported:
>>>>>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
>>> <
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490> <
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>>>>>>>> <
>>> https://github.com/FasterXML/jackson-databind/issues/1490
>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490> <
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>>>>>>>> The only way is to pass an actual class object into the
>>>>> constructor,
>>>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
>>>>> think of my
>>>>>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
>>>>> described
>>>>>>>>>>>>>>>>>>> previously)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net
>>>>>>>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
>>>>> over
>>>>>>>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
>>>>> he gave
>>>>>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
>>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>>> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
>>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>>> <ma...@confluent.io>>>
>>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
>>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>>> <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
>>>>> and then
>>>>>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>>>>>>>> There are
>>>>>>>>>>>>>>>>>>> some inherent problems with this approach, which in
>>> general
>>>>>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
>>>>> duplicated
>>>>>>>>>>>>>>>>>>> information: the concrete type as well as a byte to
>>> indicate
>>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
>>>>> they
>>>>>>>>>>>>>>>>>>> tell us
>>>>>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
>>>>> this
>>>>>>>>>>>>>>>>>>> information is completely redundant. In all likelihood,
>>> the
>>>>>>>>>>>>>>>>>>> information will be exactly the same for every record in
>>> the
>>>>>>>>>>>>>>>>>>> topic.
>>>>>>>>>>>>>>>>>>> This problem is essentially the core motivation for
>>>>> serializations
>>>>>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>>>>>>>> that the records won't contain so much redundant
>>>>> information.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back
>>> to
>>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>> like what you had earlier in which you don't support
>>>>> perfectly
>>>>>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
>>>>> just
>>>>>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could
>>> defer
>>>>> full,
>>>>>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>>>>>>>> system in
>>>>>>>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add
>>> a
>>>>>>>>>>>>>>>>>>> configuration option specifically for the serde to
>>> configure
>>>>>>>>>>>>>>>>>>> what the
>>>>>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
>>>>> well.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
>>>>> take
>>>>>>>>>>>>>>>>>>> a look
>>>>>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
>>>>> specific,
>>>>>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
>>>>> structure.
>>>>>>>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>>>>>>>>
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>>
>>>>>>>>>>>>>>>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>> <
>>>>>
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>>>
>>>>>>>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
>>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>>> dev@yeralin.net
>>>>>>>>
>>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>>> <mailto:
>>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> bump
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
KIP-466 is updated and new commit is pushed.

Thank you guys!

> On Jul 24, 2019, at 10:53 AM, John Roesler <jo...@confluent.io> wrote:
> 
> Ah, thanks for setting me straight, Matthias.
> 
> Given the choice between defining the Serde in the streams module (hence it
> would not be in the Serdes "menu" class) or defining the configuration
> property in CommonClientConfig, I think I'm leaning toward the latter.
> 
> Really good catch on the ProducerConfig; otherwise, I think we should go
> ahead and add the serializer/deserializer configs as discussed to
> ProducerConfig and ConsumerConfig. It's just cleaner and more uniform that
> way.
> 
> Thanks again,
> -John
> 
> On Tue, Jul 23, 2019 at 8:08 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>>>> Just to make sure I understand the problem you're highlighting:
>>>> I guess the difference is that the serializer and deserializer that are
>>>> nested inside the serde also need to be configured? So, by default I'd
>> have
>>>> to specify all six configs when I'm using Streams?
>> 
>> That is not the problem. And you actually describe the solution for it
>> yourself:
>> 
>>>> I guess in the Serde, it could make use of a package-protected
>> constructor
>>>> for the serializer and deserializer that fixes the list type and inner
>> type
>>>> to the serde-configured ones. Then, when you're configuring Streams, you
>>>> only need to specify the StreamsConfigs.
>> 
>> 
>> 
>> 
>> The problem is, that `ListSerde` is in package `clients` and thus
>> `ListSerde` cannot access `StreamsConfig`, and hence cannot use
>> `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we
>> either need to hard-code strings literal for the config names (what does
>> not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE`
>> (and others).
>> 
>> In StreamsConfig we would just redefine them for convenience:
>> 
>>> public static final String DEFAULT_LIST_KEY_SERDE_TYPE =
>> CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE;
>> 
>> 
>> Note that `TimeWindowSerde` is contained in `streams` package and thus
>> it can access `StreamsConfig` and
>> `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`.
>> 
>> 
>> 
>> 
>> Btw: I just realized that we actually don't need `ProducerConfig`
>> 
>>> list.key/value.serializer.type
>> 
>> because the list-type is irrelevant on write. We only need `inner` config.
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>> On 7/23/19 1:30 PM, John Roesler wrote:
>>> Hmm, that's a tricky situation.
>>> 
>>> I think Daniyar was on the right track... Producer only cares about
>>> serializer configs, and Consumer only cares about deserializer configs.
>>> 
>>> I didn't see the problem with your proposal:
>>> 
>>> ProducerConfig:
>>>> list.key/value.serializer.type
>>>> list.key/value.serializer.inner
>>>> ConsumerConfig:
>>>> list.key/value.deserializer.type
>>>> list.key/value.deserializer.inner
>>>> StreamsConfig:
>>>> default.list.key/value.serde.type
>>>> default.list.key/value.serde.inner
>>> 
>>> 
>>> It seems like the key/value serde configs are a better analogy than the
>>> windowed serde.
>>> ProducerConfig: key.serializer
>>> ConsumerConfig: key.deserializer
>>> StreamsConfig: default.key.serde
>>> 
>>> Just to make sure I understand the problem you're highlighting:
>>> I guess the difference is that the serializer and deserializer that are
>>> nested inside the serde also need to be configured? So, by default I'd
>> have
>>> to specify all six configs when I'm using Streams?
>>> 
>>> I guess in the Serde, it could make use of a package-protected
>> constructor
>>> for the serializer and deserializer that fixes the list type and inner
>> type
>>> to the serde-configured ones. Then, when you're configuring Streams, you
>>> only need to specify the StreamsConfigs.
>>> 
>>> Does that work?
>>> -John
>>> 
>>> 
>>> On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:
>>> 
>>>> Bump
>>>> 
>>>>> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
>>>>> 
>>>>> Hey Matthias,
>>>>> 
>>>>> It looks a little confusing, but I don’t have enough expertise to judge
>>>> on the configuration placement.
>>>>> 
>>>>> If you think, it is fine I’ll go ahead with this approach.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
>>>> wrote:
>>>>>> 
>>>>>> Good point.
>>>>>> 
>>>>>> I guess the simplest solution is, to actually add
>>>>>> 
>>>>>>>> default.list.key/value.serde.type
>>>>>>>> default.list.key/value.serde.inner
>>>>>> 
>>>>>> to both `CommonClientConfigs` and `StreamsConfig`.
>>>>>> 
>>>>>> It's not super clean, but I think it's the best we can do. Thoughts?
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> On 7/19/19 1:23 PM, Development wrote:
>>>>>>> Hi Matthias,
>>>>>>> 
>>>>>>> I agree, ConsumerConfig did not seem like a right place for these
>>>> configurations.
>>>>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>>>>>>> 
>>>>>>> However, I have a question. What should I do in
>> "configure(Map<String,
>>>> ?> configs, boolean isKey)” methods? Which configurations should I try
>> to
>>>> locate? I was comparing my (de)serializer implementations with
>>>> SessionWindows(De)serializer classes, and they use StreamsConfig class
>> to
>>>> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
>>>> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>>>>>>> 
>>>>>>> In my case, as I mentioned earlier, StreamsConfig class is not
>>>> accessible from org.apache.kafka.common.serialization package. So, I
>> can’t
>>>> utilize it. Any suggestions here?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matthias@confluent.io
>>> 
>>>> wrote:
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> One minor question about the configs. The KIP adds three classes, a
>>>>>>>> Serializer, a Deserializer, and a Serde.
>>>>>>>> 
>>>>>>>> Hence, would it make sense to add the corresponding configs to
>>>>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using
>> slightly
>>>>>>>> different names each time?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Somethin like this:
>>>>>>>> 
>>>>>>>> ProducerConfig:
>>>>>>>> 
>>>>>>>> list.key/value.serializer.type
>>>>>>>> list.key/value.serializer.inner
>>>>>>>> 
>>>>>>>> ConsumerConfig:
>>>>>>>> 
>>>>>>>> list.key/value.deserializer.type
>>>>>>>> list.key/value.deserializer.inner
>>>>>>>> 
>>>>>>>> StreamsConfig:
>>>>>>>> 
>>>>>>>> default.list.key/value.serde.type
>>>>>>>> default.list.key/value.serde.inner
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
>>>> right
>>>>>>>> to me. Also note, that it seems better to avoid the `default.`
>> prefix
>>>>>>>> for consumers and producers because there is only one Serializer or
>>>>>>>> Deserializer anyway. Only for Streams, there are multiple and
>>>>>>>> StreamsConfig specifies the default one of an operator does not
>>>>>>>> overwrite it.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Also, the KIP should explicitly mention to what classed certain
>>>> configs
>>>>>>>> are added. Atm, the KIP only list parameter names, but does not
>> state
>>>>>>>> where those are added.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Matthias
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/16/19 1:11 PM, Development wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>>>>>>> 
>>>>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie
>> Blee-Goldman
>>>> for your valuable input!
>>>>>>>>> 
>>>>>>>>> I hope I did not cause too much trouble :)
>>>>>>>>> 
>>>>>>>>> I’ll start the vote now.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>> 
>>>>>>>>>> Thanks for that update. I took a look, and I think this is in good
>>>> shape.
>>>>>>>>>> 
>>>>>>>>>> One note, the statement "New method public static <T>
>> Serde<List<T>>
>>>>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>>>>>>> (infers list implementation and inner serde from config file)" is
>>>>>>>>>> still present in the KIP, although I do it is was removed from the
>>>> PR.
>>>>>>>>>> 
>>>>>>>>>> Once you remove that statement from the KIP, then I think this KIP
>>>> is
>>>>>>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>>>>>>> earnest and get this thing merged.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> -john
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> Pushed new changes under my PR:
>>>> https://github.com/apache/kafka/pull/6592 <
>>>> https://github.com/apache/kafka/pull/6592>
>>>>>>>>>>> 
>>>>>>>>>>> Feel free to put any comments in there.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>> 
>>>>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I
>>>> removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>>>>>>> 
>>>>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
>>>> have any properties related to serdes, and that bothers me a little.
>>>>>>>>>>>> 
>>>>>>>>>>>> All properties like `default.key.serde`
>>>> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t
>> want
>>>> to create a confusion.
>>>>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
>>>> its (de)serializers are not located in
>>>> org.apache.kafka.common.serialization. I guess it kind of makes sense
>> since
>>>> windowed serdes are only available for kafka streams, not vice versa.
>>>>>>>>>>>> 
>>>>>>>>>>>> If everyone is okay to put list properties in
>>>> `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you for your input!
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding the placement, you might as well move the constants
>> to
>>>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants
>> and
>>>> the configs and the code are in the same module.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The
>>>> serde, serializer, and deserializer all need to have zero-arg
>> constructors
>>>> so they can be instantiated reflectively by Kafka. However, the factory
>>>> method you proposed "New method public static <T> Serde<List<T>>
>>>> ListSerde()" is not a constructor, and is not required. It would be used
>>>> purely from the Java interface, but has the drawbacks I listed above.
>> This
>>>> method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>> One problem though.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar,
>> I’m
>>>> trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And trying to do similar construct:
>>>>>>>>>>>>> final String propertyName = isKey ?
>>>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
>>>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>>>>>>> But then found out that StreamsConfig is not accessible from
>>>> org.apache.kafka.common.serialization package while window serde
>>>> (de)serializers are located under org.apache.kafka.streams.kstream
>> package.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What should I do? Should I move my classes under
>>>> org.apache.kafka.streams.kstream package instead?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you for your input.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think the configuration parameters strategy is finalized
>> then.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and
>> commit
>>>> them under my PR.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
>>>> matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Daniyar,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
>>>> and well
>>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> About the default constructor question:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
>>>> constructor to
>>>>>>>>>>>>>>> create them easily via reflections when specifies in a
>> config.
>>>> I
>>>>>>>>>>>>>>> understand that it is not super user friendly, but all
>>>> existing code
>>>>>>>>>>>>>>> works this way. Hence, it seems best to stick with the
>>>> established pattern.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
>>>> improve user
>>>>>>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
>>>> https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually,
>> the
>>>> user
>>>>>>>>>>>>>>> would also need to call `configure()` to setup the inner
>>>> serdes. Kafka
>>>>>>>>>>>>>>> Streams would not setup those automatically and one might
>> most
>>>> likely
>>>>>>>>>>>>>>> end-up with an NPE.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Coming back the KIP, and the parameter names.
>> `WindowedSerdes`
>>>> are
>>>>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
>>>> `WindowedSerdes`,
>>>>>>>>>>>>>>> we use the following parameter names:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It might be good to align the naming pattern. I would also
>>>> suggest to
>>>>>>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>>>>>>> default.value.list.serde.impl  ->
>>>> default.list.value.serde.type
>>>>>>>>>>>>>>> default.key.list.serde.element  ->
>>>> default.list.key.serde.inner
>>>>>>>>>>>>>>> default.value.list.serde.element  ->
>>>> default.list.value.serde.inner
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
>>>> that it goes against what Matthias suggested earlier:
>>>>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default
>>>> constructor and it should be possible to pass in the `Class listClass`
>>>> information via a configuration. Otherwise, KafkaStreams cannot use it
>> as
>>>> default serde.”
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing
>>>> anything.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <
>> john@confluent.io
>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks
>> for
>>>> the update, too.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Just one more super-small question, do we need this
>> variant:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
>>>> org.apache.kafka.common.serialization.Serdes class (infers list
>>>> implementation and inner serde from config file)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> It seems like this situation implies my config file is
>>>> already set up for the list serde, so passing this serde (e.g., in
>>>> Produced) would have the same effect as not specifying it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I guess that it could be the case that you have the
>>>> `default.key/value.serde` set to something else, like StringSerde, but
>> you
>>>> still have the `default.key/value.list.serde.impl/element` set. This
>> seems
>>>> like it would result in more confusion than convenience, so my gut
>> instinct
>>>> is maybe we shouldn't introduce the `ListSerde()` variant until people
>>>> actually request it later on.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
>>>> source-code-driven, not half/half.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <
>> dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>>
>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
>>>> correctly, but I think I already listed them:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So for Produced, you would use it in the following
>> fashion,
>>>> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
>>>> Serdes.Integer()))
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
>>>> Strategy” where I describe our logic of conditional serialization based
>> on
>>>> the type of an inner serde.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>> <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In addition to specifying the config interface, can you
>>>> also specify
>>>>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
>>>> of this
>>>>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced,
>> Materialized,
>>>> etc., what
>>>>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
>>>> Serializer
>>>>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
>>>> implementation
>>>>>>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If you also want to specify the serialized format of the
>>>> data records
>>>>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
>>>> letting us
>>>>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility
>>>> concerns, etc.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Finally made updates to the KIP:
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> 
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> <
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>> 
>>>>>>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>> matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes, something like this. I did not think about good
>>>> configuration
>>>>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand
>> all
>>>> proposed
>>>>>>>>>>>>>>>>>> configs atm. But all configs should be listed and
>> explained
>>>> in the KIP
>>>>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
>>>> the KIP (I can
>>>>>>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
>>>> have in a list
>>>>>>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> As for the default constructor and configurability, just
>>>> want to make
>>>>>>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
>> matthias@confluent.io>
>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
>>>> `ListSerde`
>>>>>>>>>>>>>>>>>> should have an default constructor and it should be
>>>> possible to pass in
>>>>>>>>>>>>>>>>>> the `Class listClass` information via a configuration.
>>>> Otherwise,
>>>>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
>>>> primitive IMHO,
>>>>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
>>>> arrays, not for
>>>>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class
>> wrapping
>>>> `byte[]`).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
>>>> the KIP
>>>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review
>> the
>>>> final
>>>>>>>>>>>>>>>>>> version under PR
>> https://github.com/apache/kafka/pull/6592
>>>> <https://github.com/apache/kafka/pull/6592> <
>>>> https://github.com/apache/kafka/pull/6592 <
>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I also need some advice on writing tests for this new
>>>> serde. So far I
>>>>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload),
>> I’m
>>>> not sure
>>>>>>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>>>
>>>> wrote:
>>>>>>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
>>>> information.
>>>>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
>>>> type. I
>>>>>>>>>>>>>>>>>> realized that’s the type is not really needed in
>>>> ListSerializer, but
>>>>>>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
>>>> serializer is
>>>>>>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so
>> that
>>>> during
>>>>>>>>>>>>>>>>>> deserialization we could create an instance of it with
>>>> predefined
>>>>>>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
>>>> deserializer.
>>>>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
>>>> means
>>>>>>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
>> matthias@confluent.io>
>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
>>>> matthias@confluent.io <ma...@confluent.io> <mailto:
>>>> matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about
>>>> re-encoding the
>>>>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
>>>> that the
>>>>>>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list
>> type
>>>> on
>>>>>>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
>>>> encode the
>>>>>>>>>>>>>>>>>> type size, but users could specify the type on the
>>>> deserializer (via a
>>>>>>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence,
>> I
>>>> doubt
>>>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in
>>>> the user
>>>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of
>>>> what I was
>>>>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
>>>> would that not
>>>>>>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
>>>> though. For
>>>>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
>>>> The serde
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd
>> still
>>>> need an
>>>>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T>
>> innerSerde).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
>>>> Serdes.listSerde(
>>>>>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net
>>>>>>> 
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the
>> list
>>>> serde.
>>>>>>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>>>>>>> supported:
>>>>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
>> <
>>>> https://github.com/FasterXML/jackson-databind/issues/1490> <
>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>>> https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>>>>>>> <
>> https://github.com/FasterXML/jackson-databind/issues/1490
>>>> <https://github.com/FasterXML/jackson-databind/issues/1490> <
>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>>>> https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>>>>>>> The only way is to pass an actual class object into the
>>>> constructor,
>>>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
>>>> think of my
>>>>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
>>>> described
>>>>>>>>>>>>>>>>>> previously)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net
>>>>>>> 
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
>>>> over
>>>>>>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
>>>> he gave
>>>>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>> <ma...@confluent.io>>>
>>>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
>>>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>>>> <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
>>>> and then
>>>>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>>>>>>> There are
>>>>>>>>>>>>>>>>>> some inherent problems with this approach, which in
>> general
>>>>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
>>>> duplicated
>>>>>>>>>>>>>>>>>> information: the concrete type as well as a byte to
>> indicate
>>>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
>>>> they
>>>>>>>>>>>>>>>>>> tell us
>>>>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
>>>> this
>>>>>>>>>>>>>>>>>> information is completely redundant. In all likelihood,
>> the
>>>>>>>>>>>>>>>>>> information will be exactly the same for every record in
>> the
>>>>>>>>>>>>>>>>>> topic.
>>>>>>>>>>>>>>>>>> This problem is essentially the core motivation for
>>>> serializations
>>>>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>>>>>>> that the records won't contain so much redundant
>>>> information.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back
>> to
>>>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>> like what you had earlier in which you don't support
>>>> perfectly
>>>>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
>>>> just
>>>>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could
>> defer
>>>> full,
>>>>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>>>>>>> system in
>>>>>>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add
>> a
>>>>>>>>>>>>>>>>>> configuration option specifically for the serde to
>> configure
>>>>>>>>>>>>>>>>>> what the
>>>>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
>>>> well.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
>>>> take
>>>>>>>>>>>>>>>>>> a look
>>>>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
>>>> specific,
>>>>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
>>>> structure.
>>>>>>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>>>>>>> 
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> 
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>> 
>>>>>>>>>>>>>>>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> 
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>> <
>>>> 
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>> 
>>>>>>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
>>>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net>>
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>>>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net
>>>>>>> 
>>>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>>>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
>> <mailto:
>>>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> bump
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 



Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Ah, thanks for setting me straight, Matthias.

Given the choice between defining the Serde in the streams module (hence it
would not be in the Serdes "menu" class) or defining the configuration
property in CommonClientConfig, I think I'm leaning toward the latter.

Really good catch on the ProducerConfig; otherwise, I think we should go
ahead and add the serializer/deserializer configs as discussed to
ProducerConfig and ConsumerConfig. It's just cleaner and more uniform that
way.

Thanks again,
-John

On Tue, Jul 23, 2019 at 8:08 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> >> Just to make sure I understand the problem you're highlighting:
> >> I guess the difference is that the serializer and deserializer that are
> >> nested inside the serde also need to be configured? So, by default I'd
> have
> >> to specify all six configs when I'm using Streams?
>
> That is not the problem. And you actually describe the solution for it
> yourself:
>
> >> I guess in the Serde, it could make use of a package-protected
> constructor
> >> for the serializer and deserializer that fixes the list type and inner
> type
> >> to the serde-configured ones. Then, when you're configuring Streams, you
> >> only need to specify the StreamsConfigs.
>
>
>
>
> The problem is, that `ListSerde` is in package `clients` and thus
> `ListSerde` cannot access `StreamsConfig`, and hence cannot use
> `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we
> either need to hard-code strings literal for the config names (what does
> not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE`
> (and others).
>
> In StreamsConfig we would just redefine them for convenience:
>
> > public static final String DEFAULT_LIST_KEY_SERDE_TYPE =
> CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE;
>
>
> Note that `TimeWindowSerde` is contained in `streams` package and thus
> it can access `StreamsConfig` and
> `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`.
>
>
>
>
> Btw: I just realized that we actually don't need `ProducerConfig`
>
> > list.key/value.serializer.type
>
> because the list-type is irrelevant on write. We only need `inner` config.
>
>
>
> -Matthias
>
>
> On 7/23/19 1:30 PM, John Roesler wrote:
> > Hmm, that's a tricky situation.
> >
> > I think Daniyar was on the right track... Producer only cares about
> > serializer configs, and Consumer only cares about deserializer configs.
> >
> > I didn't see the problem with your proposal:
> >
> > ProducerConfig:
> >> list.key/value.serializer.type
> >> list.key/value.serializer.inner
> >> ConsumerConfig:
> >> list.key/value.deserializer.type
> >> list.key/value.deserializer.inner
> >> StreamsConfig:
> >> default.list.key/value.serde.type
> >> default.list.key/value.serde.inner
> >
> >
> > It seems like the key/value serde configs are a better analogy than the
> > windowed serde.
> > ProducerConfig: key.serializer
> > ConsumerConfig: key.deserializer
> > StreamsConfig: default.key.serde
> >
> > Just to make sure I understand the problem you're highlighting:
> > I guess the difference is that the serializer and deserializer that are
> > nested inside the serde also need to be configured? So, by default I'd
> have
> > to specify all six configs when I'm using Streams?
> >
> > I guess in the Serde, it could make use of a package-protected
> constructor
> > for the serializer and deserializer that fixes the list type and inner
> type
> > to the serde-configured ones. Then, when you're configuring Streams, you
> > only need to specify the StreamsConfigs.
> >
> > Does that work?
> > -John
> >
> >
> > On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:
> >
> >> Bump
> >>
> >>> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
> >>>
> >>> Hey Matthias,
> >>>
> >>> It looks a little confusing, but I don’t have enough expertise to judge
> >> on the configuration placement.
> >>>
> >>> If you think, it is fine I’ll go ahead with this approach.
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
> >> wrote:
> >>>>
> >>>> Good point.
> >>>>
> >>>> I guess the simplest solution is, to actually add
> >>>>
> >>>>>> default.list.key/value.serde.type
> >>>>>> default.list.key/value.serde.inner
> >>>>
> >>>> to both `CommonClientConfigs` and `StreamsConfig`.
> >>>>
> >>>> It's not super clean, but I think it's the best we can do. Thoughts?
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 7/19/19 1:23 PM, Development wrote:
> >>>>> Hi Matthias,
> >>>>>
> >>>>> I agree, ConsumerConfig did not seem like a right place for these
> >> configurations.
> >>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
> >>>>>
> >>>>> However, I have a question. What should I do in
> "configure(Map<String,
> >> ?> configs, boolean isKey)” methods? Which configurations should I try
> to
> >> locate? I was comparing my (de)serializer implementations with
> >> SessionWindows(De)serializer classes, and they use StreamsConfig class
> to
> >> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
> >> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
> >>>>>
> >>>>> In my case, as I mentioned earlier, StreamsConfig class is not
> >> accessible from org.apache.kafka.common.serialization package. So, I
> can’t
> >> utilize it. Any suggestions here?
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>
> >>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matthias@confluent.io
> >
> >> wrote:
> >>>>>>
> >>>>>> Thanks!
> >>>>>>
> >>>>>> One minor question about the configs. The KIP adds three classes, a
> >>>>>> Serializer, a Deserializer, and a Serde.
> >>>>>>
> >>>>>> Hence, would it make sense to add the corresponding configs to
> >>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using
> slightly
> >>>>>> different names each time?
> >>>>>>
> >>>>>>
> >>>>>> Somethin like this:
> >>>>>>
> >>>>>> ProducerConfig:
> >>>>>>
> >>>>>> list.key/value.serializer.type
> >>>>>> list.key/value.serializer.inner
> >>>>>>
> >>>>>> ConsumerConfig:
> >>>>>>
> >>>>>> list.key/value.deserializer.type
> >>>>>> list.key/value.deserializer.inner
> >>>>>>
> >>>>>> StreamsConfig:
> >>>>>>
> >>>>>> default.list.key/value.serde.type
> >>>>>> default.list.key/value.serde.inner
> >>>>>>
> >>>>>>
> >>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
> >> right
> >>>>>> to me. Also note, that it seems better to avoid the `default.`
> prefix
> >>>>>> for consumers and producers because there is only one Serializer or
> >>>>>> Deserializer anyway. Only for Streams, there are multiple and
> >>>>>> StreamsConfig specifies the default one of an operator does not
> >>>>>> overwrite it.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>>
> >>>>>> Also, the KIP should explicitly mention to what classed certain
> >> configs
> >>>>>> are added. Atm, the KIP only list parameter names, but does not
> state
> >>>>>> where those are added.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 7/16/19 1:11 PM, Development wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
> >>>>>>>
> >>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie
> Blee-Goldman
> >> for your valuable input!
> >>>>>>>
> >>>>>>> I hope I did not cause too much trouble :)
> >>>>>>>
> >>>>>>> I’ll start the vote now.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
> >> wrote:
> >>>>>>>>
> >>>>>>>> Hi Daniyar,
> >>>>>>>>
> >>>>>>>> Thanks for that update. I took a look, and I think this is in good
> >> shape.
> >>>>>>>>
> >>>>>>>> One note, the statement "New method public static <T>
> Serde<List<T>>
> >>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
> >>>>>>>> (infers list implementation and inner serde from config file)" is
> >>>>>>>> still present in the KIP, although I do it is was removed from the
> >> PR.
> >>>>>>>>
> >>>>>>>> Once you remove that statement from the KIP, then I think this KIP
> >> is
> >>>>>>>> ready to go up for a vote! Then, we can really review the PR in
> >>>>>>>> earnest and get this thing merged.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -john
> >>>>>>>>
> >>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Pushed new changes under my PR:
> >> https://github.com/apache/kafka/pull/6592 <
> >> https://github.com/apache/kafka/pull/6592>
> >>>>>>>>>
> >>>>>>>>> Feel free to put any comments in there.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Daniyar Yeralin
> >>>>>>>>>
> >>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi John,
> >>>>>>>>>>
> >>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I
> >> removed all `listSerde()` methods, and left empty constructors instead.
> >>>>>>>>>>
> >>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
> >> have any properties related to serdes, and that bothers me a little.
> >>>>>>>>>>
> >>>>>>>>>> All properties like `default.key.serde`
> >> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t
> want
> >> to create a confusion.
> >>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
> >> its (de)serializers are not located in
> >> org.apache.kafka.common.serialization. I guess it kind of makes sense
> since
> >> windowed serdes are only available for kafka streams, not vice versa.
> >>>>>>>>>>
> >>>>>>>>>> If everyone is okay to put list properties in
> >> `CommonClientConfigs` class, I’ll go ahead and do that then.
> >>>>>>>>>>
> >>>>>>>>>> Thank you for your input!
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
> >> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the placement, you might as well move the constants
> to
> >> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants
> and
> >> the configs and the code are in the same module.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The
> >> serde, serializer, and deserializer all need to have zero-arg
> constructors
> >> so they can be instantiated reflectively by Kafka. However, the factory
> >> method you proposed "New method public static <T> Serde<List<T>>
> >> ListSerde()" is not a constructor, and is not required. It would be used
> >> purely from the Java interface, but has the drawbacks I listed above.
> This
> >> method, not the constructor, is what I proposed to remove from the KIP.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> -John
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
> >> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>> One problem though.
> >>>>>>>>>>>
> >>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar,
> I’m
> >> trying to mimic the implementation of my ListSerde accordingly.
> >>>>>>>>>>>
> >>>>>>>>>>> I created couple constants under StreamsConfig:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> And trying to do similar construct:
> >>>>>>>>>>> final String propertyName = isKey ?
> >> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
> >> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
> >>>>>>>>>>> But then found out that StreamsConfig is not accessible from
> >> org.apache.kafka.common.serialization package while window serde
> >> (de)serializers are located under org.apache.kafka.streams.kstream
> package.
> >>>>>>>>>>>
> >>>>>>>>>>> What should I do? Should I move my classes under
> >> org.apache.kafka.streams.kstream package instead?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
> >> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi Matthias,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you for your input.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I updated the KIP, made it a little more readable.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I think the configuration parameters strategy is finalized
> then.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and
> commit
> >> them under my PR.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
> >> matthias@confluent.io <ma...@confluent.io>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Daniyar,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
> >> and well
> >>>>>>>>>>>>> written.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About the default constructor question:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
> >> constructor to
> >>>>>>>>>>>>> create them easily via reflections when specifies in a
> config.
> >> I
> >>>>>>>>>>>>> understand that it is not super user friendly, but all
> >> existing code
> >>>>>>>>>>>>> works this way. Hence, it seems best to stick with the
> >> established pattern.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
> >>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
> >> improve user
> >>>>>>>>>>>>> experience that address the exact issue John raised. (cf
> >>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
> >> https://github.com/apache/kafka/pull/7067>)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually,
> the
> >> user
> >>>>>>>>>>>>> would also need to call `configure()` to setup the inner
> >> serdes. Kafka
> >>>>>>>>>>>>> Streams would not setup those automatically and one might
> most
> >> likely
> >>>>>>>>>>>>> end-up with an NPE.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Coming back the KIP, and the parameter names.
> `WindowedSerdes`
> >> are
> >>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
> >> `WindowedSerdes`,
> >>>>>>>>>>>>> we use the following parameter names:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - default.windowed.key.serde.inner
> >>>>>>>>>>>>> - default.windowed.value.serde.inner
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It might be good to align the naming pattern. I would also
> >> suggest to
> >>>>>>>>>>>>> use `type` instead of `impl`?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
> >>>>>>>>>>>>> default.value.list.serde.impl  ->
> >> default.list.value.serde.type
> >>>>>>>>>>>>> default.key.list.serde.element  ->
> >> default.list.key.serde.inner
> >>>>>>>>>>>>> default.value.list.serde.element  ->
> >> default.list.value.serde.inner
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
> >> that it goes against what Matthias suggested earlier:
> >>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default
> >> constructor and it should be possible to pass in the `Class listClass`
> >> information via a configuration. Otherwise, KafkaStreams cannot use it
> as
> >> default serde.”
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing
> >> anything.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <
> john@confluent.io
> >> <ma...@confluent.io>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks
> for
> >> the update, too.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Just one more super-small question, do we need this
> variant:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
> >> org.apache.kafka.common.serialization.Serdes class (infers list
> >> implementation and inner serde from config file)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It seems like this situation implies my config file is
> >> already set up for the list serde, so passing this serde (e.g., in
> >> Produced) would have the same effect as not specifying it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I guess that it could be the case that you have the
> >> `default.key/value.serde` set to something else, like StringSerde, but
> you
> >> still have the `default.key/value.list.serde.impl/element` set. This
> seems
> >> like it would result in more confusion than convenience, so my gut
> instinct
> >> is maybe we shouldn't introduce the `ListSerde()` variant until people
> >> actually request it later on.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
> >> source-code-driven, not half/half.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <
> dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>>
> >> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I hope everyone had a great long weekend.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
> >> correctly, but I think I already listed them:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So for Produced, you would use it in the following
> fashion,
> >> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
> >> Serdes.Integer()))
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
> >> Strategy” where I describe our logic of conditional serialization based
> on
> >> the type of an inner serde.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
> >> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> >> <ma...@confluent.io>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the update, Daniyar!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In addition to specifying the config interface, can you
> >> also specify
> >>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
> >> of this
> >>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced,
> Materialized,
> >> etc., what
> >>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
> >> Serializer
> >>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
> >> implementation
> >>>>>>>>>>>>>>>> logic, since we've already discussed it here.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If you also want to specify the serialized format of the
> >> data records
> >>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
> >> letting us
> >>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility
> >> concerns, etc.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
> >> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Finally made updates to the KIP:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>>
> >>>>>>>>>>>>>>>> Sorry for the delay :)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thank You!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
> >> matthias@confluent.io <ma...@confluent.io> <mailto:
> >> matthias@confluent.io <ma...@confluent.io>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, something like this. I did not think about good
> >> configuration
> >>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand
> all
> >> proposed
> >>>>>>>>>>>>>>>> configs atm. But all configs should be listed and
> explained
> >> in the KIP
> >>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
> >> the KIP (I can
> >>>>>>>>>>>>>>>> ask more detailed question if I have any).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
> >> have in a list
> >>>>>>>>>>>>>>>> of primitives.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As for the default constructor and configurability, just
> >> want to make
> >>>>>>>>>>>>>>>> sure. Is this what you have on your mind?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
> >> matthias@confluent.io <ma...@confluent.io> <mailto:
> >> matthias@confluent.io <ma...@confluent.io>>
> >>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
> matthias@confluent.io>
> >> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the update!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
> >> `ListSerde`
> >>>>>>>>>>>>>>>> should have an default constructor and it should be
> >> possible to pass in
> >>>>>>>>>>>>>>>> the `Class listClass` information via a configuration.
> >> Otherwise,
> >>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
> >> primitive IMHO,
> >>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
> >> arrays, not for
> >>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class
> wrapping
> >> `byte[]`).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
> >> the KIP
> >>>>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you also update the KIP?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review
> the
> >> final
> >>>>>>>>>>>>>>>> version under PR
> https://github.com/apache/kafka/pull/6592
> >> <https://github.com/apache/kafka/pull/6592> <
> >> https://github.com/apache/kafka/pull/6592 <
> >> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I also need some advice on writing tests for this new
> >> serde. So far I
> >>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload),
> I’m
> >> not sure
> >>>>>>>>>>>>>>>> if it is enough.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
> >> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> >> <ma...@confluent.io>>
> >>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> >> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
> >> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>>>
> >> wrote:
> >>>>>>>>>>>>>>>> Hey John and Matthias,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
> >> information.
> >>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
> >> type. I
> >>>>>>>>>>>>>>>> realized that’s the type is not really needed in
> >> ListSerializer, but
> >>>>>>>>>>>>>>>> only in ListDeserializer:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
> >> serializer is
> >>>>>>>>>>>>>>>> not a primitive serializer:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so
> that
> >> during
> >>>>>>>>>>>>>>>> deserialization we could create an instance of it with
> >> predefined
> >>>>>>>>>>>>>>>> listSize for better performance.
> >>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
> >> deserializer.
> >>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
> >> means
> >>>>>>>>>>>>>>>> that each entry’s size was encoded individually.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This looks much cleaner and more concise.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
> >> matthias@confluent.io <ma...@confluent.io> <mailto:
> >> matthias@confluent.io <ma...@confluent.io>>
> >>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <mailto:
> matthias@confluent.io>
> >> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
> >> matthias@confluent.io <ma...@confluent.io> <mailto:
> >> matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about
> >> re-encoding the
> >>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
> >> that the
> >>>>>>>>>>>>>>>> Deserializer returns a fixed type...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list
> type
> >> on
> >>>>>>>>>>>>>>>> deserialization via config?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
> >> encode the
> >>>>>>>>>>>>>>>> type size, but users could specify the type on the
> >> deserializer (via a
> >>>>>>>>>>>>>>>> config again)?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence,
> I
> >> doubt
> >>>>>>>>>>>>>>>> we can
> >>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in
> >> the user
> >>>>>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for looking at it!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of
> >> what I was
> >>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
> >> would that not
> >>>>>>>>>>>>>>>> be "vanilla java"?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
> >> though. For
> >>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
> >> The serde
> >>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd
> still
> >> need an
> >>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T>
> innerSerde).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
> >> Serdes.listSerde(
> >>>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
> >>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
> >>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And in configuration, it's something like:
> >>>>>>>>>>>>>>>> default.key.serde: org...ListSerde
> >>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
> >>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
> >> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net
> >>>>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> >> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hey John,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the
> list
> >> serde.
> >>>>>>>>>>>>>>>> However, it is not directly
> >>>>>>>>>>>>>>>> supported:
> >>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
> <
> >> https://github.com/FasterXML/jackson-databind/issues/1490> <
> >> https://github.com/FasterXML/jackson-databind/issues/1490 <
> >> https://github.com/FasterXML/jackson-databind/issues/1490>>
> >>>>>>>>>>>>>>>> <
> https://github.com/FasterXML/jackson-databind/issues/1490
> >> <https://github.com/FasterXML/jackson-databind/issues/1490> <
> >> https://github.com/FasterXML/jackson-databind/issues/1490 <
> >> https://github.com/FasterXML/jackson-databind/issues/1490>>>
> >>>>>>>>>>>>>>>> The only way is to pass an actual class object into the
> >> constructor,
> >>>>>>>>>>>>>>>> something like:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
> >> think of my
> >>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
> >> described
> >>>>>>>>>>>>>>>> previously)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net
> >>>>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> >> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
> >> over
> >>>>>>>>>>>>>>>> engineered :)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
> >> he gave
> >>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
> >> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> >> <ma...@confluent.io>>
> >>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> >> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
> >> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> >> <ma...@confluent.io>>>
> >>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> >> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
> >> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> >> <ma...@confluent.io>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Daniyar,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> That's a very clever solution!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
> >>>>>>>>>>>>>>>> polymorphic
> >>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
> >> and then
> >>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
> >>>>>>>>>>>>>>>> There are
> >>>>>>>>>>>>>>>> some inherent problems with this approach, which in
> general
> >>>>>>>>>>>>>>>> require
> >>>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
> >>>>>>>>>>>>>>>> Registry, just
> >>>>>>>>>>>>>>>> any registry for schemas) to solve.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
> >> duplicated
> >>>>>>>>>>>>>>>> information: the concrete type as well as a byte to
> indicate
> >>>>>>>>>>>>>>>> whether
> >>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
> >>>>>>>>>>>>>>>> indicate the
> >>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
> >> they
> >>>>>>>>>>>>>>>> tell us
> >>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
> >> this
> >>>>>>>>>>>>>>>> information is completely redundant. In all likelihood,
> the
> >>>>>>>>>>>>>>>> information will be exactly the same for every record in
> the
> >>>>>>>>>>>>>>>> topic.
> >>>>>>>>>>>>>>>> This problem is essentially the core motivation for
> >> serializations
> >>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
> >>>>>>>>>>>>>>>> itself, so
> >>>>>>>>>>>>>>>> that the records won't contain so much redundant
> >> information.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back
> to
> >>>>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>> like what you had earlier in which you don't support
> >> perfectly
> >>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
> >> just
> >>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could
> defer
> >> full,
> >>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
> >>>>>>>>>>>>>>>> system in
> >>>>>>>>>>>>>>>> which to register their type information.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
> >>>>>>>>>>>>>>>> preserve the
> >>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add
> a
> >>>>>>>>>>>>>>>> configuration option specifically for the serde to
> configure
> >>>>>>>>>>>>>>>> what the
> >>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
> >> well.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
> >> take
> >>>>>>>>>>>>>>>> a look
> >>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
> >> specific,
> >>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
> >> structure.
> >>>>>>>>>>>>>>>> Specifically, you might find
> >>>>>>>>>>>>>>>>
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>>>
> >>>>>>>>>>>>>>>> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >> <
> >>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>>>>
> >>>>>>>>>>>>>>>> interesting.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
> >> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> >> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net
> >>>>>
> >>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> >> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net
> <mailto:
> >> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> >> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> bump
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
>> Just to make sure I understand the problem you're highlighting:
>> I guess the difference is that the serializer and deserializer that are
>> nested inside the serde also need to be configured? So, by default I'd have
>> to specify all six configs when I'm using Streams?

That is not the problem. And you actually describe the solution for it
yourself:

>> I guess in the Serde, it could make use of a package-protected constructor
>> for the serializer and deserializer that fixes the list type and inner type
>> to the serde-configured ones. Then, when you're configuring Streams, you
>> only need to specify the StreamsConfigs.




The problem is, that `ListSerde` is in package `clients` and thus
`ListSerde` cannot access `StreamsConfig`, and hence cannot use
`StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we
either need to hard-code strings literal for the config names (what does
not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE`
(and others).

In StreamsConfig we would just redefine them for convenience:

> public static final String DEFAULT_LIST_KEY_SERDE_TYPE = CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE;


Note that `TimeWindowSerde` is contained in `streams` package and thus
it can access `StreamsConfig` and
`StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`.




Btw: I just realized that we actually don't need `ProducerConfig`

> list.key/value.serializer.type

because the list-type is irrelevant on write. We only need `inner` config.



-Matthias


On 7/23/19 1:30 PM, John Roesler wrote:
> Hmm, that's a tricky situation.
> 
> I think Daniyar was on the right track... Producer only cares about
> serializer configs, and Consumer only cares about deserializer configs.
> 
> I didn't see the problem with your proposal:
> 
> ProducerConfig:
>> list.key/value.serializer.type
>> list.key/value.serializer.inner
>> ConsumerConfig:
>> list.key/value.deserializer.type
>> list.key/value.deserializer.inner
>> StreamsConfig:
>> default.list.key/value.serde.type
>> default.list.key/value.serde.inner
> 
> 
> It seems like the key/value serde configs are a better analogy than the
> windowed serde.
> ProducerConfig: key.serializer
> ConsumerConfig: key.deserializer
> StreamsConfig: default.key.serde
> 
> Just to make sure I understand the problem you're highlighting:
> I guess the difference is that the serializer and deserializer that are
> nested inside the serde also need to be configured? So, by default I'd have
> to specify all six configs when I'm using Streams?
> 
> I guess in the Serde, it could make use of a package-protected constructor
> for the serializer and deserializer that fixes the list type and inner type
> to the serde-configured ones. Then, when you're configuring Streams, you
> only need to specify the StreamsConfigs.
> 
> Does that work?
> -John
> 
> 
> On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:
> 
>> Bump
>>
>>> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
>>>
>>> Hey Matthias,
>>>
>>> It looks a little confusing, but I don’t have enough expertise to judge
>> on the configuration placement.
>>>
>>> If you think, it is fine I’ll go ahead with this approach.
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>>>
>>>> Good point.
>>>>
>>>> I guess the simplest solution is, to actually add
>>>>
>>>>>> default.list.key/value.serde.type
>>>>>> default.list.key/value.serde.inner
>>>>
>>>> to both `CommonClientConfigs` and `StreamsConfig`.
>>>>
>>>> It's not super clean, but I think it's the best we can do. Thoughts?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 7/19/19 1:23 PM, Development wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> I agree, ConsumerConfig did not seem like a right place for these
>> configurations.
>>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>>>>>
>>>>> However, I have a question. What should I do in "configure(Map<String,
>> ?> configs, boolean isKey)” methods? Which configurations should I try to
>> locate? I was comparing my (de)serializer implementations with
>> SessionWindows(De)serializer classes, and they use StreamsConfig class to
>> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
>> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>>>>>
>>>>> In my case, as I mentioned earlier, StreamsConfig class is not
>> accessible from org.apache.kafka.common.serialization package. So, I can’t
>> utilize it. Any suggestions here?
>>>>>
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>>
>>>>>
>>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> One minor question about the configs. The KIP adds three classes, a
>>>>>> Serializer, a Deserializer, and a Serde.
>>>>>>
>>>>>> Hence, would it make sense to add the corresponding configs to
>>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
>>>>>> different names each time?
>>>>>>
>>>>>>
>>>>>> Somethin like this:
>>>>>>
>>>>>> ProducerConfig:
>>>>>>
>>>>>> list.key/value.serializer.type
>>>>>> list.key/value.serializer.inner
>>>>>>
>>>>>> ConsumerConfig:
>>>>>>
>>>>>> list.key/value.deserializer.type
>>>>>> list.key/value.deserializer.inner
>>>>>>
>>>>>> StreamsConfig:
>>>>>>
>>>>>> default.list.key/value.serde.type
>>>>>> default.list.key/value.serde.inner
>>>>>>
>>>>>>
>>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
>> right
>>>>>> to me. Also note, that it seems better to avoid the `default.` prefix
>>>>>> for consumers and producers because there is only one Serializer or
>>>>>> Deserializer anyway. Only for Streams, there are multiple and
>>>>>> StreamsConfig specifies the default one of an operator does not
>>>>>> overwrite it.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>>
>>>>>> Also, the KIP should explicitly mention to what classed certain
>> configs
>>>>>> are added. Atm, the KIP only list parameter names, but does not state
>>>>>> where those are added.
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7/16/19 1:11 PM, Development wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>>>>>
>>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman
>> for your valuable input!
>>>>>>>
>>>>>>> I hope I did not cause too much trouble :)
>>>>>>>
>>>>>>> I’ll start the vote now.
>>>>>>>
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>>
>>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
>> wrote:
>>>>>>>>
>>>>>>>> Hi Daniyar,
>>>>>>>>
>>>>>>>> Thanks for that update. I took a look, and I think this is in good
>> shape.
>>>>>>>>
>>>>>>>> One note, the statement "New method public static <T> Serde<List<T>>
>>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>>>>> (infers list implementation and inner serde from config file)" is
>>>>>>>> still present in the KIP, although I do it is was removed from the
>> PR.
>>>>>>>>
>>>>>>>> Once you remove that statement from the KIP, then I think this KIP
>> is
>>>>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>>>>> earnest and get this thing merged.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> -john
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
>> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Pushed new changes under my PR:
>> https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592>
>>>>>>>>>
>>>>>>>>> Feel free to put any comments in there.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>>
>>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I
>> removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>>>>>
>>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
>> have any properties related to serdes, and that bothers me a little.
>>>>>>>>>>
>>>>>>>>>> All properties like `default.key.serde`
>> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want
>> to create a confusion.
>>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
>> its (de)serializers are not located in
>> org.apache.kafka.common.serialization. I guess it kind of makes sense since
>> windowed serdes are only available for kafka streams, not vice versa.
>>>>>>>>>>
>>>>>>>>>> If everyone is okay to put list properties in
>> `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>>>>>
>>>>>>>>>> Thank you for your input!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Regarding the placement, you might as well move the constants to
>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
>> the configs and the code are in the same module.
>>>>>>>>>>>
>>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The
>> serde, serializer, and deserializer all need to have zero-arg constructors
>> so they can be instantiated reflectively by Kafka. However, the factory
>> method you proposed "New method public static <T> Serde<List<T>>
>> ListSerde()" is not a constructor, and is not required. It would be used
>> purely from the Java interface, but has the drawbacks I listed above. This
>> method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>> One problem though.
>>>>>>>>>>>
>>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m
>> trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>>>>>
>>>>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And trying to do similar construct:
>>>>>>>>>>> final String propertyName = isKey ?
>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>>>>> But then found out that StreamsConfig is not accessible from
>> org.apache.kafka.common.serialization package while window serde
>> (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>>>>>>>
>>>>>>>>>>> What should I do? Should I move my classes under
>> org.apache.kafka.streams.kstream package instead?
>>>>>>>>>>>
>>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your input.
>>>>>>>>>>>>
>>>>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>>>>>
>>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit
>> them under my PR.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
>> matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Daniyar,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
>> and well
>>>>>>>>>>>>> written.
>>>>>>>>>>>>>
>>>>>>>>>>>>> About the default constructor question:
>>>>>>>>>>>>>
>>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
>> constructor to
>>>>>>>>>>>>> create them easily via reflections when specifies in a config.
>> I
>>>>>>>>>>>>> understand that it is not super user friendly, but all
>> existing code
>>>>>>>>>>>>> works this way. Hence, it seems best to stick with the
>> established pattern.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
>> improve user
>>>>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
>> https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the
>> user
>>>>>>>>>>>>> would also need to call `configure()` to setup the inner
>> serdes. Kafka
>>>>>>>>>>>>> Streams would not setup those automatically and one might most
>> likely
>>>>>>>>>>>>> end-up with an NPE.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes`
>> are
>>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
>> `WindowedSerdes`,
>>>>>>>>>>>>> we use the following parameter names:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It might be good to align the naming pattern. I would also
>> suggest to
>>>>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>>>>> default.value.list.serde.impl  ->
>> default.list.value.serde.type
>>>>>>>>>>>>> default.key.list.serde.element  ->
>> default.list.key.serde.inner
>>>>>>>>>>>>> default.value.list.serde.element  ->
>> default.list.value.serde.inner
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
>> that it goes against what Matthias suggested earlier:
>>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default
>> constructor and it should be possible to pass in the `Class listClass`
>> information via a configuration. Otherwise, KafkaStreams cannot use it as
>> default serde.”
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing
>> anything.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io
>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for
>> the update, too.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
>> org.apache.kafka.common.serialization.Serdes class (infers list
>> implementation and inner serde from config file)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems like this situation implies my config file is
>> already set up for the list serde, so passing this serde (e.g., in
>> Produced) would have the same effect as not specifying it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I guess that it could be the case that you have the
>> `default.key/value.serde` set to something else, like StringSerde, but you
>> still have the `default.key/value.list.serde.impl/element` set. This seems
>> like it would result in more confusion than convenience, so my gut instinct
>> is maybe we shouldn't introduce the `ListSerde()` variant until people
>> actually request it later on.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
>> source-code-driven, not half/half.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
>> correctly, but I think I already listed them:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So for Produced, you would use it in the following fashion,
>> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
>> Serdes.Integer()))
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
>> Strategy” where I describe our logic of conditional serialization based on
>> the type of an inner serde.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>> <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition to specifying the config interface, can you
>> also specify
>>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
>> of this
>>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized,
>> etc., what
>>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
>> Serializer
>>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
>> implementation
>>>>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you also want to specify the serialized format of the
>> data records
>>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
>> letting us
>>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility
>> concerns, etc.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Finally made updates to the KIP:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>
>>>>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
>> matthias@confluent.io <ma...@confluent.io> <mailto:
>> matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, something like this. I did not think about good
>> configuration
>>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all
>> proposed
>>>>>>>>>>>>>>>> configs atm. But all configs should be listed and explained
>> in the KIP
>>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
>> the KIP (I can
>>>>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
>> have in a list
>>>>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As for the default constructor and configurability, just
>> want to make
>>>>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
>> matthias@confluent.io <ma...@confluent.io> <mailto:
>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
>> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
>> `ListSerde`
>>>>>>>>>>>>>>>> should have an default constructor and it should be
>> possible to pass in
>>>>>>>>>>>>>>>> the `Class listClass` information via a configuration.
>> Otherwise,
>>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
>> primitive IMHO,
>>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
>> arrays, not for
>>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping
>> `byte[]`).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
>> the KIP
>>>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the
>> final
>>>>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592
>> <https://github.com/apache/kafka/pull/6592> <
>> https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also need some advice on writing tests for this new
>> serde. So far I
>>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m
>> not sure
>>>>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>
>> wrote:
>>>>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
>> information.
>>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
>> type. I
>>>>>>>>>>>>>>>> realized that’s the type is not really needed in
>> ListSerializer, but
>>>>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
>> serializer is
>>>>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that
>> during
>>>>>>>>>>>>>>>> deserialization we could create an instance of it with
>> predefined
>>>>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
>> deserializer.
>>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
>> means
>>>>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
>> matthias@confluent.io <ma...@confluent.io> <mailto:
>> matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
>> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
>> matthias@confluent.io <ma...@confluent.io> <mailto:
>> matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about
>> re-encoding the
>>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
>> that the
>>>>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type
>> on
>>>>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
>> encode the
>>>>>>>>>>>>>>>> type size, but users could specify the type on the
>> deserializer (via a
>>>>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I
>> doubt
>>>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in
>> the user
>>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of
>> what I was
>>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
>> would that not
>>>>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
>> though. For
>>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
>> The serde
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still
>> need an
>>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
>> Serdes.listSerde(
>>>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list
>> serde.
>>>>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>>>>> supported:
>>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>> https://github.com/FasterXML/jackson-databind/issues/1490> <
>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>> https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490
>> <https://github.com/FasterXML/jackson-databind/issues/1490> <
>> https://github.com/FasterXML/jackson-databind/issues/1490 <
>> https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>>>>> The only way is to pass an actual class object into the
>> constructor,
>>>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
>> think of my
>>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
>> described
>>>>>>>>>>>>>>>> previously)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
>> over
>>>>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
>> he gave
>>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>> <ma...@confluent.io>>
>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>> <ma...@confluent.io>>>
>>>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
>> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
>> <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
>> and then
>>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>>>>> There are
>>>>>>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>>>>>>> require
>>>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
>> duplicated
>>>>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
>> they
>>>>>>>>>>>>>>>> tell us
>>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
>> this
>>>>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>>>>>>> topic.
>>>>>>>>>>>>>>>> This problem is essentially the core motivation for
>> serializations
>>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>>>>> that the records won't contain so much redundant
>> information.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>> like what you had earlier in which you don't support
>> perfectly
>>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
>> just
>>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer
>> full,
>>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>>>>> system in
>>>>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>>>>>>> what the
>>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
>> well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
>> take
>>>>>>>>>>>>>>>> a look
>>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
>> specific,
>>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
>> structure.
>>>>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>>>>>
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>
>>>>>>>>>>>>>>>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> <
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>
>>>>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
>> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
>> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
>>>>>
>>>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
>> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
>> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> bump
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hmm, that's a tricky situation.

I think Daniyar was on the right track... Producer only cares about
serializer configs, and Consumer only cares about deserializer configs.

I didn't see the problem with your proposal:

ProducerConfig:
> list.key/value.serializer.type
> list.key/value.serializer.inner
> ConsumerConfig:
> list.key/value.deserializer.type
> list.key/value.deserializer.inner
> StreamsConfig:
> default.list.key/value.serde.type
> default.list.key/value.serde.inner


It seems like the key/value serde configs are a better analogy than the
windowed serde.
ProducerConfig: key.serializer
ConsumerConfig: key.deserializer
StreamsConfig: default.key.serde

Just to make sure I understand the problem you're highlighting:
I guess the difference is that the serializer and deserializer that are
nested inside the serde also need to be configured? So, by default I'd have
to specify all six configs when I'm using Streams?

I guess in the Serde, it could make use of a package-protected constructor
for the serializer and deserializer that fixes the list type and inner type
to the serde-configured ones. Then, when you're configuring Streams, you
only need to specify the StreamsConfigs.

Does that work?
-John


On Tue, Jul 23, 2019 at 11:39 AM Development <de...@yeralin.net> wrote:

> Bump
>
> > On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
> >
> > Hey Matthias,
> >
> > It looks a little confusing, but I don’t have enough expertise to judge
> on the configuration placement.
> >
> > If you think, it is fine I’ll go ahead with this approach.
> >
> > Best,
> > Daniyar Yeralin
> >
> >> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> >>
> >> Good point.
> >>
> >> I guess the simplest solution is, to actually add
> >>
> >>>> default.list.key/value.serde.type
> >>>> default.list.key/value.serde.inner
> >>
> >> to both `CommonClientConfigs` and `StreamsConfig`.
> >>
> >> It's not super clean, but I think it's the best we can do. Thoughts?
> >>
> >>
> >> -Matthias
> >>
> >> On 7/19/19 1:23 PM, Development wrote:
> >>> Hi Matthias,
> >>>
> >>> I agree, ConsumerConfig did not seem like a right place for these
> configurations.
> >>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
> >>>
> >>> However, I have a question. What should I do in "configure(Map<String,
> ?> configs, boolean isKey)” methods? Which configurations should I try to
> locate? I was comparing my (de)serializer implementations with
> SessionWindows(De)serializer classes, and they use StreamsConfig class to
> get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS :
> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
> >>>
> >>> In my case, as I mentioned earlier, StreamsConfig class is not
> accessible from org.apache.kafka.common.serialization package. So, I can’t
> utilize it. Any suggestions here?
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>
> >>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> >>>>
> >>>> Thanks!
> >>>>
> >>>> One minor question about the configs. The KIP adds three classes, a
> >>>> Serializer, a Deserializer, and a Serde.
> >>>>
> >>>> Hence, would it make sense to add the corresponding configs to
> >>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
> >>>> different names each time?
> >>>>
> >>>>
> >>>> Somethin like this:
> >>>>
> >>>> ProducerConfig:
> >>>>
> >>>> list.key/value.serializer.type
> >>>> list.key/value.serializer.inner
> >>>>
> >>>> ConsumerConfig:
> >>>>
> >>>> list.key/value.deserializer.type
> >>>> list.key/value.deserializer.inner
> >>>>
> >>>> StreamsConfig:
> >>>>
> >>>> default.list.key/value.serde.type
> >>>> default.list.key/value.serde.inner
> >>>>
> >>>>
> >>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound
> right
> >>>> to me. Also note, that it seems better to avoid the `default.` prefix
> >>>> for consumers and producers because there is only one Serializer or
> >>>> Deserializer anyway. Only for Streams, there are multiple and
> >>>> StreamsConfig specifies the default one of an operator does not
> >>>> overwrite it.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>>> Also, the KIP should explicitly mention to what classed certain
> configs
> >>>> are added. Atm, the KIP only list parameter names, but does not state
> >>>> where those are added.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 7/16/19 1:11 PM, Development wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Yes, totally forgot about the statement. KIP-466 is updated.
> >>>>>
> >>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman
> for your valuable input!
> >>>>>
> >>>>> I hope I did not cause too much trouble :)
> >>>>>
> >>>>> I’ll start the vote now.
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io>
> wrote:
> >>>>>>
> >>>>>> Hi Daniyar,
> >>>>>>
> >>>>>> Thanks for that update. I took a look, and I think this is in good
> shape.
> >>>>>>
> >>>>>> One note, the statement "New method public static <T> Serde<List<T>>
> >>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
> >>>>>> (infers list implementation and inner serde from config file)" is
> >>>>>> still present in the KIP, although I do it is was removed from the
> PR.
> >>>>>>
> >>>>>> Once you remove that statement from the KIP, then I think this KIP
> is
> >>>>>> ready to go up for a vote! Then, we can really review the PR in
> >>>>>> earnest and get this thing merged.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -john
> >>>>>>
> >>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net>
> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Pushed new changes under my PR:
> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
> >>>>>>>
> >>>>>>> Feel free to put any comments in there.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
> >>>>>>>>
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> I knew I was missing something. Yes, that makes sense now, I
> removed all `listSerde()` methods, and left empty constructors instead.
> >>>>>>>>
> >>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t
> have any properties related to serdes, and that bothers me a little.
> >>>>>>>>
> >>>>>>>> All properties like `default.key.serde`
> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want
> to create a confusion.
> >>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and
> its (de)serializers are not located in
> org.apache.kafka.common.serialization. I guess it kind of makes sense since
> windowed serdes are only available for kafka streams, not vice versa.
> >>>>>>>>
> >>>>>>>> If everyone is okay to put list properties in
> `CommonClientConfigs` class, I’ll go ahead and do that then.
> >>>>>>>>
> >>>>>>>> Thank you for your input!
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Daniyar Yeralin
> >>>>>>>>
> >>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io>
> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> Regarding the placement, you might as well move the constants to
> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
> the configs and the code are in the same module.
> >>>>>>>>>
> >>>>>>>>> Regarding the constructor... What Matthias said is correct: The
> serde, serializer, and deserializer all need to have zero-arg constructors
> so they can be instantiated reflectively by Kafka. However, the factory
> method you proposed "New method public static <T> Serde<List<T>>
> ListSerde()" is not a constructor, and is not required. It would be used
> purely from the Java interface, but has the drawbacks I listed above. This
> method, not the constructor, is what I proposed to remove from the KIP.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net
> <ma...@yeralin.net>> wrote:
> >>>>>>>>> One problem though.
> >>>>>>>>>
> >>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m
> trying to mimic the implementation of my ListSerde accordingly.
> >>>>>>>>>
> >>>>>>>>> I created couple constants under StreamsConfig:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And trying to do similar construct:
> >>>>>>>>> final String propertyName = isKey ?
> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS :
> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
> >>>>>>>>> But then found out that StreamsConfig is not accessible from
> org.apache.kafka.common.serialization package while window serde
> (de)serializers are located under org.apache.kafka.streams.kstream package.
> >>>>>>>>>
> >>>>>>>>> What should I do? Should I move my classes under
> org.apache.kafka.streams.kstream package instead?
> >>>>>>>>>
> >>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net
> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Matthias,
> >>>>>>>>>>
> >>>>>>>>>> Thank you for your input.
> >>>>>>>>>>
> >>>>>>>>>> I updated the KIP, made it a little more readable.
> >>>>>>>>>>
> >>>>>>>>>> I think the configuration parameters strategy is finalized then.
> >>>>>>>>>>
> >>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
> >>>>>>>>>>
> >>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit
> them under my PR.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <
> matthias@confluent.io <ma...@confluent.io>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Daniyar,
> >>>>>>>>>>>
> >>>>>>>>>>> thanks for the update to the KIP. It's in really good shape
> and well
> >>>>>>>>>>> written.
> >>>>>>>>>>>
> >>>>>>>>>>> About the default constructor question:
> >>>>>>>>>>>
> >>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default
> constructor to
> >>>>>>>>>>> create them easily via reflections when specifies in a config.
> I
> >>>>>>>>>>> understand that it is not super user friendly, but all
> existing code
> >>>>>>>>>>> works this way. Hence, it seems best to stick with the
> established pattern.
> >>>>>>>>>>>
> >>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
> >>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to
> improve user
> >>>>>>>>>>> experience that address the exact issue John raised. (cf
> >>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <
> https://github.com/apache/kafka/pull/7067>)
> >>>>>>>>>>>
> >>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the
> user
> >>>>>>>>>>> would also need to call `configure()` to setup the inner
> serdes. Kafka
> >>>>>>>>>>> Streams would not setup those automatically and one might most
> likely
> >>>>>>>>>>> end-up with an NPE.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes`
> are
> >>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For
> `WindowedSerdes`,
> >>>>>>>>>>> we use the following parameter names:
> >>>>>>>>>>>
> >>>>>>>>>>> - default.windowed.key.serde.inner
> >>>>>>>>>>> - default.windowed.value.serde.inner
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> It might be good to align the naming pattern. I would also
> suggest to
> >>>>>>>>>>> use `type` instead of `impl`?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
> >>>>>>>>>>> default.value.list.serde.impl  ->
> default.list.value.serde.type
> >>>>>>>>>>> default.key.list.serde.element  ->
> default.list.key.serde.inner
> >>>>>>>>>>> default.value.list.serde.element  ->
> default.list.value.serde.inner
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
> >>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is
> that it goes against what Matthias suggested earlier:
> >>>>>>>>>>>> "I think that ... `ListSerde` should have an default
> constructor and it should be possible to pass in the `Class listClass`
> information via a configuration. Otherwise, KafkaStreams cannot use it as
> default serde.”
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you think about that? I hope I’m not confusing
> anything.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io
> <ma...@confluent.io>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for
> the update, too.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Just one more super-small question, do we need this variant:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in
> org.apache.kafka.common.serialization.Serdes class (infers list
> implementation and inner serde from config file)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It seems like this situation implies my config file is
> already set up for the list serde, so passing this serde (e.g., in
> Produced) would have the same effect as not specifying it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I guess that it could be the case that you have the
> `default.key/value.serde` set to something else, like StringSerde, but you
> still have the `default.key/value.list.serde.impl/element` set. This seems
> like it would result in more confusion than convenience, so my gut instinct
> is maybe we shouldn't introduce the `ListSerde()` variant until people
> actually request it later on.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully
> source-code-driven, not half/half.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> -John
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I hope everyone had a great long weekend.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you
> correctly, but I think I already listed them:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So for Produced, you would use it in the following fashion,
> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class,
> Serdes.Integer()))
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization
> Strategy” where I describe our logic of conditional serialization based on
> the type of an inner serde.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <
> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> <ma...@confluent.io>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the update, Daniyar!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In addition to specifying the config interface, can you
> also specify
> >>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance
> of this
> >>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized,
> etc., what
> >>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the
> Serializer
> >>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the
> implementation
> >>>>>>>>>>>>>> logic, since we've already discussed it here.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> If you also want to specify the serialized format of the
> data records
> >>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as
> letting us
> >>>>>>>>>>>>>> verify the schema for forward/backward compatibility
> concerns, etc.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <
> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Finally made updates to the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>
> >>>>>>>>>>>>>> Sorry for the delay :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank You!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <
> matthias@confluent.io <ma...@confluent.io> <mailto:
> matthias@confluent.io <ma...@confluent.io>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, something like this. I did not think about good
> configuration
> >>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all
> proposed
> >>>>>>>>>>>>>> configs atm. But all configs should be listed and explained
> in the KIP
> >>>>>>>>>>>>>> anyway, and we can discuss further after you have updated
> the KIP (I can
> >>>>>>>>>>>>>> ask more detailed question if I have any).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to
> have in a list
> >>>>>>>>>>>>>> of primitives.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As for the default constructor and configurability, just
> want to make
> >>>>>>>>>>>>>> sure. Is this what you have on your mind?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <
> matthias@confluent.io <ma...@confluent.io> <mailto:
> matthias@confluent.io <ma...@confluent.io>>
> >>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for the update!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and
> `ListSerde`
> >>>>>>>>>>>>>> should have an default constructor and it should be
> possible to pass in
> >>>>>>>>>>>>>> the `Class listClass` information via a configuration.
> Otherwise,
> >>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not
> primitive IMHO,
> >>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for
> arrays, not for
> >>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping
> `byte[]`).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in
> the KIP
> >>>>>>>>>>>>>> discussion.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you also update the KIP?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the
> final
> >>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592
> <https://github.com/apache/kafka/pull/6592> <
> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also need some advice on writing tests for this new
> serde. So far I
> >>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m
> not sure
> >>>>>>>>>>>>>> if it is enough.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <
> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> <ma...@confluent.io>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <
> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>
> wrote:
> >>>>>>>>>>>>>> Hey John and Matthias,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant
> information.
> >>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list
> type. I
> >>>>>>>>>>>>>> realized that’s the type is not really needed in
> ListSerializer, but
> >>>>>>>>>>>>>> only in ListDeserializer:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if
> serializer is
> >>>>>>>>>>>>>> not a primitive serializer:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that
> during
> >>>>>>>>>>>>>> deserialization we could create an instance of it with
> predefined
> >>>>>>>>>>>>>> listSize for better performance.
> >>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed
> deserializer.
> >>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which
> means
> >>>>>>>>>>>>>> that each entry’s size was encoded individually.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This looks much cleaner and more concise.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <
> matthias@confluent.io <ma...@confluent.io> <mailto:
> matthias@confluent.io <ma...@confluent.io>>
> >>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>
> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:
> matthias@confluent.io <ma...@confluent.io> <mailto:
> matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For encoding the list-type: I see John's point about
> re-encoding the
> >>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea
> that the
> >>>>>>>>>>>>>> Deserializer returns a fixed type...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type
> on
> >>>>>>>>>>>>>> deserialization via config?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to
> encode the
> >>>>>>>>>>>>>> type size, but users could specify the type on the
> deserializer (via a
> >>>>>>>>>>>>>> config again)?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I
> doubt
> >>>>>>>>>>>>>> we can
> >>>>>>>>>>>>>> support this and a cast will be necessary at some point in
> the user
> >>>>>>>>>>>>>> code.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks for looking at it!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Something like your screenshot is more along the lines of
> what I was
> >>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how
> would that not
> >>>>>>>>>>>>>> be "vanilla java"?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Unfortunately the deserializer needs more information,
> though. For
> >>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>?
> The serde
> >>>>>>>>>>>>>> could
> >>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still
> need an
> >>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> =
> Serdes.listSerde(
> >>>>>>>>>>>>>> /**list type**/ LinkedList.class,
> >>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
> >>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> And in configuration, it's something like:
> >>>>>>>>>>>>>> default.key.serde: org...ListSerde
> >>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
> >>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <
> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list
> serde.
> >>>>>>>>>>>>>> However, it is not directly
> >>>>>>>>>>>>>> supported:
> >>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490> <
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>
> >>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490
> <https://github.com/FasterXML/jackson-databind/issues/1490> <
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>>
> >>>>>>>>>>>>>> The only way is to pass an actual class object into the
> constructor,
> >>>>>>>>>>>>>> something like:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you
> think of my
> >>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As
> described
> >>>>>>>>>>>>>> previously)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit
> over
> >>>>>>>>>>>>>> engineered :)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since
> he gave
> >>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <
> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> <ma...@confluent.io>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:
> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> <ma...@confluent.io>>>
> >>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io>
> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:
> john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io
> <ma...@confluent.io>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Daniyar,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> That's a very clever solution!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> One observation is that, now, this is what we might call a
> >>>>>>>>>>>>>> polymorphic
> >>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type
> and then
> >>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
> >>>>>>>>>>>>>> There are
> >>>>>>>>>>>>>> some inherent problems with this approach, which in general
> >>>>>>>>>>>>>> require
> >>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
> >>>>>>>>>>>>>> Registry, just
> >>>>>>>>>>>>>> any registry for schemas) to solve.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Notice that every serialized record has quite a bit of
> duplicated
> >>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
> >>>>>>>>>>>>>> whether
> >>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
> >>>>>>>>>>>>>> indicate the
> >>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because
> they
> >>>>>>>>>>>>>> tell us
> >>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately,
> this
> >>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
> >>>>>>>>>>>>>> information will be exactly the same for every record in the
> >>>>>>>>>>>>>> topic.
> >>>>>>>>>>>>>> This problem is essentially the core motivation for
> serializations
> >>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
> >>>>>>>>>>>>>> itself, so
> >>>>>>>>>>>>>> that the records won't contain so much redundant
> information.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
> >>>>>>>>>>>>>> something
> >>>>>>>>>>>>>> like what you had earlier in which you don't support
> perfectly
> >>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead
> just
> >>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer
> full,
> >>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
> >>>>>>>>>>>>>> system in
> >>>>>>>>>>>>>> which to register their type information.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> There does exist an alternative, if we really do want to
> >>>>>>>>>>>>>> preserve the
> >>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
> >>>>>>>>>>>>>> configuration option specifically for the serde to configure
> >>>>>>>>>>>>>> what the
> >>>>>>>>>>>>>> list type will be, and maybe what the element type is, as
> well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As far as "related work" goes, you might be interested to
> take
> >>>>>>>>>>>>>> a look
> >>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a
> specific,
> >>>>>>>>>>>>>> arbitrarily nested, generically parameterized class
> structure.
> >>>>>>>>>>>>>> Specifically, you might find
> >>>>>>>>>>>>>>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>
> >>>>>>>>>>>>>> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>>
> >>>>>>>>>>>>>> interesting.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <
> dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net
> <ma...@yeralin.net> <mailto:dev@yeralin.net <mailto:dev@yeralin.net
> >>>
> >>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:
> dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <mailto:
> dev@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>>
> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> bump
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Bump

> On Jul 22, 2019, at 11:26 AM, Development <de...@yeralin.net> wrote:
> 
> Hey Matthias,
> 
> It looks a little confusing, but I don’t have enough expertise to judge on the configuration placement.
> 
> If you think, it is fine I’ll go ahead with this approach.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>> 
>> Good point.
>> 
>> I guess the simplest solution is, to actually add
>> 
>>>> default.list.key/value.serde.type
>>>> default.list.key/value.serde.inner
>> 
>> to both `CommonClientConfigs` and `StreamsConfig`.
>> 
>> It's not super clean, but I think it's the best we can do. Thoughts?
>> 
>> 
>> -Matthias
>> 
>> On 7/19/19 1:23 PM, Development wrote:
>>> Hi Matthias,
>>> 
>>> I agree, ConsumerConfig did not seem like a right place for these configurations.
>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>>> 
>>> However, I have a question. What should I do in "configure(Map<String, ?> configs, boolean isKey)” methods? Which configurations should I try to locate? I was comparing my (de)serializer implementations with SessionWindows(De)serializer classes, and they use StreamsConfig class to get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>>> 
>>> In my case, as I mentioned earlier, StreamsConfig class is not accessible from org.apache.kafka.common.serialization package. So, I can’t utilize it. Any suggestions here?
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>> 
>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>>>> 
>>>> Thanks!
>>>> 
>>>> One minor question about the configs. The KIP adds three classes, a
>>>> Serializer, a Deserializer, and a Serde.
>>>> 
>>>> Hence, would it make sense to add the corresponding configs to
>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
>>>> different names each time?
>>>> 
>>>> 
>>>> Somethin like this:
>>>> 
>>>> ProducerConfig:
>>>> 
>>>> list.key/value.serializer.type
>>>> list.key/value.serializer.inner
>>>> 
>>>> ConsumerConfig:
>>>> 
>>>> list.key/value.deserializer.type
>>>> list.key/value.deserializer.inner
>>>> 
>>>> StreamsConfig:
>>>> 
>>>> default.list.key/value.serde.type
>>>> default.list.key/value.serde.inner
>>>> 
>>>> 
>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right
>>>> to me. Also note, that it seems better to avoid the `default.` prefix
>>>> for consumers and producers because there is only one Serializer or
>>>> Deserializer anyway. Only for Streams, there are multiple and
>>>> StreamsConfig specifies the default one of an operator does not
>>>> overwrite it.
>>>> 
>>>> Thoughts?
>>>> 
>>>> 
>>>> Also, the KIP should explicitly mention to what classed certain configs
>>>> are added. Atm, the KIP only list parameter names, but does not state
>>>> where those are added.
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 7/16/19 1:11 PM, Development wrote:
>>>>> Hi,
>>>>> 
>>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>>> 
>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!
>>>>> 
>>>>> I hope I did not cause too much trouble :)
>>>>> 
>>>>> I’ll start the vote now.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>> 
>>>>>> Hi Daniyar,
>>>>>> 
>>>>>> Thanks for that update. I took a look, and I think this is in good shape.
>>>>>> 
>>>>>> One note, the statement "New method public static <T> Serde<List<T>>
>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>>> (infers list implementation and inner serde from config file)" is
>>>>>> still present in the KIP, although I do it is was removed from the PR.
>>>>>> 
>>>>>> Once you remove that statement from the KIP, then I think this KIP is
>>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>>> earnest and get this thing merged.
>>>>>> 
>>>>>> Thanks,
>>>>>> -john
>>>>>> 
>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>>>>> 
>>>>>>> Feel free to put any comments in there.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>>>>> 
>>>>>>>> Hi John,
>>>>>>>> 
>>>>>>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>>> 
>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>>>>>>> 
>>>>>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>>>>>>> 
>>>>>>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>>> 
>>>>>>>> Thank you for your input!
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>>>>>>> 
>>>>>>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>>> One problem though.
>>>>>>>>> 
>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>>> 
>>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> And trying to do similar construct:
>>>>>>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>>>>> 
>>>>>>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>>>>>>> 
>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Matthias,
>>>>>>>>>> 
>>>>>>>>>> Thank you for your input.
>>>>>>>>>> 
>>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>>> 
>>>>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>>>>> 
>>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>>> 
>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>>>>>>> written.
>>>>>>>>>>> 
>>>>>>>>>>> About the default constructor question:
>>>>>>>>>>> 
>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>>>>>>> create them easily via reflections when specifies in a config. I
>>>>>>>>>>> understand that it is not super user friendly, but all existing code
>>>>>>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>>>>>>> 
>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>>> 
>>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>>>>>>> Streams would not setup those automatically and one might most likely
>>>>>>>>>>> end-up with an NPE.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>>>>>>> we use the following parameter names:
>>>>>>>>>>> 
>>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -Matthias
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>>>>>>> 
>>>>>>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>>>>>>> we can
>>>>>>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>>> supported:
>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>>>>>>> something like:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>>>>>>> previously)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>>> There are
>>>>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>>>>> require
>>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>>>>> whether
>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>>>>>>> tell us
>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>>>>> topic.
>>>>>>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>>> system in
>>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>>>>> what the
>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>>>>>>> a look
>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> bump
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey Matthias,

It looks a little confusing, but I don’t have enough expertise to judge on the configuration placement.

If you think, it is fine I’ll go ahead with this approach.

Best,
Daniyar Yeralin

> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Good point.
> 
> I guess the simplest solution is, to actually add
> 
>>> default.list.key/value.serde.type
>>> default.list.key/value.serde.inner
> 
> to both `CommonClientConfigs` and `StreamsConfig`.
> 
> It's not super clean, but I think it's the best we can do. Thoughts?
> 
> 
> -Matthias
> 
> On 7/19/19 1:23 PM, Development wrote:
>> Hi Matthias,
>> 
>> I agree, ConsumerConfig did not seem like a right place for these configurations.
>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
>> 
>> However, I have a question. What should I do in "configure(Map<String, ?> configs, boolean isKey)” methods? Which configurations should I try to locate? I was comparing my (de)serializer implementations with SessionWindows(De)serializer classes, and they use StreamsConfig class to get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
>> 
>> In my case, as I mentioned earlier, StreamsConfig class is not accessible from org.apache.kafka.common.serialization package. So, I can’t utilize it. Any suggestions here?
>> 
>> Best,
>> Daniyar Yeralin
>> 
>> 
>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>>> 
>>> Thanks!
>>> 
>>> One minor question about the configs. The KIP adds three classes, a
>>> Serializer, a Deserializer, and a Serde.
>>> 
>>> Hence, would it make sense to add the corresponding configs to
>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
>>> different names each time?
>>> 
>>> 
>>> Somethin like this:
>>> 
>>> ProducerConfig:
>>> 
>>> list.key/value.serializer.type
>>> list.key/value.serializer.inner
>>> 
>>> ConsumerConfig:
>>> 
>>> list.key/value.deserializer.type
>>> list.key/value.deserializer.inner
>>> 
>>> StreamsConfig:
>>> 
>>> default.list.key/value.serde.type
>>> default.list.key/value.serde.inner
>>> 
>>> 
>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right
>>> to me. Also note, that it seems better to avoid the `default.` prefix
>>> for consumers and producers because there is only one Serializer or
>>> Deserializer anyway. Only for Streams, there are multiple and
>>> StreamsConfig specifies the default one of an operator does not
>>> overwrite it.
>>> 
>>> Thoughts?
>>> 
>>> 
>>> Also, the KIP should explicitly mention to what classed certain configs
>>> are added. Atm, the KIP only list parameter names, but does not state
>>> where those are added.
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 7/16/19 1:11 PM, Development wrote:
>>>> Hi,
>>>> 
>>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>> 
>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!
>>>> 
>>>> I hope I did not cause too much trouble :)
>>>> 
>>>> I’ll start the vote now.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
>>>>> 
>>>>> Hi Daniyar,
>>>>> 
>>>>> Thanks for that update. I took a look, and I think this is in good shape.
>>>>> 
>>>>> One note, the statement "New method public static <T> Serde<List<T>>
>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>>> (infers list implementation and inner serde from config file)" is
>>>>> still present in the KIP, although I do it is was removed from the PR.
>>>>> 
>>>>> Once you remove that statement from the KIP, then I think this KIP is
>>>>> ready to go up for a vote! Then, we can really review the PR in
>>>>> earnest and get this thing merged.
>>>>> 
>>>>> Thanks,
>>>>> -john
>>>>> 
>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>>>> 
>>>>>> Feel free to put any comments in there.
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>> 
>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>>>>>> 
>>>>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>>>>>> 
>>>>>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>> 
>>>>>>> Thank you for your input!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>>>>>> 
>>>>>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>> One problem though.
>>>>>>>> 
>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>> 
>>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> And trying to do similar construct:
>>>>>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>>>> 
>>>>>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>>>>>> 
>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Matthias,
>>>>>>>>> 
>>>>>>>>> Thank you for your input.
>>>>>>>>> 
>>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>> 
>>>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>>>> 
>>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>> 
>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Daniyar,
>>>>>>>>>> 
>>>>>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>>>>>> written.
>>>>>>>>>> 
>>>>>>>>>> About the default constructor question:
>>>>>>>>>> 
>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>>>>>> create them easily via reflections when specifies in a config. I
>>>>>>>>>> understand that it is not super user friendly, but all existing code
>>>>>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>>>>>> 
>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>> 
>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>>>>>> Streams would not setup those automatically and one might most likely
>>>>>>>>>> end-up with an NPE.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>>>>>> we use the following parameter names:
>>>>>>>>>> 
>>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -Matthias
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>>> Hi John,
>>>>>>>>>>> 
>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>>>>>> 
>>>>>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>>>>>> 
>>>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>>>> 
>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>>>>>> 
>>>>>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>>>>>> 
>>>>>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>>>>>> 
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -John
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>>>>>> discussion.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>>>>>> config again)?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>>>>>> we can
>>>>>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>>>>>> code.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>>>>>> could
>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>>> )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>>> supported:
>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>>>>>> something like:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>>>>>> previously)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>>> polymorphic
>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>>> There are
>>>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>>>> require
>>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>>> Registry, just
>>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>>>> whether
>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>>> indicate the
>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>>>>>> tell us
>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>>>> topic.
>>>>>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>>> itself, so
>>>>>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>>> something
>>>>>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>>> system in
>>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>>> preserve the
>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>>>> what the
>>>>>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>>>>>> a look
>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>>>>>> interesting.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> bump
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Good point.

I guess the simplest solution is, to actually add

>> default.list.key/value.serde.type
>> default.list.key/value.serde.inner

to both `CommonClientConfigs` and `StreamsConfig`.

It's not super clean, but I think it's the best we can do. Thoughts?


-Matthias

On 7/19/19 1:23 PM, Development wrote:
> Hi Matthias,
> 
> I agree, ConsumerConfig did not seem like a right place for these configurations.
> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.
> 
> However, I have a question. What should I do in "configure(Map<String, ?> configs, boolean isKey)” methods? Which configurations should I try to locate? I was comparing my (de)serializer implementations with SessionWindows(De)serializer classes, and they use StreamsConfig class to get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS
> 
> In my case, as I mentioned earlier, StreamsConfig class is not accessible from org.apache.kafka.common.serialization package. So, I can’t utilize it. Any suggestions here?
> 
> Best,
> Daniyar Yeralin
> 
> 
>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>>
>> Thanks!
>>
>> One minor question about the configs. The KIP adds three classes, a
>> Serializer, a Deserializer, and a Serde.
>>
>> Hence, would it make sense to add the corresponding configs to
>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
>> different names each time?
>>
>>
>> Somethin like this:
>>
>> ProducerConfig:
>>
>> list.key/value.serializer.type
>> list.key/value.serializer.inner
>>
>> ConsumerConfig:
>>
>> list.key/value.deserializer.type
>> list.key/value.deserializer.inner
>>
>> StreamsConfig:
>>
>> default.list.key/value.serde.type
>> default.list.key/value.serde.inner
>>
>>
>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right
>> to me. Also note, that it seems better to avoid the `default.` prefix
>> for consumers and producers because there is only one Serializer or
>> Deserializer anyway. Only for Streams, there are multiple and
>> StreamsConfig specifies the default one of an operator does not
>> overwrite it.
>>
>> Thoughts?
>>
>>
>> Also, the KIP should explicitly mention to what classed certain configs
>> are added. Atm, the KIP only list parameter names, but does not state
>> where those are added.
>>
>>
>> -Matthias
>>
>>
>>
>>
>>
>> On 7/16/19 1:11 PM, Development wrote:
>>> Hi,
>>>
>>> Yes, totally forgot about the statement. KIP-466 is updated.
>>>
>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!
>>>
>>> I hope I did not cause too much trouble :)
>>>
>>> I’ll start the vote now.
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
>>>>
>>>> Hi Daniyar,
>>>>
>>>> Thanks for that update. I took a look, and I think this is in good shape.
>>>>
>>>> One note, the statement "New method public static <T> Serde<List<T>>
>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>>> (infers list implementation and inner serde from config file)" is
>>>> still present in the KIP, although I do it is was removed from the PR.
>>>>
>>>> Once you remove that statement from the KIP, then I think this KIP is
>>>> ready to go up for a vote! Then, we can really review the PR in
>>>> earnest and get this thing merged.
>>>>
>>>> Thanks,
>>>> -john
>>>>
>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>>>
>>>>> Feel free to put any comments in there.
>>>>>
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>>
>>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>>>
>>>>>> Hi John,
>>>>>>
>>>>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>>>>>
>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>>>>>
>>>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>>>>>
>>>>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>>>
>>>>>> Thank you for your input!
>>>>>>
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>>
>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>>>>>
>>>>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>> One problem though.
>>>>>>>
>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>>>>>
>>>>>>> I created couple constants under StreamsConfig:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And trying to do similar construct:
>>>>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>>>
>>>>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>>>>>
>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>>
>>>>>>>> Hi Matthias,
>>>>>>>>
>>>>>>>> Thank you for your input.
>>>>>>>>
>>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>>>
>>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>>>
>>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>>>
>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>>
>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>
>>>>>>>>> Daniyar,
>>>>>>>>>
>>>>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>>>>> written.
>>>>>>>>>
>>>>>>>>> About the default constructor question:
>>>>>>>>>
>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>>>>> create them easily via reflections when specifies in a config. I
>>>>>>>>> understand that it is not super user friendly, but all existing code
>>>>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>>>>>
>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>>>>>
>>>>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>>>>> Streams would not setup those automatically and one might most likely
>>>>>>>>> end-up with an NPE.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>>>>> we use the following parameter names:
>>>>>>>>>
>>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>>>>> use `type` instead of `impl`?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>>>>>
>>>>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>>>>>
>>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>>>
>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>>>>>
>>>>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>>>>>
>>>>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>>>>>
>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>
>>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>>>>>
>>>>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>>>>>
>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>>>
>>>>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>>>
>>>>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>>
>>>>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>>>
>>>>>>>>>>>> Thank You!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>>>>> of primitives.
>>>>>>>>>>>>
>>>>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>>>
>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>>>>> discussion.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>
>>>>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>>>>> if it is enough.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>
>>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -John
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>>>>> deserialization via config?
>>>>>>>>>>>>
>>>>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>>>>> config again)?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>>>>> we can
>>>>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>>>
>>>>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>>>>> could
>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>>>
>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>>> )
>>>>>>>>>>>>
>>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -John
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>
>>>>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>>>> However, it is not directly
>>>>>>>>>>>> supported:
>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>>>>> something like:
>>>>>>>>>>>>
>>>>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>>>>> previously)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>>>> engineered :)
>>>>>>>>>>>>
>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>
>>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>>>
>>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>>> polymorphic
>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>>> There are
>>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>>> require
>>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>>> Registry, just
>>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>>>
>>>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>>> whether
>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>> indicate the
>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>>>>> tell us
>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>>> topic.
>>>>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>>> itself, so
>>>>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>>>>>
>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>> something
>>>>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>>> system in
>>>>>>>>>>>> which to register their type information.
>>>>>>>>>>>>
>>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>>> preserve the
>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>>> what the
>>>>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>>>>> a look
>>>>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>>>>> interesting.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -John
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> bump
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi Matthias,

I agree, ConsumerConfig did not seem like a right place for these configurations.
I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig.

However, I have a question. What should I do in "configure(Map<String, ?> configs, boolean isKey)” methods? Which configurations should I try to locate? I was comparing my (de)serializer implementations with SessionWindows(De)serializer classes, and they use StreamsConfig class to get  either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS

In my case, as I mentioned earlier, StreamsConfig class is not accessible from org.apache.kafka.common.serialization package. So, I can’t utilize it. Any suggestions here?

Best,
Daniyar Yeralin


> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Thanks!
> 
> One minor question about the configs. The KIP adds three classes, a
> Serializer, a Deserializer, and a Serde.
> 
> Hence, would it make sense to add the corresponding configs to
> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
> different names each time?
> 
> 
> Somethin like this:
> 
> ProducerConfig:
> 
> list.key/value.serializer.type
> list.key/value.serializer.inner
> 
> ConsumerConfig:
> 
> list.key/value.deserializer.type
> list.key/value.deserializer.inner
> 
> StreamsConfig:
> 
> default.list.key/value.serde.type
> default.list.key/value.serde.inner
> 
> 
> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right
> to me. Also note, that it seems better to avoid the `default.` prefix
> for consumers and producers because there is only one Serializer or
> Deserializer anyway. Only for Streams, there are multiple and
> StreamsConfig specifies the default one of an operator does not
> overwrite it.
> 
> Thoughts?
> 
> 
> Also, the KIP should explicitly mention to what classed certain configs
> are added. Atm, the KIP only list parameter names, but does not state
> where those are added.
> 
> 
> -Matthias
> 
> 
> 
> 
> 
> On 7/16/19 1:11 PM, Development wrote:
>> Hi,
>> 
>> Yes, totally forgot about the statement. KIP-466 is updated.
>> 
>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!
>> 
>> I hope I did not cause too much trouble :)
>> 
>> I’ll start the vote now.
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Hi Daniyar,
>>> 
>>> Thanks for that update. I took a look, and I think this is in good shape.
>>> 
>>> One note, the statement "New method public static <T> Serde<List<T>>
>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>>> (infers list implementation and inner serde from config file)" is
>>> still present in the KIP, although I do it is was removed from the PR.
>>> 
>>> Once you remove that statement from the KIP, then I think this KIP is
>>> ready to go up for a vote! Then, we can really review the PR in
>>> earnest and get this thing merged.
>>> 
>>> Thanks,
>>> -john
>>> 
>>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>> 
>>>> Feel free to put any comments in there.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>>>> 
>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>>>> 
>>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>>>> 
>>>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>> 
>>>>> Thank you for your input!
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>>>> 
>>>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>> One problem though.
>>>>>> 
>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>>>> 
>>>>>> I created couple constants under StreamsConfig:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> And trying to do similar construct:
>>>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>> 
>>>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>>>> 
>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>> 
>>>>>>> Hi Matthias,
>>>>>>> 
>>>>>>> Thank you for your input.
>>>>>>> 
>>>>>>> I updated the KIP, made it a little more readable.
>>>>>>> 
>>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>> 
>>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>> 
>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>> 
>>>>>>>> Daniyar,
>>>>>>>> 
>>>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>>>> written.
>>>>>>>> 
>>>>>>>> About the default constructor question:
>>>>>>>> 
>>>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>>>> create them easily via reflections when specifies in a config. I
>>>>>>>> understand that it is not super user friendly, but all existing code
>>>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>>>> 
>>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>>>> 
>>>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>>>> Streams would not setup those automatically and one might most likely
>>>>>>>> end-up with an NPE.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>>>> we use the following parameter names:
>>>>>>>> 
>>>>>>>> - default.windowed.key.serde.inner
>>>>>>>> - default.windowed.value.serde.inner
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>>>> use `type` instead of `impl`?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Matthias
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>>> Hi John,
>>>>>>>>> 
>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>>>> 
>>>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>>>> 
>>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>> 
>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>>>> 
>>>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>>>> 
>>>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>>>> 
>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>>>> 
>>>>>>>>>> What do you think?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi John,
>>>>>>>>>>> 
>>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>> 
>>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>>>> 
>>>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>>>> 
>>>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you!
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>> 
>>>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>> 
>>>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> John
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hey,
>>>>>>>>>>> 
>>>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>> 
>>>>>>>>>>> Thank You!
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -Matthias
>>>>>>>>>>> 
>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>>>> of primitives.
>>>>>>>>>>> 
>>>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the update!
>>>>>>>>>>> 
>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>>>> discussion.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -Matthias
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>> 
>>>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>>>> if it is enough.
>>>>>>>>>>> 
>>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>> 
>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>>>> listSize for better performance.
>>>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>> 
>>>>>>>>>>> What do you think?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>> 
>>>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>>>> deserialization via config?
>>>>>>>>>>> 
>>>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>>>> config again)?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>>>> we can
>>>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>>>> code.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> -Matthias
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>> 
>>>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>> 
>>>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>>>> could
>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>> 
>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>>> )
>>>>>>>>>>> 
>>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> What do you think?
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hey John,
>>>>>>>>>>> 
>>>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>>> However, it is not directly
>>>>>>>>>>> supported:
>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>>>> something like:
>>>>>>>>>>> 
>>>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>>>> previously)
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi John,
>>>>>>>>>>> 
>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>>> engineered :)
>>>>>>>>>>> 
>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>> 
>>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>>> polymorphic
>>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>>> There are
>>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>>> require
>>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>>> Registry, just
>>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>> 
>>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>>> whether
>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>> indicate the
>>>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>>>> tell us
>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>>> topic.
>>>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>>> itself, so
>>>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>>>> 
>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>> something
>>>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>>> system in
>>>>>>>>>>> which to register their type information.
>>>>>>>>>>> 
>>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>>> preserve the
>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>>> what the
>>>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>>>> 
>>>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>>>> a look
>>>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>>>> Specifically, you might find
>>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>>>> interesting.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> bump
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks!

One minor question about the configs. The KIP adds three classes, a
Serializer, a Deserializer, and a Serde.

Hence, would it make sense to add the corresponding configs to
`ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly
different names each time?


Somethin like this:

ProducerConfig:

list.key/value.serializer.type
list.key/value.serializer.inner

ConsumerConfig:

list.key/value.deserializer.type
list.key/value.deserializer.inner

StreamsConfig:

default.list.key/value.serde.type
default.list.key/value.serde.inner


Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound right
to me. Also note, that it seems better to avoid the `default.` prefix
for consumers and producers because there is only one Serializer or
Deserializer anyway. Only for Streams, there are multiple and
StreamsConfig specifies the default one of an operator does not
overwrite it.

Thoughts?


Also, the KIP should explicitly mention to what classed certain configs
are added. Atm, the KIP only list parameter names, but does not state
where those are added.


-Matthias





On 7/16/19 1:11 PM, Development wrote:
> Hi,
> 
> Yes, totally forgot about the statement. KIP-466 is updated.
> 
> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!
> 
> I hope I did not cause too much trouble :)
> 
> I’ll start the vote now.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
>>
>> Hi Daniyar,
>>
>> Thanks for that update. I took a look, and I think this is in good shape.
>>
>> One note, the statement "New method public static <T> Serde<List<T>>
>> ListSerde() in org.apache.kafka.common.serialization.Serdes class
>> (infers list implementation and inner serde from config file)" is
>> still present in the KIP, although I do it is was removed from the PR.
>>
>> Once you remove that statement from the KIP, then I think this KIP is
>> ready to go up for a vote! Then, we can really review the PR in
>> earnest and get this thing merged.
>>
>> Thanks,
>> -john
>>
>> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>>>
>>> Hi,
>>>
>>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>
>>> Feel free to put any comments in there.
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>>>
>>>> Hi John,
>>>>
>>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>>>
>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>>>
>>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>>>
>>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>>>
>>>> Thank you for your input!
>>>>
>>>> Best,
>>>> Daniyar Yeralin
>>>>
>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>>>
>>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>>>
>>>>> Thanks,
>>>>> -John
>>>>>
>>>>>
>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>> One problem though.
>>>>>
>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>>>
>>>>> I created couple constants under StreamsConfig:
>>>>>
>>>>>
>>>>>
>>>>> And trying to do similar construct:
>>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>>>
>>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>>>
>>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>
>>>>>> Hi Matthias,
>>>>>>
>>>>>> Thank you for your input.
>>>>>>
>>>>>> I updated the KIP, made it a little more readable.
>>>>>>
>>>>>> I think the configuration parameters strategy is finalized then.
>>>>>>
>>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>>>
>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>>>
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>>
>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>
>>>>>>> Daniyar,
>>>>>>>
>>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>>> written.
>>>>>>>
>>>>>>> About the default constructor question:
>>>>>>>
>>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>>> create them easily via reflections when specifies in a config. I
>>>>>>> understand that it is not super user friendly, but all existing code
>>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>>>
>>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>>> experience that address the exact issue John raised. (cf
>>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>>>
>>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>>> Streams would not setup those automatically and one might most likely
>>>>>>> end-up with an NPE.
>>>>>>>
>>>>>>>
>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>>> we use the following parameter names:
>>>>>>>
>>>>>>> - default.windowed.key.serde.inner
>>>>>>> - default.windowed.value.serde.inner
>>>>>>>
>>>>>>>
>>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>>> use `type` instead of `impl`?
>>>>>>>
>>>>>>>
>>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>>>
>>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>>
>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>>>
>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>>>
>>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>>>
>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>>>
>>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>>>
>>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>>>
>>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>>>
>>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>>>
>>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>>>
>>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>>>
>>>>>>>>>> Thank you!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>>>
>>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>>>
>>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>>> Sorry for the delay :)
>>>>>>>>>>
>>>>>>>>>> Thank You!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>>> of primitives.
>>>>>>>>>>
>>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for the update!
>>>>>>>>>>
>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>>> discussion.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you also update the KIP?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>>>
>>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>
>>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>>> if it is enough.
>>>>>>>>>>
>>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>
>>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>>> Hey John and Matthias,
>>>>>>>>>>
>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>>> only in ListDeserializer:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>>> not a primitive serializer:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>>> listSize for better performance.
>>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>>>
>>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>>> deserialization via config?
>>>>>>>>>>
>>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>>> config again)?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>>> we can
>>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>>>
>>>>>>>>>> Hey Daniyar,
>>>>>>>>>>
>>>>>>>>>> Thanks for looking at it!
>>>>>>>>>>
>>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>>> be "vanilla java"?
>>>>>>>>>>
>>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>>> could
>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>>>
>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>> And in configuration, it's something like:
>>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>> However, it is not directly
>>>>>>>>>> supported:
>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>>> something like:
>>>>>>>>>>
>>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>>> previously)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>> engineered :)
>>>>>>>>>>
>>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>
>>>>>>>>>> That's a very clever solution!
>>>>>>>>>>
>>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>>> polymorphic
>>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>>> There are
>>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>>> require
>>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>>> Registry, just
>>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>>>
>>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>>> whether
>>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>>> indicate the
>>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>>> tell us
>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>>> topic.
>>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>>> itself, so
>>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>>>
>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>> something
>>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>>> system in
>>>>>>>>>> which to register their type information.
>>>>>>>>>>
>>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>>> preserve the
>>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>>> what the
>>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>>>
>>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>>> a look
>>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>>> Specifically, you might find
>>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>>> interesting.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bump
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi,

Yes, totally forgot about the statement. KIP-466 is updated.

Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman for your valuable input!

I hope I did not cause too much trouble :)

I’ll start the vote now.

Best,
Daniyar Yeralin

> On Jul 16, 2019, at 3:17 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Hi Daniyar,
> 
> Thanks for that update. I took a look, and I think this is in good shape.
> 
> One note, the statement "New method public static <T> Serde<List<T>>
> ListSerde() in org.apache.kafka.common.serialization.Serdes class
> (infers list implementation and inner serde from config file)" is
> still present in the KIP, although I do it is was removed from the PR.
> 
> Once you remove that statement from the KIP, then I think this KIP is
> ready to go up for a vote! Then, we can really review the PR in
> earnest and get this thing merged.
> 
> Thanks,
> -john
> 
> On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>> 
>> Hi,
>> 
>> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>> 
>> Feel free to put any comments in there.
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
>>> 
>>> Hi John,
>>> 
>>> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
>>> 
>>> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
>>> 
>>> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
>>> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
>>> 
>>> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
>>> 
>>> Thank you for your input!
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>>>> 
>>>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> 
>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>> One problem though.
>>>> 
>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>>>> 
>>>> I created couple constants under StreamsConfig:
>>>> 
>>>> 
>>>> 
>>>> And trying to do similar construct:
>>>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>>>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>>>> 
>>>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>>>> 
>>>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>> 
>>>>> Hi Matthias,
>>>>> 
>>>>> Thank you for your input.
>>>>> 
>>>>> I updated the KIP, made it a little more readable.
>>>>> 
>>>>> I think the configuration parameters strategy is finalized then.
>>>>> 
>>>>> Do you have any other questions/concerns regarding this KIP?
>>>>> 
>>>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>>> 
>>>>>> Daniyar,
>>>>>> 
>>>>>> thanks for the update to the KIP. It's in really good shape and well
>>>>>> written.
>>>>>> 
>>>>>> About the default constructor question:
>>>>>> 
>>>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>>>> create them easily via reflections when specifies in a config. I
>>>>>> understand that it is not super user friendly, but all existing code
>>>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>>>> 
>>>>>> We have a similar issue with `TimeWindowedSerde` and
>>>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>>>> experience that address the exact issue John raised. (cf
>>>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>>>> 
>>>>>> Note, that if a user would instantiate the Serde manually, the user
>>>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>>>> Streams would not setup those automatically and one might most likely
>>>>>> end-up with an NPE.
>>>>>> 
>>>>>> 
>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>>>> we use the following parameter names:
>>>>>> 
>>>>>> - default.windowed.key.serde.inner
>>>>>> - default.windowed.value.serde.inner
>>>>>> 
>>>>>> 
>>>>>> It might be good to align the naming pattern. I would also suggest to
>>>>>> use `type` instead of `impl`?
>>>>>> 
>>>>>> 
>>>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>>>> 
>>>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>> 
>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>>>> 
>>>>>>>> Just one more super-small question, do we need this variant:
>>>>>>>> 
>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>>>> 
>>>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
>>>>>>>> 
>>>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>>>> 
>>>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>>>> 
>>>>>>>> What do you think?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi John,
>>>>>>>>> 
>>>>>>>>> I hope everyone had a great long weekend.
>>>>>>>>> 
>>>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>>>> 
>>>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>>>> 
>>>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>>>> 
>>>>>>>>> Thank you!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for the update, Daniyar!
>>>>>>>>> 
>>>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>>>> logic, since we've already discussed it here.
>>>>>>>>> 
>>>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> John
>>>>>>>>> 
>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hey,
>>>>>>>>> 
>>>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>>>> Sorry for the delay :)
>>>>>>>>> 
>>>>>>>>> Thank You!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>>>> ask more detailed question if I have any).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -Matthias
>>>>>>>>> 
>>>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>>>> 
>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>>>> of primitives.
>>>>>>>>> 
>>>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>>>> sure. Is this what you have on your mind?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for the update!
>>>>>>>>> 
>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>>>> discussion.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Can you also update the KIP?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -Matthias
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>>>> 
>>>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>> 
>>>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>>>> if it is enough.
>>>>>>>>> 
>>>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hey Daniyar,
>>>>>>>>> 
>>>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>>>> Hey John and Matthias,
>>>>>>>>> 
>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>>>> only in ListDeserializer:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>>>> not a primitive serializer:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>>>> listSize for better performance.
>>>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>>>> that each entry’s size was encoded individually.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This looks much cleaner and more concise.
>>>>>>>>> 
>>>>>>>>> What do you think?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>>>> 
>>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>>> Deserializer returns a fixed type...
>>>>>>>>> 
>>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>>> deserialization via config?
>>>>>>>>> 
>>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>>> config again)?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>>> we can
>>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>>> code.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -Matthias
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>> 
>>>>>>>>> Hey Daniyar,
>>>>>>>>> 
>>>>>>>>> Thanks for looking at it!
>>>>>>>>> 
>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>> be "vanilla java"?
>>>>>>>>> 
>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>> could
>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>> 
>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> And in configuration, it's something like:
>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What do you think?
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hey John,
>>>>>>>>> 
>>>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>>>> However, it is not directly
>>>>>>>>> supported:
>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>>>> something like:
>>>>>>>>> 
>>>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>>>> previously)
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi John,
>>>>>>>>> 
>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>> engineered :)
>>>>>>>>> 
>>>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Daniyar,
>>>>>>>>> 
>>>>>>>>> That's a very clever solution!
>>>>>>>>> 
>>>>>>>>> One observation is that, now, this is what we might call a
>>>>>>>>> polymorphic
>>>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>>>> There are
>>>>>>>>> some inherent problems with this approach, which in general
>>>>>>>>> require
>>>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>>>> Registry, just
>>>>>>>>> any registry for schemas) to solve.
>>>>>>>>> 
>>>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>>>> whether
>>>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>>>> indicate the
>>>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>>>> tell us
>>>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>>>> information will be exactly the same for every record in the
>>>>>>>>> topic.
>>>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>>>> itself, so
>>>>>>>>> that the records won't contain so much redundant information.
>>>>>>>>> 
>>>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>>>> something
>>>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>>>> system in
>>>>>>>>> which to register their type information.
>>>>>>>>> 
>>>>>>>>> There does exist an alternative, if we really do want to
>>>>>>>>> preserve the
>>>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>>>> configuration option specifically for the serde to configure
>>>>>>>>> what the
>>>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>>>> 
>>>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>>>> a look
>>>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>>>> Specifically, you might find
>>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>>>> interesting.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> bump
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hi Daniyar,

Thanks for that update. I took a look, and I think this is in good shape.

One note, the statement "New method public static <T> Serde<List<T>>
ListSerde() in org.apache.kafka.common.serialization.Serdes class
(infers list implementation and inner serde from config file)" is
still present in the KIP, although I do it is was removed from the PR.

Once you remove that statement from the KIP, then I think this KIP is
ready to go up for a vote! Then, we can really review the PR in
earnest and get this thing merged.

Thanks,
-john

On Tue, Jul 16, 2019 at 2:05 PM Development <de...@yeralin.net> wrote:
>
> Hi,
>
> Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>
> Feel free to put any comments in there.
>
> Best,
> Daniyar Yeralin
>
> > On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
> >
> > Hi John,
> >
> > I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
> >
> > As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
> >
> > All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
> > What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa.
> >
> > If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
> >
> > Thank you for your input!
> >
> > Best,
> > Daniyar Yeralin
> >
> >> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
> >>
> >> Hi all,
> >>
> >> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
> >>
> >> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
> >>
> >> Thanks,
> >> -John
> >>
> >>
> >> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >> One problem though.
> >>
> >> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
> >>
> >> I created couple constants under StreamsConfig:
> >>
> >>
> >>
> >> And trying to do similar construct:
> >> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
> >> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
> >>
> >> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
> >>
> >>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >>>
> >>> Hi Matthias,
> >>>
> >>> Thank you for your input.
> >>>
> >>> I updated the KIP, made it a little more readable.
> >>>
> >>> I think the configuration parameters strategy is finalized then.
> >>>
> >>> Do you have any other questions/concerns regarding this KIP?
> >>>
> >>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
> >>>>
> >>>> Daniyar,
> >>>>
> >>>> thanks for the update to the KIP. It's in really good shape and well
> >>>> written.
> >>>>
> >>>> About the default constructor question:
> >>>>
> >>>> All Serdes/Serializer/Deserializer classes need a default constructor to
> >>>> create them easily via reflections when specifies in a config. I
> >>>> understand that it is not super user friendly, but all existing code
> >>>> works this way. Hence, it seems best to stick with the established pattern.
> >>>>
> >>>> We have a similar issue with `TimeWindowedSerde` and
> >>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
> >>>> experience that address the exact issue John raised. (cf
> >>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
> >>>>
> >>>> Note, that if a user would instantiate the Serde manually, the user
> >>>> would also need to call `configure()` to setup the inner serdes. Kafka
> >>>> Streams would not setup those automatically and one might most likely
> >>>> end-up with an NPE.
> >>>>
> >>>>
> >>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
> >>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
> >>>> we use the following parameter names:
> >>>>
> >>>> - default.windowed.key.serde.inner
> >>>> - default.windowed.value.serde.inner
> >>>>
> >>>>
> >>>> It might be good to align the naming pattern. I would also suggest to
> >>>> use `type` instead of `impl`?
> >>>>
> >>>>
> >>>> default.key.list.serde.impl  ->  default.list.key.serde.type
> >>>> default.value.list.serde.impl  ->  default.list.value.serde.type
> >>>> default.key.list.serde.element  ->  default.list.key.serde.inner
> >>>> default.value.list.serde.element  ->  default.list.value.serde.inner
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 7/10/19 8:52 AM, Development wrote:
> >>>>> Hi John,
> >>>>>
> >>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
> >>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
> >>>>>
> >>>>> What do you think about that? I hope I’m not confusing anything.
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
> >>>>>>
> >>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
> >>>>>>
> >>>>>> Just one more super-small question, do we need this variant:
> >>>>>>
> >>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
> >>>>>>
> >>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it.
> >>>>>>
> >>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
> >>>>>>
> >>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -John
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> >>>>>>>
> >>>>>>> Hi John,
> >>>>>>>
> >>>>>>> I hope everyone had a great long weekend.
> >>>>>>>
> >>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
> >>>>>>>
> >>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
> >>>>>>>
> >>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
> >>>>>>>
> >>>>>>> Thank you!
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
> >>>>>>>
> >>>>>>> Thanks for the update, Daniyar!
> >>>>>>>
> >>>>>>> In addition to specifying the config interface, can you also specify
> >>>>>>> the Java interface? Namely, if I need to pass an instance of this
> >>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
> >>>>>>> constructor(s) would I have available? Likewise with the Serializer
> >>>>>>> and Deserailizer. I don't think you need to specify the implementation
> >>>>>>> logic, since we've already discussed it here.
> >>>>>>>
> >>>>>>> If you also want to specify the serialized format of the data records
> >>>>>>> in the KIP, it could be useful documentation, as well as letting us
> >>>>>>> verify the schema for forward/backward compatibility concerns, etc.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> John
> >>>>>>>
> >>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hey,
> >>>>>>>
> >>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
> >>>>>>> Sorry for the delay :)
> >>>>>>>
> >>>>>>> Thank You!
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
> >>>>>>>
> >>>>>>> Yes, something like this. I did not think about good configuration
> >>>>>>> parameter names yet. I am also not sure if I understand all proposed
> >>>>>>> configs atm. But all configs should be listed and explained in the KIP
> >>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
> >>>>>>> ask more detailed question if I have any).
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 6/21/19 2:05 PM, Development wrote:
> >>>>>>>
> >>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
> >>>>>>> of primitives.
> >>>>>>>
> >>>>>>> As for the default constructor and configurability, just want to make
> >>>>>>> sure. Is this what you have on your mind?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
> >>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>
> >>>>>>> Thanks for the update!
> >>>>>>>
> >>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> >>>>>>> should have an default constructor and it should be possible to pass in
> >>>>>>> the `Class listClass` information via a configuration. Otherwise,
> >>>>>>> KafkaStreams cannot use it as default serde.
> >>>>>>>
> >>>>>>>
> >>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> >>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
> >>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
> >>>>>>>
> >>>>>>>
> >>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
> >>>>>>> discussion.
> >>>>>>>
> >>>>>>>
> >>>>>>> Can you also update the KIP?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 6/21/19 11:29 AM, Development wrote:
> >>>>>>>
> >>>>>>> I made and pushed necessary commits, so we could review the final
> >>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
> >>>>>>>
> >>>>>>> I also need some advice on writing tests for this new serde. So far I
> >>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
> >>>>>>> if it is enough.
> >>>>>>>
> >>>>>>> Thank y’all for your help in this KIP :)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>
> >>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
> >>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>
> >>>>>>> Hey Daniyar,
> >>>>>>>
> >>>>>>> Looks good to me! Thanks for considering it.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
> >>>>>>> Hey John and Matthias,
> >>>>>>>
> >>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
> >>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
> >>>>>>> realized that’s the type is not really needed in ListSerializer, but
> >>>>>>> only in ListDeserializer:
> >>>>>>>
> >>>>>>>
> >>>>>>> In ListSerializer we will start storing sizes only if serializer is
> >>>>>>> not a primitive serializer:
> >>>>>>>
> >>>>>>>
> >>>>>>> Then, in deserializer, we persist passed list type, so that during
> >>>>>>> deserialization we could create an instance of it with predefined
> >>>>>>> listSize for better performance.
> >>>>>>> We also try to locate a primitiveSize based on passed deserializer.
> >>>>>>> If it is not there, then primitiveSize will be null. Which means
> >>>>>>> that each entry’s size was encoded individually.
> >>>>>>>
> >>>>>>>
> >>>>>>> This looks much cleaner and more concise.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
> >>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
> >>>>>>>
> >>>>>>> For encoding the list-type: I see John's point about re-encoding the
> >>>>>>> list-type redundantly. However, I also don't like the idea that the
> >>>>>>> Deserializer returns a fixed type...
> >>>>>>>
> >>>>>>> Maybe it's best allow users to specify the target list type on
> >>>>>>> deserialization via config?
> >>>>>>>
> >>>>>>> Similar for the primitive types: I don't think we need to encode the
> >>>>>>> type size, but users could specify the type on the deserializer (via a
> >>>>>>> config again)?
> >>>>>>>
> >>>>>>>
> >>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
> >>>>>>> we can
> >>>>>>> support this and a cast will be necessary at some point in the user
> >>>>>>> code.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
> >>>>>>>
> >>>>>>> Hey Daniyar,
> >>>>>>>
> >>>>>>> Thanks for looking at it!
> >>>>>>>
> >>>>>>> Something like your screenshot is more along the lines of what I was
> >>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
> >>>>>>> be "vanilla java"?
> >>>>>>>
> >>>>>>> Unfortunately the deserializer needs more information, though. For
> >>>>>>> example, what if the inner type is a Map<String,String>? The serde
> >>>>>>> could
> >>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
> >>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
> >>>>>>>
> >>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> >>>>>>> /**list type**/ LinkedList.class,
> >>>>>>> /**inner serde**/ new MyRecordSerde()
> >>>>>>> )
> >>>>>>>
> >>>>>>> And in configuration, it's something like:
> >>>>>>> default.key.serde: org...ListSerde
> >>>>>>> default.key.list.serde.type: java.util.LinkedList
> >>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >>>>>>>
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
> >>>>>>>
> >>>>>>> Hey John,
> >>>>>>>
> >>>>>>> I gave read about TypeReference. It could work for the list serde.
> >>>>>>> However, it is not directly
> >>>>>>> supported:
> >>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
> >>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
> >>>>>>> The only way is to pass an actual class object into the constructor,
> >>>>>>> something like:
> >>>>>>>
> >>>>>>> It could be an option, but not a pretty one. What do you think of my
> >>>>>>> approach to use vanilla java and canonical class name? (As described
> >>>>>>> previously)
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi John,
> >>>>>>>
> >>>>>>> Thank you for your input! Yes, my idea looks a little bit over
> >>>>>>> engineered :)
> >>>>>>>
> >>>>>>> I also wanted to see a feedback from Mathias as well since he gave
> >>>>>>> me an idea about storing fixed/variable size entries.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
> >>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
> >>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Daniyar,
> >>>>>>>
> >>>>>>> That's a very clever solution!
> >>>>>>>
> >>>>>>> One observation is that, now, this is what we might call a
> >>>>>>> polymorphic
> >>>>>>> serde. That is, you're detecting the actual concrete type and then
> >>>>>>> promising to produce the exact same concrete type on read.
> >>>>>>> There are
> >>>>>>> some inherent problems with this approach, which in general
> >>>>>>> require
> >>>>>>> some kind of  schema registry (not necessarily Schema
> >>>>>>> Registry, just
> >>>>>>> any registry for schemas) to solve.
> >>>>>>>
> >>>>>>> Notice that every serialized record has quite a bit of duplicated
> >>>>>>> information: the concrete type as well as a byte to indicate
> >>>>>>> whether
> >>>>>>> the value type is a fixed size, and, if so, an integer to
> >>>>>>> indicate the
> >>>>>>> actual size. These constitute a schema, of sorts, because they
> >>>>>>> tell us
> >>>>>>> later how exactly to deserialize the data. Unfortunately, this
> >>>>>>> information is completely redundant. In all likelihood, the
> >>>>>>> information will be exactly the same for every record in the
> >>>>>>> topic.
> >>>>>>> This problem is essentially the core motivation for serializations
> >>>>>>> like Avro: to move the schema outside of the serialization
> >>>>>>> itself, so
> >>>>>>> that the records won't contain so much redundant information.
> >>>>>>>
> >>>>>>> In this light, I'm wondering if it makes sense to go back to
> >>>>>>> something
> >>>>>>> like what you had earlier in which you don't support perfectly
> >>>>>>> preserving the concrete type for _this_ serde, but instead just
> >>>>>>> support deserializing to _some_ List. Then, you could defer full,
> >>>>>>> perfect, type preservation to serdes that have an external
> >>>>>>> system in
> >>>>>>> which to register their type information.
> >>>>>>>
> >>>>>>> There does exist an alternative, if we really do want to
> >>>>>>> preserve the
> >>>>>>> concrete type (which does seem kind of nice). You can add a
> >>>>>>> configuration option specifically for the serde to configure
> >>>>>>> what the
> >>>>>>> list type will be, and maybe what the element type is, as well.
> >>>>>>>
> >>>>>>> As far as "related work" goes, you might be interested to take
> >>>>>>> a look
> >>>>>>> at how Jackson can be configured to deserialize into a specific,
> >>>>>>> arbitrarily nested, generically parameterized class structure.
> >>>>>>> Specifically, you might find
> >>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
> >>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
> >>>>>>> interesting.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> bump
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi,

Pushed new changes under my PR: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>

Feel free to put any comments in there.

Best,
Daniyar Yeralin

> On Jul 15, 2019, at 1:06 PM, Development <de...@yeralin.net> wrote:
> 
> Hi John,
> 
> I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.
> 
> As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.
> 
> All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
> What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa. 
> 
> If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.
> 
> Thank you for your input!
> 
> Best,
> Daniyar Yeralin
> 
>> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
>> 
>> Hi all,
>> 
>> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
>> 
>> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
>> 
>> Thanks,
>> -John
>> 
>> 
>> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>> One problem though. 
>> 
>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
>> 
>> I created couple constants under StreamsConfig:
>> 
>> 
>> 
>> And trying to do similar construct:
>> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
>> 
>> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
>> 
>>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>> 
>>> Hi Matthias,
>>> 
>>> Thank you for your input.
>>> 
>>> I updated the KIP, made it a little more readable.
>>> 
>>> I think the configuration parameters strategy is finalized then.
>>> 
>>> Do you have any other questions/concerns regarding this KIP?
>>> 
>>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> Daniyar,
>>>> 
>>>> thanks for the update to the KIP. It's in really good shape and well
>>>> written.
>>>> 
>>>> About the default constructor question:
>>>> 
>>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>>> create them easily via reflections when specifies in a config. I
>>>> understand that it is not super user friendly, but all existing code
>>>> works this way. Hence, it seems best to stick with the established pattern.
>>>> 
>>>> We have a similar issue with `TimeWindowedSerde` and
>>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>>> experience that address the exact issue John raised. (cf
>>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>>> 
>>>> Note, that if a user would instantiate the Serde manually, the user
>>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>>> Streams would not setup those automatically and one might most likely
>>>> end-up with an NPE.
>>>> 
>>>> 
>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>>> we use the following parameter names:
>>>> 
>>>> - default.windowed.key.serde.inner
>>>> - default.windowed.value.serde.inner
>>>> 
>>>> 
>>>> It might be good to align the naming pattern. I would also suggest to
>>>> use `type` instead of `impl`?
>>>> 
>>>> 
>>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> On 7/10/19 8:52 AM, Development wrote:
>>>>> Hi John,
>>>>> 
>>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>>> 
>>>>> What do you think about that? I hope I’m not confusing anything.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>> 
>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>>> 
>>>>>> Just one more super-small question, do we need this variant: 
>>>>>> 
>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>>> 
>>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
>>>>>> 
>>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>>> 
>>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>> 
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> I hope everyone had a great long weekend.
>>>>>>> 
>>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>>> 
>>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>>> 
>>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>>> 
>>>>>>> Thank you!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>> 
>>>>>>> Thanks for the update, Daniyar!
>>>>>>> 
>>>>>>> In addition to specifying the config interface, can you also specify
>>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>>> logic, since we've already discussed it here.
>>>>>>> 
>>>>>>> If you also want to specify the serialized format of the data records
>>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> John
>>>>>>> 
>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Hey,
>>>>>>> 
>>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>>> Sorry for the delay :)
>>>>>>> 
>>>>>>> Thank You!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>> 
>>>>>>> Yes, something like this. I did not think about good configuration
>>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>>> ask more detailed question if I have any).
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>>> 
>>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>>> of primitives.
>>>>>>> 
>>>>>>> As for the default constructor and configurability, just want to make
>>>>>>> sure. Is this what you have on your mind?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>> 
>>>>>>> Thanks for the update!
>>>>>>> 
>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>>> should have an default constructor and it should be possible to pass in
>>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>>> KafkaStreams cannot use it as default serde.
>>>>>>> 
>>>>>>> 
>>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>>> 
>>>>>>> 
>>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>>> discussion.
>>>>>>> 
>>>>>>> 
>>>>>>> Can you also update the KIP?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>>> 
>>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>> 
>>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>>> if it is enough.
>>>>>>> 
>>>>>>> Thank y’all for your help in this KIP :)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> 
>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>> 
>>>>>>> Hey Daniyar,
>>>>>>> 
>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>>> Hey John and Matthias,
>>>>>>> 
>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>> only in ListDeserializer:
>>>>>>> 
>>>>>>> 
>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>> not a primitive serializer:
>>>>>>> 
>>>>>>> 
>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>> listSize for better performance.
>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>> that each entry’s size was encoded individually.
>>>>>>> 
>>>>>>> 
>>>>>>> This looks much cleaner and more concise.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>>> 
>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>> Deserializer returns a fixed type...
>>>>>>> 
>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>> deserialization via config?
>>>>>>> 
>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>> config again)?
>>>>>>> 
>>>>>>> 
>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>> we can
>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>> code.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -Matthias
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>> 
>>>>>>> Hey Daniyar,
>>>>>>> 
>>>>>>> Thanks for looking at it!
>>>>>>> 
>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>> be "vanilla java"?
>>>>>>> 
>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>> could
>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>> 
>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>> /**list type**/ LinkedList.class,
>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>> )
>>>>>>> 
>>>>>>> And in configuration, it's something like:
>>>>>>> default.key.serde: org...ListSerde
>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>> 
>>>>>>> Hey John,
>>>>>>> 
>>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>>> However, it is not directly
>>>>>>> supported:
>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>>> something like:
>>>>>>> 
>>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>>> previously)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>> 
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>>> engineered :)
>>>>>>> 
>>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>>> me an idea about storing fixed/variable size entries.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Daniyar,
>>>>>>> 
>>>>>>> That's a very clever solution!
>>>>>>> 
>>>>>>> One observation is that, now, this is what we might call a
>>>>>>> polymorphic
>>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>>> promising to produce the exact same concrete type on read.
>>>>>>> There are
>>>>>>> some inherent problems with this approach, which in general
>>>>>>> require
>>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>>> Registry, just
>>>>>>> any registry for schemas) to solve.
>>>>>>> 
>>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>>> information: the concrete type as well as a byte to indicate
>>>>>>> whether
>>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>>> indicate the
>>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>>> tell us
>>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>>> information is completely redundant. In all likelihood, the
>>>>>>> information will be exactly the same for every record in the
>>>>>>> topic.
>>>>>>> This problem is essentially the core motivation for serializations
>>>>>>> like Avro: to move the schema outside of the serialization
>>>>>>> itself, so
>>>>>>> that the records won't contain so much redundant information.
>>>>>>> 
>>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>>> something
>>>>>>> like what you had earlier in which you don't support perfectly
>>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>>> perfect, type preservation to serdes that have an external
>>>>>>> system in
>>>>>>> which to register their type information.
>>>>>>> 
>>>>>>> There does exist an alternative, if we really do want to
>>>>>>> preserve the
>>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>>> configuration option specifically for the serde to configure
>>>>>>> what the
>>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>>> 
>>>>>>> As far as "related work" goes, you might be interested to take
>>>>>>> a look
>>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>>> Specifically, you might find
>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>>> interesting.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> bump
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

I knew I was missing something. Yes, that makes sense now, I removed all `listSerde()` methods, and left empty constructors instead.

As per `CommonClientConfigs` I looked at the class, it doesn’t have any properties related to serdes, and that bothers me a little.

All properties like `default.key.serde` `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want to create a confusion.
What also doesn’t make sense to me is that `WindowedSerdes` and its (de)serializers are not located in org.apache.kafka.common.serialization. I guess it kind of makes sense since windowed serdes are only available for kafka streams, not vice versa. 

If everyone is okay to put list properties in `CommonClientConfigs` class, I’ll go ahead and do that then.

Thank you for your input!

Best,
Daniyar Yeralin

> On Jul 15, 2019, at 11:45 AM, John Roesler <jo...@confluent.io> wrote:
> 
> Hi all,
> 
> Regarding the placement, you might as well move the constants to `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and the configs and the code are in the same module.
> 
> Regarding the constructor... What Matthias said is correct: The serde, serializer, and deserializer all need to have zero-arg constructors so they can be instantiated reflectively by Kafka. However, the factory method you proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a constructor, and is not required. It would be used purely from the Java interface, but has the drawbacks I listed above. This method, not the constructor, is what I proposed to remove from the KIP.
> 
> Thanks,
> -John
> 
> 
> On Mon, Jul 15, 2019 at 10:15 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> One problem though. 
> 
> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.
> 
> I created couple constants under StreamsConfig:
> 
> 
> 
> And trying to do similar construct:
> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
> But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.
> 
> What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?
> 
>> On Jul 15, 2019, at 10:45 AM, Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>> 
>> Hi Matthias,
>> 
>> Thank you for your input.
>> 
>> I updated the KIP, made it a little more readable.
>> 
>> I think the configuration parameters strategy is finalized then.
>> 
>> Do you have any other questions/concerns regarding this KIP?
>> 
>> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>> 
>>> Daniyar,
>>> 
>>> thanks for the update to the KIP. It's in really good shape and well
>>> written.
>>> 
>>> About the default constructor question:
>>> 
>>> All Serdes/Serializer/Deserializer classes need a default constructor to
>>> create them easily via reflections when specifies in a config. I
>>> understand that it is not super user friendly, but all existing code
>>> works this way. Hence, it seems best to stick with the established pattern.
>>> 
>>> We have a similar issue with `TimeWindowedSerde` and
>>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>>> experience that address the exact issue John raised. (cf
>>> https://github.com/apache/kafka/pull/7067 <https://github.com/apache/kafka/pull/7067>)
>>> 
>>> Note, that if a user would instantiate the Serde manually, the user
>>> would also need to call `configure()` to setup the inner serdes. Kafka
>>> Streams would not setup those automatically and one might most likely
>>> end-up with an NPE.
>>> 
>>> 
>>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>>> we use the following parameter names:
>>> 
>>> - default.windowed.key.serde.inner
>>> - default.windowed.value.serde.inner
>>> 
>>> 
>>> It might be good to align the naming pattern. I would also suggest to
>>> use `type` instead of `impl`?
>>> 
>>> 
>>> default.key.list.serde.impl  ->  default.list.key.serde.type
>>> default.value.list.serde.impl  ->  default.list.value.serde.type
>>> default.key.list.serde.element  ->  default.list.key.serde.inner
>>> default.value.list.serde.element  ->  default.list.value.serde.inner
>>> 
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> On 7/10/19 8:52 AM, Development wrote:
>>>> Hi John,
>>>> 
>>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>>> 
>>>> What do you think about that? I hope I’m not confusing anything.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>> 
>>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>>> 
>>>>> Just one more super-small question, do we need this variant: 
>>>>> 
>>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>>> 
>>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
>>>>> 
>>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>>> 
>>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> 
>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> I hope everyone had a great long weekend.
>>>>>> 
>>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>>> 
>>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>>> 
>>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>>> 
>>>>>> Thank you!
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>> 
>>>>>> Thanks for the update, Daniyar!
>>>>>> 
>>>>>> In addition to specifying the config interface, can you also specify
>>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>>> logic, since we've already discussed it here.
>>>>>> 
>>>>>> If you also want to specify the serialized format of the data records
>>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>>> 
>>>>>> Thanks,
>>>>>> John
>>>>>> 
>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> Hey,
>>>>>> 
>>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>>
>>>>>> Sorry for the delay :)
>>>>>> 
>>>>>> Thank You!
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>>> 
>>>>>> Yes, something like this. I did not think about good configuration
>>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>>> ask more detailed question if I have any).
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>>> 
>>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>>> of primitives.
>>>>>> 
>>>>>> As for the default constructor and configurability, just want to make
>>>>>> sure. Is this what you have on your mind?
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>> 
>>>>>> Thanks for the update!
>>>>>> 
>>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>>> should have an default constructor and it should be possible to pass in
>>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>>> KafkaStreams cannot use it as default serde.
>>>>>> 
>>>>>> 
>>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>>> 
>>>>>> 
>>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>>> discussion.
>>>>>> 
>>>>>> 
>>>>>> Can you also update the KIP?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>> 
>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>> 
>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>> if it is enough.
>>>>>> 
>>>>>> Thank y’all for your help in this KIP :)
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> 
>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>> 
>>>>>> Hey Daniyar,
>>>>>> 
>>>>>> Looks good to me! Thanks for considering it.
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>>> Hey John and Matthias,
>>>>>> 
>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>> only in ListDeserializer:
>>>>>> 
>>>>>> 
>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>> not a primitive serializer:
>>>>>> 
>>>>>> 
>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>> deserialization we could create an instance of it with predefined
>>>>>> listSize for better performance.
>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>> that each entry’s size was encoded individually.
>>>>>> 
>>>>>> 
>>>>>> This looks much cleaner and more concise.
>>>>>> 
>>>>>> What do you think?
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>
>>>>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>> <mailto:matthias@confluent.io <ma...@confluent.io> <mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>>>>>> 
>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>> Deserializer returns a fixed type...
>>>>>> 
>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>> deserialization via config?
>>>>>> 
>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>> config again)?
>>>>>> 
>>>>>> 
>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>> we can
>>>>>> support this and a cast will be necessary at some point in the user
>>>>>> code.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>> 
>>>>>> Hey Daniyar,
>>>>>> 
>>>>>> Thanks for looking at it!
>>>>>> 
>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>> be "vanilla java"?
>>>>>> 
>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>> could
>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>> 
>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>> /**list type**/ LinkedList.class,
>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>> )
>>>>>> 
>>>>>> And in configuration, it's something like:
>>>>>> default.key.serde: org...ListSerde
>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>> 
>>>>>> 
>>>>>> What do you think?
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>> 
>>>>>> Hey John,
>>>>>> 
>>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>>> However, it is not directly
>>>>>> supported:
>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>>
>>>>>> The only way is to pass an actual class object into the constructor,
>>>>>> something like:
>>>>>> 
>>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>>> approach to use vanilla java and canonical class name? (As described
>>>>>> previously)
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>> 
>>>>>> Hi John,
>>>>>> 
>>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>>> engineered :)
>>>>>> 
>>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>>> me an idea about storing fixed/variable size entries.
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>
>>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>>> wrote:
>>>>>> 
>>>>>> Hi Daniyar,
>>>>>> 
>>>>>> That's a very clever solution!
>>>>>> 
>>>>>> One observation is that, now, this is what we might call a
>>>>>> polymorphic
>>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>>> promising to produce the exact same concrete type on read.
>>>>>> There are
>>>>>> some inherent problems with this approach, which in general
>>>>>> require
>>>>>> some kind of  schema registry (not necessarily Schema
>>>>>> Registry, just
>>>>>> any registry for schemas) to solve.
>>>>>> 
>>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>>> information: the concrete type as well as a byte to indicate
>>>>>> whether
>>>>>> the value type is a fixed size, and, if so, an integer to
>>>>>> indicate the
>>>>>> actual size. These constitute a schema, of sorts, because they
>>>>>> tell us
>>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>>> information is completely redundant. In all likelihood, the
>>>>>> information will be exactly the same for every record in the
>>>>>> topic.
>>>>>> This problem is essentially the core motivation for serializations
>>>>>> like Avro: to move the schema outside of the serialization
>>>>>> itself, so
>>>>>> that the records won't contain so much redundant information.
>>>>>> 
>>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>>> something
>>>>>> like what you had earlier in which you don't support perfectly
>>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>>> perfect, type preservation to serdes that have an external
>>>>>> system in
>>>>>> which to register their type information.
>>>>>> 
>>>>>> There does exist an alternative, if we really do want to
>>>>>> preserve the
>>>>>> concrete type (which does seem kind of nice). You can add a
>>>>>> configuration option specifically for the serde to configure
>>>>>> what the
>>>>>> list type will be, and maybe what the element type is, as well.
>>>>>> 
>>>>>> As far as "related work" goes, you might be interested to take
>>>>>> a look
>>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>>> arbitrarily nested, generically parameterized class structure.
>>>>>> Specifically, you might find
>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>>
>>>>>> interesting.
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> bump
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hi all,

Regarding the placement, you might as well move the constants to
`org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
the configs and the code are in the same module.

Regarding the constructor... What Matthias said is correct: The serde,
serializer, and deserializer all need to have zero-arg constructors so they
can be instantiated reflectively by Kafka. However, the factory method you
proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a
constructor, and is not required. It would be used purely from the Java
interface, but has the drawbacks I listed above. This method, not the
constructor, is what I proposed to remove from the KIP.

Thanks,
-John


On Mon, Jul 15, 2019 at 10:15 AM Development <de...@yeralin.net> wrote:

> One problem though.
>
> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to
> mimic the implementation of my ListSerde accordingly.
>
> I created couple constants under StreamsConfig:
>
>
>
> And trying to do similar construct:
>
> final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>
> But then found out that *StreamsConfig is not accessible* from
> *org.apache.kafka.common.serialization* package while window serde
> (de)serializers are located under *org.apache.kafka.streams.kstream*
> package.
>
> What should I do? Should I move my classes under *org.apache.kafka.streams.kstream
> *package instead?
>
> On Jul 15, 2019, at 10:45 AM, Development <de...@yeralin.net> wrote:
>
> Hi Matthias,
>
> Thank you for your input.
>
> I updated the KIP, made it a little more readable.
>
> I think the configuration parameters strategy is finalized then.
>
> Do you have any other questions/concerns regarding this KIP?
>
> Meanwhile I’ll start doing appropriate code changes, and commit them under
> my PR.
>
> Best,
> Daniyar Yeralin
>
> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> Daniyar,
>
> thanks for the update to the KIP. It's in really good shape and well
> written.
>
> About the default constructor question:
>
> All Serdes/Serializer/Deserializer classes need a default constructor to
> create them easily via reflections when specifies in a config. I
> understand that it is not super user friendly, but all existing code
> works this way. Hence, it seems best to stick with the established pattern.
>
> We have a similar issue with `TimeWindowedSerde` and
> `SessionWindowedSerde`, and I just recently did a PR to improve user
> experience that address the exact issue John raised. (cf
> https://github.com/apache/kafka/pull/7067)
>
> Note, that if a user would instantiate the Serde manually, the user
> would also need to call `configure()` to setup the inner serdes. Kafka
> Streams would not setup those automatically and one might most likely
> end-up with an NPE.
>
>
> Coming back the KIP, and the parameter names. `WindowedSerdes` are
> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
> we use the following parameter names:
>
> - default.windowed.key.serde.inner
> - default.windowed.value.serde.inner
>
>
> It might be good to align the naming pattern. I would also suggest to
> use `type` instead of `impl`?
>
>
> default.key.list.serde.impl  ->  default.list.key.serde.type
> default.value.list.serde.impl  ->  default.list.value.serde.type
> default.key.list.serde.element  ->  default.list.key.serde.inner
> default.value.list.serde.element  ->  default.list.value.serde.inner
>
>
>
> -Matthias
>
>
> On 7/10/19 8:52 AM, Development wrote:
>
> Hi John,
>
> Yes, I do agree. That totally makes sense. The only thing is that it goes
> against what Matthias suggested earlier:
> "I think that ... `ListSerde` should have an default constructor and it
> should be possible to pass in the `Class listClass` information via a
> configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>
> What do you think about that? I hope I’m not confusing anything.
>
> Best,
> Daniyar Yeralin
>
> On Jul 9, 2019, at 5:56 PM, John Roesler <jo...@confluent.io> wrote:
>
> Ah, my apologies, I must have just overlooked it. Thanks for the update,
> too.
>
> Just one more super-small question, do we need this variant:
>
> New method public static <T> Serde<List<T>> ListSerde() in
> org.apache.kafka.common.serialization.Serdes class (infers list
> implementation and inner serde from config file)
>
>
> It seems like this situation implies my config file is already set up for
> the list serde, so passing this serde (e.g., in Produced) would have the
> same effect as not specifying it.
>
> I guess that it could be the case that you have the
> `default.key/value.serde` set to something else, like StringSerde, but you
> still have the `default.key/value.list.serde.impl/element` set. This seems
> like it would result in more confusion than convenience, so my gut instinct
> is maybe we shouldn't introduce the `ListSerde()` variant until people
> actually request it later on.
>
> Thus, we'd just stick with fully config-driven or fully
> source-code-driven, not half/half.
>
> What do you think?
>
> Thanks,
> -John
>
>
> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>> wrote:
>
>
> Hi John,
>
> I hope everyone had a great long weekend.
>
> Regarding Java interfaces, I may not understand you correctly, but I think
> I already listed them:
>
> So for Produced, you would use it in the following fashion, for example:
> Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>
> I also updated the KIP, and added a section “Serialization Strategy” where
> I describe our logic of conditional serialization based on the type of an
> inner serde.
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <
> mailto:john@confluent.io <jo...@confluent.io>>> wrote:
>
> Thanks for the update, Daniyar!
>
> In addition to specifying the config interface, can you also specify
> the Java interface? Namely, if I need to pass an instance of this
> serde in to the DSL directly, as in Produced, Materialized, etc., what
> constructor(s) would I have available? Likewise with the Serializer
> and Deserailizer. I don't think you need to specify the implementation
> logic, since we've already discussed it here.
>
> If you also want to specify the serialized format of the data records
> in the KIP, it could be useful documentation, as well as letting us
> verify the schema for forward/backward compatibility concerns, etc.
>
> Thanks,
> John
>
> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>> wrote:
>
>
> Hey,
>
> Finally made updates to the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>
> Sorry for the delay :)
>
> Thank You!
>
> Best,
> Daniyar Yeralin
>
> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <
> mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>
> Yes, something like this. I did not think about good configuration
> parameter names yet. I am also not sure if I understand all proposed
> configs atm. But all configs should be listed and explained in the KIP
> anyway, and we can discuss further after you have updated the KIP (I can
> ask more detailed question if I have any).
>
>
> -Matthias
>
> On 6/21/19 2:05 PM, Development wrote:
>
> Yes, you are right. ByteSerializer is not what I need to have in a list
> of primitives.
>
> As for the default constructor and configurability, just want to make
> sure. Is this what you have on your mind?
>
> Best,
> Daniyar Yeralin
>
>
>
> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <
> mailto:matthias@confluent.io <ma...@confluent.io>>
> <mailto:matthias@confluent.io <ma...@confluent.io> <
> mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>
> Thanks for the update!
>
> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> should have an default constructor and it should be possible to pass in
> the `Class listClass` information via a configuration. Otherwise,
> KafkaStreams cannot use it as default serde.
>
>
> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> as is it for `byte[]` with variable length -- it's for arrays, not for
> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>
>
> For tests, we can comment on the PR. No need to do this in the KIP
> discussion.
>
>
> Can you also update the KIP?
>
>
>
> -Matthias
>
>
>
>
>
> On 6/21/19 11:29 AM, Development wrote:
>
> I made and pushed necessary commits, so we could review the final
> version under PR https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> I also need some advice on writing tests for this new serde. So far I
> only have two test cases (roundtrip and empty payload), I’m not sure
> if it is enough.
>
> Thank y’all for your help in this KIP :)
>
> Best,
> Daniyar Yeralin
>
>
> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <
> mailto:john@confluent.io <jo...@confluent.io>>
> <mailto:john@confluent.io <jo...@confluent.io> <mailto:john@confluent.io
> <jo...@confluent.io>>>> wrote:
>
> Hey Daniyar,
>
> Looks good to me! Thanks for considering it.
>
> Thanks,
> -John
>
> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>> wrote:
> Hey John and Matthias,
>
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I
> realized that’s the type is not really needed in ListSerializer, but
> only in ListDeserializer:
>
>
> In ListSerializer we will start storing sizes only if serializer is
> not a primitive serializer:
>
>
> Then, in deserializer, we persist passed list type, so that during
> deserialization we could create an instance of it with predefined
> listSize for better performance.
> We also try to locate a primitiveSize based on passed deserializer.
> If it is not there, then primitiveSize will be null. Which means
> that each entry’s size was encoded individually.
>
>
> This looks much cleaner and more concise.
>
> What do you think?
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <
> mailto:matthias@confluent.io <ma...@confluent.io>>
> <mailto:matthias@confluent.io <ma...@confluent.io> <
> mailto:matthias@confluent.io <ma...@confluent.io>>> <
> mailto:matthias@confluent.io <ma...@confluent.io> <
> mailto:matthias@confluent.io <ma...@confluent.io>>>> wrote:
>
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
>
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
>
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
>
>
> About generics: nesting could be arbitrarily deep. Hence, I doubt
> we can
> support this and a cast will be necessary at some point in the user
> code.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:21 PM, John Roesler wrote:
>
> Hey Daniyar,
>
> Thanks for looking at it!
>
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
>
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde
> could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
>
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> /**list type**/ LinkedList.class,
> /**inner serde**/ new MyRecordSerde()
> )
>
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>
>
> What do you think?
> Thanks,
> -John
>
> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>>> wrote:
>
> Hey John,
>
> I gave read about TypeReference. It could work for the list serde.
> However, it is not directly
> supported:
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>
> <https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>
> The only way is to pass an actual class object into the constructor,
> something like:
>
> It could be an option, but not a pretty one. What do you think of my
> approach to use vanilla java and canonical class name? (As described
> previously)
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>>> wrote:
>
> Hi John,
>
> Thank you for your input! Yes, my idea looks a little bit over
> engineered :)
>
> I also wanted to see a feedback from Mathias as well since he gave
> me an idea about storing fixed/variable size entries.
>
> Best,
> Daniyar Yeralin
>
> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <
> mailto:john@confluent.io <jo...@confluent.io>>
> <mailto:john@confluent.io <jo...@confluent.io> <mailto:john@confluent.io
> <jo...@confluent.io>>> <mailto:john@confluent.io <jo...@confluent.io> <
> mailto:john@confluent.io <jo...@confluent.io>>>
> <mailto:john@confluent.io <jo...@confluent.io> <mailto:john@confluent.io
> <jo...@confluent.io>> <mailto:john@confluent.io <jo...@confluent.io> <
> mailto:john@confluent.io <jo...@confluent.io>>>>> wrote:
>
> Hi Daniyar,
>
> That's a very clever solution!
>
> One observation is that, now, this is what we might call a
> polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read.
> There are
> some inherent problems with this approach, which in general
> require
> some kind of  schema registry (not necessarily Schema
> Registry, just
> any registry for schemas) to solve.
>
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate
> whether
> the value type is a fixed size, and, if so, an integer to
> indicate the
> actual size. These constitute a schema, of sorts, because they
> tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the
> topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization
> itself, so
> that the records won't contain so much redundant information.
>
> In this light, I'm wondering if it makes sense to go back to
> something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external
> system in
> which to register their type information.
>
> There does exist an alternative, if we really do want to
> preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure
> what the
> list type will be, and maybe what the element type is, as well.
>
> As far as "related work" goes, you might be interested to take
> a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>
> interesting.
>
> Thanks,
> -John
>
> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <
> mailto:dev@yeralin.net <de...@yeralin.net>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>
> <mailto:dev@yeralin.net <de...@yeralin.net> <mailto:dev@yeralin.net
> <de...@yeralin.net>> <mailto:dev@yeralin.net <de...@yeralin.net> <
> mailto:dev@yeralin.net <de...@yeralin.net>>>>> wrote:
>
>
> bump
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
One problem though. 

Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to mimic the implementation of my ListSerde accordingly.

I created couple constants under StreamsConfig:



And trying to do similar construct:
final String propertyName = isKey ? StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
But then found out that StreamsConfig is not accessible from org.apache.kafka.common.serialization package while window serde (de)serializers are located under org.apache.kafka.streams.kstream package.

What should I do? Should I move my classes under org.apache.kafka.streams.kstream package instead?

> On Jul 15, 2019, at 10:45 AM, Development <de...@yeralin.net> wrote:
> 
> Hi Matthias,
> 
> Thank you for your input.
> 
> I updated the KIP, made it a little more readable.
> 
> I think the configuration parameters strategy is finalized then.
> 
> Do you have any other questions/concerns regarding this KIP?
> 
> Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>> 
>> Daniyar,
>> 
>> thanks for the update to the KIP. It's in really good shape and well
>> written.
>> 
>> About the default constructor question:
>> 
>> All Serdes/Serializer/Deserializer classes need a default constructor to
>> create them easily via reflections when specifies in a config. I
>> understand that it is not super user friendly, but all existing code
>> works this way. Hence, it seems best to stick with the established pattern.
>> 
>> We have a similar issue with `TimeWindowedSerde` and
>> `SessionWindowedSerde`, and I just recently did a PR to improve user
>> experience that address the exact issue John raised. (cf
>> https://github.com/apache/kafka/pull/7067)
>> 
>> Note, that if a user would instantiate the Serde manually, the user
>> would also need to call `configure()` to setup the inner serdes. Kafka
>> Streams would not setup those automatically and one might most likely
>> end-up with an NPE.
>> 
>> 
>> Coming back the KIP, and the parameter names. `WindowedSerdes` are
>> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
>> we use the following parameter names:
>> 
>> - default.windowed.key.serde.inner
>> - default.windowed.value.serde.inner
>> 
>> 
>> It might be good to align the naming pattern. I would also suggest to
>> use `type` instead of `impl`?
>> 
>> 
>> default.key.list.serde.impl  ->  default.list.key.serde.type
>> default.value.list.serde.impl  ->  default.list.value.serde.type
>> default.key.list.serde.element  ->  default.list.key.serde.inner
>> default.value.list.serde.element  ->  default.list.value.serde.inner
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>> On 7/10/19 8:52 AM, Development wrote:
>>> Hi John,
>>> 
>>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>>> 
>>> What do you think about that? I hope I’m not confusing anything.
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <jo...@confluent.io> wrote:
>>>> 
>>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>>> 
>>>> Just one more super-small question, do we need this variant: 
>>>> 
>>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>>> 
>>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
>>>> 
>>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>>> 
>>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>>> 
>>>> What do you think?
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> 
>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I hope everyone had a great long weekend.
>>>>> 
>>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>>> 
>>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>>> 
>>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>>> 
>>>>> Thank you!
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>> 
>>>>> Thanks for the update, Daniyar!
>>>>> 
>>>>> In addition to specifying the config interface, can you also specify
>>>>> the Java interface? Namely, if I need to pass an instance of this
>>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>>> constructor(s) would I have available? Likewise with the Serializer
>>>>> and Deserailizer. I don't think you need to specify the implementation
>>>>> logic, since we've already discussed it here.
>>>>> 
>>>>> If you also want to specify the serialized format of the data records
>>>>> in the KIP, it could be useful documentation, as well as letting us
>>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>>> 
>>>>> Thanks,
>>>>> John
>>>>> 
>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>> 
>>>>> 
>>>>> Hey,
>>>>> 
>>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
>>>>> Sorry for the delay :)
>>>>> 
>>>>> Thank You!
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>>> 
>>>>> Yes, something like this. I did not think about good configuration
>>>>> parameter names yet. I am also not sure if I understand all proposed
>>>>> configs atm. But all configs should be listed and explained in the KIP
>>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>>> ask more detailed question if I have any).
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> On 6/21/19 2:05 PM, Development wrote:
>>>>> 
>>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>>> of primitives.
>>>>> 
>>>>> As for the default constructor and configurability, just want to make
>>>>> sure. Is this what you have on your mind?
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> 
>>>>> 
>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>> 
>>>>> Thanks for the update!
>>>>> 
>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>> should have an default constructor and it should be possible to pass in
>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>> KafkaStreams cannot use it as default serde.
>>>>> 
>>>>> 
>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>> 
>>>>> 
>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>> discussion.
>>>>> 
>>>>> 
>>>>> Can you also update the KIP?
>>>>> 
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>> 
>>>>> I made and pushed necessary commits, so we could review the final
>>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>>> 
>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>> if it is enough.
>>>>> 
>>>>> Thank y’all for your help in this KIP :)
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> 
>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>> 
>>>>> Hey Daniyar,
>>>>> 
>>>>> Looks good to me! Thanks for considering it.
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>> Hey John and Matthias,
>>>>> 
>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>> only in ListDeserializer:
>>>>> 
>>>>> 
>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>> not a primitive serializer:
>>>>> 
>>>>> 
>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>> deserialization we could create an instance of it with predefined
>>>>> listSize for better performance.
>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>> that each entry’s size was encoded individually.
>>>>> 
>>>>> 
>>>>> This looks much cleaner and more concise.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>>>> <mailto:matthias@confluent.io <ma...@confluent.io>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>>> 
>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>> Deserializer returns a fixed type...
>>>>> 
>>>>> Maybe it's best allow users to specify the target list type on
>>>>> deserialization via config?
>>>>> 
>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>> type size, but users could specify the type on the deserializer (via a
>>>>> config again)?
>>>>> 
>>>>> 
>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>> we can
>>>>> support this and a cast will be necessary at some point in the user
>>>>> code.
>>>>> 
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>> 
>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>> 
>>>>> Hey Daniyar,
>>>>> 
>>>>> Thanks for looking at it!
>>>>> 
>>>>> Something like your screenshot is more along the lines of what I was
>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>> be "vanilla java"?
>>>>> 
>>>>> Unfortunately the deserializer needs more information, though. For
>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>> could
>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>> 
>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>> /**list type**/ LinkedList.class,
>>>>> /**inner serde**/ new MyRecordSerde()
>>>>> )
>>>>> 
>>>>> And in configuration, it's something like:
>>>>> default.key.serde: org...ListSerde
>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>> 
>>>>> 
>>>>> What do you think?
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>> 
>>>>> Hey John,
>>>>> 
>>>>> I gave read about TypeReference. It could work for the list serde.
>>>>> However, it is not directly
>>>>> supported:
>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>>> The only way is to pass an actual class object into the constructor,
>>>>> something like:
>>>>> 
>>>>> It could be an option, but not a pretty one. What do you think of my
>>>>> approach to use vanilla java and canonical class name? (As described
>>>>> previously)
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>>> engineered :)
>>>>> 
>>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>>> me an idea about storing fixed/variable size entries.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io>>
>>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>>> 
>>>>> Hi Daniyar,
>>>>> 
>>>>> That's a very clever solution!
>>>>> 
>>>>> One observation is that, now, this is what we might call a
>>>>> polymorphic
>>>>> serde. That is, you're detecting the actual concrete type and then
>>>>> promising to produce the exact same concrete type on read.
>>>>> There are
>>>>> some inherent problems with this approach, which in general
>>>>> require
>>>>> some kind of  schema registry (not necessarily Schema
>>>>> Registry, just
>>>>> any registry for schemas) to solve.
>>>>> 
>>>>> Notice that every serialized record has quite a bit of duplicated
>>>>> information: the concrete type as well as a byte to indicate
>>>>> whether
>>>>> the value type is a fixed size, and, if so, an integer to
>>>>> indicate the
>>>>> actual size. These constitute a schema, of sorts, because they
>>>>> tell us
>>>>> later how exactly to deserialize the data. Unfortunately, this
>>>>> information is completely redundant. In all likelihood, the
>>>>> information will be exactly the same for every record in the
>>>>> topic.
>>>>> This problem is essentially the core motivation for serializations
>>>>> like Avro: to move the schema outside of the serialization
>>>>> itself, so
>>>>> that the records won't contain so much redundant information.
>>>>> 
>>>>> In this light, I'm wondering if it makes sense to go back to
>>>>> something
>>>>> like what you had earlier in which you don't support perfectly
>>>>> preserving the concrete type for _this_ serde, but instead just
>>>>> support deserializing to _some_ List. Then, you could defer full,
>>>>> perfect, type preservation to serdes that have an external
>>>>> system in
>>>>> which to register their type information.
>>>>> 
>>>>> There does exist an alternative, if we really do want to
>>>>> preserve the
>>>>> concrete type (which does seem kind of nice). You can add a
>>>>> configuration option specifically for the serde to configure
>>>>> what the
>>>>> list type will be, and maybe what the element type is, as well.
>>>>> 
>>>>> As far as "related work" goes, you might be interested to take
>>>>> a look
>>>>> at how Jackson can be configured to deserialize into a specific,
>>>>> arbitrarily nested, generically parameterized class structure.
>>>>> Specifically, you might find
>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>>> interesting.
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>>> 
>>>>> 
>>>>> bump
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi Matthias,

Thank you for your input.

I updated the KIP, made it a little more readable.

I think the configuration parameters strategy is finalized then.

Do you have any other questions/concerns regarding this KIP?

Meanwhile I’ll start doing appropriate code changes, and commit them under my PR.

Best,
Daniyar Yeralin

> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Daniyar,
> 
> thanks for the update to the KIP. It's in really good shape and well
> written.
> 
> About the default constructor question:
> 
> All Serdes/Serializer/Deserializer classes need a default constructor to
> create them easily via reflections when specifies in a config. I
> understand that it is not super user friendly, but all existing code
> works this way. Hence, it seems best to stick with the established pattern.
> 
> We have a similar issue with `TimeWindowedSerde` and
> `SessionWindowedSerde`, and I just recently did a PR to improve user
> experience that address the exact issue John raised. (cf
> https://github.com/apache/kafka/pull/7067)
> 
> Note, that if a user would instantiate the Serde manually, the user
> would also need to call `configure()` to setup the inner serdes. Kafka
> Streams would not setup those automatically and one might most likely
> end-up with an NPE.
> 
> 
> Coming back the KIP, and the parameter names. `WindowedSerdes` are
> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
> we use the following parameter names:
> 
> - default.windowed.key.serde.inner
> - default.windowed.value.serde.inner
> 
> 
> It might be good to align the naming pattern. I would also suggest to
> use `type` instead of `impl`?
> 
> 
> default.key.list.serde.impl  ->  default.list.key.serde.type
> default.value.list.serde.impl  ->  default.list.value.serde.type
> default.key.list.serde.element  ->  default.list.key.serde.inner
> default.value.list.serde.element  ->  default.list.value.serde.inner
> 
> 
> 
> -Matthias
> 
> 
> On 7/10/19 8:52 AM, Development wrote:
>> Hi John,
>> 
>> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
>> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>> 
>> What do you think about that? I hope I’m not confusing anything.
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jul 9, 2019, at 5:56 PM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>> 
>>> Just one more super-small question, do we need this variant: 
>>> 
>>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>> 
>>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
>>> 
>>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>> 
>>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>> 
>>> What do you think?
>>> 
>>> Thanks,
>>> -John
>>> 
>>> 
>>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>> 
>>>> Hi John,
>>>> 
>>>> I hope everyone had a great long weekend.
>>>> 
>>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>> 
>>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>> 
>>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>> 
>>>> Thank you!
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> Thanks for the update, Daniyar!
>>>> 
>>>> In addition to specifying the config interface, can you also specify
>>>> the Java interface? Namely, if I need to pass an instance of this
>>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>>> constructor(s) would I have available? Likewise with the Serializer
>>>> and Deserailizer. I don't think you need to specify the implementation
>>>> logic, since we've already discussed it here.
>>>> 
>>>> If you also want to specify the serialized format of the data records
>>>> in the KIP, it could be useful documentation, as well as letting us
>>>> verify the schema for forward/backward compatibility concerns, etc.
>>>> 
>>>> Thanks,
>>>> John
>>>> 
>>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>> 
>>>> 
>>>> Hey,
>>>> 
>>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
>>>> Sorry for the delay :)
>>>> 
>>>> Thank You!
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> Yes, something like this. I did not think about good configuration
>>>> parameter names yet. I am also not sure if I understand all proposed
>>>> configs atm. But all configs should be listed and explained in the KIP
>>>> anyway, and we can discuss further after you have updated the KIP (I can
>>>> ask more detailed question if I have any).
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> On 6/21/19 2:05 PM, Development wrote:
>>>> 
>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>> of primitives.
>>>> 
>>>> As for the default constructor and configurability, just want to make
>>>> sure. Is this what you have on your mind?
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> 
>>>> 
>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>> 
>>>> Thanks for the update!
>>>> 
>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>> should have an default constructor and it should be possible to pass in
>>>> the `Class listClass` information via a configuration. Otherwise,
>>>> KafkaStreams cannot use it as default serde.
>>>> 
>>>> 
>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>> 
>>>> 
>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>> discussion.
>>>> 
>>>> 
>>>> Can you also update the KIP?
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 6/21/19 11:29 AM, Development wrote:
>>>> 
>>>> I made and pushed necessary commits, so we could review the final
>>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>> 
>>>> I also need some advice on writing tests for this new serde. So far I
>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>> if it is enough.
>>>> 
>>>> Thank y’all for your help in this KIP :)
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> 
>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>> 
>>>> Hey Daniyar,
>>>> 
>>>> Looks good to me! Thanks for considering it.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>> Hey John and Matthias,
>>>> 
>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>> realized that’s the type is not really needed in ListSerializer, but
>>>> only in ListDeserializer:
>>>> 
>>>> 
>>>> In ListSerializer we will start storing sizes only if serializer is
>>>> not a primitive serializer:
>>>> 
>>>> 
>>>> Then, in deserializer, we persist passed list type, so that during
>>>> deserialization we could create an instance of it with predefined
>>>> listSize for better performance.
>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>> If it is not there, then primitiveSize will be null. Which means
>>>> that each entry’s size was encoded individually.
>>>> 
>>>> 
>>>> This looks much cleaner and more concise.
>>>> 
>>>> What do you think?
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>>> <mailto:matthias@confluent.io <ma...@confluent.io>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>> 
>>>> For encoding the list-type: I see John's point about re-encoding the
>>>> list-type redundantly. However, I also don't like the idea that the
>>>> Deserializer returns a fixed type...
>>>> 
>>>> Maybe it's best allow users to specify the target list type on
>>>> deserialization via config?
>>>> 
>>>> Similar for the primitive types: I don't think we need to encode the
>>>> type size, but users could specify the type on the deserializer (via a
>>>> config again)?
>>>> 
>>>> 
>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>> we can
>>>> support this and a cast will be necessary at some point in the user
>>>> code.
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> 
>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>> 
>>>> Hey Daniyar,
>>>> 
>>>> Thanks for looking at it!
>>>> 
>>>> Something like your screenshot is more along the lines of what I was
>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>> be "vanilla java"?
>>>> 
>>>> Unfortunately the deserializer needs more information, though. For
>>>> example, what if the inner type is a Map<String,String>? The serde
>>>> could
>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>> 
>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>> /**list type**/ LinkedList.class,
>>>> /**inner serde**/ new MyRecordSerde()
>>>> )
>>>> 
>>>> And in configuration, it's something like:
>>>> default.key.serde: org...ListSerde
>>>> default.key.list.serde.type: java.util.LinkedList
>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>> 
>>>> 
>>>> What do you think?
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>> 
>>>> Hey John,
>>>> 
>>>> I gave read about TypeReference. It could work for the list serde.
>>>> However, it is not directly
>>>> supported:
>>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>> The only way is to pass an actual class object into the constructor,
>>>> something like:
>>>> 
>>>> It could be an option, but not a pretty one. What do you think of my
>>>> approach to use vanilla java and canonical class name? (As described
>>>> previously)
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>> 
>>>> Hi John,
>>>> 
>>>> Thank you for your input! Yes, my idea looks a little bit over
>>>> engineered :)
>>>> 
>>>> I also wanted to see a feedback from Mathias as well since he gave
>>>> me an idea about storing fixed/variable size entries.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io>>
>>>> <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>> 
>>>> Hi Daniyar,
>>>> 
>>>> That's a very clever solution!
>>>> 
>>>> One observation is that, now, this is what we might call a
>>>> polymorphic
>>>> serde. That is, you're detecting the actual concrete type and then
>>>> promising to produce the exact same concrete type on read.
>>>> There are
>>>> some inherent problems with this approach, which in general
>>>> require
>>>> some kind of  schema registry (not necessarily Schema
>>>> Registry, just
>>>> any registry for schemas) to solve.
>>>> 
>>>> Notice that every serialized record has quite a bit of duplicated
>>>> information: the concrete type as well as a byte to indicate
>>>> whether
>>>> the value type is a fixed size, and, if so, an integer to
>>>> indicate the
>>>> actual size. These constitute a schema, of sorts, because they
>>>> tell us
>>>> later how exactly to deserialize the data. Unfortunately, this
>>>> information is completely redundant. In all likelihood, the
>>>> information will be exactly the same for every record in the
>>>> topic.
>>>> This problem is essentially the core motivation for serializations
>>>> like Avro: to move the schema outside of the serialization
>>>> itself, so
>>>> that the records won't contain so much redundant information.
>>>> 
>>>> In this light, I'm wondering if it makes sense to go back to
>>>> something
>>>> like what you had earlier in which you don't support perfectly
>>>> preserving the concrete type for _this_ serde, but instead just
>>>> support deserializing to _some_ List. Then, you could defer full,
>>>> perfect, type preservation to serdes that have an external
>>>> system in
>>>> which to register their type information.
>>>> 
>>>> There does exist an alternative, if we really do want to
>>>> preserve the
>>>> concrete type (which does seem kind of nice). You can add a
>>>> configuration option specifically for the serde to configure
>>>> what the
>>>> list type will be, and maybe what the element type is, as well.
>>>> 
>>>> As far as "related work" goes, you might be interested to take
>>>> a look
>>>> at how Jackson can be configured to deserialize into a specific,
>>>> arbitrarily nested, generically parameterized class structure.
>>>> Specifically, you might find
>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>> interesting.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>> 
>>>> 
>>>> bump
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Daniyar,

thanks for the update to the KIP. It's in really good shape and well
written.

About the default constructor question:

All Serdes/Serializer/Deserializer classes need a default constructor to
create them easily via reflections when specifies in a config. I
understand that it is not super user friendly, but all existing code
works this way. Hence, it seems best to stick with the established pattern.

We have a similar issue with `TimeWindowedSerde` and
`SessionWindowedSerde`, and I just recently did a PR to improve user
experience that address the exact issue John raised. (cf
https://github.com/apache/kafka/pull/7067)

Note, that if a user would instantiate the Serde manually, the user
would also need to call `configure()` to setup the inner serdes. Kafka
Streams would not setup those automatically and one might most likely
end-up with an NPE.


Coming back the KIP, and the parameter names. `WindowedSerdes` are
similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
we use the following parameter names:

- default.windowed.key.serde.inner
- default.windowed.value.serde.inner


It might be good to align the naming pattern. I would also suggest to
use `type` instead of `impl`?


default.key.list.serde.impl  ->  default.list.key.serde.type
default.value.list.serde.impl  ->  default.list.value.serde.type
default.key.list.serde.element  ->  default.list.key.serde.inner
default.value.list.serde.element  ->  default.list.value.serde.inner



-Matthias


On 7/10/19 8:52 AM, Development wrote:
> Hi John,
> 
> Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
> "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
> 
> What do you think about that? I hope I’m not confusing anything.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jul 9, 2019, at 5:56 PM, John Roesler <jo...@confluent.io> wrote:
>>
>> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
>>
>> Just one more super-small question, do we need this variant: 
>>
>>> New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
>>
>> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
>>
>> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
>>
>> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
>>
>> What do you think?
>>
>> Thanks,
>> -John
>>
>>
>> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>
>>> Hi John,
>>>
>>> I hope everyone had a great long weekend.
>>>
>>> Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
>>>
>>> So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>>>
>>> I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
>>>
>>> Thank you!
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>> On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>
>>> Thanks for the update, Daniyar!
>>>
>>> In addition to specifying the config interface, can you also specify
>>> the Java interface? Namely, if I need to pass an instance of this
>>> serde in to the DSL directly, as in Produced, Materialized, etc., what
>>> constructor(s) would I have available? Likewise with the Serializer
>>> and Deserailizer. I don't think you need to specify the implementation
>>> logic, since we've already discussed it here.
>>>
>>> If you also want to specify the serialized format of the data records
>>> in the KIP, it could be useful documentation, as well as letting us
>>> verify the schema for forward/backward compatibility concerns, etc.
>>>
>>> Thanks,
>>> John
>>>
>>> On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>
>>>
>>> Hey,
>>>
>>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
>>> Sorry for the delay :)
>>>
>>> Thank You!
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>
>>> Yes, something like this. I did not think about good configuration
>>> parameter names yet. I am also not sure if I understand all proposed
>>> configs atm. But all configs should be listed and explained in the KIP
>>> anyway, and we can discuss further after you have updated the KIP (I can
>>> ask more detailed question if I have any).
>>>
>>>
>>> -Matthias
>>>
>>> On 6/21/19 2:05 PM, Development wrote:
>>>
>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>> of primitives.
>>>
>>> As for the default constructor and configurability, just want to make
>>> sure. Is this what you have on your mind?
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>
>>>
>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>
>>> Thanks for the update!
>>>
>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>> should have an default constructor and it should be possible to pass in
>>> the `Class listClass` information via a configuration. Otherwise,
>>> KafkaStreams cannot use it as default serde.
>>>
>>>
>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>
>>>
>>> For tests, we can comment on the PR. No need to do this in the KIP
>>> discussion.
>>>
>>>
>>> Can you also update the KIP?
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>>
>>>
>>> On 6/21/19 11:29 AM, Development wrote:
>>>
>>> I made and pushed necessary commits, so we could review the final
>>> version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>>>
>>> I also need some advice on writing tests for this new serde. So far I
>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>> if it is enough.
>>>
>>> Thank y’all for your help in this KIP :)
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>
>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>> <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>
>>> Hey Daniyar,
>>>
>>> Looks good to me! Thanks for considering it.
>>>
>>> Thanks,
>>> -John
>>>
>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>> Hey John and Matthias,
>>>
>>> Yes, now I see it all. I’m storing lots of redundant information.
>>> Here is my final idea. Yes, now a user should pass a list type. I
>>> realized that’s the type is not really needed in ListSerializer, but
>>> only in ListDeserializer:
>>>
>>>
>>> In ListSerializer we will start storing sizes only if serializer is
>>> not a primitive serializer:
>>>
>>>
>>> Then, in deserializer, we persist passed list type, so that during
>>> deserialization we could create an instance of it with predefined
>>> listSize for better performance.
>>> We also try to locate a primitiveSize based on passed deserializer.
>>> If it is not there, then primitiveSize will be null. Which means
>>> that each entry’s size was encoded individually.
>>>
>>>
>>> This looks much cleaner and more concise.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
>>> <mailto:matthias@confluent.io <ma...@confluent.io>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
>>>
>>> For encoding the list-type: I see John's point about re-encoding the
>>> list-type redundantly. However, I also don't like the idea that the
>>> Deserializer returns a fixed type...
>>>
>>> Maybe it's best allow users to specify the target list type on
>>> deserialization via config?
>>>
>>> Similar for the primitive types: I don't think we need to encode the
>>> type size, but users could specify the type on the deserializer (via a
>>> config again)?
>>>
>>>
>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>> we can
>>> support this and a cast will be necessary at some point in the user
>>> code.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>
>>> Hey Daniyar,
>>>
>>> Thanks for looking at it!
>>>
>>> Something like your screenshot is more along the lines of what I was
>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>> be "vanilla java"?
>>>
>>> Unfortunately the deserializer needs more information, though. For
>>> example, what if the inner type is a Map<String,String>? The serde
>>> could
>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>
>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>> /**list type**/ LinkedList.class,
>>> /**inner serde**/ new MyRecordSerde()
>>> )
>>>
>>> And in configuration, it's something like:
>>> default.key.serde: org...ListSerde
>>> default.key.list.serde.type: java.util.LinkedList
>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>
>>>
>>> What do you think?
>>> Thanks,
>>> -John
>>>
>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>
>>>  Hey John,
>>>
>>>  I gave read about TypeReference. It could work for the list serde.
>>>  However, it is not directly
>>>  supported:
>>> https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>> <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
>>>  The only way is to pass an actual class object into the constructor,
>>>  something like:
>>>
>>>  It could be an option, but not a pretty one. What do you think of my
>>>  approach to use vanilla java and canonical class name? (As described
>>>  previously)
>>>
>>>  Best,
>>>  Daniyar Yeralin
>>>
>>>  On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>  <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>
>>>  Hi John,
>>>
>>>  Thank you for your input! Yes, my idea looks a little bit over
>>>  engineered :)
>>>
>>>  I also wanted to see a feedback from Mathias as well since he gave
>>>  me an idea about storing fixed/variable size entries.
>>>
>>>  Best,
>>>  Daniyar Yeralin
>>>
>>>  On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>> <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io>>
>>>  <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
>>>
>>>  Hi Daniyar,
>>>
>>>  That's a very clever solution!
>>>
>>>  One observation is that, now, this is what we might call a
>>>  polymorphic
>>>  serde. That is, you're detecting the actual concrete type and then
>>>  promising to produce the exact same concrete type on read.
>>> There are
>>>  some inherent problems with this approach, which in general
>>> require
>>>  some kind of  schema registry (not necessarily Schema
>>> Registry, just
>>>  any registry for schemas) to solve.
>>>
>>>  Notice that every serialized record has quite a bit of duplicated
>>>  information: the concrete type as well as a byte to indicate
>>> whether
>>>  the value type is a fixed size, and, if so, an integer to
>>>  indicate the
>>>  actual size. These constitute a schema, of sorts, because they
>>>  tell us
>>>  later how exactly to deserialize the data. Unfortunately, this
>>>  information is completely redundant. In all likelihood, the
>>>  information will be exactly the same for every record in the
>>> topic.
>>>  This problem is essentially the core motivation for serializations
>>>  like Avro: to move the schema outside of the serialization
>>> itself, so
>>>  that the records won't contain so much redundant information.
>>>
>>>  In this light, I'm wondering if it makes sense to go back to
>>>  something
>>>  like what you had earlier in which you don't support perfectly
>>>  preserving the concrete type for _this_ serde, but instead just
>>>  support deserializing to _some_ List. Then, you could defer full,
>>>  perfect, type preservation to serdes that have an external
>>> system in
>>>  which to register their type information.
>>>
>>>  There does exist an alternative, if we really do want to
>>> preserve the
>>>  concrete type (which does seem kind of nice). You can add a
>>>  configuration option specifically for the serde to configure
>>> what the
>>>  list type will be, and maybe what the element type is, as well.
>>>
>>>  As far as "related work" goes, you might be interested to take
>>> a look
>>>  at how Jackson can be configured to deserialize into a specific,
>>>  arbitrarily nested, generically parameterized class structure.
>>>  Specifically, you might find
>>>  https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
>>>  interesting.
>>>
>>>  Thanks,
>>>  -John
>>>
>>>  On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
>>>  <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
>>>
>>>
>>>  bump
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier:
"I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”

What do you think about that? I hope I’m not confusing anything.

Best,
Daniyar Yeralin

> On Jul 9, 2019, at 5:56 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
> 
> Just one more super-small question, do we need this variant: 
> 
> > New method public static <T> Serde<List<T>> ListSerde() in org.apache.kafka.common.serialization.Serdes class (infers list implementation and inner serde from config file)
> 
> It seems like this situation implies my config file is already set up for the list serde, so passing this serde (e.g., in Produced) would have the same effect as not specifying it. 
> 
> I guess that it could be the case that you have the `default.key/value.serde` set to something else, like StringSerde, but you still have the `default.key/value.list.serde.impl/element` set. This seems like it would result in more confusion than convenience, so my gut instinct is maybe we shouldn't introduce the `ListSerde()` variant until people actually request it later on.
> 
> Thus, we'd just stick with fully config-driven or fully source-code-driven, not half/half.
> 
> What do you think?
> 
> Thanks,
> -John
> 
> 
> On Tue, Jul 9, 2019 at 9:58 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >
> > Hi John,
> >
> > I hope everyone had a great long weekend.
> >
> > Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:
> >
> > So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
> >
> > I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.
> >
> > Thank you!
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 26, 2019, at 11:44 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
> >
> > Thanks for the update, Daniyar!
> >
> > In addition to specifying the config interface, can you also specify
> > the Java interface? Namely, if I need to pass an instance of this
> > serde in to the DSL directly, as in Produced, Materialized, etc., what
> > constructor(s) would I have available? Likewise with the Serializer
> > and Deserailizer. I don't think you need to specify the implementation
> > logic, since we've already discussed it here.
> >
> > If you also want to specify the serialized format of the data records
> > in the KIP, it could be useful documentation, as well as letting us
> > verify the schema for forward/backward compatibility concerns, etc.
> >
> > Thanks,
> > John
> >
> > On Wed, Jun 26, 2019 at 10:33 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >
> >
> > Hey,
> >
> > Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
> > Sorry for the delay :)
> >
> > Thank You!
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
> >
> > Yes, something like this. I did not think about good configuration
> > parameter names yet. I am also not sure if I understand all proposed
> > configs atm. But all configs should be listed and explained in the KIP
> > anyway, and we can discuss further after you have updated the KIP (I can
> > ask more detailed question if I have any).
> >
> >
> > -Matthias
> >
> > On 6/21/19 2:05 PM, Development wrote:
> >
> > Yes, you are right. ByteSerializer is not what I need to have in a list
> > of primitives.
> >
> > As for the default constructor and configurability, just want to make
> > sure. Is this what you have on your mind?
> >
> > Best,
> > Daniyar Yeralin
> >
> >
> >
> > On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
> > <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
> >
> > Thanks for the update!
> >
> > I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> > should have an default constructor and it should be possible to pass in
> > the `Class listClass` information via a configuration. Otherwise,
> > KafkaStreams cannot use it as default serde.
> >
> >
> > For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> > as is it for `byte[]` with variable length -- it's for arrays, not for
> > single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
> >
> >
> > For tests, we can comment on the PR. No need to do this in the KIP
> > discussion.
> >
> >
> > Can you also update the KIP?
> >
> >
> >
> > -Matthias
> >
> >
> >
> >
> >
> > On 6/21/19 11:29 AM, Development wrote:
> >
> > I made and pushed necessary commits, so we could review the final
> > version under PR https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
> >
> > I also need some advice on writing tests for this new serde. So far I
> > only have two test cases (roundtrip and empty payload), I’m not sure
> > if it is enough.
> >
> > Thank y’all for your help in this KIP :)
> >
> > Best,
> > Daniyar Yeralin
> >
> >
> > On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io <ma...@confluent.io>
> > <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
> >
> > Hey Daniyar,
> >
> > Looks good to me! Thanks for considering it.
> >
> > Thanks,
> > -John
> >
> > On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>
> > <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> > Hey John and Matthias,
> >
> > Yes, now I see it all. I’m storing lots of redundant information.
> > Here is my final idea. Yes, now a user should pass a list type. I
> > realized that’s the type is not really needed in ListSerializer, but
> > only in ListDeserializer:
> >
> >
> > In ListSerializer we will start storing sizes only if serializer is
> > not a primitive serializer:
> >
> >
> > Then, in deserializer, we persist passed list type, so that during
> > deserialization we could create an instance of it with predefined
> > listSize for better performance.
> > We also try to locate a primitiveSize based on passed deserializer.
> > If it is not there, then primitiveSize will be null. Which means
> > that each entry’s size was encoded individually.
> >
> >
> > This looks much cleaner and more concise.
> >
> > What do you think?
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>
> > <mailto:matthias@confluent.io <ma...@confluent.io>> <mailto:matthias@confluent.io <ma...@confluent.io>>> wrote:
> >
> > For encoding the list-type: I see John's point about re-encoding the
> > list-type redundantly. However, I also don't like the idea that the
> > Deserializer returns a fixed type...
> >
> > Maybe it's best allow users to specify the target list type on
> > deserialization via config?
> >
> > Similar for the primitive types: I don't think we need to encode the
> > type size, but users could specify the type on the deserializer (via a
> > config again)?
> >
> >
> > About generics: nesting could be arbitrarily deep. Hence, I doubt
> > we can
> > support this and a cast will be necessary at some point in the user
> > code.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 6/20/19 1:21 PM, John Roesler wrote:
> >
> > Hey Daniyar,
> >
> > Thanks for looking at it!
> >
> > Something like your screenshot is more along the lines of what I was
> > thinking. Sorry, but I didn't follow what you mean, how would that not
> > be "vanilla java"?
> >
> > Unfortunately the deserializer needs more information, though. For
> > example, what if the inner type is a Map<String,String>? The serde
> > could
> > only be used to produce a LinkedList<Map>, thus, we'd still need an
> > inner serde, like you have in the KIP (Serde<T> innerSerde).
> >
> > Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> > /**list type**/ LinkedList.class,
> > /**inner serde**/ new MyRecordSerde()
> > )
> >
> > And in configuration, it's something like:
> > default.key.serde: org...ListSerde
> > default.key.list.serde.type: java.util.LinkedList
> > default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >
> >
> > What do you think?
> > Thanks,
> > -John
> >
> > On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
> > <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> > <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
> >
> >  Hey John,
> >
> >  I gave read about TypeReference. It could work for the list serde.
> >  However, it is not directly
> >  supported:
> > https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
> > <https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>>
> >  The only way is to pass an actual class object into the constructor,
> >  something like:
> >
> >  It could be an option, but not a pretty one. What do you think of my
> >  approach to use vanilla java and canonical class name? (As described
> >  previously)
> >
> >  Best,
> >  Daniyar Yeralin
> >
> >  On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
> > <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >  <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
> >
> >  Hi John,
> >
> >  Thank you for your input! Yes, my idea looks a little bit over
> >  engineered :)
> >
> >  I also wanted to see a feedback from Mathias as well since he gave
> >  me an idea about storing fixed/variable size entries.
> >
> >  Best,
> >  Daniyar Yeralin
> >
> >  On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
> > <mailto:john@confluent.io <ma...@confluent.io>> <mailto:john@confluent.io <ma...@confluent.io>>
> >  <mailto:john@confluent.io <ma...@confluent.io> <mailto:john@confluent.io <ma...@confluent.io>>>> wrote:
> >
> >  Hi Daniyar,
> >
> >  That's a very clever solution!
> >
> >  One observation is that, now, this is what we might call a
> >  polymorphic
> >  serde. That is, you're detecting the actual concrete type and then
> >  promising to produce the exact same concrete type on read.
> > There are
> >  some inherent problems with this approach, which in general
> > require
> >  some kind of  schema registry (not necessarily Schema
> > Registry, just
> >  any registry for schemas) to solve.
> >
> >  Notice that every serialized record has quite a bit of duplicated
> >  information: the concrete type as well as a byte to indicate
> > whether
> >  the value type is a fixed size, and, if so, an integer to
> >  indicate the
> >  actual size. These constitute a schema, of sorts, because they
> >  tell us
> >  later how exactly to deserialize the data. Unfortunately, this
> >  information is completely redundant. In all likelihood, the
> >  information will be exactly the same for every record in the
> > topic.
> >  This problem is essentially the core motivation for serializations
> >  like Avro: to move the schema outside of the serialization
> > itself, so
> >  that the records won't contain so much redundant information.
> >
> >  In this light, I'm wondering if it makes sense to go back to
> >  something
> >  like what you had earlier in which you don't support perfectly
> >  preserving the concrete type for _this_ serde, but instead just
> >  support deserializing to _some_ List. Then, you could defer full,
> >  perfect, type preservation to serdes that have an external
> > system in
> >  which to register their type information.
> >
> >  There does exist an alternative, if we really do want to
> > preserve the
> >  concrete type (which does seem kind of nice). You can add a
> >  configuration option specifically for the serde to configure
> > what the
> >  list type will be, and maybe what the element type is, as well.
> >
> >  As far as "related work" goes, you might be interested to take
> > a look
> >  at how Jackson can be configured to deserialize into a specific,
> >  arbitrarily nested, generically parameterized class structure.
> >  Specifically, you might find
> >  https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
> >  interesting.
> >
> >  Thanks,
> >  -John
> >
> >  On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
> > <mailto:dev@yeralin.net <ma...@yeralin.net>> <mailto:dev@yeralin.net <ma...@yeralin.net>>
> >  <mailto:dev@yeralin.net <ma...@yeralin.net> <mailto:dev@yeralin.net <ma...@yeralin.net>>>> wrote:
> >
> >
> >  bump
> >
> >
> >
> >
> >
> >
> >
> >


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Ah, my apologies, I must have just overlooked it. Thanks for the update,
too.

Just one more super-small question, do we need this variant:

> New method public static <T> Serde<List<T>> ListSerde() in
org.apache.kafka.common.serialization.Serdes class (infers list
implementation and inner serde from config file)

It seems like this situation implies my config file is already set up for
the list serde, so passing this serde (e.g., in Produced) would have the
same effect as not specifying it.

I guess that it could be the case that you have the
`default.key/value.serde` set to something else, like StringSerde, but you
still have the `default.key/value.list.serde.impl/element` set. This seems
like it would result in more confusion than convenience, so my gut instinct
is maybe we shouldn't introduce the `ListSerde()` variant until people
actually request it later on.

Thus, we'd just stick with fully config-driven or fully source-code-driven,
not half/half.

What do you think?

Thanks,
-John


On Tue, Jul 9, 2019 at 9:58 AM Development <de...@yeralin.net> wrote:
>
> Hi John,
>
> I hope everyone had a great long weekend.
>
> Regarding Java interfaces, I may not understand you correctly, but I
think I already listed them:
>
> So for Produced, you would use it in the following fashion, for example:
Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>
> I also updated the KIP, and added a section “Serialization Strategy”
where I describe our logic of conditional serialization based on the type
of an inner serde.
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> On Jun 26, 2019, at 11:44 AM, John Roesler <jo...@confluent.io> wrote:
>
> Thanks for the update, Daniyar!
>
> In addition to specifying the config interface, can you also specify
> the Java interface? Namely, if I need to pass an instance of this
> serde in to the DSL directly, as in Produced, Materialized, etc., what
> constructor(s) would I have available? Likewise with the Serializer
> and Deserailizer. I don't think you need to specify the implementation
> logic, since we've already discussed it here.
>
> If you also want to specify the serialized format of the data records
> in the KIP, it could be useful documentation, as well as letting us
> verify the schema for forward/backward compatibility concerns, etc.
>
> Thanks,
> John
>
> On Wed, Jun 26, 2019 at 10:33 AM Development <de...@yeralin.net> wrote:
>
>
> Hey,
>
> Finally made updates to the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
>
> Sorry for the delay :)
>
> Thank You!
>
> Best,
> Daniyar Yeralin
>
> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <ma...@confluent.io>
wrote:
>
> Yes, something like this. I did not think about good configuration
> parameter names yet. I am also not sure if I understand all proposed
> configs atm. But all configs should be listed and explained in the KIP
> anyway, and we can discuss further after you have updated the KIP (I can
> ask more detailed question if I have any).
>
>
> -Matthias
>
> On 6/21/19 2:05 PM, Development wrote:
>
> Yes, you are right. ByteSerializer is not what I need to have in a list
> of primitives.
>
> As for the default constructor and configurability, just want to make
> sure. Is this what you have on your mind?
>
> Best,
> Daniyar Yeralin
>
>
>
> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io
> <ma...@confluent.io>> wrote:
>
> Thanks for the update!
>
> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> should have an default constructor and it should be possible to pass in
> the `Class listClass` information via a configuration. Otherwise,
> KafkaStreams cannot use it as default serde.
>
>
> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> as is it for `byte[]` with variable length -- it's for arrays, not for
> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>
>
> For tests, we can comment on the PR. No need to do this in the KIP
> discussion.
>
>
> Can you also update the KIP?
>
>
>
> -Matthias
>
>
>
>
>
> On 6/21/19 11:29 AM, Development wrote:
>
> I made and pushed necessary commits, so we could review the final
> version under PR https://github.com/apache/kafka/pull/6592
>
> I also need some advice on writing tests for this new serde. So far I
> only have two test cases (roundtrip and empty payload), I’m not sure
> if it is enough.
>
> Thank y’all for your help in this KIP :)
>
> Best,
> Daniyar Yeralin
>
>
> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io
> <ma...@confluent.io>> wrote:
>
> Hey Daniyar,
>
> Looks good to me! Thanks for considering it.
>
> Thanks,
> -John
>
> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net
> <ma...@yeralin.net> <ma...@yeralin.net>> wrote:
> Hey John and Matthias,
>
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I
> realized that’s the type is not really needed in ListSerializer, but
> only in ListDeserializer:
>
>
> In ListSerializer we will start storing sizes only if serializer is
> not a primitive serializer:
>
>
> Then, in deserializer, we persist passed list type, so that during
> deserialization we could create an instance of it with predefined
> listSize for better performance.
> We also try to locate a primitiveSize based on passed deserializer.
> If it is not there, then primitiveSize will be null. Which means
> that each entry’s size was encoded individually.
>
>
> This looks much cleaner and more concise.
>
> What do you think?
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io
> <ma...@confluent.io> <ma...@confluent.io>> wrote:
>
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
>
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
>
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
>
>
> About generics: nesting could be arbitrarily deep. Hence, I doubt
> we can
> support this and a cast will be necessary at some point in the user
> code.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:21 PM, John Roesler wrote:
>
> Hey Daniyar,
>
> Thanks for looking at it!
>
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
>
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde
> could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
>
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> /**list type**/ LinkedList.class,
> /**inner serde**/ new MyRecordSerde()
> )
>
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>
>
> What do you think?
> Thanks,
> -John
>
> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
> <ma...@yeralin.net> <ma...@yeralin.net>
> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>
>  Hey John,
>
>  I gave read about TypeReference. It could work for the list serde.
>  However, it is not directly
>  supported:
> https://github.com/FasterXML/jackson-databind/issues/1490
> <https://github.com/FasterXML/jackson-databind/issues/1490>
>  The only way is to pass an actual class object into the constructor,
>  something like:
>
>  It could be an option, but not a pretty one. What do you think of my
>  approach to use vanilla java and canonical class name? (As described
>  previously)
>
>  Best,
>  Daniyar Yeralin
>
>  On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
> <ma...@yeralin.net> <ma...@yeralin.net>
>  <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>
>  Hi John,
>
>  Thank you for your input! Yes, my idea looks a little bit over
>  engineered :)
>
>  I also wanted to see a feedback from Mathias as well since he gave
>  me an idea about storing fixed/variable size entries.
>
>  Best,
>  Daniyar Yeralin
>
>  On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
> <ma...@confluent.io> <ma...@confluent.io>
>  <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>
>  Hi Daniyar,
>
>  That's a very clever solution!
>
>  One observation is that, now, this is what we might call a
>  polymorphic
>  serde. That is, you're detecting the actual concrete type and then
>  promising to produce the exact same concrete type on read.
> There are
>  some inherent problems with this approach, which in general
> require
>  some kind of  schema registry (not necessarily Schema
> Registry, just
>  any registry for schemas) to solve.
>
>  Notice that every serialized record has quite a bit of duplicated
>  information: the concrete type as well as a byte to indicate
> whether
>  the value type is a fixed size, and, if so, an integer to
>  indicate the
>  actual size. These constitute a schema, of sorts, because they
>  tell us
>  later how exactly to deserialize the data. Unfortunately, this
>  information is completely redundant. In all likelihood, the
>  information will be exactly the same for every record in the
> topic.
>  This problem is essentially the core motivation for serializations
>  like Avro: to move the schema outside of the serialization
> itself, so
>  that the records won't contain so much redundant information.
>
>  In this light, I'm wondering if it makes sense to go back to
>  something
>  like what you had earlier in which you don't support perfectly
>  preserving the concrete type for _this_ serde, but instead just
>  support deserializing to _some_ List. Then, you could defer full,
>  perfect, type preservation to serdes that have an external
> system in
>  which to register their type information.
>
>  There does exist an alternative, if we really do want to
> preserve the
>  concrete type (which does seem kind of nice). You can add a
>  configuration option specifically for the serde to configure
> what the
>  list type will be, and maybe what the element type is, as well.
>
>  As far as "related work" goes, you might be interested to take
> a look
>  at how Jackson can be configured to deserialize into a specific,
>  arbitrarily nested, generically parameterized class structure.
>  Specifically, you might find
>
https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>
>  interesting.
>
>  Thanks,
>  -John
>
>  On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
> <ma...@yeralin.net> <ma...@yeralin.net>
>  <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>
>
>  bump
>
>
>
>
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

I hope everyone had a great long weekend.

Regarding Java interfaces, I may not understand you correctly, but I think I already listed them:

So for Produced, you would use it in the following fashion, for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))

I also updated the KIP, and added a section “Serialization Strategy” where I describe our logic of conditional serialization based on the type of an inner serde.

Thank you!

Best,
Daniyar Yeralin

> On Jun 26, 2019, at 11:44 AM, John Roesler <jo...@confluent.io> wrote:
> 
> Thanks for the update, Daniyar!
> 
> In addition to specifying the config interface, can you also specify
> the Java interface? Namely, if I need to pass an instance of this
> serde in to the DSL directly, as in Produced, Materialized, etc., what
> constructor(s) would I have available? Likewise with the Serializer
> and Deserailizer. I don't think you need to specify the implementation
> logic, since we've already discussed it here.
> 
> If you also want to specify the serialized format of the data records
> in the KIP, it could be useful documentation, as well as letting us
> verify the schema for forward/backward compatibility concerns, etc.
> 
> Thanks,
> John
> 
> On Wed, Jun 26, 2019 at 10:33 AM Development <de...@yeralin.net> wrote:
>> 
>> Hey,
>> 
>> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>> Sorry for the delay :)
>> 
>> Thank You!
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>> 
>>> Yes, something like this. I did not think about good configuration
>>> parameter names yet. I am also not sure if I understand all proposed
>>> configs atm. But all configs should be listed and explained in the KIP
>>> anyway, and we can discuss further after you have updated the KIP (I can
>>> ask more detailed question if I have any).
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 6/21/19 2:05 PM, Development wrote:
>>>> Yes, you are right. ByteSerializer is not what I need to have in a list
>>>> of primitives.
>>>> 
>>>> As for the default constructor and configurability, just want to make
>>>> sure. Is this what you have on your mind?
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> 
>>>> 
>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io
>>>>> <ma...@confluent.io>> wrote:
>>>>> 
>>>>> Thanks for the update!
>>>>> 
>>>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>>>> should have an default constructor and it should be possible to pass in
>>>>> the `Class listClass` information via a configuration. Otherwise,
>>>>> KafkaStreams cannot use it as default serde.
>>>>> 
>>>>> 
>>>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>>>> 
>>>>> 
>>>>> For tests, we can comment on the PR. No need to do this in the KIP
>>>>> discussion.
>>>>> 
>>>>> 
>>>>> Can you also update the KIP?
>>>>> 
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 6/21/19 11:29 AM, Development wrote:
>>>>>> I made and pushed necessary commits, so we could review the final
>>>>>> version under PR https://github.com/apache/kafka/pull/6592
>>>>>> 
>>>>>> I also need some advice on writing tests for this new serde. So far I
>>>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>>>> if it is enough.
>>>>>> 
>>>>>> Thank y’all for your help in this KIP :)
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>> 
>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io
>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>> 
>>>>>>> Hey Daniyar,
>>>>>>> 
>>>>>>> Looks good to me! Thanks for considering it.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net
>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>> wrote:
>>>>>>> Hey John and Matthias,
>>>>>>> 
>>>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>>>> only in ListDeserializer:
>>>>>>> 
>>>>>>> 
>>>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>>>> not a primitive serializer:
>>>>>>> 
>>>>>>> 
>>>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>>>> deserialization we could create an instance of it with predefined
>>>>>>> listSize for better performance.
>>>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>>>> that each entry’s size was encoded individually.
>>>>>>> 
>>>>>>> 
>>>>>>> This looks much cleaner and more concise.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io
>>>>>>>> <ma...@confluent.io> <ma...@confluent.io>> wrote:
>>>>>>>> 
>>>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>>>> Deserializer returns a fixed type...
>>>>>>>> 
>>>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>>>> deserialization via config?
>>>>>>>> 
>>>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>>>> config again)?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>>>> we can
>>>>>>>> support this and a cast will be necessary at some point in the user
>>>>>>>> code.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Matthias
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>>>> Hey Daniyar,
>>>>>>>>> 
>>>>>>>>> Thanks for looking at it!
>>>>>>>>> 
>>>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>>>> be "vanilla java"?
>>>>>>>>> 
>>>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>>>> could
>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>>>> 
>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>>> /**list type**/ LinkedList.class,
>>>>>>>>> /**inner serde**/ new MyRecordSerde()
>>>>>>>>> )
>>>>>>>>> 
>>>>>>>>> And in configuration, it's something like:
>>>>>>>>> default.key.serde: org...ListSerde
>>>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> What do you think?
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>> 
>>>>>>>>>  Hey John,
>>>>>>>>> 
>>>>>>>>>  I gave read about TypeReference. It could work for the list serde.
>>>>>>>>>  However, it is not directly
>>>>>>>>>  supported:
>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>>>>>>  The only way is to pass an actual class object into the constructor,
>>>>>>>>>  something like:
>>>>>>>>> 
>>>>>>>>>  It could be an option, but not a pretty one. What do you think of my
>>>>>>>>>  approach to use vanilla java and canonical class name? (As described
>>>>>>>>>  previously)
>>>>>>>>> 
>>>>>>>>>  Best,
>>>>>>>>>  Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>>  On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>>>  <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>  Hi John,
>>>>>>>>>> 
>>>>>>>>>>  Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>>>  engineered :)
>>>>>>>>>> 
>>>>>>>>>>  I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>>>  me an idea about storing fixed/variable size entries.
>>>>>>>>>> 
>>>>>>>>>>  Best,
>>>>>>>>>>  Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>>  On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
>>>>>>>>>>> <ma...@confluent.io> <ma...@confluent.io>
>>>>>>>>>>>  <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>  Hi Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>>  That's a very clever solution!
>>>>>>>>>>> 
>>>>>>>>>>>  One observation is that, now, this is what we might call a
>>>>>>>>>>>  polymorphic
>>>>>>>>>>>  serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>>>  promising to produce the exact same concrete type on read.
>>>>>>>>>>> There are
>>>>>>>>>>>  some inherent problems with this approach, which in general
>>>>>>>>>>> require
>>>>>>>>>>>  some kind of  schema registry (not necessarily Schema
>>>>>>>>>>> Registry, just
>>>>>>>>>>>  any registry for schemas) to solve.
>>>>>>>>>>> 
>>>>>>>>>>>  Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>>>  information: the concrete type as well as a byte to indicate
>>>>>>>>>>> whether
>>>>>>>>>>>  the value type is a fixed size, and, if so, an integer to
>>>>>>>>>>>  indicate the
>>>>>>>>>>>  actual size. These constitute a schema, of sorts, because they
>>>>>>>>>>>  tell us
>>>>>>>>>>>  later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>>>  information is completely redundant. In all likelihood, the
>>>>>>>>>>>  information will be exactly the same for every record in the
>>>>>>>>>>> topic.
>>>>>>>>>>>  This problem is essentially the core motivation for serializations
>>>>>>>>>>>  like Avro: to move the schema outside of the serialization
>>>>>>>>>>> itself, so
>>>>>>>>>>>  that the records won't contain so much redundant information.
>>>>>>>>>>> 
>>>>>>>>>>>  In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>>>  something
>>>>>>>>>>>  like what you had earlier in which you don't support perfectly
>>>>>>>>>>>  preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>>>  support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>>>  perfect, type preservation to serdes that have an external
>>>>>>>>>>> system in
>>>>>>>>>>>  which to register their type information.
>>>>>>>>>>> 
>>>>>>>>>>>  There does exist an alternative, if we really do want to
>>>>>>>>>>> preserve the
>>>>>>>>>>>  concrete type (which does seem kind of nice). You can add a
>>>>>>>>>>>  configuration option specifically for the serde to configure
>>>>>>>>>>> what the
>>>>>>>>>>>  list type will be, and maybe what the element type is, as well.
>>>>>>>>>>> 
>>>>>>>>>>>  As far as "related work" goes, you might be interested to take
>>>>>>>>>>> a look
>>>>>>>>>>>  at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>>>  arbitrarily nested, generically parameterized class structure.
>>>>>>>>>>>  Specifically, you might find
>>>>>>>>>>>  https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>>>>>>>  interesting.
>>>>>>>>>>> 
>>>>>>>>>>>  Thanks,
>>>>>>>>>>>  -John
>>>>>>>>>>> 
>>>>>>>>>>>  On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
>>>>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>>>>  <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>  bump
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Thanks for the update, Daniyar!

In addition to specifying the config interface, can you also specify
the Java interface? Namely, if I need to pass an instance of this
serde in to the DSL directly, as in Produced, Materialized, etc., what
constructor(s) would I have available? Likewise with the Serializer
and Deserailizer. I don't think you need to specify the implementation
logic, since we've already discussed it here.

If you also want to specify the serialized format of the data records
in the KIP, it could be useful documentation, as well as letting us
verify the schema for forward/backward compatibility concerns, etc.

Thanks,
John

On Wed, Jun 26, 2019 at 10:33 AM Development <de...@yeralin.net> wrote:
>
> Hey,
>
> Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> Sorry for the delay :)
>
> Thank You!
>
> Best,
> Daniyar Yeralin
>
> > On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <ma...@confluent.io> wrote:
> >
> > Yes, something like this. I did not think about good configuration
> > parameter names yet. I am also not sure if I understand all proposed
> > configs atm. But all configs should be listed and explained in the KIP
> > anyway, and we can discuss further after you have updated the KIP (I can
> > ask more detailed question if I have any).
> >
> >
> > -Matthias
> >
> > On 6/21/19 2:05 PM, Development wrote:
> >> Yes, you are right. ByteSerializer is not what I need to have in a list
> >> of primitives.
> >>
> >> As for the default constructor and configurability, just want to make
> >> sure. Is this what you have on your mind?
> >>
> >> Best,
> >> Daniyar Yeralin
> >>
> >>
> >>
> >>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io
> >>> <ma...@confluent.io>> wrote:
> >>>
> >>> Thanks for the update!
> >>>
> >>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> >>> should have an default constructor and it should be possible to pass in
> >>> the `Class listClass` information via a configuration. Otherwise,
> >>> KafkaStreams cannot use it as default serde.
> >>>
> >>>
> >>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> >>> as is it for `byte[]` with variable length -- it's for arrays, not for
> >>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
> >>>
> >>>
> >>> For tests, we can comment on the PR. No need to do this in the KIP
> >>> discussion.
> >>>
> >>>
> >>> Can you also update the KIP?
> >>>
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 6/21/19 11:29 AM, Development wrote:
> >>>> I made and pushed necessary commits, so we could review the final
> >>>> version under PR https://github.com/apache/kafka/pull/6592
> >>>>
> >>>> I also need some advice on writing tests for this new serde. So far I
> >>>> only have two test cases (roundtrip and empty payload), I’m not sure
> >>>> if it is enough.
> >>>>
> >>>> Thank y’all for your help in this KIP :)
> >>>>
> >>>> Best,
> >>>> Daniyar Yeralin
> >>>>
> >>>>
> >>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io
> >>>>> <ma...@confluent.io>> wrote:
> >>>>>
> >>>>> Hey Daniyar,
> >>>>>
> >>>>> Looks good to me! Thanks for considering it.
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>>
> >>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net
> >>>>> <ma...@yeralin.net> <ma...@yeralin.net>> wrote:
> >>>>> Hey John and Matthias,
> >>>>>
> >>>>> Yes, now I see it all. I’m storing lots of redundant information.
> >>>>> Here is my final idea. Yes, now a user should pass a list type. I
> >>>>> realized that’s the type is not really needed in ListSerializer, but
> >>>>> only in ListDeserializer:
> >>>>>
> >>>>>
> >>>>> In ListSerializer we will start storing sizes only if serializer is
> >>>>> not a primitive serializer:
> >>>>>
> >>>>>
> >>>>> Then, in deserializer, we persist passed list type, so that during
> >>>>> deserialization we could create an instance of it with predefined
> >>>>> listSize for better performance.
> >>>>> We also try to locate a primitiveSize based on passed deserializer.
> >>>>> If it is not there, then primitiveSize will be null. Which means
> >>>>> that each entry’s size was encoded individually.
> >>>>>
> >>>>>
> >>>>> This looks much cleaner and more concise.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io
> >>>>>> <ma...@confluent.io> <ma...@confluent.io>> wrote:
> >>>>>>
> >>>>>> For encoding the list-type: I see John's point about re-encoding the
> >>>>>> list-type redundantly. However, I also don't like the idea that the
> >>>>>> Deserializer returns a fixed type...
> >>>>>>
> >>>>>> Maybe it's best allow users to specify the target list type on
> >>>>>> deserialization via config?
> >>>>>>
> >>>>>> Similar for the primitive types: I don't think we need to encode the
> >>>>>> type size, but users could specify the type on the deserializer (via a
> >>>>>> config again)?
> >>>>>>
> >>>>>>
> >>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
> >>>>>> we can
> >>>>>> support this and a cast will be necessary at some point in the user
> >>>>>> code.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
> >>>>>>> Hey Daniyar,
> >>>>>>>
> >>>>>>> Thanks for looking at it!
> >>>>>>>
> >>>>>>> Something like your screenshot is more along the lines of what I was
> >>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
> >>>>>>> be "vanilla java"?
> >>>>>>>
> >>>>>>> Unfortunately the deserializer needs more information, though. For
> >>>>>>> example, what if the inner type is a Map<String,String>? The serde
> >>>>>>> could
> >>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
> >>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
> >>>>>>>
> >>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> >>>>>>>  /**list type**/ LinkedList.class,
> >>>>>>>  /**inner serde**/ new MyRecordSerde()
> >>>>>>> )
> >>>>>>>
> >>>>>>> And in configuration, it's something like:
> >>>>>>> default.key.serde: org...ListSerde
> >>>>>>> default.key.list.serde.type: java.util.LinkedList
> >>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >>>>>>>
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
> >>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
> >>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> >>>>>>>
> >>>>>>>   Hey John,
> >>>>>>>
> >>>>>>>   I gave read about TypeReference. It could work for the list serde.
> >>>>>>>   However, it is not directly
> >>>>>>>   supported:
> >>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
> >>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>
> >>>>>>>   The only way is to pass an actual class object into the constructor,
> >>>>>>>   something like:
> >>>>>>>
> >>>>>>>   It could be an option, but not a pretty one. What do you think of my
> >>>>>>>   approach to use vanilla java and canonical class name? (As described
> >>>>>>>   previously)
> >>>>>>>
> >>>>>>>   Best,
> >>>>>>>   Daniyar Yeralin
> >>>>>>>
> >>>>>>>>   On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
> >>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
> >>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> >>>>>>>>
> >>>>>>>>   Hi John,
> >>>>>>>>
> >>>>>>>>   Thank you for your input! Yes, my idea looks a little bit over
> >>>>>>>>   engineered :)
> >>>>>>>>
> >>>>>>>>   I also wanted to see a feedback from Mathias as well since he gave
> >>>>>>>>   me an idea about storing fixed/variable size entries.
> >>>>>>>>
> >>>>>>>>   Best,
> >>>>>>>>   Daniyar Yeralin
> >>>>>>>>
> >>>>>>>>>   On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
> >>>>>>>>> <ma...@confluent.io> <ma...@confluent.io>
> >>>>>>>>>   <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
> >>>>>>>>>
> >>>>>>>>>   Hi Daniyar,
> >>>>>>>>>
> >>>>>>>>>   That's a very clever solution!
> >>>>>>>>>
> >>>>>>>>>   One observation is that, now, this is what we might call a
> >>>>>>>>>   polymorphic
> >>>>>>>>>   serde. That is, you're detecting the actual concrete type and then
> >>>>>>>>>   promising to produce the exact same concrete type on read.
> >>>>>>>>> There are
> >>>>>>>>>   some inherent problems with this approach, which in general
> >>>>>>>>> require
> >>>>>>>>>   some kind of  schema registry (not necessarily Schema
> >>>>>>>>> Registry, just
> >>>>>>>>>   any registry for schemas) to solve.
> >>>>>>>>>
> >>>>>>>>>   Notice that every serialized record has quite a bit of duplicated
> >>>>>>>>>   information: the concrete type as well as a byte to indicate
> >>>>>>>>> whether
> >>>>>>>>>   the value type is a fixed size, and, if so, an integer to
> >>>>>>>>>   indicate the
> >>>>>>>>>   actual size. These constitute a schema, of sorts, because they
> >>>>>>>>>   tell us
> >>>>>>>>>   later how exactly to deserialize the data. Unfortunately, this
> >>>>>>>>>   information is completely redundant. In all likelihood, the
> >>>>>>>>>   information will be exactly the same for every record in the
> >>>>>>>>> topic.
> >>>>>>>>>   This problem is essentially the core motivation for serializations
> >>>>>>>>>   like Avro: to move the schema outside of the serialization
> >>>>>>>>> itself, so
> >>>>>>>>>   that the records won't contain so much redundant information.
> >>>>>>>>>
> >>>>>>>>>   In this light, I'm wondering if it makes sense to go back to
> >>>>>>>>>   something
> >>>>>>>>>   like what you had earlier in which you don't support perfectly
> >>>>>>>>>   preserving the concrete type for _this_ serde, but instead just
> >>>>>>>>>   support deserializing to _some_ List. Then, you could defer full,
> >>>>>>>>>   perfect, type preservation to serdes that have an external
> >>>>>>>>> system in
> >>>>>>>>>   which to register their type information.
> >>>>>>>>>
> >>>>>>>>>   There does exist an alternative, if we really do want to
> >>>>>>>>> preserve the
> >>>>>>>>>   concrete type (which does seem kind of nice). You can add a
> >>>>>>>>>   configuration option specifically for the serde to configure
> >>>>>>>>> what the
> >>>>>>>>>   list type will be, and maybe what the element type is, as well.
> >>>>>>>>>
> >>>>>>>>>   As far as "related work" goes, you might be interested to take
> >>>>>>>>> a look
> >>>>>>>>>   at how Jackson can be configured to deserialize into a specific,
> >>>>>>>>>   arbitrarily nested, generically parameterized class structure.
> >>>>>>>>>   Specifically, you might find
> >>>>>>>>>   https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> >>>>>>>>>   interesting.
> >>>>>>>>>
> >>>>>>>>>   Thanks,
> >>>>>>>>>   -John
> >>>>>>>>>
> >>>>>>>>>   On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
> >>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
> >>>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>   bump
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey, 

Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>
Sorry for the delay :)

Thank You!

Best,
Daniyar Yeralin

> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Yes, something like this. I did not think about good configuration
> parameter names yet. I am also not sure if I understand all proposed
> configs atm. But all configs should be listed and explained in the KIP
> anyway, and we can discuss further after you have updated the KIP (I can
> ask more detailed question if I have any).
> 
> 
> -Matthias
> 
> On 6/21/19 2:05 PM, Development wrote:
>> Yes, you are right. ByteSerializer is not what I need to have in a list
>> of primitives.
>> 
>> As for the default constructor and configurability, just want to make
>> sure. Is this what you have on your mind?
>> 
>> Best,
>> Daniyar Yeralin
>> 
>> 
>> 
>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io
>>> <ma...@confluent.io>> wrote:
>>> 
>>> Thanks for the update!
>>> 
>>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>>> should have an default constructor and it should be possible to pass in
>>> the `Class listClass` information via a configuration. Otherwise,
>>> KafkaStreams cannot use it as default serde.
>>> 
>>> 
>>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>>> as is it for `byte[]` with variable length -- it's for arrays, not for
>>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>> 
>>> 
>>> For tests, we can comment on the PR. No need to do this in the KIP
>>> discussion.
>>> 
>>> 
>>> Can you also update the KIP?
>>> 
>>> 
>>> 
>>> -Matthias
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 6/21/19 11:29 AM, Development wrote:
>>>> I made and pushed necessary commits, so we could review the final
>>>> version under PR https://github.com/apache/kafka/pull/6592
>>>> 
>>>> I also need some advice on writing tests for this new serde. So far I
>>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>>> if it is enough.
>>>> 
>>>> Thank y’all for your help in this KIP :)
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> 
>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io
>>>>> <ma...@confluent.io>> wrote:
>>>>> 
>>>>> Hey Daniyar,
>>>>> 
>>>>> Looks good to me! Thanks for considering it.
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net
>>>>> <ma...@yeralin.net> <ma...@yeralin.net>> wrote:
>>>>> Hey John and Matthias,
>>>>> 
>>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>>> realized that’s the type is not really needed in ListSerializer, but
>>>>> only in ListDeserializer:
>>>>> 
>>>>> 
>>>>> In ListSerializer we will start storing sizes only if serializer is
>>>>> not a primitive serializer:
>>>>> 
>>>>> 
>>>>> Then, in deserializer, we persist passed list type, so that during
>>>>> deserialization we could create an instance of it with predefined
>>>>> listSize for better performance.
>>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>>> If it is not there, then primitiveSize will be null. Which means
>>>>> that each entry’s size was encoded individually.
>>>>> 
>>>>> 
>>>>> This looks much cleaner and more concise.
>>>>> 
>>>>> What do you think?
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io
>>>>>> <ma...@confluent.io> <ma...@confluent.io>> wrote:
>>>>>> 
>>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>>> Deserializer returns a fixed type...
>>>>>> 
>>>>>> Maybe it's best allow users to specify the target list type on
>>>>>> deserialization via config?
>>>>>> 
>>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>>> type size, but users could specify the type on the deserializer (via a
>>>>>> config again)?
>>>>>> 
>>>>>> 
>>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>>> we can
>>>>>> support this and a cast will be necessary at some point in the user
>>>>>> code.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>>> Hey Daniyar,
>>>>>>> 
>>>>>>> Thanks for looking at it!
>>>>>>> 
>>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>>> be "vanilla java"?
>>>>>>> 
>>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>>> could
>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>> 
>>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>>  /**list type**/ LinkedList.class,
>>>>>>>  /**inner serde**/ new MyRecordSerde()
>>>>>>> )
>>>>>>> 
>>>>>>> And in configuration, it's something like:
>>>>>>> default.key.serde: org...ListSerde
>>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> Thanks,
>>>>>>> -John
>>>>>>> 
>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>> 
>>>>>>>   Hey John,
>>>>>>> 
>>>>>>>   I gave read about TypeReference. It could work for the list serde.
>>>>>>>   However, it is not directly
>>>>>>>   supported:
>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>>>>   The only way is to pass an actual class object into the constructor,
>>>>>>>   something like:
>>>>>>> 
>>>>>>>   It could be an option, but not a pretty one. What do you think of my
>>>>>>>   approach to use vanilla java and canonical class name? (As described
>>>>>>>   previously)
>>>>>>> 
>>>>>>>   Best,
>>>>>>>   Daniyar Yeralin
>>>>>>> 
>>>>>>>>   On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>> 
>>>>>>>>   Hi John,
>>>>>>>> 
>>>>>>>>   Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>>   engineered :)
>>>>>>>> 
>>>>>>>>   I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>>   me an idea about storing fixed/variable size entries.
>>>>>>>> 
>>>>>>>>   Best,
>>>>>>>>   Daniyar Yeralin
>>>>>>>> 
>>>>>>>>>   On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
>>>>>>>>> <ma...@confluent.io> <ma...@confluent.io>
>>>>>>>>>   <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>> 
>>>>>>>>>   Hi Daniyar,
>>>>>>>>> 
>>>>>>>>>   That's a very clever solution!
>>>>>>>>> 
>>>>>>>>>   One observation is that, now, this is what we might call a
>>>>>>>>>   polymorphic
>>>>>>>>>   serde. That is, you're detecting the actual concrete type and then
>>>>>>>>>   promising to produce the exact same concrete type on read.
>>>>>>>>> There are
>>>>>>>>>   some inherent problems with this approach, which in general
>>>>>>>>> require
>>>>>>>>>   some kind of  schema registry (not necessarily Schema
>>>>>>>>> Registry, just
>>>>>>>>>   any registry for schemas) to solve.
>>>>>>>>> 
>>>>>>>>>   Notice that every serialized record has quite a bit of duplicated
>>>>>>>>>   information: the concrete type as well as a byte to indicate
>>>>>>>>> whether
>>>>>>>>>   the value type is a fixed size, and, if so, an integer to
>>>>>>>>>   indicate the
>>>>>>>>>   actual size. These constitute a schema, of sorts, because they
>>>>>>>>>   tell us
>>>>>>>>>   later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>>   information is completely redundant. In all likelihood, the
>>>>>>>>>   information will be exactly the same for every record in the
>>>>>>>>> topic.
>>>>>>>>>   This problem is essentially the core motivation for serializations
>>>>>>>>>   like Avro: to move the schema outside of the serialization
>>>>>>>>> itself, so
>>>>>>>>>   that the records won't contain so much redundant information.
>>>>>>>>> 
>>>>>>>>>   In this light, I'm wondering if it makes sense to go back to
>>>>>>>>>   something
>>>>>>>>>   like what you had earlier in which you don't support perfectly
>>>>>>>>>   preserving the concrete type for _this_ serde, but instead just
>>>>>>>>>   support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>>   perfect, type preservation to serdes that have an external
>>>>>>>>> system in
>>>>>>>>>   which to register their type information.
>>>>>>>>> 
>>>>>>>>>   There does exist an alternative, if we really do want to
>>>>>>>>> preserve the
>>>>>>>>>   concrete type (which does seem kind of nice). You can add a
>>>>>>>>>   configuration option specifically for the serde to configure
>>>>>>>>> what the
>>>>>>>>>   list type will be, and maybe what the element type is, as well.
>>>>>>>>> 
>>>>>>>>>   As far as "related work" goes, you might be interested to take
>>>>>>>>> a look
>>>>>>>>>   at how Jackson can be configured to deserialize into a specific,
>>>>>>>>>   arbitrarily nested, generically parameterized class structure.
>>>>>>>>>   Specifically, you might find
>>>>>>>>>   https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>>>>>   interesting.
>>>>>>>>> 
>>>>>>>>>   Thanks,
>>>>>>>>>   -John
>>>>>>>>> 
>>>>>>>>>   On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>   bump
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Yes, something like this. I did not think about good configuration
parameter names yet. I am also not sure if I understand all proposed
configs atm. But all configs should be listed and explained in the KIP
anyway, and we can discuss further after you have updated the KIP (I can
ask more detailed question if I have any).


-Matthias

On 6/21/19 2:05 PM, Development wrote:
> Yes, you are right. ByteSerializer is not what I need to have in a list
> of primitives.
> 
> As for the default constructor and configurability, just want to make
> sure. Is this what you have on your mind?
> 
> Best,
> Daniyar Yeralin
> 
> 
> 
>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matthias@confluent.io
>> <ma...@confluent.io>> wrote:
>>
>> Thanks for the update!
>>
>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
>> should have an default constructor and it should be possible to pass in
>> the `Class listClass` information via a configuration. Otherwise,
>> KafkaStreams cannot use it as default serde.
>>
>>
>> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
>> as is it for `byte[]` with variable length -- it's for arrays, not for
>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>>
>>
>> For tests, we can comment on the PR. No need to do this in the KIP
>> discussion.
>>
>>
>> Can you also update the KIP?
>>
>>
>>
>> -Matthias
>>
>>
>>
>>
>>
>> On 6/21/19 11:29 AM, Development wrote:
>>> I made and pushed necessary commits, so we could review the final
>>> version under PR https://github.com/apache/kafka/pull/6592
>>>
>>> I also need some advice on writing tests for this new serde. So far I
>>> only have two test cases (roundtrip and empty payload), I’m not sure
>>> if it is enough.
>>>
>>> Thank y’all for your help in this KIP :)
>>>
>>> Best,
>>> Daniyar Yeralin
>>>
>>>
>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <john@confluent.io
>>>> <ma...@confluent.io>> wrote:
>>>>
>>>> Hey Daniyar,
>>>>
>>>> Looks good to me! Thanks for considering it.
>>>>
>>>> Thanks,
>>>> -John
>>>>
>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net
>>>> <ma...@yeralin.net> <ma...@yeralin.net>> wrote:
>>>> Hey John and Matthias,
>>>>
>>>> Yes, now I see it all. I’m storing lots of redundant information.
>>>> Here is my final idea. Yes, now a user should pass a list type. I
>>>> realized that’s the type is not really needed in ListSerializer, but
>>>> only in ListDeserializer:
>>>>
>>>>
>>>> In ListSerializer we will start storing sizes only if serializer is
>>>> not a primitive serializer:
>>>>
>>>>
>>>> Then, in deserializer, we persist passed list type, so that during
>>>> deserialization we could create an instance of it with predefined
>>>> listSize for better performance.
>>>> We also try to locate a primitiveSize based on passed deserializer.
>>>> If it is not there, then primitiveSize will be null. Which means
>>>> that each entry’s size was encoded individually.
>>>>
>>>>
>>>> This looks much cleaner and more concise.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Daniyar Yeralin
>>>>
>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io
>>>>> <ma...@confluent.io> <ma...@confluent.io>> wrote:
>>>>>
>>>>> For encoding the list-type: I see John's point about re-encoding the
>>>>> list-type redundantly. However, I also don't like the idea that the
>>>>> Deserializer returns a fixed type...
>>>>>
>>>>> Maybe it's best allow users to specify the target list type on
>>>>> deserialization via config?
>>>>>
>>>>> Similar for the primitive types: I don't think we need to encode the
>>>>> type size, but users could specify the type on the deserializer (via a
>>>>> config again)?
>>>>>
>>>>>
>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt
>>>>> we can
>>>>> support this and a cast will be necessary at some point in the user
>>>>> code.
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>>
>>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>>> Hey Daniyar,
>>>>>>
>>>>>> Thanks for looking at it!
>>>>>>
>>>>>> Something like your screenshot is more along the lines of what I was
>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>>> be "vanilla java"?
>>>>>>
>>>>>> Unfortunately the deserializer needs more information, though. For
>>>>>> example, what if the inner type is a Map<String,String>? The serde
>>>>>> could
>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>>>
>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>>  /**list type**/ LinkedList.class,
>>>>>>  /**inner serde**/ new MyRecordSerde()
>>>>>> )
>>>>>>
>>>>>> And in configuration, it's something like:
>>>>>> default.key.serde: org...ListSerde
>>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>
>>>>>>   Hey John,
>>>>>>
>>>>>>   I gave read about TypeReference. It could work for the list serde.
>>>>>>   However, it is not directly
>>>>>>   supported:
>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490
>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>>>   The only way is to pass an actual class object into the constructor,
>>>>>>   something like:
>>>>>>
>>>>>>   It could be an option, but not a pretty one. What do you think of my
>>>>>>   approach to use vanilla java and canonical class name? (As described
>>>>>>   previously)
>>>>>>
>>>>>>   Best,
>>>>>>   Daniyar Yeralin
>>>>>>
>>>>>>>   On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>
>>>>>>>   Hi John,
>>>>>>>
>>>>>>>   Thank you for your input! Yes, my idea looks a little bit over
>>>>>>>   engineered :)
>>>>>>>
>>>>>>>   I also wanted to see a feedback from Mathias as well since he gave
>>>>>>>   me an idea about storing fixed/variable size entries.
>>>>>>>
>>>>>>>   Best,
>>>>>>>   Daniyar Yeralin
>>>>>>>
>>>>>>>>   On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
>>>>>>>> <ma...@confluent.io> <ma...@confluent.io>
>>>>>>>>   <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>>>
>>>>>>>>   Hi Daniyar,
>>>>>>>>
>>>>>>>>   That's a very clever solution!
>>>>>>>>
>>>>>>>>   One observation is that, now, this is what we might call a
>>>>>>>>   polymorphic
>>>>>>>>   serde. That is, you're detecting the actual concrete type and then
>>>>>>>>   promising to produce the exact same concrete type on read.
>>>>>>>> There are
>>>>>>>>   some inherent problems with this approach, which in general
>>>>>>>> require
>>>>>>>>   some kind of  schema registry (not necessarily Schema
>>>>>>>> Registry, just
>>>>>>>>   any registry for schemas) to solve.
>>>>>>>>
>>>>>>>>   Notice that every serialized record has quite a bit of duplicated
>>>>>>>>   information: the concrete type as well as a byte to indicate
>>>>>>>> whether
>>>>>>>>   the value type is a fixed size, and, if so, an integer to
>>>>>>>>   indicate the
>>>>>>>>   actual size. These constitute a schema, of sorts, because they
>>>>>>>>   tell us
>>>>>>>>   later how exactly to deserialize the data. Unfortunately, this
>>>>>>>>   information is completely redundant. In all likelihood, the
>>>>>>>>   information will be exactly the same for every record in the
>>>>>>>> topic.
>>>>>>>>   This problem is essentially the core motivation for serializations
>>>>>>>>   like Avro: to move the schema outside of the serialization
>>>>>>>> itself, so
>>>>>>>>   that the records won't contain so much redundant information.
>>>>>>>>
>>>>>>>>   In this light, I'm wondering if it makes sense to go back to
>>>>>>>>   something
>>>>>>>>   like what you had earlier in which you don't support perfectly
>>>>>>>>   preserving the concrete type for _this_ serde, but instead just
>>>>>>>>   support deserializing to _some_ List. Then, you could defer full,
>>>>>>>>   perfect, type preservation to serdes that have an external
>>>>>>>> system in
>>>>>>>>   which to register their type information.
>>>>>>>>
>>>>>>>>   There does exist an alternative, if we really do want to
>>>>>>>> preserve the
>>>>>>>>   concrete type (which does seem kind of nice). You can add a
>>>>>>>>   configuration option specifically for the serde to configure
>>>>>>>> what the
>>>>>>>>   list type will be, and maybe what the element type is, as well.
>>>>>>>>
>>>>>>>>   As far as "related work" goes, you might be interested to take
>>>>>>>> a look
>>>>>>>>   at how Jackson can be configured to deserialize into a specific,
>>>>>>>>   arbitrarily nested, generically parameterized class structure.
>>>>>>>>   Specifically, you might find
>>>>>>>>   https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>>>>   interesting.
>>>>>>>>
>>>>>>>>   Thanks,
>>>>>>>>   -John
>>>>>>>>
>>>>>>>>   On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
>>>>>>>> <ma...@yeralin.net> <ma...@yeralin.net>
>>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>>>
>>>>>>>>>   bump
>>>>
>>>
>>>
>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Yes, you are right. ByteSerializer is not what I need to have in a list of primitives.

As for the default constructor and configurability, just want to make sure. Is this what you have on your mind?


Best,
Daniyar Yeralin



> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Thanks for the update!
> 
> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> should have an default constructor and it should be possible to pass in
> the `Class listClass` information via a configuration. Otherwise,
> KafkaStreams cannot use it as default serde.
> 
> 
> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> as is it for `byte[]` with variable length -- it's for arrays, not for
> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
> 
> 
> For tests, we can comment on the PR. No need to do this in the KIP
> discussion.
> 
> 
> Can you also update the KIP?
> 
> 
> 
> -Matthias
> 
> 
> 
> 
> 
> On 6/21/19 11:29 AM, Development wrote:
>> I made and pushed necessary commits, so we could review the final version under PR https://github.com/apache/kafka/pull/6592
>> 
>> I also need some advice on writing tests for this new serde. So far I only have two test cases (roundtrip and empty payload), I’m not sure if it is enough.
>> 
>> Thank y’all for your help in this KIP :)
>> 
>> Best,
>> Daniyar Yeralin
>> 
>> 
>>> On Jun 21, 2019, at 1:44 PM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Hey Daniyar,
>>> 
>>> Looks good to me! Thanks for considering it.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>> Hey John and Matthias,
>>> 
>>> Yes, now I see it all. I’m storing lots of redundant information.
>>> Here is my final idea. Yes, now a user should pass a list type. I realized that’s the type is not really needed in ListSerializer, but only in ListDeserializer:
>>> 
>>> 
>>> In ListSerializer we will start storing sizes only if serializer is not a primitive serializer:
>>> 
>>> 
>>> Then, in deserializer, we persist passed list type, so that during deserialization we could create an instance of it with predefined listSize for better performance.
>>> We also try to locate a primitiveSize based on passed deserializer. If it is not there, then primitiveSize will be null. Which means that each entry’s size was encoded individually.
>>> 
>>> 
>>> This looks much cleaner and more concise.
>>> 
>>> What do you think?
>>> 
>>> Best,
>>> Daniyar Yeralin 
>>> 
>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> For encoding the list-type: I see John's point about re-encoding the
>>>> list-type redundantly. However, I also don't like the idea that the
>>>> Deserializer returns a fixed type...
>>>> 
>>>> Maybe it's best allow users to specify the target list type on
>>>> deserialization via config?
>>>> 
>>>> Similar for the primitive types: I don't think we need to encode the
>>>> type size, but users could specify the type on the deserializer (via a
>>>> config again)?
>>>> 
>>>> 
>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
>>>> support this and a cast will be necessary at some point in the user code.
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> 
>>>> 
>>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>>> Hey Daniyar,
>>>>> 
>>>>> Thanks for looking at it!
>>>>> 
>>>>> Something like your screenshot is more along the lines of what I was
>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>>> be "vanilla java"?
>>>>> 
>>>>> Unfortunately the deserializer needs more information, though. For
>>>>> example, what if the inner type is a Map<String,String>? The serde could
>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>> 
>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>>  /**list type**/ LinkedList.class,
>>>>>  /**inner serde**/ new MyRecordSerde()
>>>>> )
>>>>> 
>>>>> And in configuration, it's something like:
>>>>> default.key.serde: org...ListSerde
>>>>> default.key.list.serde.type: java.util.LinkedList
>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>> 
>>>>> 
>>>>> What do you think?
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>> 
>>>>>   Hey John,
>>>>> 
>>>>>   I gave read about TypeReference. It could work for the list serde.
>>>>>   However, it is not directly
>>>>>   supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>>   The only way is to pass an actual class object into the constructor,
>>>>>   something like:
>>>>> 
>>>>>   It could be an option, but not a pretty one. What do you think of my
>>>>>   approach to use vanilla java and canonical class name? (As described
>>>>>   previously)
>>>>> 
>>>>>   Best,
>>>>>   Daniyar Yeralin
>>>>> 
>>>>>>   On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>> 
>>>>>>   Hi John,
>>>>>> 
>>>>>>   Thank you for your input! Yes, my idea looks a little bit over
>>>>>>   engineered :)
>>>>>> 
>>>>>>   I also wanted to see a feedback from Mathias as well since he gave
>>>>>>   me an idea about storing fixed/variable size entries.
>>>>>> 
>>>>>>   Best,
>>>>>>   Daniyar Yeralin
>>>>>> 
>>>>>>>   On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>>>>   <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>> 
>>>>>>>   Hi Daniyar,
>>>>>>> 
>>>>>>>   That's a very clever solution!
>>>>>>> 
>>>>>>>   One observation is that, now, this is what we might call a
>>>>>>>   polymorphic
>>>>>>>   serde. That is, you're detecting the actual concrete type and then
>>>>>>>   promising to produce the exact same concrete type on read. There are
>>>>>>>   some inherent problems with this approach, which in general require
>>>>>>>   some kind of  schema registry (not necessarily Schema Registry, just
>>>>>>>   any registry for schemas) to solve.
>>>>>>> 
>>>>>>>   Notice that every serialized record has quite a bit of duplicated
>>>>>>>   information: the concrete type as well as a byte to indicate whether
>>>>>>>   the value type is a fixed size, and, if so, an integer to
>>>>>>>   indicate the
>>>>>>>   actual size. These constitute a schema, of sorts, because they
>>>>>>>   tell us
>>>>>>>   later how exactly to deserialize the data. Unfortunately, this
>>>>>>>   information is completely redundant. In all likelihood, the
>>>>>>>   information will be exactly the same for every record in the topic.
>>>>>>>   This problem is essentially the core motivation for serializations
>>>>>>>   like Avro: to move the schema outside of the serialization itself, so
>>>>>>>   that the records won't contain so much redundant information.
>>>>>>> 
>>>>>>>   In this light, I'm wondering if it makes sense to go back to
>>>>>>>   something
>>>>>>>   like what you had earlier in which you don't support perfectly
>>>>>>>   preserving the concrete type for _this_ serde, but instead just
>>>>>>>   support deserializing to _some_ List. Then, you could defer full,
>>>>>>>   perfect, type preservation to serdes that have an external system in
>>>>>>>   which to register their type information.
>>>>>>> 
>>>>>>>   There does exist an alternative, if we really do want to preserve the
>>>>>>>   concrete type (which does seem kind of nice). You can add a
>>>>>>>   configuration option specifically for the serde to configure what the
>>>>>>>   list type will be, and maybe what the element type is, as well.
>>>>>>> 
>>>>>>>   As far as "related work" goes, you might be interested to take a look
>>>>>>>   at how Jackson can be configured to deserialize into a specific,
>>>>>>>   arbitrarily nested, generically parameterized class structure.
>>>>>>>   Specifically, you might find
>>>>>>>   https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>>>   interesting.
>>>>>>> 
>>>>>>>   Thanks,
>>>>>>>   -John
>>>>>>> 
>>>>>>>   On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>>>>   <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>> 
>>>>>>>>   bump
>>> 
>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for the update!

I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
should have an default constructor and it should be possible to pass in
the `Class listClass` information via a configuration. Otherwise,
KafkaStreams cannot use it as default serde.


For the primitive serializers: `BytesSerializer` is not primitive IMHO,
as is it for `byte[]` with variable length -- it's for arrays, not for
single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).


For tests, we can comment on the PR. No need to do this in the KIP
discussion.


Can you also update the KIP?



-Matthias





On 6/21/19 11:29 AM, Development wrote:
> I made and pushed necessary commits, so we could review the final version under PR https://github.com/apache/kafka/pull/6592
> 
> I also need some advice on writing tests for this new serde. So far I only have two test cases (roundtrip and empty payload), I’m not sure if it is enough.
> 
> Thank y’all for your help in this KIP :)
> 
> Best,
> Daniyar Yeralin
> 
> 
>> On Jun 21, 2019, at 1:44 PM, John Roesler <jo...@confluent.io> wrote:
>>
>> Hey Daniyar,
>>
>> Looks good to me! Thanks for considering it.
>>
>> Thanks,
>> -John
>>
>> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>> Hey John and Matthias,
>>
>> Yes, now I see it all. I’m storing lots of redundant information.
>> Here is my final idea. Yes, now a user should pass a list type. I realized that’s the type is not really needed in ListSerializer, but only in ListDeserializer:
>>
>>
>> In ListSerializer we will start storing sizes only if serializer is not a primitive serializer:
>>
>>
>> Then, in deserializer, we persist passed list type, so that during deserialization we could create an instance of it with predefined listSize for better performance.
>> We also try to locate a primitiveSize based on passed deserializer. If it is not there, then primitiveSize will be null. Which means that each entry’s size was encoded individually.
>>
>>
>> This looks much cleaner and more concise.
>>
>> What do you think?
>>
>> Best,
>> Daniyar Yeralin 
>>
>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>>>
>>> For encoding the list-type: I see John's point about re-encoding the
>>> list-type redundantly. However, I also don't like the idea that the
>>> Deserializer returns a fixed type...
>>>
>>> Maybe it's best allow users to specify the target list type on
>>> deserialization via config?
>>>
>>> Similar for the primitive types: I don't think we need to encode the
>>> type size, but users could specify the type on the deserializer (via a
>>> config again)?
>>>
>>>
>>> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
>>> support this and a cast will be necessary at some point in the user code.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 6/20/19 1:21 PM, John Roesler wrote:
>>>> Hey Daniyar,
>>>>
>>>> Thanks for looking at it!
>>>>
>>>> Something like your screenshot is more along the lines of what I was
>>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>>> be "vanilla java"?
>>>>
>>>> Unfortunately the deserializer needs more information, though. For
>>>> example, what if the inner type is a Map<String,String>? The serde could
>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>>>
>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>>   /**list type**/ LinkedList.class,
>>>>   /**inner serde**/ new MyRecordSerde()
>>>> )
>>>>
>>>> And in configuration, it's something like:
>>>> default.key.serde: org...ListSerde
>>>> default.key.list.serde.type: java.util.LinkedList
>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>>>
>>>>
>>>> What do you think?
>>>> Thanks,
>>>> -John
>>>>
>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>
>>>>    Hey John,
>>>>
>>>>    I gave read about TypeReference. It could work for the list serde.
>>>>    However, it is not directly
>>>>    supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>>    The only way is to pass an actual class object into the constructor,
>>>>    something like:
>>>>
>>>>    It could be an option, but not a pretty one. What do you think of my
>>>>    approach to use vanilla java and canonical class name? (As described
>>>>    previously)
>>>>
>>>>    Best,
>>>>    Daniyar Yeralin
>>>>
>>>>>    On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>
>>>>>    Hi John,
>>>>>
>>>>>    Thank you for your input! Yes, my idea looks a little bit over
>>>>>    engineered :)
>>>>>
>>>>>    I also wanted to see a feedback from Mathias as well since he gave
>>>>>    me an idea about storing fixed/variable size entries.
>>>>>
>>>>>    Best,
>>>>>    Daniyar Yeralin
>>>>>
>>>>>>    On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>>>    <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>>>
>>>>>>    Hi Daniyar,
>>>>>>
>>>>>>    That's a very clever solution!
>>>>>>
>>>>>>    One observation is that, now, this is what we might call a
>>>>>>    polymorphic
>>>>>>    serde. That is, you're detecting the actual concrete type and then
>>>>>>    promising to produce the exact same concrete type on read. There are
>>>>>>    some inherent problems with this approach, which in general require
>>>>>>    some kind of  schema registry (not necessarily Schema Registry, just
>>>>>>    any registry for schemas) to solve.
>>>>>>
>>>>>>    Notice that every serialized record has quite a bit of duplicated
>>>>>>    information: the concrete type as well as a byte to indicate whether
>>>>>>    the value type is a fixed size, and, if so, an integer to
>>>>>>    indicate the
>>>>>>    actual size. These constitute a schema, of sorts, because they
>>>>>>    tell us
>>>>>>    later how exactly to deserialize the data. Unfortunately, this
>>>>>>    information is completely redundant. In all likelihood, the
>>>>>>    information will be exactly the same for every record in the topic.
>>>>>>    This problem is essentially the core motivation for serializations
>>>>>>    like Avro: to move the schema outside of the serialization itself, so
>>>>>>    that the records won't contain so much redundant information.
>>>>>>
>>>>>>    In this light, I'm wondering if it makes sense to go back to
>>>>>>    something
>>>>>>    like what you had earlier in which you don't support perfectly
>>>>>>    preserving the concrete type for _this_ serde, but instead just
>>>>>>    support deserializing to _some_ List. Then, you could defer full,
>>>>>>    perfect, type preservation to serdes that have an external system in
>>>>>>    which to register their type information.
>>>>>>
>>>>>>    There does exist an alternative, if we really do want to preserve the
>>>>>>    concrete type (which does seem kind of nice). You can add a
>>>>>>    configuration option specifically for the serde to configure what the
>>>>>>    list type will be, and maybe what the element type is, as well.
>>>>>>
>>>>>>    As far as "related work" goes, you might be interested to take a look
>>>>>>    at how Jackson can be configured to deserialize into a specific,
>>>>>>    arbitrarily nested, generically parameterized class structure.
>>>>>>    Specifically, you might find
>>>>>>    https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>>    interesting.
>>>>>>
>>>>>>    Thanks,
>>>>>>    -John
>>>>>>
>>>>>>    On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>>>
>>>>>>>    bump
>>
> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
I made and pushed necessary commits, so we could review the final version under PR https://github.com/apache/kafka/pull/6592

I also need some advice on writing tests for this new serde. So far I only have two test cases (roundtrip and empty payload), I’m not sure if it is enough.

Thank y’all for your help in this KIP :)

Best,
Daniyar Yeralin


> On Jun 21, 2019, at 1:44 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Hey Daniyar,
> 
> Looks good to me! Thanks for considering it.
> 
> Thanks,
> -John
> 
> On Fri, Jun 21, 2019 at 9:04 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> Hey John and Matthias,
> 
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I realized that’s the type is not really needed in ListSerializer, but only in ListDeserializer:
> 
> 
> In ListSerializer we will start storing sizes only if serializer is not a primitive serializer:
> 
> 
> Then, in deserializer, we persist passed list type, so that during deserialization we could create an instance of it with predefined listSize for better performance.
> We also try to locate a primitiveSize based on passed deserializer. If it is not there, then primitiveSize will be null. Which means that each entry’s size was encoded individually.
> 
> 
> This looks much cleaner and more concise.
> 
> What do you think?
> 
> Best,
> Daniyar Yeralin 
> 
>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matthias@confluent.io <ma...@confluent.io>> wrote:
>> 
>> For encoding the list-type: I see John's point about re-encoding the
>> list-type redundantly. However, I also don't like the idea that the
>> Deserializer returns a fixed type...
>> 
>> Maybe it's best allow users to specify the target list type on
>> deserialization via config?
>> 
>> Similar for the primitive types: I don't think we need to encode the
>> type size, but users could specify the type on the deserializer (via a
>> config again)?
>> 
>> 
>> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
>> support this and a cast will be necessary at some point in the user code.
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>> 
>> On 6/20/19 1:21 PM, John Roesler wrote:
>>> Hey Daniyar,
>>> 
>>> Thanks for looking at it!
>>> 
>>> Something like your screenshot is more along the lines of what I was
>>> thinking. Sorry, but I didn't follow what you mean, how would that not
>>> be "vanilla java"?
>>> 
>>> Unfortunately the deserializer needs more information, though. For
>>> example, what if the inner type is a Map<String,String>? The serde could
>>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>>> 
>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>>   /**list type**/ LinkedList.class,
>>>   /**inner serde**/ new MyRecordSerde()
>>> )
>>> 
>>> And in configuration, it's something like:
>>> default.key.serde: org...ListSerde
>>> default.key.list.serde.type: java.util.LinkedList
>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>>> 
>>> 
>>> What do you think?
>>> Thanks,
>>> -John
>>> 
>>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>> 
>>>    Hey John,
>>> 
>>>    I gave read about TypeReference. It could work for the list serde.
>>>    However, it is not directly
>>>    supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>>    The only way is to pass an actual class object into the constructor,
>>>    something like:
>>> 
>>>    It could be an option, but not a pretty one. What do you think of my
>>>    approach to use vanilla java and canonical class name? (As described
>>>    previously)
>>> 
>>>    Best,
>>>    Daniyar Yeralin
>>> 
>>>>    On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>> 
>>>>    Hi John,
>>>> 
>>>>    Thank you for your input! Yes, my idea looks a little bit over
>>>>    engineered :)
>>>> 
>>>>    I also wanted to see a feedback from Mathias as well since he gave
>>>>    me an idea about storing fixed/variable size entries.
>>>> 
>>>>    Best,
>>>>    Daniyar Yeralin
>>>> 
>>>>>    On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>>    <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>>> 
>>>>>    Hi Daniyar,
>>>>> 
>>>>>    That's a very clever solution!
>>>>> 
>>>>>    One observation is that, now, this is what we might call a
>>>>>    polymorphic
>>>>>    serde. That is, you're detecting the actual concrete type and then
>>>>>    promising to produce the exact same concrete type on read. There are
>>>>>    some inherent problems with this approach, which in general require
>>>>>    some kind of  schema registry (not necessarily Schema Registry, just
>>>>>    any registry for schemas) to solve.
>>>>> 
>>>>>    Notice that every serialized record has quite a bit of duplicated
>>>>>    information: the concrete type as well as a byte to indicate whether
>>>>>    the value type is a fixed size, and, if so, an integer to
>>>>>    indicate the
>>>>>    actual size. These constitute a schema, of sorts, because they
>>>>>    tell us
>>>>>    later how exactly to deserialize the data. Unfortunately, this
>>>>>    information is completely redundant. In all likelihood, the
>>>>>    information will be exactly the same for every record in the topic.
>>>>>    This problem is essentially the core motivation for serializations
>>>>>    like Avro: to move the schema outside of the serialization itself, so
>>>>>    that the records won't contain so much redundant information.
>>>>> 
>>>>>    In this light, I'm wondering if it makes sense to go back to
>>>>>    something
>>>>>    like what you had earlier in which you don't support perfectly
>>>>>    preserving the concrete type for _this_ serde, but instead just
>>>>>    support deserializing to _some_ List. Then, you could defer full,
>>>>>    perfect, type preservation to serdes that have an external system in
>>>>>    which to register their type information.
>>>>> 
>>>>>    There does exist an alternative, if we really do want to preserve the
>>>>>    concrete type (which does seem kind of nice). You can add a
>>>>>    configuration option specifically for the serde to configure what the
>>>>>    list type will be, and maybe what the element type is, as well.
>>>>> 
>>>>>    As far as "related work" goes, you might be interested to take a look
>>>>>    at how Jackson can be configured to deserialize into a specific,
>>>>>    arbitrarily nested, generically parameterized class structure.
>>>>>    Specifically, you might find
>>>>>    https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>>    interesting.
>>>>> 
>>>>>    Thanks,
>>>>>    -John
>>>>> 
>>>>>    On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>>> 
>>>>>>    bump
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hey Daniyar,

Looks good to me! Thanks for considering it.

Thanks,
-John

On Fri, Jun 21, 2019 at 9:04 AM Development <de...@yeralin.net> wrote:

> Hey John and Matthias,
>
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I realized
> that’s the type is not really needed in ListSerializer, but only in
> ListDeserializer:
>
> In ListSerializer we will start storing sizes only if serializer is not a
> primitive serializer:
>
> Then, in deserializer, we persist passed list type, so that during
> deserialization we could create an instance of it with predefined listSize
> for better performance.
> We also try to locate a primitiveSize based on passed deserializer. If it
> is not there, then primitiveSize will be null. Which means that each
> entry’s size was encoded individually.
>
> This looks much cleaner and more concise.
>
> What do you think?
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
>
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
>
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
>
>
> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
> support this and a cast will be necessary at some point in the user code.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:21 PM, John Roesler wrote:
>
> Hey Daniyar,
>
> Thanks for looking at it!
>
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
>
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
>
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>   /**list type**/ LinkedList.class,
>   /**inner serde**/ new MyRecordSerde()
> )
>
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>
>
> What do you think?
> Thanks,
> -John
>
> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
> <mailto:dev@yeralin.net <de...@yeralin.net>>> wrote:
>
>    Hey John,
>
>    I gave read about TypeReference. It could work for the list serde.
>    However, it is not directly
>    supported: https://github.com/FasterXML/jackson-databind/issues/1490
>    The only way is to pass an actual class object into the constructor,
>    something like:
>
>    It could be an option, but not a pretty one. What do you think of my
>    approach to use vanilla java and canonical class name? (As described
>    previously)
>
>    Best,
>    Daniyar Yeralin
>
>    On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>    <mailto:dev@yeralin.net <de...@yeralin.net>>> wrote:
>
>    Hi John,
>
>    Thank you for your input! Yes, my idea looks a little bit over
>    engineered :)
>
>    I also wanted to see a feedback from Mathias as well since he gave
>    me an idea about storing fixed/variable size entries.
>
>    Best,
>    Daniyar Yeralin
>
>    On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
>    <mailto:john@confluent.io <jo...@confluent.io>>> wrote:
>
>    Hi Daniyar,
>
>    That's a very clever solution!
>
>    One observation is that, now, this is what we might call a
>    polymorphic
>    serde. That is, you're detecting the actual concrete type and then
>    promising to produce the exact same concrete type on read. There are
>    some inherent problems with this approach, which in general require
>    some kind of  schema registry (not necessarily Schema Registry, just
>    any registry for schemas) to solve.
>
>    Notice that every serialized record has quite a bit of duplicated
>    information: the concrete type as well as a byte to indicate whether
>    the value type is a fixed size, and, if so, an integer to
>    indicate the
>    actual size. These constitute a schema, of sorts, because they
>    tell us
>    later how exactly to deserialize the data. Unfortunately, this
>    information is completely redundant. In all likelihood, the
>    information will be exactly the same for every record in the topic.
>    This problem is essentially the core motivation for serializations
>    like Avro: to move the schema outside of the serialization itself, so
>    that the records won't contain so much redundant information.
>
>    In this light, I'm wondering if it makes sense to go back to
>    something
>    like what you had earlier in which you don't support perfectly
>    preserving the concrete type for _this_ serde, but instead just
>    support deserializing to _some_ List. Then, you could defer full,
>    perfect, type preservation to serdes that have an external system in
>    which to register their type information.
>
>    There does exist an alternative, if we really do want to preserve the
>    concrete type (which does seem kind of nice). You can add a
>    configuration option specifically for the serde to configure what the
>    list type will be, and maybe what the element type is, as well.
>
>    As far as "related work" goes, you might be interested to take a look
>    at how Jackson can be configured to deserialize into a specific,
>    arbitrarily nested, generically parameterized class structure.
>    Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>    interesting.
>
>    Thanks,
>    -John
>
>    On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
>    <mailto:dev@yeralin.net <de...@yeralin.net>>> wrote:
>
>
>    bump
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey John and Matthias,

Yes, now I see it all. I’m storing lots of redundant information.
Here is my final idea. Yes, now a user should pass a list type. I realized that’s the type is not really needed in ListSerializer, but only in ListDeserializer:


In ListSerializer we will start storing sizes only if serializer is not a primitive serializer:


Then, in deserializer, we persist passed list type, so that during deserialization we could create an instance of it with predefined listSize for better performance.
We also try to locate a primitiveSize based on passed deserializer. If it is not there, then primitiveSize will be null. Which means that each entry’s size was encoded individually.


This looks much cleaner and more concise.

What do you think?

Best,
Daniyar Yeralin 

> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
> 
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
> 
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
> 
> 
> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
> support this and a cast will be necessary at some point in the user code.
> 
> 
> 
> -Matthias
> 
> 
> 
> On 6/20/19 1:21 PM, John Roesler wrote:
>> Hey Daniyar,
>> 
>> Thanks for looking at it!
>> 
>> Something like your screenshot is more along the lines of what I was
>> thinking. Sorry, but I didn't follow what you mean, how would that not
>> be "vanilla java"?
>> 
>> Unfortunately the deserializer needs more information, though. For
>> example, what if the inner type is a Map<String,String>? The serde could
>> only be used to produce a LinkedList<Map>, thus, we'd still need an
>> inner serde, like you have in the KIP (Serde<T> innerSerde).
>> 
>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>>   /**list type**/ LinkedList.class,
>>   /**inner serde**/ new MyRecordSerde()
>> )
>> 
>> And in configuration, it's something like:
>> default.key.serde: org...ListSerde
>> default.key.list.serde.type: java.util.LinkedList
>> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>> 
>> 
>> What do you think?
>> Thanks,
>> -John
>> 
>> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
>> <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>> 
>>    Hey John,
>> 
>>    I gave read about TypeReference. It could work for the list serde.
>>    However, it is not directly
>>    supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
>>    The only way is to pass an actual class object into the constructor,
>>    something like:
>> 
>>    It could be an option, but not a pretty one. What do you think of my
>>    approach to use vanilla java and canonical class name? (As described
>>    previously)
>> 
>>    Best,
>>    Daniyar Yeralin
>> 
>>>    On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net <ma...@yeralin.net>
>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>> 
>>>    Hi John,
>>> 
>>>    Thank you for your input! Yes, my idea looks a little bit over
>>>    engineered :)
>>> 
>>>    I also wanted to see a feedback from Mathias as well since he gave
>>>    me an idea about storing fixed/variable size entries.
>>> 
>>>    Best,
>>>    Daniyar Yeralin
>>> 
>>>>    On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io <ma...@confluent.io>
>>>>    <mailto:john@confluent.io <ma...@confluent.io>>> wrote:
>>>> 
>>>>    Hi Daniyar,
>>>> 
>>>>    That's a very clever solution!
>>>> 
>>>>    One observation is that, now, this is what we might call a
>>>>    polymorphic
>>>>    serde. That is, you're detecting the actual concrete type and then
>>>>    promising to produce the exact same concrete type on read. There are
>>>>    some inherent problems with this approach, which in general require
>>>>    some kind of  schema registry (not necessarily Schema Registry, just
>>>>    any registry for schemas) to solve.
>>>> 
>>>>    Notice that every serialized record has quite a bit of duplicated
>>>>    information: the concrete type as well as a byte to indicate whether
>>>>    the value type is a fixed size, and, if so, an integer to
>>>>    indicate the
>>>>    actual size. These constitute a schema, of sorts, because they
>>>>    tell us
>>>>    later how exactly to deserialize the data. Unfortunately, this
>>>>    information is completely redundant. In all likelihood, the
>>>>    information will be exactly the same for every record in the topic.
>>>>    This problem is essentially the core motivation for serializations
>>>>    like Avro: to move the schema outside of the serialization itself, so
>>>>    that the records won't contain so much redundant information.
>>>> 
>>>>    In this light, I'm wondering if it makes sense to go back to
>>>>    something
>>>>    like what you had earlier in which you don't support perfectly
>>>>    preserving the concrete type for _this_ serde, but instead just
>>>>    support deserializing to _some_ List. Then, you could defer full,
>>>>    perfect, type preservation to serdes that have an external system in
>>>>    which to register their type information.
>>>> 
>>>>    There does exist an alternative, if we really do want to preserve the
>>>>    concrete type (which does seem kind of nice). You can add a
>>>>    configuration option specifically for the serde to configure what the
>>>>    list type will be, and maybe what the element type is, as well.
>>>> 
>>>>    As far as "related work" goes, you might be interested to take a look
>>>>    at how Jackson can be configured to deserialize into a specific,
>>>>    arbitrarily nested, generically parameterized class structure.
>>>>    Specifically, you might find
>>>>    https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
>>>>    interesting.
>>>> 
>>>>    Thanks,
>>>>    -John
>>>> 
>>>>    On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net <ma...@yeralin.net>
>>>>    <mailto:dev@yeralin.net <ma...@yeralin.net>>> wrote:
>>>>> 
>>>>>    bump


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
For encoding the list-type: I see John's point about re-encoding the
list-type redundantly. However, I also don't like the idea that the
Deserializer returns a fixed type...

Maybe it's best allow users to specify the target list type on
deserialization via config?

Similar for the primitive types: I don't think we need to encode the
type size, but users could specify the type on the deserializer (via a
config again)?


About generics: nesting could be arbitrarily deep. Hence, I doubt we can
support this and a cast will be necessary at some point in the user code.



-Matthias



On 6/20/19 1:21 PM, John Roesler wrote:
> Hey Daniyar,
> 
> Thanks for looking at it!
> 
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
> 
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
> 
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>   /**list type**/ LinkedList.class,
>   /**inner serde**/ new MyRecordSerde()
> )
> 
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> 
> 
> What do you think?
> Thanks,
> -John
> 
> On Thu, Jun 20, 2019 at 2:46 PM Development <dev@yeralin.net
> <ma...@yeralin.net>> wrote:
> 
>     Hey John,
> 
>     I gave read about TypeReference. It could work for the list serde.
>     However, it is not directly
>     supported: https://github.com/FasterXML/jackson-databind/issues/1490
>     The only way is to pass an actual class object into the constructor,
>     something like:
> 
>     It could be an option, but not a pretty one. What do you think of my
>     approach to use vanilla java and canonical class name? (As described
>     previously)
> 
>     Best,
>     Daniyar Yeralin
> 
>>     On Jun 20, 2019, at 2:45 PM, Development <dev@yeralin.net
>>     <ma...@yeralin.net>> wrote:
>>
>>     Hi John,
>>
>>     Thank you for your input! Yes, my idea looks a little bit over
>>     engineered :)
>>
>>     I also wanted to see a feedback from Mathias as well since he gave
>>     me an idea about storing fixed/variable size entries.
>>
>>     Best,
>>     Daniyar Yeralin
>>
>>>     On Jun 18, 2019, at 6:06 PM, John Roesler <john@confluent.io
>>>     <ma...@confluent.io>> wrote:
>>>
>>>     Hi Daniyar,
>>>
>>>     That's a very clever solution!
>>>
>>>     One observation is that, now, this is what we might call a
>>>     polymorphic
>>>     serde. That is, you're detecting the actual concrete type and then
>>>     promising to produce the exact same concrete type on read. There are
>>>     some inherent problems with this approach, which in general require
>>>     some kind of  schema registry (not necessarily Schema Registry, just
>>>     any registry for schemas) to solve.
>>>
>>>     Notice that every serialized record has quite a bit of duplicated
>>>     information: the concrete type as well as a byte to indicate whether
>>>     the value type is a fixed size, and, if so, an integer to
>>>     indicate the
>>>     actual size. These constitute a schema, of sorts, because they
>>>     tell us
>>>     later how exactly to deserialize the data. Unfortunately, this
>>>     information is completely redundant. In all likelihood, the
>>>     information will be exactly the same for every record in the topic.
>>>     This problem is essentially the core motivation for serializations
>>>     like Avro: to move the schema outside of the serialization itself, so
>>>     that the records won't contain so much redundant information.
>>>
>>>     In this light, I'm wondering if it makes sense to go back to
>>>     something
>>>     like what you had earlier in which you don't support perfectly
>>>     preserving the concrete type for _this_ serde, but instead just
>>>     support deserializing to _some_ List. Then, you could defer full,
>>>     perfect, type preservation to serdes that have an external system in
>>>     which to register their type information.
>>>
>>>     There does exist an alternative, if we really do want to preserve the
>>>     concrete type (which does seem kind of nice). You can add a
>>>     configuration option specifically for the serde to configure what the
>>>     list type will be, and maybe what the element type is, as well.
>>>
>>>     As far as "related work" goes, you might be interested to take a look
>>>     at how Jackson can be configured to deserialize into a specific,
>>>     arbitrarily nested, generically parameterized class structure.
>>>     Specifically, you might find
>>>     https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>>     interesting.
>>>
>>>     Thanks,
>>>     -John
>>>
>>>     On Mon, Jun 17, 2019 at 12:38 PM Development <dev@yeralin.net
>>>     <ma...@yeralin.net>> wrote:
>>>>
>>>>     bump
>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hey Daniyar,

Thanks for looking at it!

Something like your screenshot is more along the lines of what I was
thinking. Sorry, but I didn't follow what you mean, how would that not be
"vanilla java"?

Unfortunately the deserializer needs more information, though. For example,
what if the inner type is a Map<String,String>? The serde could only be
used to produce a LinkedList<Map>, thus, we'd still need an inner serde,
like you have in the KIP (Serde<T> innerSerde).

Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
  /**list type**/ LinkedList.class,
  /**inner serde**/ new MyRecordSerde()
)

And in configuration, it's something like:
default.key.serde: org...ListSerde
default.key.list.serde.type: java.util.LinkedList
default.key.list.serde.inner: com.mycompany.MyRecordSerde


What do you think?
Thanks,
-John

On Thu, Jun 20, 2019 at 2:46 PM Development <de...@yeralin.net> wrote:

> Hey John,
>
> I gave read about TypeReference. It could work for the list serde.
> However, it is not directly supported:
> https://github.com/FasterXML/jackson-databind/issues/1490
> The only way is to pass an actual class object into the constructor,
> something like:
>
> It could be an option, but not a pretty one. What do you think of my
> approach to use vanilla java and canonical class name? (As described
> previously)
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 2:45 PM, Development <de...@yeralin.net> wrote:
>
> Hi John,
>
> Thank you for your input! Yes, my idea looks a little bit over engineered
> :)
>
> I also wanted to see a feedback from Mathias as well since he gave me an
> idea about storing fixed/variable size entries.
>
> Best,
> Daniyar Yeralin
>
> On Jun 18, 2019, at 6:06 PM, John Roesler <jo...@confluent.io> wrote:
>
> Hi Daniyar,
>
> That's a very clever solution!
>
> One observation is that, now, this is what we might call a polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read. There are
> some inherent problems with this approach, which in general require
> some kind of  schema registry (not necessarily Schema Registry, just
> any registry for schemas) to solve.
>
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate whether
> the value type is a fixed size, and, if so, an integer to indicate the
> actual size. These constitute a schema, of sorts, because they tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization itself, so
> that the records won't contain so much redundant information.
>
> In this light, I'm wondering if it makes sense to go back to something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external system in
> which to register their type information.
>
> There does exist an alternative, if we really do want to preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure what the
> list type will be, and maybe what the element type is, as well.
>
> As far as "related work" goes, you might be interested to take a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> interesting.
>
> Thanks,
> -John
>
> On Mon, Jun 17, 2019 at 12:38 PM Development <de...@yeralin.net> wrote:
>
>
> bump
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey John,

I gave read about TypeReference. It could work for the list serde. However, it is not directly supported: https://github.com/FasterXML/jackson-databind/issues/1490 <https://github.com/FasterXML/jackson-databind/issues/1490>
The only way is to pass an actual class object into the constructor, something like:


It could be an option, but not a pretty one. What do you think of my approach to use vanilla java and canonical class name? (As described previously)

Best,
Daniyar Yeralin

> On Jun 20, 2019, at 2:45 PM, Development <de...@yeralin.net> wrote:
> 
> Hi John,
> 
> Thank you for your input! Yes, my idea looks a little bit over engineered :)
> 
> I also wanted to see a feedback from Mathias as well since he gave me an idea about storing fixed/variable size entries.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jun 18, 2019, at 6:06 PM, John Roesler <jo...@confluent.io> wrote:
>> 
>> Hi Daniyar,
>> 
>> That's a very clever solution!
>> 
>> One observation is that, now, this is what we might call a polymorphic
>> serde. That is, you're detecting the actual concrete type and then
>> promising to produce the exact same concrete type on read. There are
>> some inherent problems with this approach, which in general require
>> some kind of  schema registry (not necessarily Schema Registry, just
>> any registry for schemas) to solve.
>> 
>> Notice that every serialized record has quite a bit of duplicated
>> information: the concrete type as well as a byte to indicate whether
>> the value type is a fixed size, and, if so, an integer to indicate the
>> actual size. These constitute a schema, of sorts, because they tell us
>> later how exactly to deserialize the data. Unfortunately, this
>> information is completely redundant. In all likelihood, the
>> information will be exactly the same for every record in the topic.
>> This problem is essentially the core motivation for serializations
>> like Avro: to move the schema outside of the serialization itself, so
>> that the records won't contain so much redundant information.
>> 
>> In this light, I'm wondering if it makes sense to go back to something
>> like what you had earlier in which you don't support perfectly
>> preserving the concrete type for _this_ serde, but instead just
>> support deserializing to _some_ List. Then, you could defer full,
>> perfect, type preservation to serdes that have an external system in
>> which to register their type information.
>> 
>> There does exist an alternative, if we really do want to preserve the
>> concrete type (which does seem kind of nice). You can add a
>> configuration option specifically for the serde to configure what the
>> list type will be, and maybe what the element type is, as well.
>> 
>> As far as "related work" goes, you might be interested to take a look
>> at how Jackson can be configured to deserialize into a specific,
>> arbitrarily nested, generically parameterized class structure.
>> Specifically, you might find
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> interesting.
>> 
>> Thanks,
>> -John
>> 
>> On Mon, Jun 17, 2019 at 12:38 PM Development <de...@yeralin.net> wrote:
>>> 
>>> bump
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Also, reposting my reply for:
- did we consider to make the return type (ie, ArrayList, vs
LinkesList) configurable or encode it the serialized bytes?

Since the formatting got removed, I’ll repost it as screenshots.

FYI I’m researching about Jackson’s TypeReference and how it can help in this case.

In my ListSerializer.java I have the following:



I’m essentially just encoding list’s canonical name into the byte array.

Then in ListDeserializer.java:


Here, instead II’m extracting the class name, and try to retrieve a class and create an instance of it.

It works just fine, except for one edge case. If you try to pass Arrays.asList(...) to the (de)serializer which is ArrayList instance, it will throw: java.lang.ClassNotFoundException: java.util.Arrays.ArrayList
Because it is expecting java.util.ArrayList.
The problem is described here: https://stackoverflow.com/questions/28851652/java-lang-classcastexception-java-util-arraysarraylist-cannot-be-cast-to-java <https://stackoverflow.com/questions/28851652/java-lang-classcastexception-java-util-arraysarraylist-cannot-be-cast-to-java>
Arrays.asList() produces an instance of a List implementation (java.util.Arrays$ArrayList) that is not java.util.ArrayList.

Best,
Daniyar Yeralin

> On Jun 20, 2019, at 2:45 PM, Development <de...@yeralin.net> wrote:
> 
> Hi John,
> 
> Thank you for your input! Yes, my idea looks a little bit over engineered :)
> 
> I also wanted to see a feedback from Mathias as well since he gave me an idea about storing fixed/variable size entries.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jun 18, 2019, at 6:06 PM, John Roesler <jo...@confluent.io> wrote:
>> 
>> Hi Daniyar,
>> 
>> That's a very clever solution!
>> 
>> One observation is that, now, this is what we might call a polymorphic
>> serde. That is, you're detecting the actual concrete type and then
>> promising to produce the exact same concrete type on read. There are
>> some inherent problems with this approach, which in general require
>> some kind of  schema registry (not necessarily Schema Registry, just
>> any registry for schemas) to solve.
>> 
>> Notice that every serialized record has quite a bit of duplicated
>> information: the concrete type as well as a byte to indicate whether
>> the value type is a fixed size, and, if so, an integer to indicate the
>> actual size. These constitute a schema, of sorts, because they tell us
>> later how exactly to deserialize the data. Unfortunately, this
>> information is completely redundant. In all likelihood, the
>> information will be exactly the same for every record in the topic.
>> This problem is essentially the core motivation for serializations
>> like Avro: to move the schema outside of the serialization itself, so
>> that the records won't contain so much redundant information.
>> 
>> In this light, I'm wondering if it makes sense to go back to something
>> like what you had earlier in which you don't support perfectly
>> preserving the concrete type for _this_ serde, but instead just
>> support deserializing to _some_ List. Then, you could defer full,
>> perfect, type preservation to serdes that have an external system in
>> which to register their type information.
>> 
>> There does exist an alternative, if we really do want to preserve the
>> concrete type (which does seem kind of nice). You can add a
>> configuration option specifically for the serde to configure what the
>> list type will be, and maybe what the element type is, as well.
>> 
>> As far as "related work" goes, you might be interested to take a look
>> at how Jackson can be configured to deserialize into a specific,
>> arbitrarily nested, generically parameterized class structure.
>> Specifically, you might find
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> interesting.
>> 
>> Thanks,
>> -John
>> 
>> On Mon, Jun 17, 2019 at 12:38 PM Development <de...@yeralin.net> wrote:
>>> 
>>> bump
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

Thank you for your input! Yes, my idea looks a little bit over engineered :)

I also wanted to see a feedback from Mathias as well since he gave me an idea about storing fixed/variable size entries.

Best,
Daniyar Yeralin

> On Jun 18, 2019, at 6:06 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Hi Daniyar,
> 
> That's a very clever solution!
> 
> One observation is that, now, this is what we might call a polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read. There are
> some inherent problems with this approach, which in general require
> some kind of  schema registry (not necessarily Schema Registry, just
> any registry for schemas) to solve.
> 
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate whether
> the value type is a fixed size, and, if so, an integer to indicate the
> actual size. These constitute a schema, of sorts, because they tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization itself, so
> that the records won't contain so much redundant information.
> 
> In this light, I'm wondering if it makes sense to go back to something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external system in
> which to register their type information.
> 
> There does exist an alternative, if we really do want to preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure what the
> list type will be, and maybe what the element type is, as well.
> 
> As far as "related work" goes, you might be interested to take a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> interesting.
> 
> Thanks,
> -John
> 
> On Mon, Jun 17, 2019 at 12:38 PM Development <de...@yeralin.net> wrote:
>> 
>> bump


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hi Daniyar,

That's a very clever solution!

One observation is that, now, this is what we might call a polymorphic
serde. That is, you're detecting the actual concrete type and then
promising to produce the exact same concrete type on read. There are
some inherent problems with this approach, which in general require
some kind of  schema registry (not necessarily Schema Registry, just
any registry for schemas) to solve.

Notice that every serialized record has quite a bit of duplicated
information: the concrete type as well as a byte to indicate whether
the value type is a fixed size, and, if so, an integer to indicate the
actual size. These constitute a schema, of sorts, because they tell us
later how exactly to deserialize the data. Unfortunately, this
information is completely redundant. In all likelihood, the
information will be exactly the same for every record in the topic.
This problem is essentially the core motivation for serializations
like Avro: to move the schema outside of the serialization itself, so
that the records won't contain so much redundant information.

In this light, I'm wondering if it makes sense to go back to something
like what you had earlier in which you don't support perfectly
preserving the concrete type for _this_ serde, but instead just
support deserializing to _some_ List. Then, you could defer full,
perfect, type preservation to serdes that have an external system in
which to register their type information.

There does exist an alternative, if we really do want to preserve the
concrete type (which does seem kind of nice). You can add a
configuration option specifically for the serde to configure what the
list type will be, and maybe what the element type is, as well.

As far as "related work" goes, you might be interested to take a look
at how Jackson can be configured to deserialize into a specific,
arbitrarily nested, generically parameterized class structure.
Specifically, you might find
https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
interesting.

Thanks,
-John

On Mon, Jun 17, 2019 at 12:38 PM Development <de...@yeralin.net> wrote:
>
> bump

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
bump

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi Matthias,

Here is what I have for (2) fixed/variable size encoding



Here is what I’m doing:
1. Identify whether a generic list entry T is a primitive
2. If it is, we will execute fixedEntrySizeSerialization method:


Here, I’m getting a size of a known primitive from my generated Map<PrimitiveClass, PrimitiveSize> (see pic. 1)
Then, I write first byte either 0 or 1 where ListSerde.FIXED_SIZE_TYPE == 1 identifying whether a payload is fixed or variable size list.
Essentially my header will look like this:

Or if an entry is not a primitive:



Then, in my ListDeserializer, we read first byte, then act accordingly:


I hope it all makes sense.

This is not a final implementation, but just a sketch of my idea. 
Feel free to ask questions or request for modifications.

Best,
Daniyar Yeralin

> On Jun 12, 2019, at 10:29 AM, Development <de...@yeralin.net> wrote:
> 
> Hmm the formatting got removed unfortunately. I’m sorry, it got harder to read my email.
> 
>> On Jun 12, 2019, at 10:27 AM, Development <de...@yeralin.net> wrote:
>> 
>> Hi Matthias,
>> 
>> Indeed, you are right. I missed your email, I had a problem with my mail server, so I guess I didn’t receive it.
>> 
>> 1) Here is what I came up with
>> In my ListSerializer.java I have the following:
>> try (final DataOutputStream out = new DataOutputStream(baos)) {
>>   // I’m encoding the class name at the top of the byte array
>>   out.writeUTF(data.getClass().getCanonicalName());
>>   out.writeInt(size);
>>   for (T entry : data) {
>>       final byte[] bytes = serializer.serialize(topic, entry);
>>       out.writeInt(bytes.length);
>>       out.write(bytes);
>>   }
>>   return baos.toByteArray();
>> }
>> Then in ListDeserializer.java:
>> try (final DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data))) {
>>   final String listClassName = dis.readUTF();
>>   final int size = dis.readInt();
>>   List<T> deserializedList = getListInstance(listClassName, size);
>>   for (int i = 0; i < size; i++) {
>>       byte[] payload = new byte[dis.readInt()];
>>       dis.read(payload);
>>       deserializedList.add(deserializer.deserialize(topic, payload));
>>   }
>>   return deserializedList;
>> }
>> private List<T> getListInstance(String listClassName, int listSize) {
>>   try {
>>       Class<?> listClass = Class.forName(listClassName);
>>       Constructor<?> listConstructor = listClass.getConstructor(Integer.TYPE);
>>       return (List<T>) listConstructor.newInstance(listSize);
>>   } catch (Exception e) {
>>       throw new RuntimeException("Could not construct a list instance of \"" + listClassName + "\"", e);
>>   }
>> }
>> It works just fine, except for one edge case. If you try to pass Arrays.asList(...) to the (de)serializer which is ArrayList instance, it will throw: java.lang.ClassNotFoundException: java.util.Arrays.ArrayList
>> Because it is expecting java.util.ArrayList.
>> The problem is described here: https://stackoverflow.com/questions/28851652/java-lang-classcastexception-java-util-arraysarraylist-cannot-be-cast-to-java
>> Arrays.asList() produces an instance of a List implementation (java.util.Arrays$ArrayList) that is not java.util.ArrayList
>> 
>> What do you think of this? Based on your experience, what is the best approach in this case? Maybe we modify a String? Is there any other cases where similar problem happens?
>> 
>> Still working on (2)
>> 
>> Thank you Matthias!
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On Jun 11, 2019, at 6:05 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>>> 
>>> Seems you missed my reply from 31/6. C&P below:
>>> 
>>>> (1) The current PR suggests to always instantiate an `ArrayList` --
>>>> however, if a user wants to use any other list implementation, they have
>>>> no way to specify this. It might be good to either allow users to
>>>> specify the list-type on the deserializer, or encode the list type
>>>> directly in the bytes, and hence, whatever type the serialized list was,
>>>> the same type will be used on deserialization (might only work for Java
>>>> build-it list types).
>>>> 
>>>> Personally, I thinks its better/more flexible to specify the list-type
>>>> on the deserializer, as it also allows to plug-in any custom list types.
>>>> 
>>>> This could of course be opt-in and for the case users don't care, we
>>>> just default to `ArrayList`.
>>>> 
>>>> 
>>>> (2) For Java built-in types, we could check the type via `instanceof` --
>>>> if the type is unknown, we fall back to per-element length encoding. As
>>>> an alternative, we could also add a constructor taking an `enum` with
>>>> two values `fixed-size` and `variable-size`, or a config instead of a
>>>> constructor element.
>>>> 
>>>> 
>>>> Just bounding off ideas -- maybe there are good reasons (too
>>>> complicated?) to not support either of them.
>>>> 
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 6/10/19 8:44 AM, Development wrote:
>>>> Bump
>>>> 
>>>>> On May 24, 2019, at 2:09 PM, Development <de...@yeralin.net> wrote:
>>>>> 
>>>>> Hey,
>>>>> 
>>>>> - did we consider to make the return type (ie, ArrayList, vs
>>>>> LinkesList) configurable or encode it the serialized bytes?
>>>>> 
>>>>> Not sure about this one. Could you elaborate?
>>>>> 
>>>>> - atm the size of each element is encoded individually; did we consider
>>>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>>> 
>>>>> I cannot think of any clean way to do so. How would you see it?
>>>>> 
>>>>> Btw I resolved all your comments under PR
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>>>>> 
>>>>>> Thanks for the KIP. I also had a look into the PR and have two follow up
>>>>>> question:
>>>>>> 
>>>>>> 
>>>>>> - did we consider to make the return type (ie, ArrayList, vs
>>>>>> LinkesList) configurable or encode it the serialized bytes?
>>>>>> 
>>>>>> - atm the size of each element is encoded individually; did we consider
>>>>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Matthias
>>>>>> 
>>>>>> On 5/15/19 6:05 PM, John Roesler wrote:
>>>>>>> Sounds good!
>>>>>>> 
>>>>>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>>>>>> 
>>>>>>>> Hey,
>>>>>>>> 
>>>>>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>> 
>>>>>>>>> Good observation, Daniyar.
>>>>>>>>> 
>>>>>>>>> Maybe we should just not implement support for serdeFrom.
>>>>>>>>> 
>>>>>>>>> We can always add it later, but I think you're right, we need some
>>>>>>>>> kind of more sophisticated support, or at least a second argument for
>>>>>>>>> the inner class.
>>>>>>>>> 
>>>>>>>>> For now, it seems like most use cases would be satisfied without
>>>>>>>>> serdeFrom(...List...)
>>>>>>>>> 
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>>>>>> 
>>>>>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>>>>>> 
>>>>>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>>>>>> 
>>>>>>>>>> Any ideas?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hey Sophie,
>>>>>>>>>>> 
>>>>>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>>>>>> 
>>>>>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>>>>>> the ROI could be low..
>>>>>>>>>>>> 
>>>>>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>>>>>> be useful, and there may be more)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>>>>>> called for a vote?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>>>>>> Like queue or set for example
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>>>>>> not required :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>>>>>> others will find easy to review.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>>>>>> the syntax.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Does that work?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>>>>>> KIPs :)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>>>>>> KIP document ? (
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>>>>>> generic
>>>>>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>>>>>> serialization.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>>>>>> an old
>>>>>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>>>>>> editing
>>>>>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>>>>>> serde a
>>>>>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>>>>>> is going
>>>>>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>>>>>> capture the
>>>>>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>>>>>> known
>>>>>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>>>>>> captured type
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>>>>>> desire
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>>>>>> static
>>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>>>>>> certainly
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>>>>>    Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>>>>>    URL:
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>>>>> Project: Kafka
>>>>>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>>>>> Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>>>>>> message to
>>>>>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>>>>>> arrays
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>>>>>> a List
>>>>>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>> <
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hmm the formatting got removed unfortunately. I’m sorry, it got harder to read my email.

> On Jun 12, 2019, at 10:27 AM, Development <de...@yeralin.net> wrote:
> 
> Hi Matthias,
> 
> Indeed, you are right. I missed your email, I had a problem with my mail server, so I guess I didn’t receive it.
> 
> 1) Here is what I came up with
> In my ListSerializer.java I have the following:
> try (final DataOutputStream out = new DataOutputStream(baos)) {
>    // I’m encoding the class name at the top of the byte array
>    out.writeUTF(data.getClass().getCanonicalName());
>    out.writeInt(size);
>    for (T entry : data) {
>        final byte[] bytes = serializer.serialize(topic, entry);
>        out.writeInt(bytes.length);
>        out.write(bytes);
>    }
>    return baos.toByteArray();
> }
> Then in ListDeserializer.java:
> try (final DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data))) {
>    final String listClassName = dis.readUTF();
>    final int size = dis.readInt();
>    List<T> deserializedList = getListInstance(listClassName, size);
>    for (int i = 0; i < size; i++) {
>        byte[] payload = new byte[dis.readInt()];
>        dis.read(payload);
>        deserializedList.add(deserializer.deserialize(topic, payload));
>    }
>    return deserializedList;
> }
> private List<T> getListInstance(String listClassName, int listSize) {
>    try {
>        Class<?> listClass = Class.forName(listClassName);
>        Constructor<?> listConstructor = listClass.getConstructor(Integer.TYPE);
>        return (List<T>) listConstructor.newInstance(listSize);
>    } catch (Exception e) {
>        throw new RuntimeException("Could not construct a list instance of \"" + listClassName + "\"", e);
>    }
> }
> It works just fine, except for one edge case. If you try to pass Arrays.asList(...) to the (de)serializer which is ArrayList instance, it will throw: java.lang.ClassNotFoundException: java.util.Arrays.ArrayList
> Because it is expecting java.util.ArrayList.
> The problem is described here: https://stackoverflow.com/questions/28851652/java-lang-classcastexception-java-util-arraysarraylist-cannot-be-cast-to-java
> Arrays.asList() produces an instance of a List implementation (java.util.Arrays$ArrayList) that is not java.util.ArrayList
> 
> What do you think of this? Based on your experience, what is the best approach in this case? Maybe we modify a String? Is there any other cases where similar problem happens?
> 
> Still working on (2)
> 
> Thank you Matthias!
> 
> Best,
> Daniyar Yeralin
> 
>> On Jun 11, 2019, at 6:05 PM, Matthias J. Sax <ma...@confluent.io> wrote:
>> 
>> Seems you missed my reply from 31/6. C&P below:
>> 
>>> (1) The current PR suggests to always instantiate an `ArrayList` --
>>> however, if a user wants to use any other list implementation, they have
>>> no way to specify this. It might be good to either allow users to
>>> specify the list-type on the deserializer, or encode the list type
>>> directly in the bytes, and hence, whatever type the serialized list was,
>>> the same type will be used on deserialization (might only work for Java
>>> build-it list types).
>>> 
>>> Personally, I thinks its better/more flexible to specify the list-type
>>> on the deserializer, as it also allows to plug-in any custom list types.
>>> 
>>> This could of course be opt-in and for the case users don't care, we
>>> just default to `ArrayList`.
>>> 
>>> 
>>> (2) For Java built-in types, we could check the type via `instanceof` --
>>> if the type is unknown, we fall back to per-element length encoding. As
>>> an alternative, we could also add a constructor taking an `enum` with
>>> two values `fixed-size` and `variable-size`, or a config instead of a
>>> constructor element.
>>> 
>>> 
>>> Just bounding off ideas -- maybe there are good reasons (too
>>> complicated?) to not support either of them.
>>> 
>> 
>> 
>> -Matthias
>> 
>> On 6/10/19 8:44 AM, Development wrote:
>>> Bump
>>> 
>>>> On May 24, 2019, at 2:09 PM, Development <de...@yeralin.net> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> - did we consider to make the return type (ie, ArrayList, vs
>>>> LinkesList) configurable or encode it the serialized bytes?
>>>> 
>>>> Not sure about this one. Could you elaborate?
>>>> 
>>>> - atm the size of each element is encoded individually; did we consider
>>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>> 
>>>> I cannot think of any clean way to do so. How would you see it?
>>>> 
>>>> Btw I resolved all your comments under PR
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>>>> 
>>>>> Thanks for the KIP. I also had a look into the PR and have two follow up
>>>>> question:
>>>>> 
>>>>> 
>>>>> - did we consider to make the return type (ie, ArrayList, vs
>>>>> LinkesList) configurable or encode it the serialized bytes?
>>>>> 
>>>>> - atm the size of each element is encoded individually; did we consider
>>>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>>> 
>>>>> 
>>>>> 
>>>>> -Matthias
>>>>> 
>>>>> On 5/15/19 6:05 PM, John Roesler wrote:
>>>>>> Sounds good!
>>>>>> 
>>>>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>> Hey,
>>>>>>> 
>>>>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>> 
>>>>>>>> Good observation, Daniyar.
>>>>>>>> 
>>>>>>>> Maybe we should just not implement support for serdeFrom.
>>>>>>>> 
>>>>>>>> We can always add it later, but I think you're right, we need some
>>>>>>>> kind of more sophisticated support, or at least a second argument for
>>>>>>>> the inner class.
>>>>>>>> 
>>>>>>>> For now, it seems like most use cases would be satisfied without
>>>>>>>> serdeFrom(...List...)
>>>>>>>> 
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>>>>> 
>>>>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>>>>> 
>>>>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>>>>> 
>>>>>>>>> Any ideas?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hey Sophie,
>>>>>>>>>> 
>>>>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>>>>> 
>>>>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>>>>> the ROI could be low..
>>>>>>>>>>> 
>>>>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>>>>> be useful, and there may be more)
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey,
>>>>>>>>>>>> 
>>>>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>>>>> called for a vote?
>>>>>>>>>>>> 
>>>>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>>>>> Like queue or set for example
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>>>>> not required :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>>>>> others will find easy to review.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>>>>> the syntax.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Does that work?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>>>>> KIPs :)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>>>>> KIP document ? (
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>> <
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>>>>> generic
>>>>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>>>>> serialization.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>>>>> an old
>>>>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>>>>> editing
>>>>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>>>>> any
>>>>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>>>>> serde a
>>>>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>>>>> is going
>>>>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>>>>> capture the
>>>>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>>>>> known
>>>>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>>>>> captured type
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>>>>> desire
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>>>>> static
>>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>>>>> certainly
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>>>>     Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>>>>     URL:
>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>>>> Project: Kafka
>>>>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>>>> Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>>>>> message to
>>>>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>>>>> arrays
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>>>>> a List
>>>>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>> <
>>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>> <
>>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>> <
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi Matthias,

Indeed, you are right. I missed your email, I had a problem with my mail server, so I guess I didn’t receive it.

1) Here is what I came up with
In my ListSerializer.java I have the following:
try (final DataOutputStream out = new DataOutputStream(baos)) {
    // I’m encoding the class name at the top of the byte array
    out.writeUTF(data.getClass().getCanonicalName());
    out.writeInt(size);
    for (T entry : data) {
        final byte[] bytes = serializer.serialize(topic, entry);
        out.writeInt(bytes.length);
        out.write(bytes);
    }
    return baos.toByteArray();
}
Then in ListDeserializer.java:
try (final DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data))) {
    final String listClassName = dis.readUTF();
    final int size = dis.readInt();
    List<T> deserializedList = getListInstance(listClassName, size);
    for (int i = 0; i < size; i++) {
        byte[] payload = new byte[dis.readInt()];
        dis.read(payload);
        deserializedList.add(deserializer.deserialize(topic, payload));
    }
    return deserializedList;
}
private List<T> getListInstance(String listClassName, int listSize) {
    try {
        Class<?> listClass = Class.forName(listClassName);
        Constructor<?> listConstructor = listClass.getConstructor(Integer.TYPE);
        return (List<T>) listConstructor.newInstance(listSize);
    } catch (Exception e) {
        throw new RuntimeException("Could not construct a list instance of \"" + listClassName + "\"", e);
    }
}
It works just fine, except for one edge case. If you try to pass Arrays.asList(...) to the (de)serializer which is ArrayList instance, it will throw: java.lang.ClassNotFoundException: java.util.Arrays.ArrayList
Because it is expecting java.util.ArrayList.
The problem is described here: https://stackoverflow.com/questions/28851652/java-lang-classcastexception-java-util-arraysarraylist-cannot-be-cast-to-java
Arrays.asList() produces an instance of a List implementation (java.util.Arrays$ArrayList) that is not java.util.ArrayList

What do you think of this? Based on your experience, what is the best approach in this case? Maybe we modify a String? Is there any other cases where similar problem happens?

Still working on (2)

Thank you Matthias!

Best,
Daniyar Yeralin

> On Jun 11, 2019, at 6:05 PM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Seems you missed my reply from 31/6. C&P below:
> 
>> (1) The current PR suggests to always instantiate an `ArrayList` --
>> however, if a user wants to use any other list implementation, they have
>> no way to specify this. It might be good to either allow users to
>> specify the list-type on the deserializer, or encode the list type
>> directly in the bytes, and hence, whatever type the serialized list was,
>> the same type will be used on deserialization (might only work for Java
>> build-it list types).
>> 
>> Personally, I thinks its better/more flexible to specify the list-type
>> on the deserializer, as it also allows to plug-in any custom list types.
>> 
>> This could of course be opt-in and for the case users don't care, we
>> just default to `ArrayList`.
>> 
>> 
>> (2) For Java built-in types, we could check the type via `instanceof` --
>> if the type is unknown, we fall back to per-element length encoding. As
>> an alternative, we could also add a constructor taking an `enum` with
>> two values `fixed-size` and `variable-size`, or a config instead of a
>> constructor element.
>> 
>> 
>> Just bounding off ideas -- maybe there are good reasons (too
>> complicated?) to not support either of them.
>> 
> 
> 
> -Matthias
> 
> On 6/10/19 8:44 AM, Development wrote:
>> Bump
>> 
>>> On May 24, 2019, at 2:09 PM, Development <de...@yeralin.net> wrote:
>>> 
>>> Hey,
>>> 
>>> - did we consider to make the return type (ie, ArrayList, vs
>>> LinkesList) configurable or encode it the serialized bytes?
>>> 
>>> Not sure about this one. Could you elaborate?
>>> 
>>> - atm the size of each element is encoded individually; did we consider
>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>> 
>>> I cannot think of any clean way to do so. How would you see it?
>>> 
>>> Btw I resolved all your comments under PR
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>>> 
>>>> Thanks for the KIP. I also had a look into the PR and have two follow up
>>>> question:
>>>> 
>>>> 
>>>> - did we consider to make the return type (ie, ArrayList, vs
>>>> LinkesList) configurable or encode it the serialized bytes?
>>>> 
>>>> - atm the size of each element is encoded individually; did we consider
>>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>> 
>>>> 
>>>> 
>>>> -Matthias
>>>> 
>>>> On 5/15/19 6:05 PM, John Roesler wrote:
>>>>> Sounds good!
>>>>> 
>>>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>>>> 
>>>>>> Hey,
>>>>>> 
>>>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>> 
>>>>>>> Good observation, Daniyar.
>>>>>>> 
>>>>>>> Maybe we should just not implement support for serdeFrom.
>>>>>>> 
>>>>>>> We can always add it later, but I think you're right, we need some
>>>>>>> kind of more sophisticated support, or at least a second argument for
>>>>>>> the inner class.
>>>>>>> 
>>>>>>> For now, it seems like most use cases would be satisfied without
>>>>>>> serdeFrom(...List...)
>>>>>>> 
>>>>>>> -John
>>>>>>> 
>>>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>>>> 
>>>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>>>> 
>>>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>>>> 
>>>>>>>> Any ideas?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>> 
>>>>>>>>> Hey Sophie,
>>>>>>>>> 
>>>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>>>> 
>>>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>>>> 
>>>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>>>> the ROI could be low..
>>>>>>>>>> 
>>>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>>>> be useful, and there may be more)
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hey,
>>>>>>>>>>> 
>>>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>>>> called for a vote?
>>>>>>>>>>> 
>>>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>>>> Like queue or set for example
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>> 
>>>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>>>> not required :)
>>>>>>>>>>>> 
>>>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>>>> others will find easy to review.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>>>> the syntax.
>>>>>>>>>>>> 
>>>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>>>> 
>>>>>>>>>>>> -John
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Does that work?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>>>> KIPs :)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>>>> KIP document ? (
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>> <
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>> )
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>>>> but I
>>>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>>>> control
>>>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>>>> generic
>>>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>>>> serialization.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>>>> an old
>>>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>>>> editing
>>>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>>>> any
>>>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>>>> serde a
>>>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>>>> is going
>>>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>>>> capture the
>>>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>>>> known
>>>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>>>> many
>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>>>> captured type
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>>>> desire
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>>>> static
>>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>>>> certainly
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>  Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>>>      Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>>>      URL:
>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>>>  Project: Kafka
>>>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>>> Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>>>> message to
>>>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>>>> arrays
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>>>> a List
>>>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>> <
>>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>> <
>>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>> <
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Seems you missed my reply from 31/6. C&P below:

> (1) The current PR suggests to always instantiate an `ArrayList` --
> however, if a user wants to use any other list implementation, they have
> no way to specify this. It might be good to either allow users to
> specify the list-type on the deserializer, or encode the list type
> directly in the bytes, and hence, whatever type the serialized list was,
> the same type will be used on deserialization (might only work for Java
> build-it list types).
> 
> Personally, I thinks its better/more flexible to specify the list-type
> on the deserializer, as it also allows to plug-in any custom list types.
> 
> This could of course be opt-in and for the case users don't care, we
> just default to `ArrayList`.
> 
> 
> (2) For Java built-in types, we could check the type via `instanceof` --
> if the type is unknown, we fall back to per-element length encoding. As
> an alternative, we could also add a constructor taking an `enum` with
> two values `fixed-size` and `variable-size`, or a config instead of a
> constructor element.
> 
> 
> Just bounding off ideas -- maybe there are good reasons (too
> complicated?) to not support either of them.
> 


-Matthias

On 6/10/19 8:44 AM, Development wrote:
> Bump
> 
>> On May 24, 2019, at 2:09 PM, Development <de...@yeralin.net> wrote:
>>
>> Hey,
>>
>> - did we consider to make the return type (ie, ArrayList, vs
>> LinkesList) configurable or encode it the serialized bytes?
>>
>> Not sure about this one. Could you elaborate?
>>
>> - atm the size of each element is encoded individually; did we consider
>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>
>> I cannot think of any clean way to do so. How would you see it?
>>
>> Btw I resolved all your comments under PR
>>
>> Best,
>> Daniyar Yeralin
>>
>>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>>>
>>> Thanks for the KIP. I also had a look into the PR and have two follow up
>>> question:
>>>
>>>
>>> - did we consider to make the return type (ie, ArrayList, vs
>>> LinkesList) configurable or encode it the serialized bytes?
>>>
>>> - atm the size of each element is encoded individually; did we consider
>>> an optimization for fixed size elements (like Long) to avoid this overhead?
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 5/15/19 6:05 PM, John Roesler wrote:
>>>> Sounds good!
>>>>
>>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>>>
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>>
>>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>>>
>>>>>> Good observation, Daniyar.
>>>>>>
>>>>>> Maybe we should just not implement support for serdeFrom.
>>>>>>
>>>>>> We can always add it later, but I think you're right, we need some
>>>>>> kind of more sophisticated support, or at least a second argument for
>>>>>> the inner class.
>>>>>>
>>>>>> For now, it seems like most use cases would be satisfied without
>>>>>> serdeFrom(...List...)
>>>>>>
>>>>>> -John
>>>>>>
>>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>>>
>>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>>>
>>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>>
>>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>>>
>>>>>>>> Hey Sophie,
>>>>>>>>
>>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>>>
>>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>>
>>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>>>
>>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>>> the ROI could be low..
>>>>>>>>>
>>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>>> be useful, and there may be more)
>>>>>>>>>
>>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hey,
>>>>>>>>>>
>>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>>> called for a vote?
>>>>>>>>>>
>>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>>> Like queue or set for example
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>
>>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>>> not required :)
>>>>>>>>>>>
>>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>>> others will find easy to review.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>>> the syntax.
>>>>>>>>>>>
>>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>>>
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hey,
>>>>>>>>>>>>
>>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>
>>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>>>
>>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does that work?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>>> KIPs :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>>> KIP document ? (
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>> <
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>> )
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>>> but I
>>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>>> control
>>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>>> generic
>>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>>> serialization.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>>> an old
>>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>>> editing
>>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>>> any
>>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>>> serde a
>>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>>> is going
>>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>>> capture the
>>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>>> known
>>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>>> many
>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>>> captured type
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>>> make
>>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>>> desire
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>>> could
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>>> static
>>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>>> certainly
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>>       Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>>       URL:
>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>>   Project: Kafka
>>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>>  Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>>> message to
>>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>>> arrays
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>>> a List
>>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>> <
>>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>> <
>>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>> <
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Bump

> On May 24, 2019, at 2:09 PM, Development <de...@yeralin.net> wrote:
> 
> Hey,
> 
> - did we consider to make the return type (ie, ArrayList, vs
> LinkesList) configurable or encode it the serialized bytes?
> 
> Not sure about this one. Could you elaborate?
> 
> - atm the size of each element is encoded individually; did we consider
> an optimization for fixed size elements (like Long) to avoid this overhead?
> 
> I cannot think of any clean way to do so. How would you see it?
> 
> Btw I resolved all your comments under PR
> 
> Best,
> Daniyar Yeralin
> 
>> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
>> 
>> Thanks for the KIP. I also had a look into the PR and have two follow up
>> question:
>> 
>> 
>> - did we consider to make the return type (ie, ArrayList, vs
>> LinkesList) configurable or encode it the serialized bytes?
>> 
>> - atm the size of each element is encoded individually; did we consider
>> an optimization for fixed size elements (like Long) to avoid this overhead?
>> 
>> 
>> 
>> -Matthias
>> 
>> On 5/15/19 6:05 PM, John Roesler wrote:
>>> Sounds good!
>>> 
>>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>>> 
>>>>> Good observation, Daniyar.
>>>>> 
>>>>> Maybe we should just not implement support for serdeFrom.
>>>>> 
>>>>> We can always add it later, but I think you're right, we need some
>>>>> kind of more sophisticated support, or at least a second argument for
>>>>> the inner class.
>>>>> 
>>>>> For now, it seems like most use cases would be satisfied without
>>>>> serdeFrom(...List...)
>>>>> 
>>>>> -John
>>>>> 
>>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>>> 
>>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>>> 
>>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>>> 
>>>>>> Any ideas?
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>> Hey Sophie,
>>>>>>> 
>>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>>> 
>>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>>> 
>>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>>> the ROI could be low..
>>>>>>>> 
>>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>>> be useful, and there may be more)
>>>>>>>> 
>>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>>> 
>>>>>>>>> Hey,
>>>>>>>>> 
>>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>>> called for a vote?
>>>>>>>>> 
>>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>>> Like queue or set for example
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>> 
>>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>>> not required :)
>>>>>>>>>> 
>>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>>> others will find easy to review.
>>>>>>>>>> 
>>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>>> the syntax.
>>>>>>>>>> 
>>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>>> 
>>>>>>>>>> -John
>>>>>>>>>> 
>>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hey,
>>>>>>>>>>> 
>>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>> 
>>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>>> 
>>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> Does that work?
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> 
>>>>>>>>>>>> Chris
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>> 
>>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>>> 
>>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>>> KIPs :)
>>>>>>>>>>>> 
>>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>>> KIP document ? (
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> <
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>> )
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>>> could not create a static method for List Serde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>>> but I
>>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>>> control
>>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>>> generic
>>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>>> serialization.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>>> an old
>>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>>> editing
>>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>>> any
>>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>>> serde a
>>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>>> is going
>>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>>> capture the
>>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>>> known
>>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>>> adding
>>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>>> many
>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>>> captured type
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>>> make
>>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>>> parameter, to
>>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>>> desire
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>>> a
>>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>>> *lists of*
>>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>>> PR is
>>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>>> could
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>>> static
>>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>>> beforehand.
>>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>>> certainly
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>   Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>>       Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>>       URL:
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>>   Project: Kafka
>>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>>  Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>>> message to
>>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>>> arrays
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>>> a List
>>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>> <
>>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>>> 
>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>> <
>>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> <
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey,

- did we consider to make the return type (ie, ArrayList, vs
LinkesList) configurable or encode it the serialized bytes?

Not sure about this one. Could you elaborate?

- atm the size of each element is encoded individually; did we consider
an optimization for fixed size elements (like Long) to avoid this overhead?

I cannot think of any clean way to do so. How would you see it?

Btw I resolved all your comments under PR

Best,
Daniyar Yeralin

> On May 24, 2019, at 12:01 AM, Matthias J. Sax <ma...@confluent.io> wrote:
> 
> Thanks for the KIP. I also had a look into the PR and have two follow up
> question:
> 
> 
> - did we consider to make the return type (ie, ArrayList, vs
> LinkesList) configurable or encode it the serialized bytes?
> 
> - atm the size of each element is encoded individually; did we consider
> an optimization for fixed size elements (like Long) to avoid this overhead?
> 
> 
> 
> -Matthias
> 
> On 5/15/19 6:05 PM, John Roesler wrote:
>> Sounds good!
>> 
>> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>> 
>>> Hey,
>>> 
>>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>> 
>>>> Good observation, Daniyar.
>>>> 
>>>> Maybe we should just not implement support for serdeFrom.
>>>> 
>>>> We can always add it later, but I think you're right, we need some
>>>> kind of more sophisticated support, or at least a second argument for
>>>> the inner class.
>>>> 
>>>> For now, it seems like most use cases would be satisfied without
>>>> serdeFrom(...List...)
>>>> 
>>>> -John
>>>> 
>>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>> 
>>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>> 
>>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>> 
>>>>> Any ideas?
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>> 
>>>>>> Hey Sophie,
>>>>>> 
>>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>> 
>>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>> 
>>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>>> the ROI could be low..
>>>>>>> 
>>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>>> be useful, and there may be more)
>>>>>>> 
>>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>>> Hey,
>>>>>>>> 
>>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>>> called for a vote?
>>>>>>>> 
>>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>>> Like queue or set for example
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Daniyar,
>>>>>>>>> 
>>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>>> not required :)
>>>>>>>>> 
>>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>>> others will find easy to review.
>>>>>>>>> 
>>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>>> the syntax.
>>>>>>>>> 
>>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>> 
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hey,
>>>>>>>>>> 
>>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>> 
>>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> Does that work?
>>>>>>>>>>> 
>>>>>>>>>>> Cheers,
>>>>>>>>>>> 
>>>>>>>>>>> Chris
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>> Hi John,
>>>>>>>>>>> 
>>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>> 
>>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>>> KIPs :)
>>>>>>>>>>> 
>>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>>> KIP document ? (
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>> <
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> )
>>>>>>>>>>>> 
>>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>> 
>>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>> 
>>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>>> still have a static method:
>>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>>> could not create a static method for List Serde under
>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>>> but I
>>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>>> control
>>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>>> generic
>>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>>> serialization.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>>> an old
>>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>>> editing
>>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>>> any
>>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>>> serde a
>>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>>> is going
>>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>>> capture the
>>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>>> known
>>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>>> adding
>>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>>> many
>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>>> captured type
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>>> make
>>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>>> parameter, to
>>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>>> desire
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>>> a
>>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>>> *lists of*
>>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>>> PR is
>>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>>> could
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>>> the KIP:
>>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>>> static
>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>>> beforehand.
>>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>>> certainly
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>    Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>>        Key: KAFKA-8326
>>>>>>>>>>>>>>>>>>        URL:
>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>>    Project: Kafka
>>>>>>>>>>>>>>>>>> Issue Type: Improvement
>>>>>>>>>>>>>>>>>> Components: clients, streams
>>>>>>>>>>>>>>>>>>   Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>>> java.util.List
>>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>>> message to
>>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>>> arrays
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>>> a List
>>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>> <
>>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>> 
>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>> <
>>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>> 
>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>> <
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>> 
>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for the KIP. I also had a look into the PR and have two follow up
question:


 - did we consider to make the return type (ie, ArrayList, vs
LinkesList) configurable or encode it the serialized bytes?

 - atm the size of each element is encoded individually; did we consider
an optimization for fixed size elements (like Long) to avoid this overhead?



-Matthias

On 5/15/19 6:05 PM, John Roesler wrote:
> Sounds good!
> 
> On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>>
>> Hey,
>>
>> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>>
>> Best,
>> Daniyar Yeralin
>>
>>> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
>>>
>>> Good observation, Daniyar.
>>>
>>> Maybe we should just not implement support for serdeFrom.
>>>
>>> We can always add it later, but I think you're right, we need some
>>> kind of more sophisticated support, or at least a second argument for
>>> the inner class.
>>>
>>> For now, it seems like most use cases would be satisfied without
>>> serdeFrom(...List...)
>>>
>>> -John
>>>
>>> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>>>>
>>>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>>>>
>>>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>>>>
>>>> Any ideas?
>>>>
>>>> Best,
>>>> Daniyar Yeralin
>>>>
>>>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>>>>
>>>>> Hey Sophie,
>>>>>
>>>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>>>>
>>>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>>>>
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>>
>>>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>>>>
>>>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>>>> that non-List Collections are probably relatively rare in practice (if
>>>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>>>> to just add them now. Personally I feel it would make sense to expand the
>>>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>>>> the ROI could be low..
>>>>>>
>>>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>>>> be useful, and there may be more)
>>>>>>
>>>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>>>> called for a vote?
>>>>>>>
>>>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>>>> Like queue or set for example
>>>>>>>
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>>
>>>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>>>>
>>>>>>>> Hi Daniyar,
>>>>>>>>
>>>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>>>> not required :)
>>>>>>>>
>>>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>>>> others will find easy to review.
>>>>>>>>
>>>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>>>> the syntax.
>>>>>>>>
>>>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>>>>
>>>>>>>> -John
>>>>>>>>
>>>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>>>>
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>>>> snippet. I’ll update KIP again.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>>
>>>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>
>>>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>>>>
>>>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Does that work?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> I updated JIRA and KIP.
>>>>>>>>>>
>>>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>>>> KIPs :)
>>>>>>>>>>
>>>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>
>>>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>>>> john@confluent.io>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>>>> KIP document ? (
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>> <
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>>>>
>>>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>>>> look at some other recent KIPs.
>>>>>>>>>>>
>>>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>>>> still have a static method:
>>>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>>>>
>>>>>>>>>>>> What about this point I made:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>>>> could not create a static method for List Serde under
>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>
>>>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>>>>
>>>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>>>> but I
>>>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>>>> control
>>>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>>>> generic
>>>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>>>> serialization.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>>>> an old
>>>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>>>> editing
>>>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>>>> any
>>>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>>>> serde a
>>>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>>>> is going
>>>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>>>> capture the
>>>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>>>> known
>>>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>>>> adding
>>>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>>>> many
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>>>> captured type
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>>>> make
>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>>>> parameter, to
>>>>>>>>>>>>>> force
>>>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>>>> desire
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>>>> a
>>>>>>>>>>>>>>> built-in UUID serde:
>>>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>>>> *lists of*
>>>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>>>> PR is
>>>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>>>> could
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>>>> the KIP:
>>>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>>>> static
>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>>>> beforehand.
>>>>>>>>>>>>>> That's
>>>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>>>> certainly
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     Summary: Add List<T> Serde
>>>>>>>>>>>>>>>>>         Key: KAFKA-8326
>>>>>>>>>>>>>>>>>         URL:
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>>>     Project: Kafka
>>>>>>>>>>>>>>>>>  Issue Type: Improvement
>>>>>>>>>>>>>>>>>  Components: clients, streams
>>>>>>>>>>>>>>>>>    Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>>>> java.util.List
>>>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>>>> message to
>>>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>>>> arrays
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>>>> serialize
>>>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>>>> a List
>>>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>> <
>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>>>>
>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>> [
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>> <
>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>>>>
>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>> <
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>>>>
>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Sounds good!

On Tue, May 14, 2019 at 9:21 AM Development <de...@yeralin.net> wrote:
>
> Hey,
>
> I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?
>
> Best,
> Daniyar Yeralin
>
> > On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
> >
> > Good observation, Daniyar.
> >
> > Maybe we should just not implement support for serdeFrom.
> >
> > We can always add it later, but I think you're right, we need some
> > kind of more sophisticated support, or at least a second argument for
> > the inner class.
> >
> > For now, it seems like most use cases would be satisfied without
> > serdeFrom(...List...)
> >
> > -John
> >
> > On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
> >>
> >> Hi,
> >>
> >> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
> >>
> >> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
> >>
> >> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
> >>
> >> Any ideas?
> >>
> >> Best,
> >> Daniyar Yeralin
> >>
> >>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
> >>>
> >>> Hey Sophie,
> >>>
> >>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
> >>>
> >>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
> >>>>
> >>>> Good point about serdes for other Collections. On the one hand I'd guess
> >>>> that non-List Collections are probably relatively rare in practice (if
> >>>> anyone disagrees please correct me!) but on the other hand, a) even if just
> >>>> a small number of people benefit I think it's worth the extra effort and b)
> >>>> if we do end up needing/wanting them in the future it would save us a KIP
> >>>> to just add them now. Personally I feel it would make sense to expand the
> >>>> scope of this KIP a bit to include all Collections as a logical unit, but
> >>>> the ROI could be low..
> >>>>
> >>>> (I know of at least one instance in the unit tests where a Set serde could
> >>>> be useful, and there may be more)
> >>>>
> >>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
> >>>>
> >>>>> Hey,
> >>>>>
> >>>>> I don’t see any replies. Seems like this proposal can be finalized and
> >>>>> called for a vote?
> >>>>>
> >>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
> >>>>> Like queue or set for example
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
> >>>>>>
> >>>>>> Hi Daniyar,
> >>>>>>
> >>>>>> No worries about the procedural stuff. Prior experience with KIPs is
> >>>>>> not required :)
> >>>>>>
> >>>>>> I was just trying to help you propose this stuff in a way that the
> >>>>>> others will find easy to review.
> >>>>>>
> >>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
> >>>>>> the syntax.
> >>>>>>
> >>>>>> Given these updates, I'm curious if anyone else has feedback about
> >>>>>> this proposal. Personally, I think it sounds fine!
> >>>>>>
> >>>>>> -John
> >>>>>>
> >>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
> >>>>>>>
> >>>>>>> Hey,
> >>>>>>>
> >>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
> >>>>> snippet. I’ll update KIP again.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
> >>>>>>>>
> >>>>>>>> Hi Daniyar,
> >>>>>>>>
> >>>>>>>> I think you may want to tweak your syntax a little:
> >>>>>>>>
> >>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
> >>>>>>>> return new ListSerde<T>(innerSerde);
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> Does that work?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Chris
> >>>>>>>>
> >>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
> >>>>> dev@yeralin.net>> wrote:
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> I updated JIRA and KIP.
> >>>>>>>>
> >>>>>>>> I didn’t know about the process, and created PR before I knew about
> >>>>> KIPs :)
> >>>>>>>>
> >>>>>>>> As per static declaration, I don’t think Java allows that:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Daniyar Yeralin
> >>>>>>>>
> >>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
> >>>>> john@confluent.io>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
> >>>>>>>>> KIP document ? (
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>> <
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>>> )
> >>>>>>>>>
> >>>>>>>>> This is the design document that we have to agree on and vote for, the
> >>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
> >>>>>>>>> but the KIP is the main artifact for this discussion.
> >>>>>>>>>
> >>>>>>>>> With this in mind, it will help get more reviewers to look at it if
> >>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
> >>>>>>>>> shouldn't have to look at any other document to understand the
> >>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
> >>>>>>>>> see what the public API will look like. If it helps, you can take a
> >>>>>>>>> look at some other recent KIPs.
> >>>>>>>>>
> >>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
> >>>>>>>>> a zero-argument static factory method for it, but it seems you could
> >>>>>>>>> still have a static method:
> >>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
> >>>>> dev@yeralin.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
> >>>>> argument: https://github.com/apache/kafka/pull/6592 <
> >>>>> https://github.com/apache/kafka/pull/6592> <
> >>>>> https://github.com/apache/kafka/pull/6592 <
> >>>>> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>
> >>>>>>>>>> Thank you for your input John! I really appreciate it.
> >>>>>>>>>>
> >>>>>>>>>> What about this point I made:
> >>>>>>>>>>
> >>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
> >>>>> could not create a static method for List Serde under
> >>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>>>>>>>>> P.S. Static method corresponding to ListSerde under
> >>>>> org.apache.kafka.common.serialization.Serdes (something like static public
> >>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> >>>>> class cannot be added because type needs to be defined beforehand. That's
> >>>>> why one needs to create List Serde in the following fashion:
> >>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>>> Comparator.comparing(String::length));
> >>>>>>>>>> (can possibly be simplified by declaring import static
> >>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>>>>
> >>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
> >>>>> <ma...@confluent.io>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the reply Daniyar,
> >>>>>>>>>>>
> >>>>>>>>>>> That makes much more sense! I thought I must be missing something,
> >>>>> but I
> >>>>>>>>>>> couldn't for the life of me figure it out.
> >>>>>>>>>>>
> >>>>>>>>>>> What do you think about just taking an argument, instead of for a
> >>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
> >>>>> control
> >>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
> >>>>> generic
> >>>>>>>>>>> parameter properly. As for the order, since the list is already in a
> >>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
> >>>>>>>>>>> strictly necessary to offer an option to sort the data during
> >>>>> serialization.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> -John
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
> >>>>> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
> >>>>> an old
> >>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
> >>>>> editing
> >>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
> >>>>> any
> >>>>>>>>>>>> inconvenience .
> >>>>>>>>>>>>
> >>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
> >>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
> >>>>> serde a
> >>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
> >>>>> is going
> >>>>>>>>>>>> to be inserted into a change log topic).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
> >>>>> <ma...@confluent.io>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Daniyar,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the proposal!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If I understand the point about the comparator, is it just to
> >>>>> capture the
> >>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
> >>>>> known
> >>>>>>>>>>>>> interface would work just as well, right? I've been considering
> >>>>> adding
> >>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
> >>>>> many
> >>>>>>>>>>>> other
> >>>>>>>>>>>>> projects). Would this be a good time to do it?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Note that it's not necessary to actually require that the
> >>>>> captured type
> >>>>>>>>>>>> is
> >>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
> >>>>> make
> >>>>>>>>>>>> sure
> >>>>>>>>>>>>> there is some method that makes use of the generic type
> >>>>> parameter, to
> >>>>>>>>>>>> force
> >>>>>>>>>>>>> the compiler to capture the type.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
> >>>>> desire
> >>>>>>>>>>>> to
> >>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
> >>>>> a
> >>>>>>>>>>>>> built-in UUID serde:
> >>>>> org.apache.kafka.common.serialization.Serdes#UUID,
> >>>>>>>>>>>> and
> >>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
> >>>>> *lists of*
> >>>>>>>>>>>>> UUIDs?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> -John
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
> >>>>> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
> >>>>> PR is
> >>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
> >>>>> https://github.com/apache/kafka/pull/6592> <
> >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
> >>>>> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> There are two topics I would like to discuss:
> >>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
> >>>>> could
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>> create a static method for List Serde under
> >>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
> >>>>> the KIP:
> >>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
> >>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
> >>>>> static
> >>>>>>>>>>>> public
> >>>>>>>>>>>>>> Serde<List<T>> List() {...}
> >>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
> >>>>>>>>>>>>>> class cannot be added because type needs to be defined
> >>>>> beforehand.
> >>>>>>>>>>>> That's
> >>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
> >>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>>>>>>>>>>>> Comparator.comparing(String::length));
> >>>>>>>>>>>>>> (can possibly be simplified by declaring import static
> >>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
> >>>>> https://github.com/miguno>> is questioning
> >>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
> >>>>> certainly
> >>>>>>>>>>>> is
> >>>>>>>>>>>>>> not required. Feel free to add your input:
> >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
> >>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
> >>>>> jira@apache.org <ma...@apache.org>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
> >>>>>>>>>>>>>>> --------------------------------------
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>     Summary: Add List<T> Serde
> >>>>>>>>>>>>>>>         Key: KAFKA-8326
> >>>>>>>>>>>>>>>         URL:
> >>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
> >>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
> >>>>>>>>>>>>>>>     Project: Kafka
> >>>>>>>>>>>>>>>  Issue Type: Improvement
> >>>>>>>>>>>>>>>  Components: clients, streams
> >>>>>>>>>>>>>>>    Reporter: Daniyar Yeralin
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
> >>>>> java.util.List
> >>>>>>>>>>>>>> class.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
> >>>>> message to
> >>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
> >>>>> arrays
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
> >>>>>>>>>>>> serialize
> >>>>>>>>>>>>>> and deserialize UUIDs directly.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
> >>>>> a List
> >>>>>>>>>>>>>> serde. Ex. [
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >>>>> <
> >>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >>>>>>
> >>>>>>>>>>>> ],
> >>>>>>>>>>>>>> [
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >>>>> <
> >>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >>>>>>
> >>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> KIP Link: [
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>> <
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>>>
> >>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>>>>>>>>> (v7.6.3#76005)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey,

I think it the proposal is finalized, no one raised any concerns. Shall we call it for a [VOTE]?

Best,
Daniyar Yeralin

> On May 10, 2019, at 10:17 AM, John Roesler <jo...@confluent.io> wrote:
> 
> Good observation, Daniyar.
> 
> Maybe we should just not implement support for serdeFrom.
> 
> We can always add it later, but I think you're right, we need some
> kind of more sophisticated support, or at least a second argument for
> the inner class.
> 
> For now, it seems like most use cases would be satisfied without
> serdeFrom(...List...)
> 
> -John
> 
> On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>> 
>> Hi,
>> 
>> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>> 
>> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>> 
>> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>> 
>> Any ideas?
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
>>> 
>>> Hey Sophie,
>>> 
>>> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
>>> 
>>> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>>>> 
>>>> Good point about serdes for other Collections. On the one hand I'd guess
>>>> that non-List Collections are probably relatively rare in practice (if
>>>> anyone disagrees please correct me!) but on the other hand, a) even if just
>>>> a small number of people benefit I think it's worth the extra effort and b)
>>>> if we do end up needing/wanting them in the future it would save us a KIP
>>>> to just add them now. Personally I feel it would make sense to expand the
>>>> scope of this KIP a bit to include all Collections as a logical unit, but
>>>> the ROI could be low..
>>>> 
>>>> (I know of at least one instance in the unit tests where a Set serde could
>>>> be useful, and there may be more)
>>>> 
>>>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>>>> 
>>>>> Hey,
>>>>> 
>>>>> I don’t see any replies. Seems like this proposal can be finalized and
>>>>> called for a vote?
>>>>> 
>>>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>>>> Like queue or set for example
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>>>> 
>>>>>> Hi Daniyar,
>>>>>> 
>>>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>>>> not required :)
>>>>>> 
>>>>>> I was just trying to help you propose this stuff in a way that the
>>>>>> others will find easy to review.
>>>>>> 
>>>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>>>> the syntax.
>>>>>> 
>>>>>> Given these updates, I'm curious if anyone else has feedback about
>>>>>> this proposal. Personally, I think it sounds fine!
>>>>>> 
>>>>>> -John
>>>>>> 
>>>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>>>> 
>>>>>>> Hey,
>>>>>>> 
>>>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>>>> snippet. I’ll update KIP again.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>>>> 
>>>>>>>> Hi Daniyar,
>>>>>>>> 
>>>>>>>> I think you may want to tweak your syntax a little:
>>>>>>>> 
>>>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>>>> return new ListSerde<T>(innerSerde);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Does that work?
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Chris
>>>>>>>> 
>>>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>>>> dev@yeralin.net>> wrote:
>>>>>>>> Hi John,
>>>>>>>> 
>>>>>>>> I updated JIRA and KIP.
>>>>>>>> 
>>>>>>>> I didn’t know about the process, and created PR before I knew about
>>>>> KIPs :)
>>>>>>>> 
>>>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Daniyar Yeralin
>>>>>>>> 
>>>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>>>> john@confluent.io>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>>>> KIP document ? (
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>> )
>>>>>>>>> 
>>>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>>>> 
>>>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>>>> look at some other recent KIPs.
>>>>>>>>> 
>>>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>>>> still have a static method:
>>>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>>>> dev@yeralin.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>>>> argument: https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>> 
>>>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>>>> 
>>>>>>>>>> What about this point I made:
>>>>>>>>>> 
>>>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>>>> could not create a static method for List Serde under
>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>>>> class cannot be added because type needs to be defined beforehand. That's
>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>> Comparator.comparing(String::length));
>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>> 
>>>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>>>> but I
>>>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>>>> 
>>>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>>>> control
>>>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>>>> generic
>>>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>>>> strictly necessary to offer an option to sort the data during
>>>>> serialization.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>> 
>>>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>>>> an old
>>>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>>>> editing
>>>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>>>> any
>>>>>>>>>>>> inconvenience .
>>>>>>>>>>>> 
>>>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>>>> serde a
>>>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>>>> is going
>>>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you!
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>>>> capture the
>>>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>>>> known
>>>>>>>>>>>>> interface would work just as well, right? I've been considering
>>>>> adding
>>>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>>>> many
>>>>>>>>>>>> other
>>>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Note that it's not necessary to actually require that the
>>>>> captured type
>>>>>>>>>>>> is
>>>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>>>> make
>>>>>>>>>>>> sure
>>>>>>>>>>>>> there is some method that makes use of the generic type
>>>>> parameter, to
>>>>>>>>>>>> force
>>>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>>>> desire
>>>>>>>>>>>> to
>>>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>>>> a
>>>>>>>>>>>>> built-in UUID serde:
>>>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>>>> and
>>>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>>>> *lists of*
>>>>>>>>>>>>> UUIDs?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>>>> PR is
>>>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>>>> could
>>>>>>>>>>>> not
>>>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>>>> the KIP:
>>>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>>>> static
>>>>>>>>>>>> public
>>>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>>>> class cannot be added because type needs to be defined
>>>>> beforehand.
>>>>>>>>>>>> That's
>>>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>>>> certainly
>>>>>>>>>>>> is
>>>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>     Summary: Add List<T> Serde
>>>>>>>>>>>>>>>         Key: KAFKA-8326
>>>>>>>>>>>>>>>         URL:
>>>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>>>     Project: Kafka
>>>>>>>>>>>>>>>  Issue Type: Improvement
>>>>>>>>>>>>>>>  Components: clients, streams
>>>>>>>>>>>>>>>    Reporter: Daniyar Yeralin
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>>>> java.util.List
>>>>>>>>>>>>>> class.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>>>> message to
>>>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>>>> arrays
>>>>>>>>>>>> and
>>>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>>>> serialize
>>>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>>>> a List
>>>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>> <
>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>>>> 
>>>>>>>>>>>> ],
>>>>>>>>>>>>>> [
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>> <
>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>> 
>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>> <
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>> 
>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Good observation, Daniyar.

Maybe we should just not implement support for serdeFrom.

We can always add it later, but I think you're right, we need some
kind of more sophisticated support, or at least a second argument for
the inner class.

For now, it seems like most use cases would be satisfied without
serdeFrom(...List...)

-John

On Fri, May 10, 2019 at 8:57 AM Development <de...@yeralin.net> wrote:
>
> Hi,
>
> I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`
>
> Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case?
>
> I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.
>
> Any ideas?
>
> Best,
> Daniyar Yeralin
>
> > On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
> >
> > Hey Sophie,
> >
> > Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
> >
> > Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
> >
> > Best,
> > Daniyar Yeralin
> >
> >> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
> >>
> >> Good point about serdes for other Collections. On the one hand I'd guess
> >> that non-List Collections are probably relatively rare in practice (if
> >> anyone disagrees please correct me!) but on the other hand, a) even if just
> >> a small number of people benefit I think it's worth the extra effort and b)
> >> if we do end up needing/wanting them in the future it would save us a KIP
> >> to just add them now. Personally I feel it would make sense to expand the
> >> scope of this KIP a bit to include all Collections as a logical unit, but
> >> the ROI could be low..
> >>
> >> (I know of at least one instance in the unit tests where a Set serde could
> >> be useful, and there may be more)
> >>
> >> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
> >>
> >>> Hey,
> >>>
> >>> I don’t see any replies. Seems like this proposal can be finalized and
> >>> called for a vote?
> >>>
> >>> Also I’ve been thinking. Do we need more serdes for other Collections?
> >>> Like queue or set for example
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
> >>>>
> >>>> Hi Daniyar,
> >>>>
> >>>> No worries about the procedural stuff. Prior experience with KIPs is
> >>>> not required :)
> >>>>
> >>>> I was just trying to help you propose this stuff in a way that the
> >>>> others will find easy to review.
> >>>>
> >>>> Thanks for updating the KIP. Thanks to the others for helping out with
> >>>> the syntax.
> >>>>
> >>>> Given these updates, I'm curious if anyone else has feedback about
> >>>> this proposal. Personally, I think it sounds fine!
> >>>>
> >>>> -John
> >>>>
> >>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
> >>>>>
> >>>>> Hey,
> >>>>>
> >>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
> >>> snippet. I’ll update KIP again.
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
> >>>>>>
> >>>>>> Hi Daniyar,
> >>>>>>
> >>>>>> I think you may want to tweak your syntax a little:
> >>>>>>
> >>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
> >>>>>> return new ListSerde<T>(innerSerde);
> >>>>>> }
> >>>>>>
> >>>>>> Does that work?
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Chris
> >>>>>>
> >>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
> >>> dev@yeralin.net>> wrote:
> >>>>>> Hi John,
> >>>>>>
> >>>>>> I updated JIRA and KIP.
> >>>>>>
> >>>>>> I didn’t know about the process, and created PR before I knew about
> >>> KIPs :)
> >>>>>>
> >>>>>> As per static declaration, I don’t think Java allows that:
> >>>>>>
> >>>>>>
> >>>>>> Best,
> >>>>>> Daniyar Yeralin
> >>>>>>
> >>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
> >>> john@confluent.io>> wrote:
> >>>>>>>
> >>>>>>> Thanks for that update. Do you mind making changes primarily on the
> >>>>>>> KIP document ? (
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>> <
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>> )
> >>>>>>>
> >>>>>>> This is the design document that we have to agree on and vote for, the
> >>>>>>> PR comes later. It can be nice to have an implementation to look at,
> >>>>>>> but the KIP is the main artifact for this discussion.
> >>>>>>>
> >>>>>>> With this in mind, it will help get more reviewers to look at it if
> >>>>>>> you can tidy up the KIP document so that it stands on its own. People
> >>>>>>> shouldn't have to look at any other document to understand the
> >>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
> >>>>>>> see what the public API will look like. If it helps, you can take a
> >>>>>>> look at some other recent KIPs.
> >>>>>>>
> >>>>>>> Given that the list serde needs an inner serde, I agree you can't have
> >>>>>>> a zero-argument static factory method for it, but it seems you could
> >>>>>>> still have a static method:
> >>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
> >>> dev@yeralin.net>> wrote:
> >>>>>>>>
> >>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
> >>> argument: https://github.com/apache/kafka/pull/6592 <
> >>> https://github.com/apache/kafka/pull/6592> <
> >>> https://github.com/apache/kafka/pull/6592 <
> >>> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>
> >>>>>>>> Thank you for your input John! I really appreciate it.
> >>>>>>>>
> >>>>>>>> What about this point I made:
> >>>>>>>>
> >>>>>>>> 1. Since type for List serde needs to be declared before hand, I
> >>> could not create a static method for List Serde under
> >>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>>>>>>> P.S. Static method corresponding to ListSerde under
> >>> org.apache.kafka.common.serialization.Serdes (something like static public
> >>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> >>> class cannot be added because type needs to be defined beforehand. That's
> >>> why one needs to create List Serde in the following fashion:
> >>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>> Comparator.comparing(String::length));
> >>>>>>>> (can possibly be simplified by declaring import static
> >>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>>
> >>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
> >>> <ma...@confluent.io>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks for the reply Daniyar,
> >>>>>>>>>
> >>>>>>>>> That makes much more sense! I thought I must be missing something,
> >>> but I
> >>>>>>>>> couldn't for the life of me figure it out.
> >>>>>>>>>
> >>>>>>>>> What do you think about just taking an argument, instead of for a
> >>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
> >>> control
> >>>>>>>>> how exactly the inner data gets serialized, while also bounding the
> >>> generic
> >>>>>>>>> parameter properly. As for the order, since the list is already in a
> >>>>>>>>> specific order, which the user themselves controls, it doesn't seem
> >>>>>>>>> strictly necessary to offer an option to sort the data during
> >>> serialization.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
> >>> <ma...@yeralin.net>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi John,
> >>>>>>>>>>
> >>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
> >>> an old
> >>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
> >>> editing
> >>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
> >>> any
> >>>>>>>>>> inconvenience .
> >>>>>>>>>>
> >>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
> >>>>>>>>>> implementing comparable interface. I was just thinking to make the
> >>> serde a
> >>>>>>>>>> little more flexible (i.e. let user decide in which order records
> >>> is going
> >>>>>>>>>> to be inserted into a change log topic).
> >>>>>>>>>>
> >>>>>>>>>> Thank you!
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
> >>> <ma...@confluent.io>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Daniyar,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for the proposal!
> >>>>>>>>>>>
> >>>>>>>>>>> If I understand the point about the comparator, is it just to
> >>> capture the
> >>>>>>>>>>> generic type parameter? If so, then anything that implements a
> >>> known
> >>>>>>>>>>> interface would work just as well, right? I've been considering
> >>> adding
> >>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
> >>> many
> >>>>>>>>>> other
> >>>>>>>>>>> projects). Would this be a good time to do it?
> >>>>>>>>>>>
> >>>>>>>>>>> Note that it's not necessary to actually require that the
> >>> captured type
> >>>>>>>>>> is
> >>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
> >>> make
> >>>>>>>>>> sure
> >>>>>>>>>>> there is some method that makes use of the generic type
> >>> parameter, to
> >>>>>>>>>> force
> >>>>>>>>>>> the compiler to capture the type.
> >>>>>>>>>>>
> >>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
> >>> desire
> >>>>>>>>>> to
> >>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
> >>> a
> >>>>>>>>>>> built-in UUID serde:
> >>> org.apache.kafka.common.serialization.Serdes#UUID,
> >>>>>>>>>> and
> >>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
> >>> *lists of*
> >>>>>>>>>>> UUIDs?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> -John
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
> >>> <ma...@yeralin.net>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hello,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
> >>> PR is
> >>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
> >>> https://github.com/apache/kafka/pull/6592> <
> >>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
> >>> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> There are two topics I would like to discuss:
> >>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
> >>> could
> >>>>>>>>>> not
> >>>>>>>>>>>> create a static method for List Serde under
> >>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
> >>> the KIP:
> >>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
> >>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
> >>> static
> >>>>>>>>>> public
> >>>>>>>>>>>> Serde<List<T>> List() {...}
> >>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
> >>>>>>>>>>>> class cannot be added because type needs to be defined
> >>> beforehand.
> >>>>>>>>>> That's
> >>>>>>>>>>>> why one needs to create List Serde in the following fashion:
> >>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>>>>>>>>>> Comparator.comparing(String::length));
> >>>>>>>>>>>> (can possibly be simplified by declaring import static
> >>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
> >>> https://github.com/miguno>> is questioning
> >>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
> >>> certainly
> >>>>>>>>>> is
> >>>>>>>>>>>> not required. Feel free to add your input:
> >>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
> >>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Daniyar Yeralin
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
> >>> jira@apache.org <ma...@apache.org>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
> >>>>>>>>>>>>> --------------------------------------
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>       Summary: Add List<T> Serde
> >>>>>>>>>>>>>           Key: KAFKA-8326
> >>>>>>>>>>>>>           URL:
> >>> https://issues.apache.org/jira/browse/KAFKA-8326 <
> >>> https://issues.apache.org/jira/browse/KAFKA-8326>
> >>>>>>>>>>>>>       Project: Kafka
> >>>>>>>>>>>>>    Issue Type: Improvement
> >>>>>>>>>>>>>    Components: clients, streams
> >>>>>>>>>>>>>      Reporter: Daniyar Yeralin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I propose adding serializers and deserializers for the
> >>> java.util.List
> >>>>>>>>>>>> class.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
> >>> message to
> >>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
> >>> arrays
> >>>>>>>>>> and
> >>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
> >>>>>>>>>> serialize
> >>>>>>>>>>>> and deserialize UUIDs directly.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I believe there are many use cases where one would want to have
> >>> a List
> >>>>>>>>>>>> serde. Ex. [
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >>> <
> >>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >>>>
> >>>>>>>>>> ],
> >>>>>>>>>>>> [
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >>> <
> >>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >>>>
> >>>>>>>>>>>> ]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> KIP Link: [
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>> <
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>>
> >>>>>>>>>>>> ]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>>>>>>> (v7.6.3#76005)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi,

I was trying to add some test cases for the list serde, and it led me to this class `org.apache.kafka.common.serialization.SerializationTest`. I saw that it relies on method `org.apache.kafka.common.serialization.serdeFrom(Class<T> type)`

Now, I’m not sure how to adapt List<T> serde for this method, since it will be a “nested class”. What is the best approach in this case? 

I remember that in Jackson for example, one uses a TypeFactory, and constructs “collectionType” of two classes. For example, `constructCollectionType(List.class, String.class).getClass()`. I don’t think it applies here.

Any ideas?

Best,
Daniyar Yeralin

> On May 9, 2019, at 2:10 PM, Development <de...@yeralin.net> wrote:
> 
> Hey Sophie,
> 
> Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.
> 
> Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.
> 
> Best,
> Daniyar Yeralin
> 
>> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
>> 
>> Good point about serdes for other Collections. On the one hand I'd guess
>> that non-List Collections are probably relatively rare in practice (if
>> anyone disagrees please correct me!) but on the other hand, a) even if just
>> a small number of people benefit I think it's worth the extra effort and b)
>> if we do end up needing/wanting them in the future it would save us a KIP
>> to just add them now. Personally I feel it would make sense to expand the
>> scope of this KIP a bit to include all Collections as a logical unit, but
>> the ROI could be low..
>> 
>> (I know of at least one instance in the unit tests where a Set serde could
>> be useful, and there may be more)
>> 
>> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
>> 
>>> Hey,
>>> 
>>> I don’t see any replies. Seems like this proposal can be finalized and
>>> called for a vote?
>>> 
>>> Also I’ve been thinking. Do we need more serdes for other Collections?
>>> Like queue or set for example
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>>> 
>>>> Hi Daniyar,
>>>> 
>>>> No worries about the procedural stuff. Prior experience with KIPs is
>>>> not required :)
>>>> 
>>>> I was just trying to help you propose this stuff in a way that the
>>>> others will find easy to review.
>>>> 
>>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>>> the syntax.
>>>> 
>>>> Given these updates, I'm curious if anyone else has feedback about
>>>> this proposal. Personally, I think it sounds fine!
>>>> 
>>>> -John
>>>> 
>>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>>> 
>>>>> Hey,
>>>>> 
>>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>>> snippet. I’ll update KIP again.
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>>> 
>>>>>> Hi Daniyar,
>>>>>> 
>>>>>> I think you may want to tweak your syntax a little:
>>>>>> 
>>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>>> return new ListSerde<T>(innerSerde);
>>>>>> }
>>>>>> 
>>>>>> Does that work?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Chris
>>>>>> 
>>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>>> dev@yeralin.net>> wrote:
>>>>>> Hi John,
>>>>>> 
>>>>>> I updated JIRA and KIP.
>>>>>> 
>>>>>> I didn’t know about the process, and created PR before I knew about
>>> KIPs :)
>>>>>> 
>>>>>> As per static declaration, I don’t think Java allows that:
>>>>>> 
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>>> john@confluent.io>> wrote:
>>>>>>> 
>>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>>> KIP document ? (
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> )
>>>>>>> 
>>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>>> but the KIP is the main artifact for this discussion.
>>>>>>> 
>>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>>> shouldn't have to look at any other document to understand the
>>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>>> look at some other recent KIPs.
>>>>>>> 
>>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>>> still have a static method:
>>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>>> dev@yeralin.net>> wrote:
>>>>>>>> 
>>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>>> argument: https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592> <
>>> https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>> 
>>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>>> 
>>>>>>>> What about this point I made:
>>>>>>>> 
>>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>>> could not create a static method for List Serde under
>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>> P.S. Static method corresponding to ListSerde under
>>> org.apache.kafka.common.serialization.Serdes (something like static public
>>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>>> class cannot be added because type needs to be defined beforehand. That's
>>> why one needs to create List Serde in the following fashion:
>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>> Comparator.comparing(String::length));
>>>>>>>> (can possibly be simplified by declaring import static
>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>> 
>>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>>> <ma...@confluent.io>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>>> 
>>>>>>>>> That makes much more sense! I thought I must be missing something,
>>> but I
>>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>>> 
>>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>>> control
>>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>>> generic
>>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>>> strictly necessary to offer an option to sort the data during
>>> serialization.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> -John
>>>>>>>>> 
>>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>>> <ma...@yeralin.net>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi John,
>>>>>>>>>> 
>>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>>> an old
>>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>>> editing
>>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>>> any
>>>>>>>>>> inconvenience .
>>>>>>>>>> 
>>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>>> implementing comparable interface. I was just thinking to make the
>>> serde a
>>>>>>>>>> little more flexible (i.e. let user decide in which order records
>>> is going
>>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>>> 
>>>>>>>>>> Thank you!
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>>> <ma...@confluent.io>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Daniyar,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>>> 
>>>>>>>>>>> If I understand the point about the comparator, is it just to
>>> capture the
>>>>>>>>>>> generic type parameter? If so, then anything that implements a
>>> known
>>>>>>>>>>> interface would work just as well, right? I've been considering
>>> adding
>>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>>> many
>>>>>>>>>> other
>>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>>> 
>>>>>>>>>>> Note that it's not necessary to actually require that the
>>> captured type
>>>>>>>>>> is
>>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>>> make
>>>>>>>>>> sure
>>>>>>>>>>> there is some method that makes use of the generic type
>>> parameter, to
>>>>>>>>>> force
>>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>>> 
>>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>>> desire
>>>>>>>>>> to
>>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>>> a
>>>>>>>>>>> built-in UUID serde:
>>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>>> and
>>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>>> *lists of*
>>>>>>>>>>> UUIDs?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>>> <ma...@yeralin.net>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> 
>>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>>> PR is
>>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>>> 
>>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>>> could
>>>>>>>>>> not
>>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>>> the KIP:
>>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>>> static
>>>>>>>>>> public
>>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>>> class cannot be added because type needs to be defined
>>> beforehand.
>>>>>>>>>> That's
>>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>>> https://github.com/miguno>> is questioning
>>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>>> certainly
>>>>>>>>>> is
>>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you!
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>>> 
>>>>>>>>>>>>>       Summary: Add List<T> Serde
>>>>>>>>>>>>>           Key: KAFKA-8326
>>>>>>>>>>>>>           URL:
>>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>>       Project: Kafka
>>>>>>>>>>>>>    Issue Type: Improvement
>>>>>>>>>>>>>    Components: clients, streams
>>>>>>>>>>>>>      Reporter: Daniyar Yeralin
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I propose adding serializers and deserializers for the
>>> java.util.List
>>>>>>>>>>>> class.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>>> message to
>>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>>> arrays
>>>>>>>>>> and
>>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>>> serialize
>>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I believe there are many use cases where one would want to have
>>> a List
>>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>>> 
>>>>>>>>>> 
>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>> <
>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>> 
>>>>>>>>>> ],
>>>>>>>>>>>> [
>>>>>>>>>>>> 
>>>>>>>>>> 
>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>> <
>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>> 
>>>>>>>>>>>> ]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>>> 
>>>>>>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> 
>>>>>>>>>>>> ]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey Sophie,

Thank you for your input. I think I’d rather finish this KIP as is, and then open a new one for the Collections (if everyone agrees). I don’t want to extend the current KIP-466, since most of the work is already done for it.

Meanwhile, I’ll start adding some test cases for this new list serde since this discussion seems to be approaching its logical end.

Best,
Daniyar Yeralin

> On May 9, 2019, at 1:35 PM, Sophie Blee-Goldman <so...@confluent.io> wrote:
> 
> Good point about serdes for other Collections. On the one hand I'd guess
> that non-List Collections are probably relatively rare in practice (if
> anyone disagrees please correct me!) but on the other hand, a) even if just
> a small number of people benefit I think it's worth the extra effort and b)
> if we do end up needing/wanting them in the future it would save us a KIP
> to just add them now. Personally I feel it would make sense to expand the
> scope of this KIP a bit to include all Collections as a logical unit, but
> the ROI could be low..
> 
> (I know of at least one instance in the unit tests where a Set serde could
> be useful, and there may be more)
> 
> On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:
> 
>> Hey,
>> 
>> I don’t see any replies. Seems like this proposal can be finalized and
>> called for a vote?
>> 
>> Also I’ve been thinking. Do we need more serdes for other Collections?
>> Like queue or set for example
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Hi Daniyar,
>>> 
>>> No worries about the procedural stuff. Prior experience with KIPs is
>>> not required :)
>>> 
>>> I was just trying to help you propose this stuff in a way that the
>>> others will find easy to review.
>>> 
>>> Thanks for updating the KIP. Thanks to the others for helping out with
>>> the syntax.
>>> 
>>> Given these updates, I'm curious if anyone else has feedback about
>>> this proposal. Personally, I think it sounds fine!
>>> 
>>> -John
>>> 
>>> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>>>> 
>>>> Hey,
>>>> 
>>>> That worked! I certainly lack Java generics knowledge. Thanks for the
>> snippet. I’ll update KIP again.
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>>>> 
>>>>> Hi Daniyar,
>>>>> 
>>>>> I think you may want to tweak your syntax a little:
>>>>> 
>>>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>>> return new ListSerde<T>(innerSerde);
>>>>> }
>>>>> 
>>>>> Does that work?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Chris
>>>>> 
>>>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
>> dev@yeralin.net>> wrote:
>>>>> Hi John,
>>>>> 
>>>>> I updated JIRA and KIP.
>>>>> 
>>>>> I didn’t know about the process, and created PR before I knew about
>> KIPs :)
>>>>> 
>>>>> As per static declaration, I don’t think Java allows that:
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
>> john@confluent.io>> wrote:
>>>>>> 
>>>>>> Thanks for that update. Do you mind making changes primarily on the
>>>>>> KIP document ? (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> )
>>>>>> 
>>>>>> This is the design document that we have to agree on and vote for, the
>>>>>> PR comes later. It can be nice to have an implementation to look at,
>>>>>> but the KIP is the main artifact for this discussion.
>>>>>> 
>>>>>> With this in mind, it will help get more reviewers to look at it if
>>>>>> you can tidy up the KIP document so that it stands on its own. People
>>>>>> shouldn't have to look at any other document to understand the
>>>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>>>> see what the public API will look like. If it helps, you can take a
>>>>>> look at some other recent KIPs.
>>>>>> 
>>>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>>>> a zero-argument static factory method for it, but it seems you could
>>>>>> still have a static method:
>>>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
>> dev@yeralin.net>> wrote:
>>>>>>> 
>>>>>>> Absolutely agree. Already pushed another commit to remove comparator
>> argument: https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592> <
>> https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592>>
>>>>>>> 
>>>>>>> Thank you for your input John! I really appreciate it.
>>>>>>> 
>>>>>>> What about this point I made:
>>>>>>> 
>>>>>>> 1. Since type for List serde needs to be declared before hand, I
>> could not create a static method for List Serde under
>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>> P.S. Static method corresponding to ListSerde under
>> org.apache.kafka.common.serialization.Serdes (something like static public
>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>> class cannot be added because type needs to be defined beforehand. That's
>> why one needs to create List Serde in the following fashion:
>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>> Comparator.comparing(String::length));
>>>>>>> (can possibly be simplified by declaring import static
>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>> 
>>>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
>> <ma...@confluent.io>> wrote:
>>>>>>>> 
>>>>>>>> Thanks for the reply Daniyar,
>>>>>>>> 
>>>>>>>> That makes much more sense! I thought I must be missing something,
>> but I
>>>>>>>> couldn't for the life of me figure it out.
>>>>>>>> 
>>>>>>>> What do you think about just taking an argument, instead of for a
>>>>>>>> Comparator, for the Serde of the inner type? That way, the user can
>> control
>>>>>>>> how exactly the inner data gets serialized, while also bounding the
>> generic
>>>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>>>> strictly necessary to offer an option to sort the data during
>> serialization.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
>> <ma...@yeralin.net>> wrote:
>>>>>>>> 
>>>>>>>>> Hi John,
>>>>>>>>> 
>>>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
>> an old
>>>>>>>>> one about introducing UUID Serde, and I guess was too hasty while
>> editing
>>>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
>> any
>>>>>>>>> inconvenience .
>>>>>>>>> 
>>>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>>>> implementing comparable interface. I was just thinking to make the
>> serde a
>>>>>>>>> little more flexible (i.e. let user decide in which order records
>> is going
>>>>>>>>> to be inserted into a change log topic).
>>>>>>>>> 
>>>>>>>>> Thank you!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
>> <ma...@confluent.io>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Daniyar,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the proposal!
>>>>>>>>>> 
>>>>>>>>>> If I understand the point about the comparator, is it just to
>> capture the
>>>>>>>>>> generic type parameter? If so, then anything that implements a
>> known
>>>>>>>>>> interface would work just as well, right? I've been considering
>> adding
>>>>>>>>>> something like the Jackson TypeReference (or similar classes in
>> many
>>>>>>>>> other
>>>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>>>> 
>>>>>>>>>> Note that it's not necessary to actually require that the
>> captured type
>>>>>>>>> is
>>>>>>>>>> Comparable (as this proposal currently does), it's just a way to
>> make
>>>>>>>>> sure
>>>>>>>>>> there is some method that makes use of the generic type
>> parameter, to
>>>>>>>>> force
>>>>>>>>>> the compiler to capture the type.
>>>>>>>>>> 
>>>>>>>>>> Just to make sure I understand the motivation... You expressed a
>> desire
>>>>>>>>> to
>>>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
>> a
>>>>>>>>>> built-in UUID serde:
>> org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>>>> and
>>>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
>> *lists of*
>>>>>>>>>> UUIDs?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>> 
>>>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
>> <ma...@yeralin.net>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hello,
>>>>>>>>>>> 
>>>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
>> PR is
>>>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592> <
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592>>
>>>>>>>>>>> 
>>>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
>> could
>>>>>>>>> not
>>>>>>>>>>> create a static method for List Serde under
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
>> the KIP:
>>>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
>> static
>>>>>>>>> public
>>>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>>>> class cannot be added because type needs to be defined
>> beforehand.
>>>>>>>>> That's
>>>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>>>> 
>>>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
>> https://github.com/miguno>> is questioning
>>>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
>> certainly
>>>>>>>>> is
>>>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>>>> 
>>>>>>>>>>> Thank you!
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Daniyar Yeralin
>>>>>>>>>>> 
>>>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
>> jira@apache.org <ma...@apache.org>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>>>> --------------------------------------
>>>>>>>>>>>> 
>>>>>>>>>>>>        Summary: Add List<T> Serde
>>>>>>>>>>>>            Key: KAFKA-8326
>>>>>>>>>>>>            URL:
>> https://issues.apache.org/jira/browse/KAFKA-8326 <
>> https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>>>        Project: Kafka
>>>>>>>>>>>>     Issue Type: Improvement
>>>>>>>>>>>>     Components: clients, streams
>>>>>>>>>>>>       Reporter: Daniyar Yeralin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I propose adding serializers and deserializers for the
>> java.util.List
>>>>>>>>>>> class.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have many use cases where I want to set the key of a Kafka
>> message to
>>>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
>> arrays
>>>>>>>>> and
>>>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>>>> serialize
>>>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>>>> 
>>>>>>>>>>>> I believe there are many use cases where one would want to have
>> a List
>>>>>>>>>>> serde. Ex. [
>>>>>>>>>>> 
>>>>>>>>> 
>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>> <
>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>> 
>>>>>>>>> ],
>>>>>>>>>>> [
>>>>>>>>>>> 
>>>>>>>>> 
>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>> <
>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>> 
>>>>>>>>>>> ]
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> KIP Link: [
>>>>>>>>>>> 
>>>>>>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>> 
>>>>>>>>>>> ]
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Sophie Blee-Goldman <so...@confluent.io>.
Good point about serdes for other Collections. On the one hand I'd guess
that non-List Collections are probably relatively rare in practice (if
anyone disagrees please correct me!) but on the other hand, a) even if just
a small number of people benefit I think it's worth the extra effort and b)
if we do end up needing/wanting them in the future it would save us a KIP
to just add them now. Personally I feel it would make sense to expand the
scope of this KIP a bit to include all Collections as a logical unit, but
the ROI could be low..

(I know of at least one instance in the unit tests where a Set serde could
be useful, and there may be more)

On Thu, May 9, 2019 at 7:27 AM Development <de...@yeralin.net> wrote:

> Hey,
>
> I don’t see any replies. Seems like this proposal can be finalized and
> called for a vote?
>
> Also I’ve been thinking. Do we need more serdes for other Collections?
> Like queue or set for example
>
> Best,
> Daniyar Yeralin
>
> > On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
> >
> > Hi Daniyar,
> >
> > No worries about the procedural stuff. Prior experience with KIPs is
> > not required :)
> >
> > I was just trying to help you propose this stuff in a way that the
> > others will find easy to review.
> >
> > Thanks for updating the KIP. Thanks to the others for helping out with
> > the syntax.
> >
> > Given these updates, I'm curious if anyone else has feedback about
> > this proposal. Personally, I think it sounds fine!
> >
> > -John
> >
> > On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
> >>
> >> Hey,
> >>
> >> That worked! I certainly lack Java generics knowledge. Thanks for the
> snippet. I’ll update KIP again.
> >>
> >> Best,
> >> Daniyar Yeralin
> >>
> >>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
> >>>
> >>> Hi Daniyar,
> >>>
> >>> I think you may want to tweak your syntax a little:
> >>>
> >>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
> >>>  return new ListSerde<T>(innerSerde);
> >>> }
> >>>
> >>> Does that work?
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <mailto:
> dev@yeralin.net>> wrote:
> >>> Hi John,
> >>>
> >>> I updated JIRA and KIP.
> >>>
> >>> I didn’t know about the process, and created PR before I knew about
> KIPs :)
> >>>
> >>> As per static declaration, I don’t think Java allows that:
> >>>
> >>>
> >>> Best,
> >>> Daniyar Yeralin
> >>>
> >>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <mailto:
> john@confluent.io>> wrote:
> >>>>
> >>>> Thanks for that update. Do you mind making changes primarily on the
> >>>> KIP document ? (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >)
> >>>>
> >>>> This is the design document that we have to agree on and vote for, the
> >>>> PR comes later. It can be nice to have an implementation to look at,
> >>>> but the KIP is the main artifact for this discussion.
> >>>>
> >>>> With this in mind, it will help get more reviewers to look at it if
> >>>> you can tidy up the KIP document so that it stands on its own. People
> >>>> shouldn't have to look at any other document to understand the
> >>>> motivation of the proposal, and they shouldn't have to look at a PR to
> >>>> see what the public API will look like. If it helps, you can take a
> >>>> look at some other recent KIPs.
> >>>>
> >>>> Given that the list serde needs an inner serde, I agree you can't have
> >>>> a zero-argument static factory method for it, but it seems you could
> >>>> still have a static method:
> >>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <mailto:
> dev@yeralin.net>> wrote:
> >>>>>
> >>>>> Absolutely agree. Already pushed another commit to remove comparator
> argument: https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592> <
> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>>
> >>>>>
> >>>>> Thank you for your input John! I really appreciate it.
> >>>>>
> >>>>> What about this point I made:
> >>>>>
> >>>>> 1. Since type for List serde needs to be declared before hand, I
> could not create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>>>> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static public
> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> class cannot be added because type needs to be defined beforehand. That's
> why one needs to create List Serde in the following fashion:
> >>>>> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> >>>>> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>
> >>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io
> <ma...@confluent.io>> wrote:
> >>>>>>
> >>>>>> Thanks for the reply Daniyar,
> >>>>>>
> >>>>>> That makes much more sense! I thought I must be missing something,
> but I
> >>>>>> couldn't for the life of me figure it out.
> >>>>>>
> >>>>>> What do you think about just taking an argument, instead of for a
> >>>>>> Comparator, for the Serde of the inner type? That way, the user can
> control
> >>>>>> how exactly the inner data gets serialized, while also bounding the
> generic
> >>>>>> parameter properly. As for the order, since the list is already in a
> >>>>>> specific order, which the user themselves controls, it doesn't seem
> >>>>>> strictly necessary to offer an option to sort the data during
> serialization.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -John
> >>>>>>
> >>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net
> <ma...@yeralin.net>> wrote:
> >>>>>>
> >>>>>>> Hi John,
> >>>>>>>
> >>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from
> an old
> >>>>>>> one about introducing UUID Serde, and I guess was too hasty while
> editing
> >>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for
> any
> >>>>>>> inconvenience .
> >>>>>>>
> >>>>>>> As per comparator, I agree. Let’s make user be responsible for
> >>>>>>> implementing comparable interface. I was just thinking to make the
> serde a
> >>>>>>> little more flexible (i.e. let user decide in which order records
> is going
> >>>>>>> to be inserted into a change log topic).
> >>>>>>>
> >>>>>>> Thank you!
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>
> >>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io
> <ma...@confluent.io>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Daniyar,
> >>>>>>>>
> >>>>>>>> Thanks for the proposal!
> >>>>>>>>
> >>>>>>>> If I understand the point about the comparator, is it just to
> capture the
> >>>>>>>> generic type parameter? If so, then anything that implements a
> known
> >>>>>>>> interface would work just as well, right? I've been considering
> adding
> >>>>>>>> something like the Jackson TypeReference (or similar classes in
> many
> >>>>>>> other
> >>>>>>>> projects). Would this be a good time to do it?
> >>>>>>>>
> >>>>>>>> Note that it's not necessary to actually require that the
> captured type
> >>>>>>> is
> >>>>>>>> Comparable (as this proposal currently does), it's just a way to
> make
> >>>>>>> sure
> >>>>>>>> there is some method that makes use of the generic type
> parameter, to
> >>>>>>> force
> >>>>>>>> the compiler to capture the type.
> >>>>>>>>
> >>>>>>>> Just to make sure I understand the motivation... You expressed a
> desire
> >>>>>>> to
> >>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is
> a
> >>>>>>>> built-in UUID serde:
> org.apache.kafka.common.serialization.Serdes#UUID,
> >>>>>>> and
> >>>>>>>> also, a UUID isn't a List. Did you mean that you need to use
> *lists of*
> >>>>>>>> UUIDs?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -John
> >>>>>>>>
> >>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net
> <ma...@yeralin.net>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde.
> PR is
> >>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592> <
> >>>>>>>>> https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>>
> >>>>>>>>>
> >>>>>>>>> There are two topics I would like to discuss:
> >>>>>>>>> 1. Since type for List serve needs to be declared before hand, I
> could
> >>>>>>> not
> >>>>>>>>> create a static method for List Serde under
> >>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in
> the KIP:
> >>>>>>>>> P.S. Static method corresponding to ListSerde under
> >>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like
> static
> >>>>>>> public
> >>>>>>>>> Serde<List<T>> List() {...}
> >>>>>>> inorg.apache.kafka.common.serialization.Serdes)
> >>>>>>>>> class cannot be added because type needs to be defined
> beforehand.
> >>>>>>> That's
> >>>>>>>>> why one needs to create List Serde in the following fashion:
> >>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>>>>>>> Comparator.comparing(String::length));
> >>>>>>>>> (can possibly be simplified by declaring import static
> >>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>>>
> >>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <
> https://github.com/miguno>> is questioning
> >>>>>>>>> whether I need to pass a comparator to ListDeserializer. This
> certainly
> >>>>>>> is
> >>>>>>>>> not required. Feel free to add your input:
> >>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
> <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
> >>>>>>>>>
> >>>>>>>>> Thank you!
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Daniyar Yeralin
> >>>>>>>>>
> >>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <
> jira@apache.org <ma...@apache.org>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
> >>>>>>>>>> --------------------------------------
> >>>>>>>>>>
> >>>>>>>>>>         Summary: Add List<T> Serde
> >>>>>>>>>>             Key: KAFKA-8326
> >>>>>>>>>>             URL:
> https://issues.apache.org/jira/browse/KAFKA-8326 <
> https://issues.apache.org/jira/browse/KAFKA-8326>
> >>>>>>>>>>         Project: Kafka
> >>>>>>>>>>      Issue Type: Improvement
> >>>>>>>>>>      Components: clients, streams
> >>>>>>>>>>        Reporter: Daniyar Yeralin
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I propose adding serializers and deserializers for the
> java.util.List
> >>>>>>>>> class.
> >>>>>>>>>>
> >>>>>>>>>> I have many use cases where I want to set the key of a Kafka
> message to
> >>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte
> arrays
> >>>>>>> and
> >>>>>>>>> use their associated Serdes, but it would be more convenient to
> >>>>>>> serialize
> >>>>>>>>> and deserialize UUIDs directly.
> >>>>>>>>>>
> >>>>>>>>>> I believe there are many use cases where one would want to have
> a List
> >>>>>>>>> serde. Ex. [
> >>>>>>>>>
> >>>>>>>
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> <
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >
> >>>>>>> ],
> >>>>>>>>> [
> >>>>>>>>>
> >>>>>>>
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> <
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >
> >>>>>>>>> ]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> KIP Link: [
> >>>>>>>>>
> >>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >
> >>>>>>>>> ]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>>>> (v7.6.3#76005)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey,

I don’t see any replies. Seems like this proposal can be finalized and called for a vote?

Also I’ve been thinking. Do we need more serdes for other Collections? Like queue or set for example

Best,
Daniyar Yeralin

> On May 8, 2019, at 2:28 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Hi Daniyar,
> 
> No worries about the procedural stuff. Prior experience with KIPs is
> not required :)
> 
> I was just trying to help you propose this stuff in a way that the
> others will find easy to review.
> 
> Thanks for updating the KIP. Thanks to the others for helping out with
> the syntax.
> 
> Given these updates, I'm curious if anyone else has feedback about
> this proposal. Personally, I think it sounds fine!
> 
> -John
> 
> On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>> 
>> Hey,
>> 
>> That worked! I certainly lack Java generics knowledge. Thanks for the snippet. I’ll update KIP again.
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
>>> 
>>> Hi Daniyar,
>>> 
>>> I think you may want to tweak your syntax a little:
>>> 
>>> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>>>  return new ListSerde<T>(innerSerde);
>>> }
>>> 
>>> Does that work?
>>> 
>>> Cheers,
>>> 
>>> Chris
>>> 
>>> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>> Hi John,
>>> 
>>> I updated JIRA and KIP.
>>> 
>>> I didn’t know about the process, and created PR before I knew about KIPs :)
>>> 
>>> As per static declaration, I don’t think Java allows that:
>>> 
>>> 
>>> Best,
>>> Daniyar Yeralin
>>> 
>>>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> Thanks for that update. Do you mind making changes primarily on the
>>>> KIP document ? (https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>)
>>>> 
>>>> This is the design document that we have to agree on and vote for, the
>>>> PR comes later. It can be nice to have an implementation to look at,
>>>> but the KIP is the main artifact for this discussion.
>>>> 
>>>> With this in mind, it will help get more reviewers to look at it if
>>>> you can tidy up the KIP document so that it stands on its own. People
>>>> shouldn't have to look at any other document to understand the
>>>> motivation of the proposal, and they shouldn't have to look at a PR to
>>>> see what the public API will look like. If it helps, you can take a
>>>> look at some other recent KIPs.
>>>> 
>>>> Given that the list serde needs an inner serde, I agree you can't have
>>>> a zero-argument static factory method for it, but it seems you could
>>>> still have a static method:
>>>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>>>> 
>>>> Thoughts?
>>>> 
>>>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>> 
>>>>> Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>> 
>>>>> Thank you for your input John! I really appreciate it.
>>>>> 
>>>>> What about this point I made:
>>>>> 
>>>>> 1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>> P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
>>>>> new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
>>>>> (can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>> 
>>>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>> 
>>>>>> Thanks for the reply Daniyar,
>>>>>> 
>>>>>> That makes much more sense! I thought I must be missing something, but I
>>>>>> couldn't for the life of me figure it out.
>>>>>> 
>>>>>> What do you think about just taking an argument, instead of for a
>>>>>> Comparator, for the Serde of the inner type? That way, the user can control
>>>>>> how exactly the inner data gets serialized, while also bounding the generic
>>>>>> parameter properly. As for the order, since the list is already in a
>>>>>> specific order, which the user themselves controls, it doesn't seem
>>>>>> strictly necessary to offer an option to sort the data during serialization.
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>> 
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
>>>>>>> one about introducing UUID Serde, and I guess was too hasty while editing
>>>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for any
>>>>>>> inconvenience .
>>>>>>> 
>>>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>>>> implementing comparable interface. I was just thinking to make the serde a
>>>>>>> little more flexible (i.e. let user decide in which order records is going
>>>>>>> to be inserted into a change log topic).
>>>>>>> 
>>>>>>> Thank you!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>>>> 
>>>>>>>> Hi Daniyar,
>>>>>>>> 
>>>>>>>> Thanks for the proposal!
>>>>>>>> 
>>>>>>>> If I understand the point about the comparator, is it just to capture the
>>>>>>>> generic type parameter? If so, then anything that implements a known
>>>>>>>> interface would work just as well, right? I've been considering adding
>>>>>>>> something like the Jackson TypeReference (or similar classes in many
>>>>>>> other
>>>>>>>> projects). Would this be a good time to do it?
>>>>>>>> 
>>>>>>>> Note that it's not necessary to actually require that the captured type
>>>>>>> is
>>>>>>>> Comparable (as this proposal currently does), it's just a way to make
>>>>>>> sure
>>>>>>>> there is some method that makes use of the generic type parameter, to
>>>>>>> force
>>>>>>>> the compiler to capture the type.
>>>>>>>> 
>>>>>>>> Just to make sure I understand the motivation... You expressed a desire
>>>>>>> to
>>>>>>>> be able to serialize UUIDs, which I didn't follow, since there is a
>>>>>>>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>>>>>>> and
>>>>>>>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
>>>>>>>> UUIDs?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>>>> 
>>>>>>>>> Hello,
>>>>>>>>> 
>>>>>>>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
>>>>>>>>> created under https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <
>>>>>>>>> https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>>>> 
>>>>>>>>> There are two topics I would like to discuss:
>>>>>>>>> 1. Since type for List serve needs to be declared before hand, I could
>>>>>>> not
>>>>>>>>> create a static method for List Serde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static
>>>>>>> public
>>>>>>>>> Serde<List<T>> List() {...}
>>>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>>>> class cannot be added because type needs to be defined beforehand.
>>>>>>> That's
>>>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>>>> Comparator.comparing(String::length));
>>>>>>>>> (can possibly be simplified by declaring import static
>>>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>>>> 
>>>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <https://github.com/miguno>> is questioning
>>>>>>>>> whether I need to pass a comparator to ListDeserializer. This certainly
>>>>>>> is
>>>>>>>>> not required. Feel free to add your input:
>>>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067 <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>>>> 
>>>>>>>>> Thank you!
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Daniyar Yeralin
>>>>>>>>> 
>>>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <jira@apache.org <ma...@apache.org>>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>>>> --------------------------------------
>>>>>>>>>> 
>>>>>>>>>>         Summary: Add List<T> Serde
>>>>>>>>>>             Key: KAFKA-8326
>>>>>>>>>>             URL: https://issues.apache.org/jira/browse/KAFKA-8326 <https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>>>         Project: Kafka
>>>>>>>>>>      Issue Type: Improvement
>>>>>>>>>>      Components: clients, streams
>>>>>>>>>>        Reporter: Daniyar Yeralin
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I propose adding serializers and deserializers for the java.util.List
>>>>>>>>> class.
>>>>>>>>>> 
>>>>>>>>>> I have many use cases where I want to set the key of a Kafka message to
>>>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>>>>>>> and
>>>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>>>> serialize
>>>>>>>>> and deserialize UUIDs directly.
>>>>>>>>>> 
>>>>>>>>>> I believe there are many use cases where one would want to have a List
>>>>>>>>> serde. Ex. [
>>>>>>>>> 
>>>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows <https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows>
>>>>>>> ],
>>>>>>>>> [
>>>>>>>>> 
>>>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api <https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api>
>>>>>>>>> ]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> KIP Link: [
>>>>>>>>> 
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>>>>>>>>> ]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>>>> (v7.6.3#76005)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hi Daniyar,

No worries about the procedural stuff. Prior experience with KIPs is
not required :)

I was just trying to help you propose this stuff in a way that the
others will find easy to review.

Thanks for updating the KIP. Thanks to the others for helping out with
the syntax.

Given these updates, I'm curious if anyone else has feedback about
this proposal. Personally, I think it sounds fine!

-John

On Wed, May 8, 2019 at 1:01 PM Development <de...@yeralin.net> wrote:
>
> Hey,
>
> That worked! I certainly lack Java generics knowledge. Thanks for the snippet. I’ll update KIP again.
>
> Best,
> Daniyar Yeralin
>
> > On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
> >
> > Hi Daniyar,
> >
> > I think you may want to tweak your syntax a little:
> >
> > public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
> >   return new ListSerde<T>(innerSerde);
> > }
> >
> > Does that work?
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> > Hi John,
> >
> > I updated JIRA and KIP.
> >
> > I didn’t know about the process, and created PR before I knew about KIPs :)
> >
> > As per static declaration, I don’t think Java allows that:
> >
> >
> > Best,
> > Daniyar Yeralin
> >
> >> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
> >>
> >> Thanks for that update. Do you mind making changes primarily on the
> >> KIP document ? (https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>)
> >>
> >> This is the design document that we have to agree on and vote for, the
> >> PR comes later. It can be nice to have an implementation to look at,
> >> but the KIP is the main artifact for this discussion.
> >>
> >> With this in mind, it will help get more reviewers to look at it if
> >> you can tidy up the KIP document so that it stands on its own. People
> >> shouldn't have to look at any other document to understand the
> >> motivation of the proposal, and they shouldn't have to look at a PR to
> >> see what the public API will look like. If it helps, you can take a
> >> look at some other recent KIPs.
> >>
> >> Given that the list serde needs an inner serde, I agree you can't have
> >> a zero-argument static factory method for it, but it seems you could
> >> still have a static method:
> >> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
> >>
> >> Thoughts?
> >>
> >> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >>>
> >>> Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
> >>>
> >>> Thank you for your input John! I really appreciate it.
> >>>
> >>> What about this point I made:
> >>>
> >>> 1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>> P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
> >>> new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
> >>> (can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>
> >>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
> >>>>
> >>>> Thanks for the reply Daniyar,
> >>>>
> >>>> That makes much more sense! I thought I must be missing something, but I
> >>>> couldn't for the life of me figure it out.
> >>>>
> >>>> What do you think about just taking an argument, instead of for a
> >>>> Comparator, for the Serde of the inner type? That way, the user can control
> >>>> how exactly the inner data gets serialized, while also bounding the generic
> >>>> parameter properly. As for the order, since the list is already in a
> >>>> specific order, which the user themselves controls, it doesn't seem
> >>>> strictly necessary to offer an option to sort the data during serialization.
> >>>>
> >>>> Thanks,
> >>>> -John
> >>>>
> >>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
> >>>>> one about introducing UUID Serde, and I guess was too hasty while editing
> >>>>> the copy to notice the mistake. Just edited the ticket. Sorry for any
> >>>>> inconvenience .
> >>>>>
> >>>>> As per comparator, I agree. Let’s make user be responsible for
> >>>>> implementing comparable interface. I was just thinking to make the serde a
> >>>>> little more flexible (i.e. let user decide in which order records is going
> >>>>> to be inserted into a change log topic).
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>> Best,
> >>>>> Daniyar Yeralin
> >>>>>
> >>>>>
> >>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
> >>>>>>
> >>>>>> Hi Daniyar,
> >>>>>>
> >>>>>> Thanks for the proposal!
> >>>>>>
> >>>>>> If I understand the point about the comparator, is it just to capture the
> >>>>>> generic type parameter? If so, then anything that implements a known
> >>>>>> interface would work just as well, right? I've been considering adding
> >>>>>> something like the Jackson TypeReference (or similar classes in many
> >>>>> other
> >>>>>> projects). Would this be a good time to do it?
> >>>>>>
> >>>>>> Note that it's not necessary to actually require that the captured type
> >>>>> is
> >>>>>> Comparable (as this proposal currently does), it's just a way to make
> >>>>> sure
> >>>>>> there is some method that makes use of the generic type parameter, to
> >>>>> force
> >>>>>> the compiler to capture the type.
> >>>>>>
> >>>>>> Just to make sure I understand the motivation... You expressed a desire
> >>>>> to
> >>>>>> be able to serialize UUIDs, which I didn't follow, since there is a
> >>>>>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
> >>>>> and
> >>>>>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
> >>>>>> UUIDs?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> -John
> >>>>>>
> >>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
> >>>>>>> created under https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <
> >>>>>>> https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
> >>>>>>>
> >>>>>>> There are two topics I would like to discuss:
> >>>>>>> 1. Since type for List serve needs to be declared before hand, I could
> >>>>> not
> >>>>>>> create a static method for List Serde under
> >>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>>>>>> P.S. Static method corresponding to ListSerde under
> >>>>>>> org.apache.kafka.common.serialization.Serdes (something like static
> >>>>> public
> >>>>>>> Serde<List<T>> List() {...}
> >>>>> inorg.apache.kafka.common.serialization.Serdes)
> >>>>>>> class cannot be added because type needs to be defined beforehand.
> >>>>> That's
> >>>>>>> why one needs to create List Serde in the following fashion:
> >>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>>>>> Comparator.comparing(String::length));
> >>>>>>> (can possibly be simplified by declaring import static
> >>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>>>>
> >>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <https://github.com/miguno>> is questioning
> >>>>>>> whether I need to pass a comparator to ListDeserializer. This certainly
> >>>>> is
> >>>>>>> not required. Feel free to add your input:
> >>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067 <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
> >>>>>>>
> >>>>>>> Thank you!
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Daniyar Yeralin
> >>>>>>>
> >>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <jira@apache.org <ma...@apache.org>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Daniyar Yeralin created KAFKA-8326:
> >>>>>>>> --------------------------------------
> >>>>>>>>
> >>>>>>>>          Summary: Add List<T> Serde
> >>>>>>>>              Key: KAFKA-8326
> >>>>>>>>              URL: https://issues.apache.org/jira/browse/KAFKA-8326 <https://issues.apache.org/jira/browse/KAFKA-8326>
> >>>>>>>>          Project: Kafka
> >>>>>>>>       Issue Type: Improvement
> >>>>>>>>       Components: clients, streams
> >>>>>>>>         Reporter: Daniyar Yeralin
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I propose adding serializers and deserializers for the java.util.List
> >>>>>>> class.
> >>>>>>>>
> >>>>>>>> I have many use cases where I want to set the key of a Kafka message to
> >>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
> >>>>> and
> >>>>>>> use their associated Serdes, but it would be more convenient to
> >>>>> serialize
> >>>>>>> and deserialize UUIDs directly.
> >>>>>>>>
> >>>>>>>> I believe there are many use cases where one would want to have a List
> >>>>>>> serde. Ex. [
> >>>>>>>
> >>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows <https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows>
> >>>>> ],
> >>>>>>> [
> >>>>>>>
> >>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api <https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api>
> >>>>>>> ]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> KIP Link: [
> >>>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> >>>>>>> ]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> This message was sent by Atlassian JIRA
> >>>>>>>> (v7.6.3#76005)
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>
> >
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hey,

That worked! I certainly lack Java generics knowledge. Thanks for the snippet. I’ll update KIP again.

Best,
Daniyar Yeralin

> On May 8, 2019, at 1:39 PM, Chris Egerton <ch...@confluent.io> wrote:
> 
> Hi Daniyar,
> 
> I think you may want to tweak your syntax a little:
> 
> public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
>   return new ListSerde<T>(innerSerde);
> }
> 
> Does that work?
> 
> Cheers,
> 
> Chris
> 
> On Wed, May 8, 2019 at 10:29 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
> Hi John,
> 
> I updated JIRA and KIP.
> 
> I didn’t know about the process, and created PR before I knew about KIPs :) 
> 
> As per static declaration, I don’t think Java allows that:
> 
> 
> Best,
> Daniyar Yeralin
> 
>> On May 7, 2019, at 2:22 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>> 
>> Thanks for that update. Do you mind making changes primarily on the
>> KIP document ? (https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>)
>> 
>> This is the design document that we have to agree on and vote for, the
>> PR comes later. It can be nice to have an implementation to look at,
>> but the KIP is the main artifact for this discussion.
>> 
>> With this in mind, it will help get more reviewers to look at it if
>> you can tidy up the KIP document so that it stands on its own. People
>> shouldn't have to look at any other document to understand the
>> motivation of the proposal, and they shouldn't have to look at a PR to
>> see what the public API will look like. If it helps, you can take a
>> look at some other recent KIPs.
>> 
>> Given that the list serde needs an inner serde, I agree you can't have
>> a zero-argument static factory method for it, but it seems you could
>> still have a static method:
>> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>> 
>> Thoughts?
>> 
>> On Tue, May 7, 2019 at 12:18 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>> 
>>> Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>> 
>>> Thank you for your input John! I really appreciate it.
>>> 
>>> What about this point I made:
>>> 
>>> 1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>> P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
>>> new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
>>> (can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)
>>> 
>>>> On May 7, 2019, at 11:50 AM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>> 
>>>> Thanks for the reply Daniyar,
>>>> 
>>>> That makes much more sense! I thought I must be missing something, but I
>>>> couldn't for the life of me figure it out.
>>>> 
>>>> What do you think about just taking an argument, instead of for a
>>>> Comparator, for the Serde of the inner type? That way, the user can control
>>>> how exactly the inner data gets serialized, while also bounding the generic
>>>> parameter properly. As for the order, since the list is already in a
>>>> specific order, which the user themselves controls, it doesn't seem
>>>> strictly necessary to offer an option to sort the data during serialization.
>>>> 
>>>> Thanks,
>>>> -John
>>>> 
>>>> On Mon, May 6, 2019 at 8:47 PM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
>>>>> one about introducing UUID Serde, and I guess was too hasty while editing
>>>>> the copy to notice the mistake. Just edited the ticket. Sorry for any
>>>>> inconvenience .
>>>>> 
>>>>> As per comparator, I agree. Let’s make user be responsible for
>>>>> implementing comparable interface. I was just thinking to make the serde a
>>>>> little more flexible (i.e. let user decide in which order records is going
>>>>> to be inserted into a change log topic).
>>>>> 
>>>>> Thank you!
>>>>> 
>>>>> Best,
>>>>> Daniyar Yeralin
>>>>> 
>>>>> 
>>>>>> On May 6, 2019, at 5:37 PM, John Roesler <john@confluent.io <ma...@confluent.io>> wrote:
>>>>>> 
>>>>>> Hi Daniyar,
>>>>>> 
>>>>>> Thanks for the proposal!
>>>>>> 
>>>>>> If I understand the point about the comparator, is it just to capture the
>>>>>> generic type parameter? If so, then anything that implements a known
>>>>>> interface would work just as well, right? I've been considering adding
>>>>>> something like the Jackson TypeReference (or similar classes in many
>>>>> other
>>>>>> projects). Would this be a good time to do it?
>>>>>> 
>>>>>> Note that it's not necessary to actually require that the captured type
>>>>> is
>>>>>> Comparable (as this proposal currently does), it's just a way to make
>>>>> sure
>>>>>> there is some method that makes use of the generic type parameter, to
>>>>> force
>>>>>> the compiler to capture the type.
>>>>>> 
>>>>>> Just to make sure I understand the motivation... You expressed a desire
>>>>> to
>>>>>> be able to serialize UUIDs, which I didn't follow, since there is a
>>>>>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>>>>> and
>>>>>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
>>>>>> UUIDs?
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On Mon, May 6, 2019 at 11:49 AM Development <dev@yeralin.net <ma...@yeralin.net>> wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
>>>>>>> created under https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592> <
>>>>>>> https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>>
>>>>>>> 
>>>>>>> There are two topics I would like to discuss:
>>>>>>> 1. Since type for List serve needs to be declared before hand, I could
>>>>> not
>>>>>>> create a static method for List Serde under
>>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>>> org.apache.kafka.common.serialization.Serdes (something like static
>>>>> public
>>>>>>> Serde<List<T>> List() {...}
>>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>>> class cannot be added because type needs to be defined beforehand.
>>>>> That's
>>>>>>> why one needs to create List Serde in the following fashion:
>>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>>> Comparator.comparing(String::length));
>>>>>>> (can possibly be simplified by declaring import static
>>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>>> 
>>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno <https://github.com/miguno>> is questioning
>>>>>>> whether I need to pass a comparator to ListDeserializer. This certainly
>>>>> is
>>>>>>> not required. Feel free to add your input:
>>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067 <https://github.com/apache/kafka/pull/6592#discussion_r281152067>
>>>>>>> 
>>>>>>> Thank you!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Daniyar Yeralin
>>>>>>> 
>>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <jira@apache.org <ma...@apache.org>>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>>> --------------------------------------
>>>>>>>> 
>>>>>>>>          Summary: Add List<T> Serde
>>>>>>>>              Key: KAFKA-8326
>>>>>>>>              URL: https://issues.apache.org/jira/browse/KAFKA-8326 <https://issues.apache.org/jira/browse/KAFKA-8326>
>>>>>>>>          Project: Kafka
>>>>>>>>       Issue Type: Improvement
>>>>>>>>       Components: clients, streams
>>>>>>>>         Reporter: Daniyar Yeralin
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I propose adding serializers and deserializers for the java.util.List
>>>>>>> class.
>>>>>>>> 
>>>>>>>> I have many use cases where I want to set the key of a Kafka message to
>>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>>>>> and
>>>>>>> use their associated Serdes, but it would be more convenient to
>>>>> serialize
>>>>>>> and deserialize UUIDs directly.
>>>>>>>> 
>>>>>>>> I believe there are many use cases where one would want to have a List
>>>>>>> serde. Ex. [
>>>>>>> 
>>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows <https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows>
>>>>> ],
>>>>>>> [
>>>>>>> 
>>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api <https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api>
>>>>>>> ]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> KIP Link: [
>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
>>>>>>> ]
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> This message was sent by Atlassian JIRA
>>>>>>>> (v7.6.3#76005)
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Chris Egerton <ch...@confluent.io>.
Hi Daniyar,

I think you may want to tweak your syntax a little:

public static <T> Serde<List<T>> List(Serde<T> innerSerde) {
  return new ListSerde<T>(innerSerde);
}

Does that work?

Cheers,

Chris

On Wed, May 8, 2019 at 10:29 AM Development <de...@yeralin.net> wrote:

> Hi John,
>
> I updated JIRA and KIP.
>
> I didn’t know about the process, and created PR before I knew about KIPs
> :)
>
> As per static declaration, I don’t think Java allows that:
>
> Best,
> Daniyar Yeralin
>
> On May 7, 2019, at 2:22 PM, John Roesler <jo...@confluent.io> wrote:
>
> Thanks for that update. Do you mind making changes primarily on the
> KIP document ? (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> )
>
> This is the design document that we have to agree on and vote for, the
> PR comes later. It can be nice to have an implementation to look at,
> but the KIP is the main artifact for this discussion.
>
> With this in mind, it will help get more reviewers to look at it if
> you can tidy up the KIP document so that it stands on its own. People
> shouldn't have to look at any other document to understand the
> motivation of the proposal, and they shouldn't have to look at a PR to
> see what the public API will look like. If it helps, you can take a
> look at some other recent KIPs.
>
> Given that the list serde needs an inner serde, I agree you can't have
> a zero-argument static factory method for it, but it seems you could
> still have a static method:
> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>
> Thoughts?
>
> On Tue, May 7, 2019 at 12:18 PM Development <de...@yeralin.net> wrote:
>
>
> Absolutely agree. Already pushed another commit to remove comparator
> argument: https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> Thank you for your input John! I really appreciate it.
>
> What about this point I made:
>
> 1. Since type for List serde needs to be declared before hand, I could not
> create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static public
> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> class cannot be added because type needs to be defined beforehand. That's
> why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> On May 7, 2019, at 11:50 AM, John Roesler <jo...@confluent.io> wrote:
>
> Thanks for the reply Daniyar,
>
> That makes much more sense! I thought I must be missing something, but I
> couldn't for the life of me figure it out.
>
> What do you think about just taking an argument, instead of for a
> Comparator, for the Serde of the inner type? That way, the user can control
> how exactly the inner data gets serialized, while also bounding the generic
> parameter properly. As for the order, since the list is already in a
> specific order, which the user themselves controls, it doesn't seem
> strictly necessary to offer an option to sort the data during
> serialization.
>
> Thanks,
> -John
>
> On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:
>
> Hi John,
>
> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
> one about introducing UUID Serde, and I guess was too hasty while editing
> the copy to notice the mistake. Just edited the ticket. Sorry for any
> inconvenience .
>
> As per comparator, I agree. Let’s make user be responsible for
> implementing comparable interface. I was just thinking to make the serde a
> little more flexible (i.e. let user decide in which order records is going
> to be inserted into a change log topic).
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
>
> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
>
> Hi Daniyar,
>
> Thanks for the proposal!
>
> If I understand the point about the comparator, is it just to capture the
> generic type parameter? If so, then anything that implements a known
> interface would work just as well, right? I've been considering adding
> something like the Jackson TypeReference (or similar classes in many
>
> other
>
> projects). Would this be a good time to do it?
>
> Note that it's not necessary to actually require that the captured type
>
> is
>
> Comparable (as this proposal currently does), it's just a way to make
>
> sure
>
> there is some method that makes use of the generic type parameter, to
>
> force
>
> the compiler to capture the type.
>
> Just to make sure I understand the motivation... You expressed a desire
>
> to
>
> be able to serialize UUIDs, which I didn't follow, since there is a
> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>
> and
>
> also, a UUID isn't a List. Did you mean that you need to use *lists of*
> UUIDs?
>
> Thanks,
> -John
>
> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
>
> Hello,
>
> Starting a discussion for KIP-466 adding support for List Serde. PR is
> created under https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> There are two topics I would like to discuss:
> 1. Since type for List serve needs to be declared before hand, I could
>
> not
>
> create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static
>
> public
>
> Serde<List<T>> List() {...}
>
> inorg.apache.kafka.common.serialization.Serdes)
>
> class cannot be added because type needs to be defined beforehand.
>
> That's
>
> why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
> whether I need to pass a comparator to ListDeserializer. This certainly
>
> is
>
> not required. Feel free to add your input:
> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
>
> wrote:
>
>
> Daniyar Yeralin created KAFKA-8326:
> --------------------------------------
>
>          Summary: Add List<T> Serde
>              Key: KAFKA-8326
>              URL: https://issues.apache.org/jira/browse/KAFKA-8326
>          Project: Kafka
>       Issue Type: Improvement
>       Components: clients, streams
>         Reporter: Daniyar Yeralin
>
>
> I propose adding serializers and deserializers for the java.util.List
>
> class.
>
>
> I have many use cases where I want to set the key of a Kafka message to
>
> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>
> and
>
> use their associated Serdes, but it would be more convenient to
>
> serialize
>
> and deserialize UUIDs directly.
>
>
> I believe there are many use cases where one would want to have a List
>
> serde. Ex. [
>
>
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> ],
>
> [
>
>
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>
> ]
>
>
>
>
> KIP Link: [
>
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>
> ]
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
>
>
>
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Sophie Blee-Goldman <so...@confluent.io>.
Hi Daniyar,

Thanks for the KIP! I had to write my own List serde for testing a while
back and this definitely would have saved me some time. Regarding the
static declaration, I believe you're missing a <T> between "public static"
and the return type "Serde<List<T>>"  -- Java should allow this

Cheers,
Sophie

On Wed, May 8, 2019 at 10:29 AM Development <de...@yeralin.net> wrote:

> Hi John,
>
> I updated JIRA and KIP.
>
> I didn’t know about the process, and created PR before I knew about KIPs
> :)
>
> As per static declaration, I don’t think Java allows that:
>
> Best,
> Daniyar Yeralin
>
> On May 7, 2019, at 2:22 PM, John Roesler <jo...@confluent.io> wrote:
>
> Thanks for that update. Do you mind making changes primarily on the
> KIP document ? (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> )
>
> This is the design document that we have to agree on and vote for, the
> PR comes later. It can be nice to have an implementation to look at,
> but the KIP is the main artifact for this discussion.
>
> With this in mind, it will help get more reviewers to look at it if
> you can tidy up the KIP document so that it stands on its own. People
> shouldn't have to look at any other document to understand the
> motivation of the proposal, and they shouldn't have to look at a PR to
> see what the public API will look like. If it helps, you can take a
> look at some other recent KIPs.
>
> Given that the list serde needs an inner serde, I agree you can't have
> a zero-argument static factory method for it, but it seems you could
> still have a static method:
> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
>
> Thoughts?
>
> On Tue, May 7, 2019 at 12:18 PM Development <de...@yeralin.net> wrote:
>
>
> Absolutely agree. Already pushed another commit to remove comparator
> argument: https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> Thank you for your input John! I really appreciate it.
>
> What about this point I made:
>
> 1. Since type for List serde needs to be declared before hand, I could not
> create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static public
> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> class cannot be added because type needs to be defined beforehand. That's
> why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> On May 7, 2019, at 11:50 AM, John Roesler <jo...@confluent.io> wrote:
>
> Thanks for the reply Daniyar,
>
> That makes much more sense! I thought I must be missing something, but I
> couldn't for the life of me figure it out.
>
> What do you think about just taking an argument, instead of for a
> Comparator, for the Serde of the inner type? That way, the user can control
> how exactly the inner data gets serialized, while also bounding the generic
> parameter properly. As for the order, since the list is already in a
> specific order, which the user themselves controls, it doesn't seem
> strictly necessary to offer an option to sort the data during
> serialization.
>
> Thanks,
> -John
>
> On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:
>
> Hi John,
>
> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
> one about introducing UUID Serde, and I guess was too hasty while editing
> the copy to notice the mistake. Just edited the ticket. Sorry for any
> inconvenience .
>
> As per comparator, I agree. Let’s make user be responsible for
> implementing comparable interface. I was just thinking to make the serde a
> little more flexible (i.e. let user decide in which order records is going
> to be inserted into a change log topic).
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
>
> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
>
> Hi Daniyar,
>
> Thanks for the proposal!
>
> If I understand the point about the comparator, is it just to capture the
> generic type parameter? If so, then anything that implements a known
> interface would work just as well, right? I've been considering adding
> something like the Jackson TypeReference (or similar classes in many
>
> other
>
> projects). Would this be a good time to do it?
>
> Note that it's not necessary to actually require that the captured type
>
> is
>
> Comparable (as this proposal currently does), it's just a way to make
>
> sure
>
> there is some method that makes use of the generic type parameter, to
>
> force
>
> the compiler to capture the type.
>
> Just to make sure I understand the motivation... You expressed a desire
>
> to
>
> be able to serialize UUIDs, which I didn't follow, since there is a
> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>
> and
>
> also, a UUID isn't a List. Did you mean that you need to use *lists of*
> UUIDs?
>
> Thanks,
> -John
>
> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
>
> Hello,
>
> Starting a discussion for KIP-466 adding support for List Serde. PR is
> created under https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> There are two topics I would like to discuss:
> 1. Since type for List serve needs to be declared before hand, I could
>
> not
>
> create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static
>
> public
>
> Serde<List<T>> List() {...}
>
> inorg.apache.kafka.common.serialization.Serdes)
>
> class cannot be added because type needs to be defined beforehand.
>
> That's
>
> why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
> whether I need to pass a comparator to ListDeserializer. This certainly
>
> is
>
> not required. Feel free to add your input:
> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
>
> wrote:
>
>
> Daniyar Yeralin created KAFKA-8326:
> --------------------------------------
>
>          Summary: Add List<T> Serde
>              Key: KAFKA-8326
>              URL: https://issues.apache.org/jira/browse/KAFKA-8326
>          Project: Kafka
>       Issue Type: Improvement
>       Components: clients, streams
>         Reporter: Daniyar Yeralin
>
>
> I propose adding serializers and deserializers for the java.util.List
>
> class.
>
>
> I have many use cases where I want to set the key of a Kafka message to
>
> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>
> and
>
> use their associated Serdes, but it would be more convenient to
>
> serialize
>
> and deserialize UUIDs directly.
>
>
> I believe there are many use cases where one would want to have a List
>
> serde. Ex. [
>
>
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> ],
>
> [
>
>
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>
> ]
>
>
>
>
> KIP Link: [
>
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>
> ]
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
>
>
>
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

I updated JIRA and KIP.

I didn’t know about the process, and created PR before I knew about KIPs :) 

As per static declaration, I don’t think Java allows that:


Best,
Daniyar Yeralin

> On May 7, 2019, at 2:22 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Thanks for that update. Do you mind making changes primarily on the
> KIP document ? (https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization)
> 
> This is the design document that we have to agree on and vote for, the
> PR comes later. It can be nice to have an implementation to look at,
> but the KIP is the main artifact for this discussion.
> 
> With this in mind, it will help get more reviewers to look at it if
> you can tidy up the KIP document so that it stands on its own. People
> shouldn't have to look at any other document to understand the
> motivation of the proposal, and they shouldn't have to look at a PR to
> see what the public API will look like. If it helps, you can take a
> look at some other recent KIPs.
> 
> Given that the list serde needs an inner serde, I agree you can't have
> a zero-argument static factory method for it, but it seems you could
> still have a static method:
> `public static Serde<List<T>> List(Serde<T> innerSerde)`.
> 
> Thoughts?
> 
> On Tue, May 7, 2019 at 12:18 PM Development <de...@yeralin.net> wrote:
>> 
>> Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>> 
>> Thank you for your input John! I really appreciate it.
>> 
>> What about this point I made:
>> 
>> 1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>> P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
>> new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
>> (can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)
>> 
>>> On May 7, 2019, at 11:50 AM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Thanks for the reply Daniyar,
>>> 
>>> That makes much more sense! I thought I must be missing something, but I
>>> couldn't for the life of me figure it out.
>>> 
>>> What do you think about just taking an argument, instead of for a
>>> Comparator, for the Serde of the inner type? That way, the user can control
>>> how exactly the inner data gets serialized, while also bounding the generic
>>> parameter properly. As for the order, since the list is already in a
>>> specific order, which the user themselves controls, it doesn't seem
>>> strictly necessary to offer an option to sort the data during serialization.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:
>>> 
>>>> Hi John,
>>>> 
>>>> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
>>>> one about introducing UUID Serde, and I guess was too hasty while editing
>>>> the copy to notice the mistake. Just edited the ticket. Sorry for any
>>>> inconvenience .
>>>> 
>>>> As per comparator, I agree. Let’s make user be responsible for
>>>> implementing comparable interface. I was just thinking to make the serde a
>>>> little more flexible (i.e. let user decide in which order records is going
>>>> to be inserted into a change log topic).
>>>> 
>>>> Thank you!
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>> 
>>>>> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
>>>>> 
>>>>> Hi Daniyar,
>>>>> 
>>>>> Thanks for the proposal!
>>>>> 
>>>>> If I understand the point about the comparator, is it just to capture the
>>>>> generic type parameter? If so, then anything that implements a known
>>>>> interface would work just as well, right? I've been considering adding
>>>>> something like the Jackson TypeReference (or similar classes in many
>>>> other
>>>>> projects). Would this be a good time to do it?
>>>>> 
>>>>> Note that it's not necessary to actually require that the captured type
>>>> is
>>>>> Comparable (as this proposal currently does), it's just a way to make
>>>> sure
>>>>> there is some method that makes use of the generic type parameter, to
>>>> force
>>>>> the compiler to capture the type.
>>>>> 
>>>>> Just to make sure I understand the motivation... You expressed a desire
>>>> to
>>>>> be able to serialize UUIDs, which I didn't follow, since there is a
>>>>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>>>> and
>>>>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
>>>>> UUIDs?
>>>>> 
>>>>> Thanks,
>>>>> -John
>>>>> 
>>>>> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
>>>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>>>> https://github.com/apache/kafka/pull/6592>
>>>>>> 
>>>>>> There are two topics I would like to discuss:
>>>>>> 1. Since type for List serve needs to be declared before hand, I could
>>>> not
>>>>>> create a static method for List Serde under
>>>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>>>> P.S. Static method corresponding to ListSerde under
>>>>>> org.apache.kafka.common.serialization.Serdes (something like static
>>>> public
>>>>>> Serde<List<T>> List() {...}
>>>> inorg.apache.kafka.common.serialization.Serdes)
>>>>>> class cannot be added because type needs to be defined beforehand.
>>>> That's
>>>>>> why one needs to create List Serde in the following fashion:
>>>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>>>> Comparator.comparing(String::length));
>>>>>> (can possibly be simplified by declaring import static
>>>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>>>> 
>>>>>> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
>>>>>> whether I need to pass a comparator to ListDeserializer. This certainly
>>>> is
>>>>>> not required. Feel free to add your input:
>>>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>>>> 
>>>>>> Thank you!
>>>>>> 
>>>>>> Best,
>>>>>> Daniyar Yeralin
>>>>>> 
>>>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>>>> --------------------------------------
>>>>>>> 
>>>>>>>          Summary: Add List<T> Serde
>>>>>>>              Key: KAFKA-8326
>>>>>>>              URL: https://issues.apache.org/jira/browse/KAFKA-8326
>>>>>>>          Project: Kafka
>>>>>>>       Issue Type: Improvement
>>>>>>>       Components: clients, streams
>>>>>>>         Reporter: Daniyar Yeralin
>>>>>>> 
>>>>>>> 
>>>>>>> I propose adding serializers and deserializers for the java.util.List
>>>>>> class.
>>>>>>> 
>>>>>>> I have many use cases where I want to set the key of a Kafka message to
>>>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>>>> and
>>>>>> use their associated Serdes, but it would be more convenient to
>>>> serialize
>>>>>> and deserialize UUIDs directly.
>>>>>>> 
>>>>>>> I believe there are many use cases where one would want to have a List
>>>>>> serde. Ex. [
>>>>>> 
>>>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>>>> ],
>>>>>> [
>>>>>> 
>>>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>>>> ]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> KIP Link: [
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>>>> ]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v7.6.3#76005)
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Thanks for that update. Do you mind making changes primarily on the
KIP document ? (https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization)

This is the design document that we have to agree on and vote for, the
PR comes later. It can be nice to have an implementation to look at,
but the KIP is the main artifact for this discussion.

With this in mind, it will help get more reviewers to look at it if
you can tidy up the KIP document so that it stands on its own. People
shouldn't have to look at any other document to understand the
motivation of the proposal, and they shouldn't have to look at a PR to
see what the public API will look like. If it helps, you can take a
look at some other recent KIPs.

Given that the list serde needs an inner serde, I agree you can't have
a zero-argument static factory method for it, but it seems you could
still have a static method:
`public static Serde<List<T>> List(Serde<T> innerSerde)`.

Thoughts?

On Tue, May 7, 2019 at 12:18 PM Development <de...@yeralin.net> wrote:
>
> Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>
>
> Thank you for your input John! I really appreciate it.
>
> What about this point I made:
>
> 1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> > On May 7, 2019, at 11:50 AM, John Roesler <jo...@confluent.io> wrote:
> >
> > Thanks for the reply Daniyar,
> >
> > That makes much more sense! I thought I must be missing something, but I
> > couldn't for the life of me figure it out.
> >
> > What do you think about just taking an argument, instead of for a
> > Comparator, for the Serde of the inner type? That way, the user can control
> > how exactly the inner data gets serialized, while also bounding the generic
> > parameter properly. As for the order, since the list is already in a
> > specific order, which the user themselves controls, it doesn't seem
> > strictly necessary to offer an option to sort the data during serialization.
> >
> > Thanks,
> > -John
> >
> > On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:
> >
> >> Hi John,
> >>
> >> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
> >> one about introducing UUID Serde, and I guess was too hasty while editing
> >> the copy to notice the mistake. Just edited the ticket. Sorry for any
> >> inconvenience .
> >>
> >> As per comparator, I agree. Let’s make user be responsible for
> >> implementing comparable interface. I was just thinking to make the serde a
> >> little more flexible (i.e. let user decide in which order records is going
> >> to be inserted into a change log topic).
> >>
> >> Thank you!
> >>
> >> Best,
> >> Daniyar Yeralin
> >>
> >>
> >>> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
> >>>
> >>> Hi Daniyar,
> >>>
> >>> Thanks for the proposal!
> >>>
> >>> If I understand the point about the comparator, is it just to capture the
> >>> generic type parameter? If so, then anything that implements a known
> >>> interface would work just as well, right? I've been considering adding
> >>> something like the Jackson TypeReference (or similar classes in many
> >> other
> >>> projects). Would this be a good time to do it?
> >>>
> >>> Note that it's not necessary to actually require that the captured type
> >> is
> >>> Comparable (as this proposal currently does), it's just a way to make
> >> sure
> >>> there is some method that makes use of the generic type parameter, to
> >> force
> >>> the compiler to capture the type.
> >>>
> >>> Just to make sure I understand the motivation... You expressed a desire
> >> to
> >>> be able to serialize UUIDs, which I didn't follow, since there is a
> >>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
> >> and
> >>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
> >>> UUIDs?
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
> >>>> created under https://github.com/apache/kafka/pull/6592 <
> >>>> https://github.com/apache/kafka/pull/6592>
> >>>>
> >>>> There are two topics I would like to discuss:
> >>>> 1. Since type for List serve needs to be declared before hand, I could
> >> not
> >>>> create a static method for List Serde under
> >>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >>>> P.S. Static method corresponding to ListSerde under
> >>>> org.apache.kafka.common.serialization.Serdes (something like static
> >> public
> >>>> Serde<List<T>> List() {...}
> >> inorg.apache.kafka.common.serialization.Serdes)
> >>>> class cannot be added because type needs to be defined beforehand.
> >> That's
> >>>> why one needs to create List Serde in the following fashion:
> >>>> new Serdes.ListSerde<String>(Serdes.String(),
> >>>> Comparator.comparing(String::length));
> >>>> (can possibly be simplified by declaring import static
> >>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>>>
> >>>> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
> >>>> whether I need to pass a comparator to ListDeserializer. This certainly
> >> is
> >>>> not required. Feel free to add your input:
> >>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
> >>>>
> >>>> Thank you!
> >>>>
> >>>> Best,
> >>>> Daniyar Yeralin
> >>>>
> >>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
> >>>> wrote:
> >>>>>
> >>>>> Daniyar Yeralin created KAFKA-8326:
> >>>>> --------------------------------------
> >>>>>
> >>>>>           Summary: Add List<T> Serde
> >>>>>               Key: KAFKA-8326
> >>>>>               URL: https://issues.apache.org/jira/browse/KAFKA-8326
> >>>>>           Project: Kafka
> >>>>>        Issue Type: Improvement
> >>>>>        Components: clients, streams
> >>>>>          Reporter: Daniyar Yeralin
> >>>>>
> >>>>>
> >>>>> I propose adding serializers and deserializers for the java.util.List
> >>>> class.
> >>>>>
> >>>>> I have many use cases where I want to set the key of a Kafka message to
> >>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
> >> and
> >>>> use their associated Serdes, but it would be more convenient to
> >> serialize
> >>>> and deserialize UUIDs directly.
> >>>>>
> >>>>> I believe there are many use cases where one would want to have a List
> >>>> serde. Ex. [
> >>>>
> >> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> >> ],
> >>>> [
> >>>>
> >> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >>>> ]
> >>>>>
> >>>>>
> >>>>>
> >>>>> KIP Link: [
> >>>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>>> ]
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> This message was sent by Atlassian JIRA
> >>>>> (v7.6.3#76005)
> >>>>
> >>>>
> >>
> >>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Absolutely agree. Already pushed another commit to remove comparator argument: https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>

Thank you for your input John! I really appreciate it.

What about this point I made:

1. Since type for List serde needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP: 
P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
(can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)

> On May 7, 2019, at 11:50 AM, John Roesler <jo...@confluent.io> wrote:
> 
> Thanks for the reply Daniyar,
> 
> That makes much more sense! I thought I must be missing something, but I
> couldn't for the life of me figure it out.
> 
> What do you think about just taking an argument, instead of for a
> Comparator, for the Serde of the inner type? That way, the user can control
> how exactly the inner data gets serialized, while also bounding the generic
> parameter properly. As for the order, since the list is already in a
> specific order, which the user themselves controls, it doesn't seem
> strictly necessary to offer an option to sort the data during serialization.
> 
> Thanks,
> -John
> 
> On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:
> 
>> Hi John,
>> 
>> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
>> one about introducing UUID Serde, and I guess was too hasty while editing
>> the copy to notice the mistake. Just edited the ticket. Sorry for any
>> inconvenience .
>> 
>> As per comparator, I agree. Let’s make user be responsible for
>> implementing comparable interface. I was just thinking to make the serde a
>> little more flexible (i.e. let user decide in which order records is going
>> to be inserted into a change log topic).
>> 
>> Thank you!
>> 
>> Best,
>> Daniyar Yeralin
>> 
>> 
>>> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
>>> 
>>> Hi Daniyar,
>>> 
>>> Thanks for the proposal!
>>> 
>>> If I understand the point about the comparator, is it just to capture the
>>> generic type parameter? If so, then anything that implements a known
>>> interface would work just as well, right? I've been considering adding
>>> something like the Jackson TypeReference (or similar classes in many
>> other
>>> projects). Would this be a good time to do it?
>>> 
>>> Note that it's not necessary to actually require that the captured type
>> is
>>> Comparable (as this proposal currently does), it's just a way to make
>> sure
>>> there is some method that makes use of the generic type parameter, to
>> force
>>> the compiler to capture the type.
>>> 
>>> Just to make sure I understand the motivation... You expressed a desire
>> to
>>> be able to serialize UUIDs, which I didn't follow, since there is a
>>> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
>> and
>>> also, a UUID isn't a List. Did you mean that you need to use *lists of*
>>> UUIDs?
>>> 
>>> Thanks,
>>> -John
>>> 
>>> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Starting a discussion for KIP-466 adding support for List Serde. PR is
>>>> created under https://github.com/apache/kafka/pull/6592 <
>>>> https://github.com/apache/kafka/pull/6592>
>>>> 
>>>> There are two topics I would like to discuss:
>>>> 1. Since type for List serve needs to be declared before hand, I could
>> not
>>>> create a static method for List Serde under
>>>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>>>> P.S. Static method corresponding to ListSerde under
>>>> org.apache.kafka.common.serialization.Serdes (something like static
>> public
>>>> Serde<List<T>> List() {...}
>> inorg.apache.kafka.common.serialization.Serdes)
>>>> class cannot be added because type needs to be defined beforehand.
>> That's
>>>> why one needs to create List Serde in the following fashion:
>>>> new Serdes.ListSerde<String>(Serdes.String(),
>>>> Comparator.comparing(String::length));
>>>> (can possibly be simplified by declaring import static
>>>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>>>> 
>>>> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
>>>> whether I need to pass a comparator to ListDeserializer. This certainly
>> is
>>>> not required. Feel free to add your input:
>>>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>>>> 
>>>> Thank you!
>>>> 
>>>> Best,
>>>> Daniyar Yeralin
>>>> 
>>>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
>>>> wrote:
>>>>> 
>>>>> Daniyar Yeralin created KAFKA-8326:
>>>>> --------------------------------------
>>>>> 
>>>>>           Summary: Add List<T> Serde
>>>>>               Key: KAFKA-8326
>>>>>               URL: https://issues.apache.org/jira/browse/KAFKA-8326
>>>>>           Project: Kafka
>>>>>        Issue Type: Improvement
>>>>>        Components: clients, streams
>>>>>          Reporter: Daniyar Yeralin
>>>>> 
>>>>> 
>>>>> I propose adding serializers and deserializers for the java.util.List
>>>> class.
>>>>> 
>>>>> I have many use cases where I want to set the key of a Kafka message to
>>>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
>> and
>>>> use their associated Serdes, but it would be more convenient to
>> serialize
>>>> and deserialize UUIDs directly.
>>>>> 
>>>>> I believe there are many use cases where one would want to have a List
>>>> serde. Ex. [
>>>> 
>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
>> ],
>>>> [
>>>> 
>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>>>> ]
>>>>> 
>>>>> 
>>>>> 
>>>>> KIP Link: [
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>>>> ]
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> This message was sent by Atlassian JIRA
>>>>> (v7.6.3#76005)
>>>> 
>>>> 
>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Thanks for the reply Daniyar,

That makes much more sense! I thought I must be missing something, but I
couldn't for the life of me figure it out.

What do you think about just taking an argument, instead of for a
Comparator, for the Serde of the inner type? That way, the user can control
how exactly the inner data gets serialized, while also bounding the generic
parameter properly. As for the order, since the list is already in a
specific order, which the user themselves controls, it doesn't seem
strictly necessary to offer an option to sort the data during serialization.

Thanks,
-John

On Mon, May 6, 2019 at 8:47 PM Development <de...@yeralin.net> wrote:

> Hi John,
>
> I’m really sorry for the confusion. I cloned that JIRA ticket from an old
> one about introducing UUID Serde, and I guess was too hasty while editing
> the copy to notice the mistake. Just edited the ticket. Sorry for any
> inconvenience .
>
> As per comparator, I agree. Let’s make user be responsible for
> implementing comparable interface. I was just thinking to make the serde a
> little more flexible (i.e. let user decide in which order records is going
> to be inserted into a change log topic).
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
>
> > On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
> >
> > Hi Daniyar,
> >
> > Thanks for the proposal!
> >
> > If I understand the point about the comparator, is it just to capture the
> > generic type parameter? If so, then anything that implements a known
> > interface would work just as well, right? I've been considering adding
> > something like the Jackson TypeReference (or similar classes in many
> other
> > projects). Would this be a good time to do it?
> >
> > Note that it's not necessary to actually require that the captured type
> is
> > Comparable (as this proposal currently does), it's just a way to make
> sure
> > there is some method that makes use of the generic type parameter, to
> force
> > the compiler to capture the type.
> >
> > Just to make sure I understand the motivation... You expressed a desire
> to
> > be able to serialize UUIDs, which I didn't follow, since there is a
> > built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID,
> and
> > also, a UUID isn't a List. Did you mean that you need to use *lists of*
> > UUIDs?
> >
> > Thanks,
> > -John
> >
> > On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
> >
> >> Hello,
> >>
> >> Starting a discussion for KIP-466 adding support for List Serde. PR is
> >> created under https://github.com/apache/kafka/pull/6592 <
> >> https://github.com/apache/kafka/pull/6592>
> >>
> >> There are two topics I would like to discuss:
> >> 1. Since type for List serve needs to be declared before hand, I could
> not
> >> create a static method for List Serde under
> >> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> >> P.S. Static method corresponding to ListSerde under
> >> org.apache.kafka.common.serialization.Serdes (something like static
> public
> >> Serde<List<T>> List() {...}
> inorg.apache.kafka.common.serialization.Serdes)
> >> class cannot be added because type needs to be defined beforehand.
> That's
> >> why one needs to create List Serde in the following fashion:
> >> new Serdes.ListSerde<String>(Serdes.String(),
> >> Comparator.comparing(String::length));
> >> (can possibly be simplified by declaring import static
> >> org.apache.kafka.common.serialization.Serdes.ListSerde)
> >>
> >> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
> >> whether I need to pass a comparator to ListDeserializer. This certainly
> is
> >> not required. Feel free to add your input:
> >> https://github.com/apache/kafka/pull/6592#discussion_r281152067
> >>
> >> Thank you!
> >>
> >> Best,
> >> Daniyar Yeralin
> >>
> >>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
> >> wrote:
> >>>
> >>> Daniyar Yeralin created KAFKA-8326:
> >>> --------------------------------------
> >>>
> >>>            Summary: Add List<T> Serde
> >>>                Key: KAFKA-8326
> >>>                URL: https://issues.apache.org/jira/browse/KAFKA-8326
> >>>            Project: Kafka
> >>>         Issue Type: Improvement
> >>>         Components: clients, streams
> >>>           Reporter: Daniyar Yeralin
> >>>
> >>>
> >>> I propose adding serializers and deserializers for the java.util.List
> >> class.
> >>>
> >>> I have many use cases where I want to set the key of a Kafka message to
> >> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays
> and
> >> use their associated Serdes, but it would be more convenient to
> serialize
> >> and deserialize UUIDs directly.
> >>>
> >>> I believe there are many use cases where one would want to have a List
> >> serde. Ex. [
> >>
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows
> ],
> >> [
> >>
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> >> ]
> >>>
> >>>
> >>>
> >>> KIP Link: [
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >> ]
> >>>
> >>>
> >>>
> >>> --
> >>> This message was sent by Atlassian JIRA
> >>> (v7.6.3#76005)
> >>
> >>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hi John,

I’m really sorry for the confusion. I cloned that JIRA ticket from an old one about introducing UUID Serde, and I guess was too hasty while editing the copy to notice the mistake. Just edited the ticket. Sorry for any inconvenience .

As per comparator, I agree. Let’s make user be responsible for implementing comparable interface. I was just thinking to make the serde a little more flexible (i.e. let user decide in which order records is going to be inserted into a change log topic).

Thank you!

Best,
Daniyar Yeralin


> On May 6, 2019, at 5:37 PM, John Roesler <jo...@confluent.io> wrote:
> 
> Hi Daniyar,
> 
> Thanks for the proposal!
> 
> If I understand the point about the comparator, is it just to capture the
> generic type parameter? If so, then anything that implements a known
> interface would work just as well, right? I've been considering adding
> something like the Jackson TypeReference (or similar classes in many other
> projects). Would this be a good time to do it?
> 
> Note that it's not necessary to actually require that the captured type is
> Comparable (as this proposal currently does), it's just a way to make sure
> there is some method that makes use of the generic type parameter, to force
> the compiler to capture the type.
> 
> Just to make sure I understand the motivation... You expressed a desire to
> be able to serialize UUIDs, which I didn't follow, since there is a
> built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID, and
> also, a UUID isn't a List. Did you mean that you need to use *lists of*
> UUIDs?
> 
> Thanks,
> -John
> 
> On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:
> 
>> Hello,
>> 
>> Starting a discussion for KIP-466 adding support for List Serde. PR is
>> created under https://github.com/apache/kafka/pull/6592 <
>> https://github.com/apache/kafka/pull/6592>
>> 
>> There are two topics I would like to discuss:
>> 1. Since type for List serve needs to be declared before hand, I could not
>> create a static method for List Serde under
>> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
>> P.S. Static method corresponding to ListSerde under
>> org.apache.kafka.common.serialization.Serdes (something like static public
>> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
>> class cannot be added because type needs to be defined beforehand. That's
>> why one needs to create List Serde in the following fashion:
>> new Serdes.ListSerde<String>(Serdes.String(),
>> Comparator.comparing(String::length));
>> (can possibly be simplified by declaring import static
>> org.apache.kafka.common.serialization.Serdes.ListSerde)
>> 
>> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
>> whether I need to pass a comparator to ListDeserializer. This certainly is
>> not required. Feel free to add your input:
>> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>> 
>> Thank you!
>> 
>> Best,
>> Daniyar Yeralin
>> 
>>> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
>> wrote:
>>> 
>>> Daniyar Yeralin created KAFKA-8326:
>>> --------------------------------------
>>> 
>>>            Summary: Add List<T> Serde
>>>                Key: KAFKA-8326
>>>                URL: https://issues.apache.org/jira/browse/KAFKA-8326
>>>            Project: Kafka
>>>         Issue Type: Improvement
>>>         Components: clients, streams
>>>           Reporter: Daniyar Yeralin
>>> 
>>> 
>>> I propose adding serializers and deserializers for the java.util.List
>> class.
>>> 
>>> I have many use cases where I want to set the key of a Kafka message to
>> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and
>> use their associated Serdes, but it would be more convenient to serialize
>> and deserialize UUIDs directly.
>>> 
>>> I believe there are many use cases where one would want to have a List
>> serde. Ex. [
>> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows],
>> [
>> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
>> ]
>>> 
>>> 
>>> 
>>> KIP Link: [
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
>> ]
>>> 
>>> 
>>> 
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v7.6.3#76005)
>> 
>> 


Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by John Roesler <jo...@confluent.io>.
Hi Daniyar,

Thanks for the proposal!

If I understand the point about the comparator, is it just to capture the
generic type parameter? If so, then anything that implements a known
interface would work just as well, right? I've been considering adding
something like the Jackson TypeReference (or similar classes in many other
projects). Would this be a good time to do it?

Note that it's not necessary to actually require that the captured type is
Comparable (as this proposal currently does), it's just a way to make sure
there is some method that makes use of the generic type parameter, to force
the compiler to capture the type.

Just to make sure I understand the motivation... You expressed a desire to
be able to serialize UUIDs, which I didn't follow, since there is a
built-in UUID serde: org.apache.kafka.common.serialization.Serdes#UUID, and
also, a UUID isn't a List. Did you mean that you need to use *lists of*
UUIDs?

Thanks,
-John

On Mon, May 6, 2019 at 11:49 AM Development <de...@yeralin.net> wrote:

> Hello,
>
> Starting a discussion for KIP-466 adding support for List Serde. PR is
> created under https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> There are two topics I would like to discuss:
> 1. Since type for List serve needs to be declared before hand, I could not
> create a static method for List Serde under
> org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP:
> P.S. Static method corresponding to ListSerde under
> org.apache.kafka.common.serialization.Serdes (something like static public
> Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes)
> class cannot be added because type needs to be defined beforehand. That's
> why one needs to create List Serde in the following fashion:
> new Serdes.ListSerde<String>(Serdes.String(),
> Comparator.comparing(String::length));
> (can possibly be simplified by declaring import static
> org.apache.kafka.common.serialization.Serdes.ListSerde)
>
> 2. @miguno Michael G. Noll <https://github.com/miguno> is questioning
> whether I need to pass a comparator to ListDeserializer. This certainly is
> not required. Feel free to add your input:
> https://github.com/apache/kafka/pull/6592#discussion_r281152067
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> > On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org>
> wrote:
> >
> > Daniyar Yeralin created KAFKA-8326:
> > --------------------------------------
> >
> >             Summary: Add List<T> Serde
> >                 Key: KAFKA-8326
> >                 URL: https://issues.apache.org/jira/browse/KAFKA-8326
> >             Project: Kafka
> >          Issue Type: Improvement
> >          Components: clients, streams
> >            Reporter: Daniyar Yeralin
> >
> >
> > I propose adding serializers and deserializers for the java.util.List
> class.
> >
> > I have many use cases where I want to set the key of a Kafka message to
> be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and
> use their associated Serdes, but it would be more convenient to serialize
> and deserialize UUIDs directly.
> >
> > I believe there are many use cases where one would want to have a List
> serde. Ex. [
> https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows],
> [
> https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api
> ]
> >
> >
> >
> > KIP Link: [
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> ]
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
>
>

[DISCUSS] KIP-466: Add support for List serialization and deserialization

Posted by Development <de...@yeralin.net>.
Hello,

Starting a discussion for KIP-466 adding support for List Serde. PR is created under https://github.com/apache/kafka/pull/6592 <https://github.com/apache/kafka/pull/6592>

There are two topics I would like to discuss:
1. Since type for List serve needs to be declared before hand, I could not create a static method for List Serde under org.apache.kafka.common.serialization.Serdes. I addressed it in the KIP: 
P.S. Static method corresponding to ListSerde under org.apache.kafka.common.serialization.Serdes (something like static public Serde<List<T>> List() {...} inorg.apache.kafka.common.serialization.Serdes) class cannot be added because type needs to be defined beforehand. That's why one needs to create List Serde in the following fashion:
new Serdes.ListSerde<String>(Serdes.String(), Comparator.comparing(String::length));
(can possibly be simplified by declaring import static org.apache.kafka.common.serialization.Serdes.ListSerde)

2. @miguno Michael G. Noll <https://github.com/miguno> is questioning whether I need to pass a comparator to ListDeserializer. This certainly is not required. Feel free to add your input:
https://github.com/apache/kafka/pull/6592#discussion_r281152067

Thank you!

Best,
Daniyar Yeralin

> On May 6, 2019, at 11:59 AM, Daniyar Yeralin (JIRA) <ji...@apache.org> wrote:
> 
> Daniyar Yeralin created KAFKA-8326:
> --------------------------------------
> 
>             Summary: Add List<T> Serde
>                 Key: KAFKA-8326
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8326
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients, streams
>            Reporter: Daniyar Yeralin
> 
> 
> I propose adding serializers and deserializers for the java.util.List class.
> 
> I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and use their associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly.
> 
> I believe there are many use cases where one would want to have a List serde. Ex. [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api]
> 
>  
> 
> KIP Link: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization]
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)


Re: [jira] [Created] (KAFKA-8326) Add List Serde

Posted by Jan Filipiak <Ja...@trivago.com>.
I think this encourages bad descissions.
Lets just have people define repeated fields in thrift,avro,json,
protobuf. Its gonna look nasty if you got your 11th layer of lists.

If you really want to add lists, please do Map aswell in 1 shot

Best Jan

On 06.05.2019 17:59, Daniyar Yeralin (JIRA) wrote:
> Daniyar Yeralin created KAFKA-8326:
> --------------------------------------
> 
>              Summary: Add List<T> Serde
>                  Key: KAFKA-8326
>                  URL: https://issues.apache.org/jira/browse/KAFKA-8326
>              Project: Kafka
>           Issue Type: Improvement
>           Components: clients, streams
>             Reporter: Daniyar Yeralin
> 
> 
> I propose adding serializers and deserializers for the java.util.List class.
> 
> I have many use cases where I want to set the key of a Kafka message to be a UUID. Currently, I need to turn UUIDs into strings or byte arrays and use their associated Serdes, but it would be more convenient to serialize and deserialize UUIDs directly.
> 
> I believe there are many use cases where one would want to have a List serde. Ex. [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api]
> 
>  
> 
> KIP Link: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization]
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>