You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Anastasia Vela <av...@confluent.io> on 2020/09/24 22:42:36 UTC

[DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Hi all,

I'd like to discuss KIP-673:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-673%3A+Emit+JSONs+with+new+auto-generated+schema

This is a proposal to change the format of request and response traces to
JSON, which would be easier to load and parse, because the current format
is only JSON-like and not easily parsable.

Let me know what you think,
Anastasia

Re: [DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Posted by Tom Bentley <tb...@redhat.com>.
Hi Anastasia,

Is the concern that deserialization would have no guarantee as to the order
> it appears?
>

No, it's simply that when reading the logs screeds of unformatted JSON can
be hard to read, especially when more important fields come later in an
object (or after an array). This is also a problem with the existing format
too, of course. And there's now the option to put it through `jq` to indent
it. So I don't think it's worth obsessing over.


> I understand the performance concern. However, this would only be logged
> when the trace level logging is turned on, so this logging isn't done
> otherwise. Plus, the existing text-based logging is already very expensive,
> so it might not be that significant of a change.
>

Yeah, it's an implementation detail, so could always be optimised later
when it's known to be a problem.

Thanks again for the KIP, LGTM.

Tom

Re: [DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Posted by Anastasia Vela <av...@confluent.io>.
Hi Tom,

I'm glad it makes more sense. I modified the KIP just to make it a little
more clear as well.

Is the concern that deserialization would have no guarantee as to the order
it appears? Since ObjectNodes hold the tree structure in a LinkedHashMap,
the iterating order to deserialize would be defined in the order it was
inserted into the map.

I understand the performance concern. However, this would only be logged
when the trace level logging is turned on, so this logging isn't done
otherwise. Plus, the existing text-based logging is already very expensive,
so it might not be that significant of a change.

Thanks,
Anastasia

On Mon, Sep 28, 2020 at 3:06 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Anastasia,
>
> Thanks for those changes. I eventually figured out that the
> XYZJsonDataConverter are the new classes from the factoring-out of the
> Jackson dependency (KAFKA-10384), after which the proposal made much more
> sense to me. Apologies for being slow on the uptake there.
>
> I can see now that property order in the JSON in the log would be
> determined by the declared field in the RPC JSON definition. That might not
> always be ideal since it can be hard to visually parse large unindented
> JSON, but maybe that could be dealt with once we knew there was a real
> problem.
>
> It's an implementation detail, but I wonder whether constructing a whole
> tree of JsonNodes might cause performance issues. It would be more work,
> but the XYZJsonDataConverter could be generated to have a method which took
> a JsonGenerator, thus avoiding the need to instantiate the nodes just for
> the purposes of logging.
>
> Kind regards,
>
> Tom
>
> On Fri, Sep 25, 2020 at 7:05 PM Anastasia Vela <av...@confluent.io> wrote:
>
> > Hi Tom,
> >
> > Thanks for your input!
> >
> > 1. I'll add more details for the RequestConvertToJson and
> > XYZJsonDataConverter classes. Hopefully it will be more clear, but just
> to
> > answer your question, RequestConvertToJson does not return a
> > XYZJsonDataConverter, but rather it returns a JsonNode which will be
> > serialized. The JsonDataConverter is the new auto-generated schema for
> each
> > request/response type that contains the method to return the JsonNode to
> be
> > serialized.
> >
> > 2. There is no defined order of the properties, rather it's in the order
> > that it is set in. So if you first set key B, then key A, the properties
> > would appear with key B first. JsonNodes when serialized does not sort
> the
> > keys.
> >
> > 3. Yes, serialization is done via Jackson databind.
> >
> > Thanks again,
> > Anastasia
> >
> > On Fri, Sep 25, 2020 at 1:15 AM Tom Bentley <tb...@redhat.com> wrote:
> >
> > > Hi Anastasia,
> > >
> > > Thanks for the KIP, I can certainly see the benefit of this. I have a
> few
> > > questions:
> > >
> > > 1. I think it would be helpful to readers to explicitly illustrate the
> > > RequestConvertToJson and XYZJsonDataConverter classes (e.g. with method
> > > signatures for one or two methods), because currently it's not clear
> (to
> > me
> > > at least) exactly what's being proposed. Does the RequestConvertToJson
> > > return a XYZJsonDataConverter?
> > >
> > > 2. Does the serialization have a defined order of properties
> (alphabetic
> > > perhaps)? My concern here is that properties appearing in order
> according
> > > to how they are iterated in a hash map might harm human readability of
> > the
> > > logs.
> > >
> > > 3. Would the serialization be done via the Jackson databinding?
> > >
> > > Many thanks,
> > >
> > > Tom
> > >
> > > On Thu, Sep 24, 2020 at 11:49 PM Anastasia Vela <av...@confluent.io>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to discuss KIP-673:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-673%3A+Emit+JSONs+with+new+auto-generated+schema
> > > >
> > > > This is a proposal to change the format of request and response
> traces
> > to
> > > > JSON, which would be easier to load and parse, because the current
> > format
> > > > is only JSON-like and not easily parsable.
> > > >
> > > > Let me know what you think,
> > > > Anastasia
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Posted by Tom Bentley <tb...@redhat.com>.
Hi Anastasia,

Thanks for those changes. I eventually figured out that the
XYZJsonDataConverter are the new classes from the factoring-out of the
Jackson dependency (KAFKA-10384), after which the proposal made much more
sense to me. Apologies for being slow on the uptake there.

I can see now that property order in the JSON in the log would be
determined by the declared field in the RPC JSON definition. That might not
always be ideal since it can be hard to visually parse large unindented
JSON, but maybe that could be dealt with once we knew there was a real
problem.

It's an implementation detail, but I wonder whether constructing a whole
tree of JsonNodes might cause performance issues. It would be more work,
but the XYZJsonDataConverter could be generated to have a method which took
a JsonGenerator, thus avoiding the need to instantiate the nodes just for
the purposes of logging.

Kind regards,

Tom

On Fri, Sep 25, 2020 at 7:05 PM Anastasia Vela <av...@confluent.io> wrote:

> Hi Tom,
>
> Thanks for your input!
>
> 1. I'll add more details for the RequestConvertToJson and
> XYZJsonDataConverter classes. Hopefully it will be more clear, but just to
> answer your question, RequestConvertToJson does not return a
> XYZJsonDataConverter, but rather it returns a JsonNode which will be
> serialized. The JsonDataConverter is the new auto-generated schema for each
> request/response type that contains the method to return the JsonNode to be
> serialized.
>
> 2. There is no defined order of the properties, rather it's in the order
> that it is set in. So if you first set key B, then key A, the properties
> would appear with key B first. JsonNodes when serialized does not sort the
> keys.
>
> 3. Yes, serialization is done via Jackson databind.
>
> Thanks again,
> Anastasia
>
> On Fri, Sep 25, 2020 at 1:15 AM Tom Bentley <tb...@redhat.com> wrote:
>
> > Hi Anastasia,
> >
> > Thanks for the KIP, I can certainly see the benefit of this. I have a few
> > questions:
> >
> > 1. I think it would be helpful to readers to explicitly illustrate the
> > RequestConvertToJson and XYZJsonDataConverter classes (e.g. with method
> > signatures for one or two methods), because currently it's not clear (to
> me
> > at least) exactly what's being proposed. Does the RequestConvertToJson
> > return a XYZJsonDataConverter?
> >
> > 2. Does the serialization have a defined order of properties (alphabetic
> > perhaps)? My concern here is that properties appearing in order according
> > to how they are iterated in a hash map might harm human readability of
> the
> > logs.
> >
> > 3. Would the serialization be done via the Jackson databinding?
> >
> > Many thanks,
> >
> > Tom
> >
> > On Thu, Sep 24, 2020 at 11:49 PM Anastasia Vela <av...@confluent.io>
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to discuss KIP-673:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-673%3A+Emit+JSONs+with+new+auto-generated+schema
> > >
> > > This is a proposal to change the format of request and response traces
> to
> > > JSON, which would be easier to load and parse, because the current
> format
> > > is only JSON-like and not easily parsable.
> > >
> > > Let me know what you think,
> > > Anastasia
> > >
> >
>

Re: [DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Posted by Anastasia Vela <av...@confluent.io>.
Hi Tom,

Thanks for your input!

1. I'll add more details for the RequestConvertToJson and
XYZJsonDataConverter classes. Hopefully it will be more clear, but just to
answer your question, RequestConvertToJson does not return a
XYZJsonDataConverter, but rather it returns a JsonNode which will be
serialized. The JsonDataConverter is the new auto-generated schema for each
request/response type that contains the method to return the JsonNode to be
serialized.

2. There is no defined order of the properties, rather it's in the order
that it is set in. So if you first set key B, then key A, the properties
would appear with key B first. JsonNodes when serialized does not sort the
keys.

3. Yes, serialization is done via Jackson databind.

Thanks again,
Anastasia

On Fri, Sep 25, 2020 at 1:15 AM Tom Bentley <tb...@redhat.com> wrote:

> Hi Anastasia,
>
> Thanks for the KIP, I can certainly see the benefit of this. I have a few
> questions:
>
> 1. I think it would be helpful to readers to explicitly illustrate the
> RequestConvertToJson and XYZJsonDataConverter classes (e.g. with method
> signatures for one or two methods), because currently it's not clear (to me
> at least) exactly what's being proposed. Does the RequestConvertToJson
> return a XYZJsonDataConverter?
>
> 2. Does the serialization have a defined order of properties (alphabetic
> perhaps)? My concern here is that properties appearing in order according
> to how they are iterated in a hash map might harm human readability of the
> logs.
>
> 3. Would the serialization be done via the Jackson databinding?
>
> Many thanks,
>
> Tom
>
> On Thu, Sep 24, 2020 at 11:49 PM Anastasia Vela <av...@confluent.io>
> wrote:
>
> > Hi all,
> >
> > I'd like to discuss KIP-673:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-673%3A+Emit+JSONs+with+new+auto-generated+schema
> >
> > This is a proposal to change the format of request and response traces to
> > JSON, which would be easier to load and parse, because the current format
> > is only JSON-like and not easily parsable.
> >
> > Let me know what you think,
> > Anastasia
> >
>

Re: [DISCUSS] KIP-673: Emit JSONs with new auto-generated schema

Posted by Tom Bentley <tb...@redhat.com>.
Hi Anastasia,

Thanks for the KIP, I can certainly see the benefit of this. I have a few
questions:

1. I think it would be helpful to readers to explicitly illustrate the
RequestConvertToJson and XYZJsonDataConverter classes (e.g. with method
signatures for one or two methods), because currently it's not clear (to me
at least) exactly what's being proposed. Does the RequestConvertToJson
return a XYZJsonDataConverter?

2. Does the serialization have a defined order of properties (alphabetic
perhaps)? My concern here is that properties appearing in order according
to how they are iterated in a hash map might harm human readability of the
logs.

3. Would the serialization be done via the Jackson databinding?

Many thanks,

Tom

On Thu, Sep 24, 2020 at 11:49 PM Anastasia Vela <av...@confluent.io> wrote:

> Hi all,
>
> I'd like to discuss KIP-673:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-673%3A+Emit+JSONs+with+new+auto-generated+schema
>
> This is a proposal to change the format of request and response traces to
> JSON, which would be easier to load and parse, because the current format
> is only JSON-like and not easily parsable.
>
> Let me know what you think,
> Anastasia
>