You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by JJ Meyer <jj...@gmail.com> on 2017/04/21 12:50:22 UTC

Reducing Warnings in Build

Hello everybody,

Currently our build has a significant amount of warnings (most from the
error prone plugin). A lot of this has been documented here:
https://issues.apache.org/jira/browse/METRON-617

I want to continue to work on this Jira. I really want to reduce the number
of warnings in our build. As the Jira points out, a lot of them are
unchecked casts or the implicit use of default charset.

When starting this, I want to start with a specific module. I plan on
starting with `metron-interface` as it's fairly self contained and is
pretty new. Below I want to layout what I plan on doing. Please let me know
what you all think:

1. Suppress warnings where generics are not supported or checking type is
not possible.
2. For all unit tests that require supplying a charset I'll supply the
UTF-8 charset.
3. Update the API to have a charset configuration for each resource that
needs one.
4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
prone doesn't support this level of suppression. Which is probably a good
thing. We will need to explicitly state what we want to suppress instead.

Two big questions came up right away when I was thinking about the above:

1. When suppressing warnings. I see we sometimes cast a JSONObject to
Map<String, Object>. I know it extends Map, but is it really safe to do
something like the following? Should we have a utility that truly builds a
map that uses generics? I plan on doing some testing around this, but if
anyone has any experience with this it would be greatly appreciated. Here
is an example of what I am talking about:

JSONObject message = ...;
@SuppressWarnings("unchecked")
Map<String, Object> state = (Map<String, Object>) message;


2. This one is about configuring charset (#3 above). Specifically in
metron-rest. Right now, I believe there are 3 sensor resources (index,
enrichment, and parser) that throw warnings due to implicitly using the
default charset. I propose that we have 3 configs (listed below). These
configs would take any valid charset, default, or not set. If not set then
UTF-8 would be default. Does this seem fair?

sensor:
  index.encoding: UTF-8
  enrichment.encoding: UTF-8
  parser.encoding: UTF-8


Does anyone see any problems that may occur if we go about it this way? Any
help on this would be very much appreciated.

Thanks,
JJ

Re: Reducing Warnings in Build

Posted by Justin Leet <ju...@gmail.com>.

First off, I would absolutely love to see the warnings reduced and the
quality of our code improved.  I'm in favor of all four of the points, and
I think it's a good start towards weeding out a lot of issues.

So regarding question 1:

That awkwardness comes about because org.simple.json is all untyped Maps
under the hood (
https://github.com/fangyidong/json-simple/blob/master/src/main/java/org/json/simple/JSONObject.java#L19).
So anything that touches those classes starts requiring casts everywhere. A
couple months ago a PR was made against the library that seems to have
added a lot of this stuff (
https://github.com/fangyidong/json-simple/pull/113).  Basically, it was
forked (https://github.com/cliftonlabs/json-simple and
https://cliftonlabs.github.io/json-simple/) and they tried to contribute it
back and it's been sitting there.

I strongly think we should consider swapping over to the fork.  We'd have
to do a bit of research, but it seems the intent was to be completely
backwards compatible while updating things like generics and some various
issues they presumably saw with the original.

For 2, and I'm not an expert in this by any means, but I would be in favor
of defaulting to UTF-8.  It's most likely what's happening under the hood
anyway (even though it's technically platform dependent), so defaulting
probably makes it both more explicit and removes potential for issues if a
given system has a different underlying default.

Justin

On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:

> Hello everybody,
>
> Currently our build has a significant amount of warnings (most from the
> error prone plugin). A lot of this has been documented here:
> https://issues.apache.org/jira/browse/METRON-617
>
> I want to continue to work on this Jira. I really want to reduce the number
> of warnings in our build. As the Jira points out, a lot of them are
> unchecked casts or the implicit use of default charset.
>
> When starting this, I want to start with a specific module. I plan on
> starting with `metron-interface` as it's fairly self contained and is
> pretty new. Below I want to layout what I plan on doing. Please let me know
> what you all think:
>
> 1. Suppress warnings where generics are not supported or checking type is
> not possible.
> 2. For all unit tests that require supplying a charset I'll supply the
> UTF-8 charset.
> 3. Update the API to have a charset configuration for each resource that
> needs one.
> 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> prone doesn't support this level of suppression. Which is probably a good
> thing. We will need to explicitly state what we want to suppress instead.
>
> Two big questions came up right away when I was thinking about the above:
>
> 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> Map<String, Object>. I know it extends Map, but is it really safe to do
> something like the following? Should we have a utility that truly builds a
> map that uses generics? I plan on doing some testing around this, but if
> anyone has any experience with this it would be greatly appreciated. Here
> is an example of what I am talking about:
>
> JSONObject message = ...;
> @SuppressWarnings("unchecked")
> Map<String, Object> state = (Map<String, Object>) message;
>
>
> 2. This one is about configuring charset (#3 above). Specifically in
> metron-rest. Right now, I believe there are 3 sensor resources (index,
> enrichment, and parser) that throw warnings due to implicitly using the
> default charset. I propose that we have 3 configs (listed below). These
> configs would take any valid charset, default, or not set. If not set then
> UTF-8 would be default. Does this seem fair?
>
> sensor:
>   index.encoding: UTF-8
>   enrichment.encoding: UTF-8
>   parser.encoding: UTF-8
>
>
> Does anyone see any problems that may occur if we go about it this way? Any
> help on this would be very much appreciated.
>
> Thanks,
> JJ
>

Re: Reducing Warnings in Build

Posted by Otto Fowler <ot...@gmail.com>.

I am sorry,  I mean for encodings.  We should improve the warnings, and
design and plan
how to handle non utf-8 separately.



On April 21, 2017 at 13:28:09, Michael Miklavcic (
michael.miklavcic@gmail.com) wrote:

Otto, are you referring to the 80% plus case of encodings or the compiler
warnings in general? I'm not sure I agree about leaving the encoding as is,
but I do agree that we should split the work out from the other javac
compiler warnings. Right now we're at the whim of the OS default when we
haven't specified an encoding. Worse yet, we might specify it in some
places and not in others. That would also result in a similar problem to
what Nick described. The default nowadays is usually UTF-8 for Linux, but
users can change these defaults. And it's unclear to me what this would be
for Asian countries. It might be UTF-16?

Here are values for my Mac and CentOS 6
Mac:
{11:14}~ ➭ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Centos 6
[root@not-sharing-the-ip: ~]
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Best,
Mike


On Fri, Apr 21, 2017 at 10:45 AM, Otto Fowler <ot...@gmail.com>
wrote:

> Until there is a use case supporting changing to other then the 80% plus
> case, this should not be changed.
> or a customer requirement etc. This would need some design and discussion
> time as well to map out all the
> implications for the complete pipeline.
>
> I think configurable charset should be separated from the cleanup.
>
>
> On April 21, 2017 at 12:26:36, JJ Meyer (jjmeyer0@gmail.com) wrote:
>
> Nick,
>
> You're not off base at all. This is exactly the push back I wanted to
hear.
> I wasn't really sure if what I was proposing was the proper way of going
> about it. My understanding of charsets is limited, and I've usually just
> defaulted to UTF-8. That being said, right or wrong, my thought process
for
> including multiple configs was potential use of multiple charsets that
may
> be compatible (if that's possible?). But as I'm typing this out, I'm
> probably over doing it. Especially since currently we usually take system
> default, which a lot of times is UTF-8. So, you are right, it is probably
> best to start with the path of least resistance and include one
> configuration that is set to UTF-8. Then expand if we ever see a need to
do
> so.
>
> Thanks,
> JJ
>
> On Fri, Apr 21, 2017 at 10:45 AM, Ryan Merriman <me...@gmail.com>
> wrote:
>
> > I think Nick brings up some good points. Would there ever be a reason
to
> > not use UTF8 as the default from parsing a message on? All the tools we
> > use for analytics work with UTF8 (am I wrong?).
> >
> > The only case I can see needing a configurable charset would be if a
> > message coming from a sensor were encoded in a charset other than UTF8.
> In
> > that case there would need to be a configurable charset per parser (in
> > parser config?) for decoding but the message could be encoded/decoded
> with
> > UTF8 after that. Or we could just make UTF encoding a prerequisite for
> > sending messages to Metron.
> >
> > On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <ni...@nickallen.org>
wrote:
> >
> > > Per (2), I think it makes sense to make the charset configurable, but
> > with
> > > the proposal of 3 separate settings, wouldn't things blow up horribly
> if
> > > the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?
> > They
> > > are not even speaking the same language, no?
> > >
> > > This makes me think that we need to understand the scenarios under
> which
> > a
> > > user would want to change the charset, before we know what kinds of
> > levers
> > > to bake-in. What sort of use case would drive someone to change the
> > > charset? Would there be a particular sensor producing telemetry with
a
> > > different charset from others? If one source of telemetry is
different
> > > than the others, would the entire system even work with varying
> charsets?
> > >
> > > Without a good understanding of use cases, I think the only mildly
safe
> > > thing to do right now, is to have a single, global charset setting.
The
> > > user would have to ensure that their entire environment and all the
> JVMs
> > > driving it are set to use that charset.
> > >
> > > Perhaps my questioning comes from a lack of understanding of
charsets.
> I
> > > can't remember ever having to deviate from the default. Please chime
in
> > > and educate me, if I am off base.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
> > >
> > > > Hello everybody,
> > > >
> > > > Currently our build has a significant amount of warnings (most from
> the
> > > > error prone plugin). A lot of this has been documented here:
> > > > https://issues.apache.org/jira/browse/METRON-617
> > > >
> > > > I want to continue to work on this Jira. I really want to reduce
the
> > > number
> > > > of warnings in our build. As the Jira points out, a lot of them are
> > > > unchecked casts or the implicit use of default charset.
> > > >
> > > > When starting this, I want to start with a specific module. I plan
on
> > > > starting with `metron-interface` as it's fairly self contained and
is
> > > > pretty new. Below I want to layout what I plan on doing. Please let
> me
> > > know
> > > > what you all think:
> > > >
> > > > 1. Suppress warnings where generics are not supported or checking
> type
> > is
> > > > not possible.
> > > > 2. For all unit tests that require supplying a charset I'll supply
> the
> > > > UTF-8 charset.
> > > > 3. Update the API to have a charset configuration for each resource
> > that
> > > > needs one.
> > > > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out
> error
> > > > prone doesn't support this level of suppression. Which is probably
a
> > good
> > > > thing. We will need to explicitly state what we want to suppress
> > instead.
> > > >
> > > > Two big questions came up right away when I was thinking about the
> > above:
> > > >
> > > > 1. When suppressing warnings. I see we sometimes cast a JSONObject
to
> > > > Map<String, Object>. I know it extends Map, but is it really safe
to
> do
> > > > something like the following? Should we have a utility that truly
> > builds
> > > a
> > > > map that uses generics? I plan on doing some testing around this,
but
> > if
> > > > anyone has any experience with this it would be greatly
appreciated.
> > Here
> > > > is an example of what I am talking about:
> > > >
> > > > JSONObject message = ...;
> > > > @SuppressWarnings("unchecked")
> > > > Map<String, Object> state = (Map<String, Object>) message;
> > > >
> > > >
> > > > 2. This one is about configuring charset (#3 above). Specifically
in
> > > > metron-rest. Right now, I believe there are 3 sensor resources
> (index,
> > > > enrichment, and parser) that throw warnings due to implicitly using
> the
> > > > default charset. I propose that we have 3 configs (listed below).
> These
> > > > configs would take any valid charset, default, or not set. If not
set
> > > then
> > > > UTF-8 would be default. Does this seem fair?
> > > >
> > > > sensor:
> > > > index.encoding: UTF-8
> > > > enrichment.encoding: UTF-8
> > > > parser.encoding: UTF-8
> > > >
> > > >
> > > > Does anyone see any problems that may occur if we go about it this
> way?
> > > Any
> > > > help on this would be very much appreciated.
> > > >
> > > > Thanks,
> > > > JJ
> > > >
> > >
> >
>

Re: Reducing Warnings in Build

Posted by Michael Miklavcic <mi...@gmail.com>.

Otto, are you referring to the 80% plus case of encodings or the compiler
warnings in general? I'm not sure I agree about leaving the encoding as is,
but I do agree that we should split the work out from the other javac
compiler warnings. Right now we're at the whim of the OS default when we
haven't specified an encoding. Worse yet, we might specify it in some
places and not in others. That would also result in a similar problem to
what Nick described. The default nowadays is usually UTF-8 for Linux, but
users can change these defaults. And it's unclear to me what this would be
for Asian countries. It might be UTF-16?

Here are values for my Mac and CentOS 6
Mac:
{11:14}~ ➭ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Centos 6
[root@not-sharing-the-ip: ~]
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Best,
Mike


On Fri, Apr 21, 2017 at 10:45 AM, Otto Fowler <ot...@gmail.com>
wrote:

> Until there is a use case supporting changing to other then the 80% plus
> case, this should not be changed.
> or a customer requirement etc.  This would need some design and discussion
> time as well to map out all the
> implications for the complete pipeline.
>
> I think configurable charset should be separated from the cleanup.
>
>
> On April 21, 2017 at 12:26:36, JJ Meyer (jjmeyer0@gmail.com) wrote:
>
> Nick,
>
> You're not off base at all. This is exactly the push back I wanted to hear.
> I wasn't really sure if what I was proposing was the proper way of going
> about it. My understanding of charsets is limited, and I've usually just
> defaulted to UTF-8. That being said, right or wrong, my thought process for
> including multiple configs was potential use of multiple charsets that may
> be compatible (if that's possible?). But as I'm typing this out, I'm
> probably over doing it. Especially since currently we usually take system
> default, which a lot of times is UTF-8. So, you are right, it is probably
> best to start with the path of least resistance and include one
> configuration that is set to UTF-8. Then expand if we ever see a need to do
> so.
>
> Thanks,
> JJ
>
> On Fri, Apr 21, 2017 at 10:45 AM, Ryan Merriman <me...@gmail.com>
> wrote:
>
> > I think Nick brings up some good points. Would there ever be a reason to
> > not use UTF8 as the default from parsing a message on? All the tools we
> > use for analytics work with UTF8 (am I wrong?).
> >
> > The only case I can see needing a configurable charset would be if a
> > message coming from a sensor were encoded in a charset other than UTF8.
> In
> > that case there would need to be a configurable charset per parser (in
> > parser config?) for decoding but the message could be encoded/decoded
> with
> > UTF8 after that. Or we could just make UTF encoding a prerequisite for
> > sending messages to Metron.
> >
> > On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <ni...@nickallen.org> wrote:
> >
> > > Per (2), I think it makes sense to make the charset configurable, but
> > with
> > > the proposal of 3 separate settings, wouldn't things blow up horribly
> if
> > > the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?
> > They
> > > are not even speaking the same language, no?
> > >
> > > This makes me think that we need to understand the scenarios under
> which
> > a
> > > user would want to change the charset, before we know what kinds of
> > levers
> > > to bake-in. What sort of use case would drive someone to change the
> > > charset? Would there be a particular sensor producing telemetry with a
> > > different charset from others? If one source of telemetry is different
> > > than the others, would the entire system even work with varying
> charsets?
> > >
> > > Without a good understanding of use cases, I think the only mildly safe
> > > thing to do right now, is to have a single, global charset setting. The
> > > user would have to ensure that their entire environment and all the
> JVMs
> > > driving it are set to use that charset.
> > >
> > > Perhaps my questioning comes from a lack of understanding of charsets.
> I
> > > can't remember ever having to deviate from the default. Please chime in
> > > and educate me, if I am off base.
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
> > >
> > > > Hello everybody,
> > > >
> > > > Currently our build has a significant amount of warnings (most from
> the
> > > > error prone plugin). A lot of this has been documented here:
> > > > https://issues.apache.org/jira/browse/METRON-617
> > > >
> > > > I want to continue to work on this Jira. I really want to reduce the
> > > number
> > > > of warnings in our build. As the Jira points out, a lot of them are
> > > > unchecked casts or the implicit use of default charset.
> > > >
> > > > When starting this, I want to start with a specific module. I plan on
> > > > starting with `metron-interface` as it's fairly self contained and is
> > > > pretty new. Below I want to layout what I plan on doing. Please let
> me
> > > know
> > > > what you all think:
> > > >
> > > > 1. Suppress warnings where generics are not supported or checking
> type
> > is
> > > > not possible.
> > > > 2. For all unit tests that require supplying a charset I'll supply
> the
> > > > UTF-8 charset.
> > > > 3. Update the API to have a charset configuration for each resource
> > that
> > > > needs one.
> > > > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out
> error
> > > > prone doesn't support this level of suppression. Which is probably a
> > good
> > > > thing. We will need to explicitly state what we want to suppress
> > instead.
> > > >
> > > > Two big questions came up right away when I was thinking about the
> > above:
> > > >
> > > > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > > > Map<String, Object>. I know it extends Map, but is it really safe to
> do
> > > > something like the following? Should we have a utility that truly
> > builds
> > > a
> > > > map that uses generics? I plan on doing some testing around this, but
> > if
> > > > anyone has any experience with this it would be greatly appreciated.
> > Here
> > > > is an example of what I am talking about:
> > > >
> > > > JSONObject message = ...;
> > > > @SuppressWarnings("unchecked")
> > > > Map<String, Object> state = (Map<String, Object>) message;
> > > >
> > > >
> > > > 2. This one is about configuring charset (#3 above). Specifically in
> > > > metron-rest. Right now, I believe there are 3 sensor resources
> (index,
> > > > enrichment, and parser) that throw warnings due to implicitly using
> the
> > > > default charset. I propose that we have 3 configs (listed below).
> These
> > > > configs would take any valid charset, default, or not set. If not set
> > > then
> > > > UTF-8 would be default. Does this seem fair?
> > > >
> > > > sensor:
> > > > index.encoding: UTF-8
> > > > enrichment.encoding: UTF-8
> > > > parser.encoding: UTF-8
> > > >
> > > >
> > > > Does anyone see any problems that may occur if we go about it this
> way?
> > > Any
> > > > help on this would be very much appreciated.
> > > >
> > > > Thanks,
> > > > JJ
> > > >
> > >
> >
>

Re: Reducing Warnings in Build

Posted by Otto Fowler <ot...@gmail.com>.

Until there is a use case supporting changing to other then the 80% plus
case, this should not be changed.
or a customer requirement etc.  This would need some design and discussion
time as well to map out all the
implications for the complete pipeline.

I think configurable charset should be separated from the cleanup.


On April 21, 2017 at 12:26:36, JJ Meyer (jjmeyer0@gmail.com) wrote:

Nick,

You're not off base at all. This is exactly the push back I wanted to hear.
I wasn't really sure if what I was proposing was the proper way of going
about it. My understanding of charsets is limited, and I've usually just
defaulted to UTF-8. That being said, right or wrong, my thought process for
including multiple configs was potential use of multiple charsets that may
be compatible (if that's possible?). But as I'm typing this out, I'm
probably over doing it. Especially since currently we usually take system
default, which a lot of times is UTF-8. So, you are right, it is probably
best to start with the path of least resistance and include one
configuration that is set to UTF-8. Then expand if we ever see a need to do
so.

Thanks,
JJ

On Fri, Apr 21, 2017 at 10:45 AM, Ryan Merriman <me...@gmail.com>
wrote:

> I think Nick brings up some good points. Would there ever be a reason to
> not use UTF8 as the default from parsing a message on? All the tools we
> use for analytics work with UTF8 (am I wrong?).
>
> The only case I can see needing a configurable charset would be if a
> message coming from a sensor were encoded in a charset other than UTF8.
In
> that case there would need to be a configurable charset per parser (in
> parser config?) for decoding but the message could be encoded/decoded
with
> UTF8 after that. Or we could just make UTF encoding a prerequisite for
> sending messages to Metron.
>
> On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <ni...@nickallen.org> wrote:
>
> > Per (2), I think it makes sense to make the charset configurable, but
> with
> > the proposal of 3 separate settings, wouldn't things blow up horribly
if
> > the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?
> They
> > are not even speaking the same language, no?
> >
> > This makes me think that we need to understand the scenarios under
which
> a
> > user would want to change the charset, before we know what kinds of
> levers
> > to bake-in. What sort of use case would drive someone to change the
> > charset? Would there be a particular sensor producing telemetry with a
> > different charset from others? If one source of telemetry is different
> > than the others, would the entire system even work with varying
charsets?
> >
> > Without a good understanding of use cases, I think the only mildly safe
> > thing to do right now, is to have a single, global charset setting. The
> > user would have to ensure that their entire environment and all the
JVMs
> > driving it are set to use that charset.
> >
> > Perhaps my questioning comes from a lack of understanding of charsets.
I
> > can't remember ever having to deviate from the default. Please chime in
> > and educate me, if I am off base.
> >
> >
> >
> >
> >
> >
> > On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
> >
> > > Hello everybody,
> > >
> > > Currently our build has a significant amount of warnings (most from
the
> > > error prone plugin). A lot of this has been documented here:
> > > https://issues.apache.org/jira/browse/METRON-617
> > >
> > > I want to continue to work on this Jira. I really want to reduce the
> > number
> > > of warnings in our build. As the Jira points out, a lot of them are
> > > unchecked casts or the implicit use of default charset.
> > >
> > > When starting this, I want to start with a specific module. I plan on
> > > starting with `metron-interface` as it's fairly self contained and is
> > > pretty new. Below I want to layout what I plan on doing. Please let
me
> > know
> > > what you all think:
> > >
> > > 1. Suppress warnings where generics are not supported or checking
type
> is
> > > not possible.
> > > 2. For all unit tests that require supplying a charset I'll supply
the
> > > UTF-8 charset.
> > > 3. Update the API to have a charset configuration for each resource
> that
> > > needs one.
> > > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out
error
> > > prone doesn't support this level of suppression. Which is probably a
> good
> > > thing. We will need to explicitly state what we want to suppress
> instead.
> > >
> > > Two big questions came up right away when I was thinking about the
> above:
> > >
> > > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > > Map<String, Object>. I know it extends Map, but is it really safe to
do
> > > something like the following? Should we have a utility that truly
> builds
> > a
> > > map that uses generics? I plan on doing some testing around this, but
> if
> > > anyone has any experience with this it would be greatly appreciated.
> Here
> > > is an example of what I am talking about:
> > >
> > > JSONObject message = ...;
> > > @SuppressWarnings("unchecked")
> > > Map<String, Object> state = (Map<String, Object>) message;
> > >
> > >
> > > 2. This one is about configuring charset (#3 above). Specifically in
> > > metron-rest. Right now, I believe there are 3 sensor resources
(index,
> > > enrichment, and parser) that throw warnings due to implicitly using
the
> > > default charset. I propose that we have 3 configs (listed below).
These
> > > configs would take any valid charset, default, or not set. If not set
> > then
> > > UTF-8 would be default. Does this seem fair?
> > >
> > > sensor:
> > > index.encoding: UTF-8
> > > enrichment.encoding: UTF-8
> > > parser.encoding: UTF-8
> > >
> > >
> > > Does anyone see any problems that may occur if we go about it this
way?
> > Any
> > > help on this would be very much appreciated.
> > >
> > > Thanks,
> > > JJ
> > >
> >
>

Re: Reducing Warnings in Build

Posted by JJ Meyer <jj...@gmail.com>.

Nick,

You're not off base at all. This is exactly the push back I wanted to hear.
I wasn't really sure if what I was proposing was the proper way of going
about it. My understanding of charsets is limited, and I've usually just
defaulted to UTF-8. That being said, right or wrong, my thought process for
including multiple configs was potential use of multiple charsets that may
be compatible (if that's possible?). But as I'm typing this out, I'm
probably over doing it. Especially since currently we usually take system
default, which a lot of times is UTF-8. So, you are right, it is probably
best to start with the path of least resistance and include one
configuration that is set to UTF-8. Then expand if we ever see a need to do
so.

Thanks,
JJ

On Fri, Apr 21, 2017 at 10:45 AM, Ryan Merriman <me...@gmail.com> wrote:

> I think Nick brings up some good points.  Would there ever be a reason to
> not use UTF8 as the default from parsing a message on?  All the tools we
> use for analytics work with UTF8 (am I wrong?).
>
> The only case I can see needing a configurable charset would be if a
> message coming from a sensor were encoded in a charset other than UTF8.  In
> that case there would need to be a configurable charset per parser (in
> parser config?) for decoding but the message could be encoded/decoded with
> UTF8 after that.  Or we could just make UTF encoding a prerequisite for
> sending messages to Metron.
>
> On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <ni...@nickallen.org> wrote:
>
> > Per (2), I think it makes sense to make the charset configurable, but
> with
> > the proposal of 3 separate settings, wouldn't things blow up horribly if
> > the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?
> They
> > are not even speaking the same language, no?
> >
> > This makes me think that we need to understand the scenarios under which
> a
> > user would want to change the charset, before we know what kinds of
> levers
> > to bake-in.  What sort of use case would drive someone to change the
> > charset?  Would there be a particular sensor producing telemetry with a
> > different charset from others?  If one source of telemetry is different
> > than the others, would the entire system even work with varying charsets?
> >
> > Without a good understanding of use cases, I think the only mildly safe
> > thing to do right now, is to have a single, global charset setting.  The
> > user would have to ensure that their entire environment and all the JVMs
> > driving it are set to use that charset.
> >
> > Perhaps my questioning comes from a lack of understanding of charsets.  I
> > can't remember ever having to deviate from the default.  Please chime in
> > and educate me, if I am off base.
> >
> >
> >
> >
> >
> >
> > On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
> >
> > > Hello everybody,
> > >
> > > Currently our build has a significant amount of warnings (most from the
> > > error prone plugin). A lot of this has been documented here:
> > > https://issues.apache.org/jira/browse/METRON-617
> > >
> > > I want to continue to work on this Jira. I really want to reduce the
> > number
> > > of warnings in our build. As the Jira points out, a lot of them are
> > > unchecked casts or the implicit use of default charset.
> > >
> > > When starting this, I want to start with a specific module. I plan on
> > > starting with `metron-interface` as it's fairly self contained and is
> > > pretty new. Below I want to layout what I plan on doing. Please let me
> > know
> > > what you all think:
> > >
> > > 1. Suppress warnings where generics are not supported or checking type
> is
> > > not possible.
> > > 2. For all unit tests that require supplying a charset I'll supply the
> > > UTF-8 charset.
> > > 3. Update the API to have a charset configuration for each resource
> that
> > > needs one.
> > > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> > > prone doesn't support this level of suppression. Which is probably a
> good
> > > thing. We will need to explicitly state what we want to suppress
> instead.
> > >
> > > Two big questions came up right away when I was thinking about the
> above:
> > >
> > > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > > Map<String, Object>. I know it extends Map, but is it really safe to do
> > > something like the following? Should we have a utility that truly
> builds
> > a
> > > map that uses generics? I plan on doing some testing around this, but
> if
> > > anyone has any experience with this it would be greatly appreciated.
> Here
> > > is an example of what I am talking about:
> > >
> > > JSONObject message = ...;
> > > @SuppressWarnings("unchecked")
> > > Map<String, Object> state = (Map<String, Object>) message;
> > >
> > >
> > > 2. This one is about configuring charset (#3 above). Specifically in
> > > metron-rest. Right now, I believe there are 3 sensor resources (index,
> > > enrichment, and parser) that throw warnings due to implicitly using the
> > > default charset. I propose that we have 3 configs (listed below). These
> > > configs would take any valid charset, default, or not set. If not set
> > then
> > > UTF-8 would be default. Does this seem fair?
> > >
> > > sensor:
> > >   index.encoding: UTF-8
> > >   enrichment.encoding: UTF-8
> > >   parser.encoding: UTF-8
> > >
> > >
> > > Does anyone see any problems that may occur if we go about it this way?
> > Any
> > > help on this would be very much appreciated.
> > >
> > > Thanks,
> > > JJ
> > >
> >
>

Re: Reducing Warnings in Build

Posted by Ryan Merriman <me...@gmail.com>.

I think Nick brings up some good points.  Would there ever be a reason to
not use UTF8 as the default from parsing a message on?  All the tools we
use for analytics work with UTF8 (am I wrong?).

The only case I can see needing a configurable charset would be if a
message coming from a sensor were encoded in a charset other than UTF8.  In
that case there would need to be a configurable charset per parser (in
parser config?) for decoding but the message could be encoded/decoded with
UTF8 after that.  Or we could just make UTF encoding a prerequisite for
sending messages to Metron.

On Fri, Apr 21, 2017 at 10:32 AM, Nick Allen <ni...@nickallen.org> wrote:

> Per (2), I think it makes sense to make the charset configurable, but with
> the proposal of 3 separate settings, wouldn't things blow up horribly if
> the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?  They
> are not even speaking the same language, no?
>
> This makes me think that we need to understand the scenarios under which a
> user would want to change the charset, before we know what kinds of levers
> to bake-in.  What sort of use case would drive someone to change the
> charset?  Would there be a particular sensor producing telemetry with a
> different charset from others?  If one source of telemetry is different
> than the others, would the entire system even work with varying charsets?
>
> Without a good understanding of use cases, I think the only mildly safe
> thing to do right now, is to have a single, global charset setting.  The
> user would have to ensure that their entire environment and all the JVMs
> driving it are set to use that charset.
>
> Perhaps my questioning comes from a lack of understanding of charsets.  I
> can't remember ever having to deviate from the default.  Please chime in
> and educate me, if I am off base.
>
>
>
>
>
>
> On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
>
> > Hello everybody,
> >
> > Currently our build has a significant amount of warnings (most from the
> > error prone plugin). A lot of this has been documented here:
> > https://issues.apache.org/jira/browse/METRON-617
> >
> > I want to continue to work on this Jira. I really want to reduce the
> number
> > of warnings in our build. As the Jira points out, a lot of them are
> > unchecked casts or the implicit use of default charset.
> >
> > When starting this, I want to start with a specific module. I plan on
> > starting with `metron-interface` as it's fairly self contained and is
> > pretty new. Below I want to layout what I plan on doing. Please let me
> know
> > what you all think:
> >
> > 1. Suppress warnings where generics are not supported or checking type is
> > not possible.
> > 2. For all unit tests that require supplying a charset I'll supply the
> > UTF-8 charset.
> > 3. Update the API to have a charset configuration for each resource that
> > needs one.
> > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> > prone doesn't support this level of suppression. Which is probably a good
> > thing. We will need to explicitly state what we want to suppress instead.
> >
> > Two big questions came up right away when I was thinking about the above:
> >
> > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > Map<String, Object>. I know it extends Map, but is it really safe to do
> > something like the following? Should we have a utility that truly builds
> a
> > map that uses generics? I plan on doing some testing around this, but if
> > anyone has any experience with this it would be greatly appreciated. Here
> > is an example of what I am talking about:
> >
> > JSONObject message = ...;
> > @SuppressWarnings("unchecked")
> > Map<String, Object> state = (Map<String, Object>) message;
> >
> >
> > 2. This one is about configuring charset (#3 above). Specifically in
> > metron-rest. Right now, I believe there are 3 sensor resources (index,
> > enrichment, and parser) that throw warnings due to implicitly using the
> > default charset. I propose that we have 3 configs (listed below). These
> > configs would take any valid charset, default, or not set. If not set
> then
> > UTF-8 would be default. Does this seem fair?
> >
> > sensor:
> >   index.encoding: UTF-8
> >   enrichment.encoding: UTF-8
> >   parser.encoding: UTF-8
> >
> >
> > Does anyone see any problems that may occur if we go about it this way?
> Any
> > help on this would be very much appreciated.
> >
> > Thanks,
> > JJ
> >
>

Re: Reducing Warnings in Build

Posted by Michael Miklavcic <mi...@gmail.com>.

Personally, I think it makes sense to have an inbound/outbound charset
setting for the parsers. We should generally standardize on UTF-8 across
Metron and its topologies, but accept potentially different charsets from
the inbound sensors. That is to say, standardize the defaults for the
things we do control, and handle edge cases for the things we don't. And
UTF-8 is fully compatible with ASCII without requiring a mapping, so this
should generally work.
http://stackoverflow.com/questions/15965811/why-utf8-is-compatible-with-ascii.
Even with defaults, I think we should still let users configure this to
whatever they want, but offer the warning of what will happen if the
expected charsets don't match up.

On Fri, Apr 21, 2017 at 9:32 AM, Nick Allen <ni...@nickallen.org> wrote:

> Per (2), I think it makes sense to make the charset configurable, but with
> the proposal of 3 separate settings, wouldn't things blow up horribly if
> the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?  They
> are not even speaking the same language, no?
>
> This makes me think that we need to understand the scenarios under which a
> user would want to change the charset, before we know what kinds of levers
> to bake-in.  What sort of use case would drive someone to change the
> charset?  Would there be a particular sensor producing telemetry with a
> different charset from others?  If one source of telemetry is different
> than the others, would the entire system even work with varying charsets?
>
> Without a good understanding of use cases, I think the only mildly safe
> thing to do right now, is to have a single, global charset setting.  The
> user would have to ensure that their entire environment and all the JVMs
> driving it are set to use that charset.
>
> Perhaps my questioning comes from a lack of understanding of charsets.  I
> can't remember ever having to deviate from the default.  Please chime in
> and educate me, if I am off base.
>
>
>
>
>
>
> On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:
>
> > Hello everybody,
> >
> > Currently our build has a significant amount of warnings (most from the
> > error prone plugin). A lot of this has been documented here:
> > https://issues.apache.org/jira/browse/METRON-617
> >
> > I want to continue to work on this Jira. I really want to reduce the
> number
> > of warnings in our build. As the Jira points out, a lot of them are
> > unchecked casts or the implicit use of default charset.
> >
> > When starting this, I want to start with a specific module. I plan on
> > starting with `metron-interface` as it's fairly self contained and is
> > pretty new. Below I want to layout what I plan on doing. Please let me
> know
> > what you all think:
> >
> > 1. Suppress warnings where generics are not supported or checking type is
> > not possible.
> > 2. For all unit tests that require supplying a charset I'll supply the
> > UTF-8 charset.
> > 3. Update the API to have a charset configuration for each resource that
> > needs one.
> > 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> > prone doesn't support this level of suppression. Which is probably a good
> > thing. We will need to explicitly state what we want to suppress instead.
> >
> > Two big questions came up right away when I was thinking about the above:
> >
> > 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> > Map<String, Object>. I know it extends Map, but is it really safe to do
> > something like the following? Should we have a utility that truly builds
> a
> > map that uses generics? I plan on doing some testing around this, but if
> > anyone has any experience with this it would be greatly appreciated. Here
> > is an example of what I am talking about:
> >
> > JSONObject message = ...;
> > @SuppressWarnings("unchecked")
> > Map<String, Object> state = (Map<String, Object>) message;
> >
> >
> > 2. This one is about configuring charset (#3 above). Specifically in
> > metron-rest. Right now, I believe there are 3 sensor resources (index,
> > enrichment, and parser) that throw warnings due to implicitly using the
> > default charset. I propose that we have 3 configs (listed below). These
> > configs would take any valid charset, default, or not set. If not set
> then
> > UTF-8 would be default. Does this seem fair?
> >
> > sensor:
> >   index.encoding: UTF-8
> >   enrichment.encoding: UTF-8
> >   parser.encoding: UTF-8
> >
> >
> > Does anyone see any problems that may occur if we go about it this way?
> Any
> > help on this would be very much appreciated.
> >
> > Thanks,
> > JJ
> >
>

Re: Reducing Warnings in Build

Posted by Nick Allen <ni...@nickallen.org>.

Per (2), I think it makes sense to make the charset configurable, but with
the proposal of 3 separate settings, wouldn't things blow up horribly if
the Parsers are producing UTF-8, but Enrichment is expecting UTF-16?  They
are not even speaking the same language, no?

This makes me think that we need to understand the scenarios under which a
user would want to change the charset, before we know what kinds of levers
to bake-in.  What sort of use case would drive someone to change the
charset?  Would there be a particular sensor producing telemetry with a
different charset from others?  If one source of telemetry is different
than the others, would the entire system even work with varying charsets?

Without a good understanding of use cases, I think the only mildly safe
thing to do right now, is to have a single, global charset setting.  The
user would have to ensure that their entire environment and all the JVMs
driving it are set to use that charset.

Perhaps my questioning comes from a lack of understanding of charsets.  I
can't remember ever having to deviate from the default.  Please chime in
and educate me, if I am off base.

On Fri, Apr 21, 2017 at 8:50 AM, JJ Meyer <jj...@gmail.com> wrote:

> Hello everybody,
>
> Currently our build has a significant amount of warnings (most from the
> error prone plugin). A lot of this has been documented here:
> https://issues.apache.org/jira/browse/METRON-617
>
> I want to continue to work on this Jira. I really want to reduce the number
> of warnings in our build. As the Jira points out, a lot of them are
> unchecked casts or the implicit use of default charset.
>
> When starting this, I want to start with a specific module. I plan on
> starting with `metron-interface` as it's fairly self contained and is
> pretty new. Below I want to layout what I plan on doing. Please let me know
> what you all think:
>
> 1. Suppress warnings where generics are not supported or checking type is
> not possible.
> 2. For all unit tests that require supplying a charset I'll supply the
> UTF-8 charset.
> 3. Update the API to have a charset configuration for each resource that
> needs one.
> 4. Remove @SuppressWarnings("ALL") on all unit tests. I found out error
> prone doesn't support this level of suppression. Which is probably a good
> thing. We will need to explicitly state what we want to suppress instead.
>
> Two big questions came up right away when I was thinking about the above:
>
> 1. When suppressing warnings. I see we sometimes cast a JSONObject to
> Map<String, Object>. I know it extends Map, but is it really safe to do
> something like the following? Should we have a utility that truly builds a
> map that uses generics? I plan on doing some testing around this, but if
> anyone has any experience with this it would be greatly appreciated. Here
> is an example of what I am talking about:
>
> JSONObject message = ...;
> @SuppressWarnings("unchecked")
> Map<String, Object> state = (Map<String, Object>) message;
>
>
> 2. This one is about configuring charset (#3 above). Specifically in
> metron-rest. Right now, I believe there are 3 sensor resources (index,
> enrichment, and parser) that throw warnings due to implicitly using the
> default charset. I propose that we have 3 configs (listed below). These
> configs would take any valid charset, default, or not set. If not set then
> UTF-8 would be default. Does this seem fair?
>
> sensor:
>   index.encoding: UTF-8
>   enrichment.encoding: UTF-8
>   parser.encoding: UTF-8
>
>
> Does anyone see any problems that may occur if we go about it this way? Any
> help on this would be very much appreciated.
>
> Thanks,
> JJ
>