You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Attila Simon <sa...@cloudera.com> on 2016/08/23 12:26:28 UTC

Re: raw data in log messages is a security risk

Hi All,

To continue this discussion please take a look at the diff:
https://reviews.apache.org/r/51182/. The review opened a debate which may
be interesting for wider audience:


"Can we consider the event headers safe in aspect of logging sensitive
data?"


Headers are key, value pairs (Map<String, String>) and I think the "value"
bit may contain sensitive data thus logging of it should be avoided. It
gets interesting in the SimpleEvent implementation which is logged
occasionally (this is the only implementation which overrides the
Object.toString):
public String toString() {

  Integer bodyLen = null;
  if (body != null) bodyLen = body.length;
  return "[Event headers = " + headers + ", body.length = " + bodyLen + " ]";
}

Purely changing headers to headers.keySet() would result in a change from
from [Event headers =
{{timestamp=2016.08.23T12:00:00},{hostname=example},{loginId=sati}},
body.length = 16 ]
to [Event headers = {timestamp,hostname,loginId}, body.length = 16 ]

This is a proposed change to SimpleEvent (additional to whatever the patch
already contains). Do you have any objection? Please share your opinion!

Cheers,
Attila





*Attila Simon*
Software Engineer
Email:   sati@cloudera.com

[image: Cloudera Inc.]

On Wed, Jul 13, 2016 at 1:55 AM, Mike Percy <mp...@apache.org> wrote:

> Great, thanks for filing that JIRA Attila. Let's continue the discussion
> there.
>
> Mike
>
> On Tue, Jul 12, 2016 at 8:50 AM, Attila Simon <sa...@cloudera.com> wrote:
>
> > Hi Mike,
> >
> > I created a jira (https://issues.apache.org/jira/browse/FLUME-2954)
> > and would like to look around first in the codebase where such content
> > log was introduced. Based on the actual use cases we can discuss
> > further what would be the best approach. Thanks for the log4j concerns
> > that actually moved my standpoint a bit.
> >
> > Cheers,
> > Attila
> >
> >
> > Attila Simon
> > Software Engineer
> > Email:   sati@cloudera.com
> >
> >
> >
> >
> > On Sat, Jul 9, 2016 at 3:06 AM, Mike Percy <mp...@apache.org> wrote:
> > > Hi Attila,
> > > Thanks for bringing this up. I agree that we should prevent logging
> data
> > > unless it is explicitly enabled.
> > >
> > > One concern I have about the log4j approach is that many people have
> > > customized their log4j.properties file. As long as your proposal would
> > keep
> > > logging of data disabled for all of the likely customizations people
> > might
> > > have in place then it sounds good. However maybe you can be a little
> more
> > > specific about how it would look in the log4j.properties file and how
> it
> > > would look at the code level when writing to that named logger. I'm not
> > > totally sure I understand your exact proposal.
> > >
> > > Mike
> > >
> > > On Tue, Jul 5, 2016 at 9:44 AM, Attila Simon <sa...@cloudera.com>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> Flume has built in functionality to log out data flowing through
> > >> mainly for debugging purposes. This functionality appears in several
> > >> places of the codebase. I think such functionality rise security
> > >> concerns in production environments where sensitive information might
> > >> be ingested so it is crucial that enabling such functionality has to
> > >> be as explicit as possible (avoid implicit side effect setup).
> > >> Eg: setting the level of root logger to debug/trace cause that every
> > >> other logger will start logging at debug/trace including the ones
> > >> logging raw data.
> > >>
> > >> Options to solve this issue:
> > >> 1) command line option to enable data logging
> > >> 2) configuration property to enable data logging globally
> > >> 3) implementing a single concept which is solely responsible for
> > >> logging ie a single LoggerSink (which already exists) or Interceptor
> > >> 4) introduction of a new named logger instance which is configured OFF
> > >> in log4j config
> > >> 5) any other idea is welcomed
> > >>
> > >> Considering the pros and cons of the usage and implementation I would
> > >> vote for 4) but I require your opinion. I'm going to open a jira to
> > >> tackle this work (please let me know if there are some important
> > >> fields I have to set considering 1.7 release).
> > >>
> > >> Cheers,
> > >> Attila
> > >>
> >
>