You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Attila Simon (JIRA)" <ji...@apache.org> on 2016/08/17 15:48:20 UTC

[jira] [Comment Edited] (FLUME-2954) make raw data appearing in log messages explicit

    [ https://issues.apache.org/jira/browse/FLUME-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424747#comment-15424747 ] 

Attila Simon edited comment on FLUME-2954 at 8/17/16 3:48 PM:
--------------------------------------------------------------

Changes made in the spirit of the discussed:
{noformat}
--------------------------------------------------------------------------------
flume-ng-channel                              ---
  flume-jdbc-channel                          ---
    JdbcChannelProviderImpl#98                <- fail properties <REMOVED>
    JdbcChannelProviderImpl#261 #431          <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA>
  flume-kafka-channel                         ---
    KafkaChannel#230 #253                     <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration                        ---
  FlumeConfiguration#315 #372                 <- fail properties <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core                                 ---
  SyslogAvroEventSerializer#150               <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY>
  GangliaServer#224 #245                      <- safe data: only flume component metrics data <KEPT>
  LoggerSink#95                               <- fail data: on purpose <KEPT>
  AvroSource#347                              <- fail data: log whole message <DRIVE BY PROPERTY>
  MultiportSyslogTCPSource#360                <- fail data: log whole message <DRIVE BY PROPERTY>
  BLOBHandler#70                              <- fail data: logs http request headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent                       ---
  EmbeddedAgent#155                           <- fail properties: printing all config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks                                ---
  flume-hive-sink                             ---
    HiveEndPoint has an URI field.            <- fail properties <KEPT><FOLLOWUP IN JIRA>
        It may contain private data
        (URI string may contain password) as it is
        excessively logged within this module.
        Appears in HiveSink#298 #342 #400 #403 #428,
        HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
        HiveEndPoint is also attached to exception logs as well
  flume-ng-hbase-sink                         ---
    AsyncHBaseSink#641                        <- safe data: error details gets logged in case of failure <KEPT>
  flume-ng-kafka-sink                         ---
    KafkaSink#179                             <- fail data: log whole message <REMOVED>
    KafkaSink#304                             <- fail properties <REMOVED>
  flume-ng-morphline-solr-sink                ---
    BlobHandler#98 #113                       <- fail data: log http request headers <DRIVE BY PROPERTY>
    MorphlineSink#139                         <- fail data: logs event <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources                              ---
  flume-kafka-source                          ---
    KafkaSource#247                           <- fail data: log whole <DRIVE BY PROPERTY>
  flume-twitter-source                        ---
    TwitterSource#110-113                     <- fail properties <REMOVED>
--------------------------------------------------------------------------------
{noformat}


was (Author: sati):
Changes made in the spirit of the discussed:
--------------------------------------------------------------------------------
flume-ng-channel                              ---
  flume-jdbc-channel                          ---
    JdbcChannelProviderImpl#98                <- fail properties <REMOVED>
    JdbcChannelProviderImpl#261 #431          <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA>
  flume-kafka-channel                         ---
    KafkaChannel#230 #253                     <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration                        ---
  FlumeConfiguration#315 #372                 <- fail properties <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core                                 ---
  SyslogAvroEventSerializer#150               <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY>
  GangliaServer#224 #245                      <- safe data: only flume component metrics data <KEPT>
  LoggerSink#95                               <- fail data: on purpose <KEPT>
  AvroSource#347                              <- fail data: log whole message <DRIVE BY PROPERTY>
  MultiportSyslogTCPSource#360                <- fail data: log whole message <DRIVE BY PROPERTY>
  BLOBHandler#70                              <- fail data: logs http request headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent                       ---
  EmbeddedAgent#155                           <- fail properties: printing all config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks                                ---
  flume-hive-sink                             ---
    HiveEndPoint has an URI field.            <- fail properties <KEPT><FOLLOWUP IN JIRA>
        It may contain private data
        (URI string may contain password) as it is
        excessively logged within this module.
        Appears in HiveSink#298 #342 #400 #403 #428,
        HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
        HiveEndPoint is also attached to exception logs as well
  flume-ng-hbase-sink                         ---
    AsyncHBaseSink#641                        <- safe data: error details gets logged in case of failure <KEPT>
  flume-ng-kafka-sink                         ---
    KafkaSink#179                             <- fail data: log whole message <REMOVED>
    KafkaSink#304                             <- fail properties <REMOVED>
  flume-ng-morphline-solr-sink                ---
    BlobHandler#98 #113                       <- fail data: log http request headers <DRIVE BY PROPERTY>
    MorphlineSink#139                         <- fail data: logs event <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources                              ---
  flume-kafka-source                          ---
    KafkaSource#247                           <- fail data: log whole <DRIVE BY PROPERTY>
  flume-twitter-source                        ---
    TwitterSource#110-113                     <- fail properties <REMOVED>
--------------------------------------------------------------------------------

> make raw data appearing in log messages explicit
> ------------------------------------------------
>
>                 Key: FLUME-2954
>                 URL: https://issues.apache.org/jira/browse/FLUME-2954
>             Project: Flume
>          Issue Type: Improvement
>          Components: Channel, Configuration, Sinks+Sources
>    Affects Versions: v1.6.0
>            Reporter: Attila Simon
>            Assignee: Attila Simon
>            Priority: Critical
>             Fix For: v1.7.0
>
>         Attachments: FLUME-2954-1.patch, FLUME-2954-2.patch, FLUME-2954.patch
>
>
> Flume has built in functionality to log out data flowing through
> mainly for debugging purposes. This functionality appears in several
> places of the codebase. I think such functionality rise security
> concerns in production environments where sensitive information might
> be ingested so it is crucial that enabling such functionality has to
> be as explicit as possible (avoid implicit side effect setup).
> Eg: setting the level of root logger to debug/trace cause that every
> other logger will start logging at debug/trace including the ones
> logging raw data.
> In this jira I would like to provide a patch capturing how I imagined solving this issue. It can be refined iteratively or used as a basis for a broader discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)