You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Attila Simon (JIRA)" <ji...@apache.org> on 2016/08/17 15:48:20 UTC
[jira] [Comment Edited] (FLUME-2954) make raw data appearing in log
messages explicit
[ https://issues.apache.org/jira/browse/FLUME-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424747#comment-15424747 ]
Attila Simon edited comment on FLUME-2954 at 8/17/16 3:48 PM:
--------------------------------------------------------------
Changes made in the spirit of the discussed:
{noformat}
--------------------------------------------------------------------------------
flume-ng-channel ---
flume-jdbc-channel ---
JdbcChannelProviderImpl#98 <- fail properties <REMOVED>
JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA>
flume-kafka-channel ---
KafkaChannel#230 #253 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration ---
FlumeConfiguration#315 #372 <- fail properties <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core ---
SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY>
GangliaServer#224 #245 <- safe data: only flume component metrics data <KEPT>
LoggerSink#95 <- fail data: on purpose <KEPT>
AvroSource#347 <- fail data: log whole message <DRIVE BY PROPERTY>
MultiportSyslogTCPSource#360 <- fail data: log whole message <DRIVE BY PROPERTY>
BLOBHandler#70 <- fail data: logs http request headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent ---
EmbeddedAgent#155 <- fail properties: printing all config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks ---
flume-hive-sink ---
HiveEndPoint has an URI field. <- fail properties <KEPT><FOLLOWUP IN JIRA>
It may contain private data
(URI string may contain password) as it is
excessively logged within this module.
Appears in HiveSink#298 #342 #400 #403 #428,
HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
HiveEndPoint is also attached to exception logs as well
flume-ng-hbase-sink ---
AsyncHBaseSink#641 <- safe data: error details gets logged in case of failure <KEPT>
flume-ng-kafka-sink ---
KafkaSink#179 <- fail data: log whole message <REMOVED>
KafkaSink#304 <- fail properties <REMOVED>
flume-ng-morphline-solr-sink ---
BlobHandler#98 #113 <- fail data: log http request headers <DRIVE BY PROPERTY>
MorphlineSink#139 <- fail data: logs event <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources ---
flume-kafka-source ---
KafkaSource#247 <- fail data: log whole <DRIVE BY PROPERTY>
flume-twitter-source ---
TwitterSource#110-113 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
{noformat}
was (Author: sati):
Changes made in the spirit of the discussed:
--------------------------------------------------------------------------------
flume-ng-channel ---
flume-jdbc-channel ---
JdbcChannelProviderImpl#98 <- fail properties <REMOVED>
JdbcChannelProviderImpl#261 #431 <- fail properties: jdbc url might include password <KEPT><FOLLOWUP IN JIRA>
flume-kafka-channel ---
KafkaChannel#230 #253 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
flume-ng-configuration ---
FlumeConfiguration#315 #372 <- fail properties <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-core ---
SyslogAvroEventSerializer#150 <- fail data: SyslogEvent.message gets logged <DRIVE BY PROPERTY>
GangliaServer#224 #245 <- safe data: only flume component metrics data <KEPT>
LoggerSink#95 <- fail data: on purpose <KEPT>
AvroSource#347 <- fail data: log whole message <DRIVE BY PROPERTY>
MultiportSyslogTCPSource#360 <- fail data: log whole message <DRIVE BY PROPERTY>
BLOBHandler#70 <- fail data: logs http request headers <DRIVE BY PROPERTY>
-------------------------------------------------------------------q-------------
flume-ng-embedded-agent ---
EmbeddedAgent#155 <- fail properties: printing all config <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sinks ---
flume-hive-sink ---
HiveEndPoint has an URI field. <- fail properties <KEPT><FOLLOWUP IN JIRA>
It may contain private data
(URI string may contain password) as it is
excessively logged within this module.
Appears in HiveSink#298 #342 #400 #403 #428,
HiveWriter#210 #319 #330 #337 #353 #365 #368 #407...)
HiveEndPoint is also attached to exception logs as well
flume-ng-hbase-sink ---
AsyncHBaseSink#641 <- safe data: error details gets logged in case of failure <KEPT>
flume-ng-kafka-sink ---
KafkaSink#179 <- fail data: log whole message <REMOVED>
KafkaSink#304 <- fail properties <REMOVED>
flume-ng-morphline-solr-sink ---
BlobHandler#98 #113 <- fail data: log http request headers <DRIVE BY PROPERTY>
MorphlineSink#139 <- fail data: logs event <DRIVE BY PROPERTY>
--------------------------------------------------------------------------------
flume-ng-sources ---
flume-kafka-source ---
KafkaSource#247 <- fail data: log whole <DRIVE BY PROPERTY>
flume-twitter-source ---
TwitterSource#110-113 <- fail properties <REMOVED>
--------------------------------------------------------------------------------
> make raw data appearing in log messages explicit
> ------------------------------------------------
>
> Key: FLUME-2954
> URL: https://issues.apache.org/jira/browse/FLUME-2954
> Project: Flume
> Issue Type: Improvement
> Components: Channel, Configuration, Sinks+Sources
> Affects Versions: v1.6.0
> Reporter: Attila Simon
> Assignee: Attila Simon
> Priority: Critical
> Fix For: v1.7.0
>
> Attachments: FLUME-2954-1.patch, FLUME-2954-2.patch, FLUME-2954.patch
>
>
> Flume has built in functionality to log out data flowing through
> mainly for debugging purposes. This functionality appears in several
> places of the codebase. I think such functionality rise security
> concerns in production environments where sensitive information might
> be ingested so it is crucial that enabling such functionality has to
> be as explicit as possible (avoid implicit side effect setup).
> Eg: setting the level of root logger to debug/trace cause that every
> other logger will start logging at debug/trace including the ones
> logging raw data.
> In this jira I would like to provide a patch capturing how I imagined solving this issue. It can be refined iteratively or used as a basis for a broader discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)