You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Colin McCabe <cm...@alumni.cmu.edu> on 2015/05/05 22:25:26 UTC

Re: HDFS audit log

I think HDFS INotify is a better choice if you need:
* guaranteed backwards compatibility
* rapid and unambiguous parsing (via protobuf)
* clear Java API for retrieving the data (I.e. not rsync on a text file)
* ability to resume reading at a given point if the consumer process fails

We are using it in production for this purpose, via Cloudera Manager.  It
would work well with Kafka or Flume or whatever.

The audit log is just a human readable log file.  Its format has never been
fixed or even formally specified.

Colin
On Apr 25, 2015 7:58 AM, "Allen Wittenauer" <aw...@altiscale.com> wrote:

>
>         I think we need to have a discussion about the HDFS audit log.
>
>         The purpose of the HDFS audit log* is for operations and security
> people to keep track of actual, bits-on-disk changes to HDFS and related
> metadata changes. It is not meant as a catch-all for any and all HDFS
> operations.  It is most definitely processed by code written by people.
> It’s format is meant to be fixed; specifically no new fields and all fields
> should be present on every line. It’s meant to be extremely easy to parse
> for even junior admins.
>
>         For the past year, I’ve noticed an extremely disturbing trend:
>
>                 a) Changes to the log file with BREAKS operations people.
> Part of the problem here is that the compatibility guidelines don’t specify
> that this file is locked.  We should fix this.
>
>                 b) An increasing number of “we should log this random NN
> operation”.  Unless it modifies the actual data, these are not AUDIT-worthy
> events.  Ask yourself, “would a security person care?”  If the answer is
> no, then don’t put it in the HDFS audit log and just keep an entry in the
> generic namenode log.  If the answer is yes, get a second opinion from
> someone else, preferably outside your team who actually does security.
>
>
> * - if anyone wants the full history, feel free to ask …