You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Dan Burkert (Code Review)" <ge...@cloudera.org> on 2016/12/21 03:01:54 UTC

[kudu-CR] KUDU-1812: redact row contents from logs

Hello Jean-Daniel Cryans, Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/5555

to review the following change.

Change subject: KUDU-1812: redact row contents from logs
......................................................................

KUDU-1812: redact row contents from logs

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
---
M src/kudu/cfile/cfile_util.h
M src/kudu/client/batcher.cc
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/codegen/row_projector.h
M src/kudu/common/column_predicate.h
M src/kudu/common/encoded_key.h
M src/kudu/common/generic_iterators.cc
M src/kudu/common/partial_row.h
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec.h
M src/kudu/common/schema.h
M src/kudu/common/types.h
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/mutation.h
M src/kudu/tablet/row_op.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.h
M src/kudu/tablet/transactions/alter_schema_transaction.cc
M src/kudu/tablet/transactions/alter_schema_transaction.h
M src/kudu/tablet/transactions/transaction.h
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
M src/kudu/tools/tool_main.cc
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_service.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
44 files changed, 216 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5555/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1812: redact row contents from logs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1812: redact row contents from logs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/client.h
File src/kudu/client/client.h:

PS1, Line 1926:   // Internal Kudu Note: This method must not be used in log messages because
              :   // it contains sensitive predicate values. Use Scanner::Data::DebugString
              :   // instead.
> I made it with two slashes intentionally to avoid having it show up in the 
OK, I see now.

As for formatting/documenting, there is an option to enclose that note into the @internal/@endinternal directives:

http://www.doxygen.nl/manual/commands.html#cmdinternal

As for the implementation, I don't have a good idea here.


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812: redact row contents from logs

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1812: redact row contents from logs
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/client.h
File src/kudu/client/client.h:

PS1, Line 1926:   // Internal Kudu Note: This method must not be used in log messages because
              :   // it contains sensitive predicate values. Use Scanner::Data::DebugString
              :   // instead.
nit: since it's 'doxygenized' comment, please make the new note to conform that style as well -- check for @note ... in this file for examples.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/write_op.h
File src/kudu/client/write_op.h:

PS1, Line 78:   // Note for internal Kudu use: this method does *NOT* redact row values. The
            :   // caller must check 'log_row_contents' themselves, if applicable.
nit: since it's 'doxygenized' comment, please make the new note to conform that style as well -- check for @note ... in this file for examples.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

PS1, Line 461:   // Note for internal Kudu use: this method does *NOT* redact row values. The
             :   // caller must check 'log_row_contents' themselves, if applicable.
nit: since it's 'doxygenized' comment, please make the new note to conform that style as well -- check for @note ... in this file for examples.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/common/schema.h
File src/kudu/common/schema.h:

PS1, Line 28: <gflags/gflags.h>
I'm not sure this is needed in this file.  If yes, then as a nit, please consider using <gflags/gflags_declare.h> instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 36:     case RowOperationsPB::UNKNOWN:
> Restore this comment.
Done


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 48:       return Substitute("SPLIT_ROW $0", split_row->ToString());
> From here down should use KUDU_DISABLE_REDACTION.
Done


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/tablet/transactions/write_transaction.cc
File src/kudu/tablet/transactions/write_transaction.cc:

Line 357:   // NOTE: we'll eventually need to gate this by some flag if we want to avoid
> Remove this.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812: redact row contents from logs

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5555

to look at the new patch set (#2).

Change subject: KUDU-1812: redact row contents from logs
......................................................................

KUDU-1812: redact row contents from logs

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
---
M src/kudu/cfile/cfile_util.h
M src/kudu/client/batcher.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/codegen/row_projector.h
M src/kudu/common/column_predicate.h
M src/kudu/common/encoded_key.h
M src/kudu/common/generic_iterators.cc
M src/kudu/common/partial_row.h
M src/kudu/common/row_changelist.cc
M src/kudu/common/row_changelist.h
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec.h
M src/kudu/common/schema.h
M src/kudu/common/types.h
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/compaction.cc
M src/kudu/tablet/compaction.h
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/mutation.h
M src/kudu/tablet/row_op.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.h
M src/kudu/tablet/transactions/alter_schema_transaction.cc
M src/kudu/tablet/transactions/alter_schema_transaction.h
M src/kudu/tablet/transactions/transaction.h
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
M src/kudu/tools/tool_main.cc
M src/kudu/tserver/CMakeLists.txt
M src/kudu/tserver/tablet_service.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
43 files changed, 222 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5555/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 4:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/5555/1//COMMIT_MSG
Commit Message:

Line 7: KUDU-1812. Redact pretty-printed sensitive user data
A lot of decisions went into this patch, and I think they should be documented here. For example, redaction in ToString() methods vs. redaction via KUDU_REDACT().


http://gerrit.cloudera.org:8080/#/c/5555/4//COMMIT_MSG
Commit Message:

Line 53: 
Should also add a note about what it means to use KUDU_REDACT in the C++ client (namely, that it checks the gflag, but there's not yet a way for third party applications to change the value of the gflag).


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/cfile/cfile_util.h
File src/kudu/cfile/cfile_util.h:

PS1, Line 84:                     CFileIterator* it,
            :                     std::ostream* out,
I've spent some time thinking about this warning, and there's something about it that feels a little odd. Isn't the warning self-evident given the nature of the method?

If the goal is to leave a breadcrumb to "tag" methods that return strings containing row data, perhaps a more concise way to do that would be preferable here, where the warning itself is unnecessary?


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 520:   // so that when the user calls Flush, we are ready to go.
Arguably this is just spam without the partition key.


Line 569: }
Likewise, maybe just remove the string, it's not really adding any value anymore.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

Line 223:     return strings::Substitute("Scanner { table: $0, projection: $1, scan_spec: $2 }",
Nit: since there are a lot of callers, perhaps move to scanner-internal.cc?


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/partition-test.cc
File src/kudu/common/partition-test.cc:

Line 519:   // Ensure that redaction does not apply to partitions, since they are metadata.
Shouldn't this go above the CreatePartitions() call, for completeness?

Also, I think this could be clarified a bit: "Explicitly enable redaction. It should have no effect on the subsequent partition pretty printing tests, as partitions are metadata and thus not redacted".


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/partition.h
File src/kudu/common/partition.h:

PS4, Line 204:   // Returns a text description of this partition schema suitable for display in the web UI.
             :   // The format of this string is not guaranteed to be identical cross-version.
             :   //
             :   // 'range_partitions' should include the set of range partitions in the table,
             :   // as formatted by 'RangePartitionDebugString'.
             :   std::string DisplayString(const Schema& schema,
             :                             const std::vector<std::string>& range_partitions) const;
             : 
             :   // Returns header and entry HTML cells for the partition schema for the master
             :   // table web UI. This is an abstraction leak, but it's better than leaking the
             :   // internals of partitions to the master path handlers.
             :   std::string PartitionTableHeader(const Schema& schema) const;
             :   std::string PartitionTableEntry(const Schema& schema, const Partition& partition) const;
Should probably be annotated as being metadata and thus not redacting.


Line 250:   // Returns a text description of the encoded range key suitable for debug printing.
Likewise, for consistency with DebugStringComponents (also private).


Line 301:   void AppendRangeDebugStringComponentsOrMin(const KuduPartialRow& row,
Likewise.


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 52:       return Substitute("SPLIT_ROW $0", split_row->ToString());
Isn't a split point is morally equivalent to a partition boundary?


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/tools/tool_main.cc
File src/kudu/tools/tool_main.cc:

Line 227:   FLAGS_log_redact_user_data = false;
Let's add a comment explaining this.


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/flag_tags-test.cc
File src/kudu/util/flag_tags-test.cc:

PS4, Line 54:     // Set to true via KuduTest, but explicitly unset here as this test deals
            :     // with unsafe and experimental flags.
Should update this comment to make it more clear that KuduTest overrides the default values of these flags, so we need to restore the default values here.


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/flags-test.cc
File src/kudu/util/flags-test.cc:

PS4, Line 44:     // Set to true via KuduTest, but explicitly unset here as this test parses
            :     // command line flags and fails if an unsafe or experimental flag is set.
Likewise.


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/logging-test.cc
File src/kudu/util/logging-test.cc:

Line 195:     });
Nit: indentation?


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/logging.h
File src/kudu/util/logging.h:

Line 154: extern const char* const kRedactionMessage;
This is duplicated from just before ScopedDisableRedaction.


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 2:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 524:   VLOG(3) << "Looking up tablet for " << KUDU_REDACT(op->write_op->ToString());
> isn't this ToString() already safe?
This is not safe, because the KuduWriteOperation::ToString is not safe.  I'm going to switch everything to use the safe InFlightOp::ToString


Line 574:     << "Could not remove op " << KUDU_REDACT(op->ToString()) << " from in-flight list";
> same (and several below)
This one is safe because it's calling the InFlightOp::ToString, which is redacted.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/client.cc
File src/kudu/client/client.cc:

Line 1284:   } else if (data_->last_response_.has_more_results()) {
> warning: don't use else after return [readability-else-after-return]
Ignoring.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/client.h
File src/kudu/client/client.h:

PS1, Line 1926:   ///
              :   /// @internal This method must not be used in log messages because it contains
              :   ///   sensi
> OK, I see now.
That's perfect, I'll use that instead.


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2115
> hrm, was this ToString never implemented?
correct


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/write_op.h
File src/kudu/client/write_op.h:

PS1, Line 78:   ///
            :   /// @internal this method does *NOT* redact row values. The
> nit: since it's 'doxygenized' comment, please make the new note to conform 
Done


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 36: using std::get;
> warning: using decl 'all_of' is unused [misc-unused-using-decls]
Done


Line 468:     }
> warning: std::move of the const expression has no effect; remove std::move(
Done


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 466:       VLOG(1) << "Pushing down predicate " << KUDU_REDACT(pred.ToString());
> would be nice to keep the column name, operator, etc here
Done


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

PS1, Line 461:   /// @internal this method does *NOT* redact row values. The
             :   ///   caller must check 'log_row_contents' themselves, if applicab
> nit: since it's 'doxygenized' comment, please make the new note to conform 
Done


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/common/schema.h
File src/kudu/common/schema.h:

PS1, Line 28: <glog/logging.h>
> I'm not sure this is needed in this file.  If yes, then as a nit, please co
Done


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

Line 119:   virtual Status DebugDump(vector<string> *lines = NULL) = 0;
> warning: default arguments on virtual or override methods are prohibited [g
Ignoring.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/tablet/transactions/alter_schema_transaction.cc
File src/kudu/tablet/transactions/alter_schema_transaction.cc:

Line 43: 
> warning: using decl 'AlterSchemaResponsePB' is unused [misc-unused-using-de
Done


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/tablet/transactions/transaction.h
File src/kudu/tablet/transactions/transaction.h:

Line 122:   virtual void Finish(TransactionResult result) {}
> warning: parameter 'result' is unused [misc-unused-parameters]
ignoring.


http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/tools/tool_main.cc
File src/kudu/tools/tool_main.cc:

Line 46: using std::deque;
> warning: using decl 'cout' is unused [misc-unused-using-decls]
Done


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/util/logging.h
File src/kudu/util/logging.h:

Line 180: // as a macro so that the message can be lazily evaluated.
> I think lazy evaluation is good in the context of a release build, but I'm 
You have kept this as a macro in your version, so I'm assuming the default log reaction beign off is good enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5555

to look at the new patch set (#5).

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................

KUDU-1812. Redact pretty-printed sensitive user data

This commit introduces a framework in logging.h to enable sensitive user
data to be redacted in the Kudu server, and to a lesser extent, in the
C++ client.

- The 'log_redact_user_data' gflag is added which controls whether row
  and predicate data is redacted from log, error and status messages. The
  flag defaults to true in the Kudu server binaries order to provide a
  safe default for Kudu users. The flag is explicitly set to false in
  tests and the 'kudu' CLI tool in order to aid debugging.

- There is a new thread-local boolean which indicates whether ToString
  functions should redact user data in the current thread.  This defaults
  to 'true', but it is always also combined with the
  'log_redact_user_data' flag to determine whether redaction will actually
  happen.

- A utility macro KUDU_DISABLE_REDACTION(...) and RAII equivalent
  ScopedDisableRedaction can disable redaction while evaluating a
  particular expression or scope, useful in contexts such as the web UI or
  tools where we don't want to redact.

- A macro KUDU_REDACT(expr) replaces its argument with '<redacted>' if
  redaction is enabled.

- ToString and equivalent calls have been changed to consult
  log_redact_user_data and the TLS flag to determine whether to perform
  redaction, where appropriate. Some ToString calls specifically disable
  redaction based on the type of data being stringified; this behavior
  has been explicitly called out in the header documentation, but this
  isn't expected to be a source of security issues in the future since
  in all such cases the data is not considered sensitive. A handful of
  ToString methods in our public API disable redaction in order to
  retain the same behavior; these methods should never be used for
  internal Kudu logging without manual redaction, and their doxygen has
  been updated to indicate this.

- Redaction in the C++ client uses the same mechanism as in the server,
  but the 'log_redact_user_data' is permanently set to 'true'. We can
  add a public API option to allow applications to turn off redaction in
  the future if it proves to be necessary.

The advantage of a TLS flag in addition to 'log_redact_user_data' is
that we're now able to apply KUDU_REDACT() at a very low level (the
stringification functions in types.cc which are used for stringifying
all user data). This means that we are by-default redacted anywhere that
stringifies a row, rather than having to look for all cases that may
lead to this stringification. Instead, we only have to find the places
that explicitly want to disable redaction, which should be the exception
rather than the rule.

Redaction of raw buffers and protobuf messages will be included in a
follow up commit.

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
---
M src/kudu/client/batcher.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/schema-test.cc
M src/kudu/common/types-test.cc
M src/kudu/common/types.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/server/webserver.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_main.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging-test.cc
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
32 files changed, 341 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5555/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1812: redact row contents from logs

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812: redact row contents from logs
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5555/1/src/kudu/client/client.h
File src/kudu/client/client.h:

PS1, Line 1926:   // Internal Kudu Note: This method must not be used in log messages because
              :   // it contains sensitive predicate values. Use Scanner::Data::DebugString
              :   // instead.
> nit: since it's 'doxygenized' comment, please make the new note to conform 
I made it with two slashes intentionally to avoid having it show up in the rendered doxygen, since it's for internal Kudu development eyes only.  Is there a better alternative?


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/common/row_changelist.cc
File src/kudu/common/row_changelist.cc:

Line 354:                                   this->raw_value.ToDebugString(), col.ToString()));
> Note to self: this Slice ToDebugString call needs a KUDU_REDACT
Done


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 376:                   << Slice(base_data_->max_encoded_key_).ToDebugString()
> Note to self: redact this.
Done


Line 378:                   << spec->exclusive_upper_bound_key()->encoded_key().ToDebugString();
> Note to Self: redact this.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 6:

(11 comments)

Woops, I accidentally pressed wrong button :(

My bad -- I'm sorry.  I was about to press the 'Post' button instead.

My comments are mostly nits, though, but I remember Todd about to look at the patch as well.

http://gerrit.cloudera.org:8080/#/c/5555/4//COMMIT_MSG
Commit Message:

PS4, Line 7: .
nit: strayed dot?  consider removing


PS4, Line 10:  
nit: it would be nice to note that it's about status output and logging.

May be, something like:

... to be redacted when output into logs or in status messages.  That's done in the Kudu server side and, ...


PS4, Line 14: e
nit: here and elsewhere in the commit message -- consider wrapping up at 72 chars, if possible


PS4, Line 42: doxygen has
nit: 'doxygen entries have been updated' or 'doxygen docs have been updated' ?


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/client/client.h
File src/kudu/client/client.h:

PS4, Line 1927: @internal
nit: I would expect that to be under '@attention' or '@note' flag.  Also, consider moving that before the @return: usually, @return is the last entry in those blocks.

I.e., something like that:

@internal
@attention This method ...
@endinternal

or

@internal
@note This method ...
@endinternal


http://gerrit.cloudera.org:8080/#/c/5555/5/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

Line 23: 
nit: consider adding <gflags/gflags_declare.h> and re-ordering the includes.


http://gerrit.cloudera.org:8080/#/c/5555/5/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

PS5, Line 462: NOTE:
nit: replace with @note for the doxygen-like style


http://gerrit.cloudera.org:8080/#/c/5555/5/src/kudu/common/partition-test.cc
File src/kudu/common/partition-test.cc:

PS5, Line 26: gflags.h
nit: gflags_declare.h would suffice


PS5, Line 457:   // Explicitly enable redaction. It should have no effect on the subsequent
             :   // partition pretty printing tests, as partitions are metadata and thus not
             :   // redacted.
I'm not sure the order of tests for this class is always the same as in the source file -- I know there might be cases when it's not so: there is explicit --gtest_shuffle flag which might be used.  Consider setting the flag explicitly in every test.


http://gerrit.cloudera.org:8080/#/c/5555/5/src/kudu/common/types-test.cc
File src/kudu/common/types-test.cc:

PS5, Line 37: TestTypes
nit: it seems the modified AppendDebugStringForValue() method has no appropriate coverage in this test for the case when reduction is enabled.  Is it worth adding a small test here -- just to verify how the method works in redaction-enabled case.


http://gerrit.cloudera.org:8080/#/c/5555/5/src/kudu/util/logging.h
File src/kudu/util/logging.h:

PS5, Line 63: "redacted"
nit: it seems to be "<redacted>"


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/5555/4//COMMIT_MSG
Commit Message:

Line 53: 
> Should also add a note about what it means to use KUDU_REDACT in the C++ cl
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/partition-test.cc
File src/kudu/common/partition-test.cc:

Line 519:   // Ensure that redaction does not apply to partitions, since they are metadata.
> Shouldn't this go above the CreatePartitions() call, for completeness?
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/partition.h
File src/kudu/common/partition.h:

PS4, Line 204:   // Returns a text description of this partition schema suitable for display in the web UI.
             :   // The format of this string is not guaranteed to be identical cross-version.
             :   //
             :   // 'range_partitions' should include the set of range partitions in the table,
             :   // as formatted by 'RangePartitionDebugString'.
             :   std::string DisplayString(const Schema& schema,
             :                             const std::vector<std::string>& range_partitions) const;
             : 
             :   // Returns header and entry HTML cells for the partition schema for the master
             :   // table web UI. This is an abstraction leak, but it's better than leaking the
             :   // internals of partitions to the master path handlers.
             :   std::string PartitionTableHeader(const Schema& schema) const;
             :   std::string PartitionTableEntry(const Schema& schema, const Partition& partition) const;
> Should probably be annotated as being metadata and thus not redacting.
DisplayString doesn't doesn't format sensitive data and it doesn't disable redaction, so I think that would be misleading documentation.

For the HTML ones: done


Line 250:   // Returns a text description of the encoded range key suitable for debug printing.
> Likewise, for consistency with DebugStringComponents (also private).
These don't disable redaction, so I think that would be misleading.


Line 301:   void AppendRangeDebugStringComponentsOrMin(const KuduPartialRow& row,
> Likewise.
Same here, this does not disable redaction.


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 52:       return Substitute("SPLIT_ROW $0", split_row->ToString());
> Isn't a split point is morally equivalent to a partition boundary?
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/tools/tool_main.cc
File src/kudu/tools/tool_main.cc:

Line 227:   FLAGS_log_redact_user_data = false;
> Let's add a comment explaining this.
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/flag_tags-test.cc
File src/kudu/util/flag_tags-test.cc:

PS4, Line 54:     // Set to true via KuduTest, but explicitly unset here as this test deals
            :     // with unsafe and experimental flags.
> Should update this comment to make it more clear that KuduTest overrides th
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/flags-test.cc
File src/kudu/util/flags-test.cc:

PS4, Line 44:     // Set to true via KuduTest, but explicitly unset here as this test parses
            :     // command line flags and fails if an unsafe or experimental flag is set.
> Likewise.
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/logging-test.cc
File src/kudu/util/logging-test.cc:

Line 195:     });
> Nit: indentation?
Done


http://gerrit.cloudera.org:8080/#/c/5555/4/src/kudu/util/logging.h
File src/kudu/util/logging.h:

Line 154: extern const char* const kRedactionMessage;
> This is duplicated from just before ScopedDisableRedaction.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812: redact row contents from logs

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1812: redact row contents from logs
......................................................................


Patch Set 2:

(5 comments)

this looks pretty reasonable, but it's very hard to review for completeness. As we discussed on Slack, it would be nicer if the default behavior were to redact everything and we explicitly tagged the places where we stringified user data that we _didn't_ want redacted.

Let me do a little prototyping and see what I can come up with.

http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

Line 524:   VLOG(3) << "Looking up tablet for " << KUDU_REDACT(op->write_op->ToString());
isn't this ToString() already safe?


Line 574:     << "Could not remove op " << KUDU_REDACT(op->ToString()) << " from in-flight list";
same (and several below)


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2115
hrm, was this ToString never implemented?


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

Line 466:       VLOG(1) << "Pushing down predicate " << KUDU_REDACT(pred.ToString());
would be nice to keep the column name, operator, etc here


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/util/logging.h
File src/kudu/util/logging.h:

Line 180: // as a macro so that the message can be lazily evaluated.
I think lazy evaluation is good in the context of a release build, but I'm afraid we're not going to get great coverage of the non-redacted code path. In debug builds, maybe we should make this be a non-lazy function so that, if for example we tried to stringify a NULL pointer, we'd still get a crash in our DEBUG/ASAN test cases?

Or do you think the fact that we've defauled log-redaction to off in KuduTest is sufficient?


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5555

to look at the new patch set (#3).

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................

KUDU-1812. Redact pretty-printed sensitive user data

This commit introduces a framework in logging.h to enable sensitive user
data to be redacted in the Kudu server, and to a lesser extent, in the
C++ client.

- The 'log_redact_user_data' gflag is added which controls whether row
  and predicate data is redacted from log, error and status messages. The
  flag defaults to true in the Kudu server binaries order to provide a
  safe default for Kudu users. The flag is explicitly set to false in
  tests and the 'kudu' CLI tool in order to aid debugging.

- There is a new thread-local boolean which indicates whether ToString
  functions should redact user data in the current thread.  This defaults
  to 'true', but it is always also combined with the
  'log_redact_user_data' flag to determine whether redaction will actually
  happen.

- A utility macro KUDU_DISABLE_REDACTION(...) and RAII equivalent
  ScopedDisableRedaction can disable redaction while evaluating a
  particular expression or scope, useful in contexts such as the web UI or
  tools where we don't want to redact.

- A macro KUDU_REDACT(expr) replaces its argument with '<redacted>' if
  redaction is enabled.

- ToString and equivalent calls have been changed to consult
  log_redact_user_data and the TLS flag to determine whether to perform
  redaction, where appropriate. Some ToString calls specifically disable
  redaction based on the type of data being stringified; this behavior
  has been explicitly called out in the header documentation, but this
  isn't expected to be a source of security issues in the future since
  in all such cases the data is not considered sensitive. A handful of
  ToString methods in our public API disable redaction in order to
  retain the same behavior; these methods should never be used for
  internal Kudu logging without manual redaction, and their doxygen has
  been updated to indicate this.

The advantage of a TLS flag in addition to 'log_redact_user_data' is
that we're now able to apply KUDU_REDACT() at a very low level (the
stringification functions in types.cc which are used for stringifying
all user data). This means that we are by-default redacted anywhere that
stringifies a row, rather than having to look for all cases that may
lead to this stringification. Instead, we only have to find the places
that explicitly want to disable redaction, which should be the exception
rather than the rule.

Redaction of raw buffers and protobuf messages will be included in a
follow up commit.

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
---
M src/kudu/client/batcher.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/schema-test.cc
M src/kudu/common/types-test.cc
M src/kudu/common/types.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/server/webserver.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_main.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging-test.cc
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
30 files changed, 322 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5555/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 2:

I've rolled Todd's TLS work into this, should be much simpler now.

-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/common/row_changelist.cc
File src/kudu/common/row_changelist.cc:

Line 354:                                   this->raw_value.ToDebugString(), col.ToString()));
Note to self: this Slice ToDebugString call needs a KUDU_REDACT


http://gerrit.cloudera.org:8080/#/c/5555/2/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 36:     case RowOperationsPB::UNKNOWN:
Restore this comment.


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

Line 48:       return Substitute("SPLIT_ROW $0", split_row->ToString());
From here down should use KUDU_DISABLE_REDACTION.


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 376:                   << Slice(base_data_->max_encoded_key_).ToDebugString()
Note to self: redact this.


Line 378:                   << spec->exclusive_upper_bound_key()->encoded_key().ToDebugString();
Note to Self: redact this.


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/tablet/transactions/write_transaction.cc
File src/kudu/tablet/transactions/write_transaction.cc:

Line 357:   // NOTE: we'll eventually need to gate this by some flag if we want to avoid
Remove this.


http://gerrit.cloudera.org:8080/#/c/5555/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 1358:                 << " from " << scan_pb.ShortDebugString();
Redact this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged.

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................


KUDU-1812. Redact pretty-printed sensitive user data

This commit introduces a framework in logging.h to enable sensitive user
data to be redacted in the Kudu server, and to a lesser extent, in the
C++ client.

- The 'log_redact_user_data' gflag is added which controls whether row
  and predicate data is redacted from log, error and status messages. The
  flag defaults to true in the Kudu server binaries order to provide a
  safe default for Kudu users. The flag is explicitly set to false in
  tests and the 'kudu' CLI tool in order to aid debugging.

- There is a new thread-local boolean which indicates whether ToString
  functions should redact user data in the current thread.  This defaults
  to 'true', but it is always also combined with the
  'log_redact_user_data' flag to determine whether redaction will actually
  happen.

- A utility macro KUDU_DISABLE_REDACTION(...) and RAII equivalent
  ScopedDisableRedaction can disable redaction while evaluating a
  particular expression or scope, useful in contexts such as the web UI or
  tools where we don't want to redact.

- A macro KUDU_REDACT(expr) replaces its argument with '<redacted>' if
  redaction is enabled.

- ToString and equivalent calls have been changed to consult
  log_redact_user_data and the TLS flag to determine whether to perform
  redaction, where appropriate. Some ToString calls specifically disable
  redaction based on the type of data being stringified; this behavior
  has been explicitly called out in the header documentation, but this
  isn't expected to be a source of security issues in the future since
  in all such cases the data is not considered sensitive. A handful of
  ToString methods in our public API disable redaction in order to
  retain the same behavior; these methods should never be used for
  internal Kudu logging without manual redaction, and their doxygen has
  been updated to indicate this.

- Redaction in the C++ client uses the same mechanism as in the server,
  but the 'log_redact_user_data' is permanently set to 'true'. We can
  add a public API option to allow applications to turn off redaction in
  the future if it proves to be necessary.

The advantage of a TLS flag in addition to 'log_redact_user_data' is
that we're now able to apply KUDU_REDACT() at a very low level (the
stringification functions in types.cc which are used for stringifying
all user data). This means that we are by-default redacted anywhere that
stringifies a row, rather than having to look for all cases that may
lead to this stringification. Instead, we only have to find the places
that explicitly want to disable redaction, which should be the exception
rather than the rule.

Redaction of raw buffers and protobuf messages will be included in a
follow up commit.

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Reviewed-on: http://gerrit.cloudera.org:8080/5555
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/client/batcher.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/schema-test.cc
M src/kudu/common/types-test.cc
M src/kudu/common/types.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/server/webserver.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_main.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging-test.cc
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
32 files changed, 341 insertions(+), 81 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1812. Redact pretty-printed sensitive user data

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5555

to look at the new patch set (#4).

Change subject: KUDU-1812. Redact pretty-printed sensitive user data
......................................................................

KUDU-1812. Redact pretty-printed sensitive user data

This commit introduces a framework in logging.h to enable sensitive user
data to be redacted in the Kudu server, and to a lesser extent, in the
C++ client.

- The 'log_redact_user_data' gflag is added which controls whether row
  and predicate data is redacted from log, error and status messages. The
  flag defaults to true in the Kudu server binaries order to provide a
  safe default for Kudu users. The flag is explicitly set to false in
  tests and the 'kudu' CLI tool in order to aid debugging.

- There is a new thread-local boolean which indicates whether ToString
  functions should redact user data in the current thread.  This defaults
  to 'true', but it is always also combined with the
  'log_redact_user_data' flag to determine whether redaction will actually
  happen.

- A utility macro KUDU_DISABLE_REDACTION(...) and RAII equivalent
  ScopedDisableRedaction can disable redaction while evaluating a
  particular expression or scope, useful in contexts such as the web UI or
  tools where we don't want to redact.

- A macro KUDU_REDACT(expr) replaces its argument with '<redacted>' if
  redaction is enabled.

- ToString and equivalent calls have been changed to consult
  log_redact_user_data and the TLS flag to determine whether to perform
  redaction, where appropriate. Some ToString calls specifically disable
  redaction based on the type of data being stringified; this behavior
  has been explicitly called out in the header documentation, but this
  isn't expected to be a source of security issues in the future since
  in all such cases the data is not considered sensitive. A handful of
  ToString methods in our public API disable redaction in order to
  retain the same behavior; these methods should never be used for
  internal Kudu logging without manual redaction, and their doxygen has
  been updated to indicate this.

The advantage of a TLS flag in addition to 'log_redact_user_data' is
that we're now able to apply KUDU_REDACT() at a very low level (the
stringification functions in types.cc which are used for stringifying
all user data). This means that we are by-default redacted anywhere that
stringifies a row, rather than having to look for all cases that may
lead to this stringification. Instead, we only have to find the places
that explicitly want to disable redaction, which should be the exception
rather than the rule.

Redaction of raw buffers and protobuf messages will be included in a
follow up commit.

Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
---
M src/kudu/client/batcher.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/meta_cache.h
M src/kudu/client/scanner-internal.h
M src/kudu/client/session-internal.cc
M src/kudu/client/write_op.h
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/common/partition-test.cc
M src/kudu/common/partition.cc
M src/kudu/common/partition.h
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.h
M src/kudu/common/row_operations.cc
M src/kudu/common/row_operations.h
M src/kudu/common/scan_spec-test.cc
M src/kudu/common/schema-test.cc
M src/kudu/common/types-test.cc
M src/kudu/common/types.cc
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/server/webserver.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_main.cc
M src/kudu/util/flag_tags-test.cc
M src/kudu/util/flags-test.cc
M src/kudu/util/logging-test.cc
M src/kudu/util/logging.cc
M src/kudu/util/logging.h
M src/kudu/util/test_util.cc
32 files changed, 332 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5555/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2b87a3065280116bb8af6f26f072dafdfd1ee077
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>