You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by tm...@apache.org on 2017/02/23 16:31:12 UTC

[4/7] incubator-impala git commit: IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

IMPALA-3411 [DOCS] Rework Impala governance topics to be generic.

This set of edits removes references and links to Cloudera Navigator
and Cloudera Manager from the auditing and lineage topics. Those
were either marked as 'hidden' or replaced with a generic suggestion
to use cluster management software with a focus on governance.

Some paragraphs with overflowing lines were also fixed.

Change-Id: I192bc2d1de89e55418c045d1a0e5433cf02cf782
Reviewed-on: http://gerrit.cloudera.org:8080/5957
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/cede3762
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/cede3762
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/cede3762

Branch: refs/heads/master
Commit: cede3762fe1641342fa56e2d92b74fa93a98200a
Parents: 79b5e7d
Author: Ambreen Kazi <am...@cloudera.com>
Authored: Wed Feb 8 22:43:43 2017 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Feb 23 01:22:53 2017 +0000

----------------------------------------------------------------------
 docs/topics/impala_auditing.xml | 67 +++++++++++++++++++++++-------------
 docs/topics/impala_lineage.xml  | 25 +++++++-------
 2 files changed, 57 insertions(+), 35 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/cede3762/docs/topics/impala_auditing.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_auditing.xml b/docs/topics/impala_auditing.xml
index ecf03bf..8dd5b23 100644
--- a/docs/topics/impala_auditing.xml
+++ b/docs/topics/impala_auditing.xml
@@ -36,38 +36,43 @@ under the License.
   <conbody>
 
     <p>
-      To monitor how Impala data is being used within your organization, ensure that your Impala authorization and
-      authentication policies are effective, and detect attempts at intrusion or unauthorized access to Impala
+      To monitor how Impala data is being used within your organization, ensure
+      that your Impala authorization and authentication policies are effective.
+      To detect attempts at intrusion or unauthorized access to Impala
       data, you can use the auditing feature in Impala 1.2.1 and higher:
     </p>
 
     <ul>
       <li>
-        Enable auditing by including the option <codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
-        in your <cmdname>impalad</cmdname> startup options for a cluster not managed by Cloudera Manager, or
-        <xref audience="integrated" href="cn_iu_audit_log.xml#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7d6f/section_v25_lmy_bn">configuring Impala Daemon logging in Cloudera Manager</xref><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/cn_iu_service_audit.html" scope="external" format="html">configuring Impala Daemon logging in Cloudera Manager</xref>.
+        Enable auditing by including the option
+        <codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
+        in your <cmdname>impalad</cmdname> startup options.
         The log directory must be a local directory on the
         server, not an HDFS directory.
+        <p audience="hidden">
+	  For a cluster managed by Cloudera Manager, see
+          <xref
+          href="cn_iu_audit_log.xml#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7d6f/section_v25_lmy_bn"/>.
+        </p>
       </li>
 
       <li>
-        Decide how many queries will be represented in each log files. By default, Impala starts a new log file
-        every 5000 queries. To specify a different number, <ph
-          audience="standalone"
-          >include
-        the option <codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph> in the
-        <cmdname>impalad</cmdname> startup
-        options</ph><xref
-          href="cn_iu_audit_log.xml#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7d6f/section_v25_lmy_bn"
-            audience="integrated"
-            >configure
-        Impala Daemon logging in Cloudera Manager</xref>.
+        Decide how many queries will be represented in each log file. By default,
+        Impala starts a new log file every 5000 queries. To specify a different number, <ph
+          audience="standalone">include
+        the option <codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
+        in the <cmdname>impalad</cmdname> startup options</ph>
+        <xref href="cn_iu_audit_log.xml#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--7d6f/section_v25_lmy_bn" audience="integrated">
+        configure Impala Daemon logging in Cloudera Manager</xref>.
       </li>
 
-      <li> Configure Cloudera Navigator to collect and consolidate the audit
-        logs from all the hosts in the cluster. </li>
+      <li> 
+        Use a cluster manager with governance capabilities to filter, visualize,
+        and produce reports based on the audit logs collected
+        from all the hosts in the cluster. 
+      </li>
 
-      <li>
+      <li audience="hidden">
         Use Cloudera Navigator or Cloudera Manager to filter, visualize, and produce reports based on the audit
         data. (The Impala auditing feature works with Cloudera Manager 4.7 to 5.1 and Cloudera Navigator 2.1 and
         higher.) Check the audit data to ensure that all activity is authorized and detect attempts at
@@ -101,9 +106,19 @@ under the License.
         <codeph>fsync()</codeph> system call) to avoid loss of audit data in case of a crash.
       </p>
 
-      <p> The runtime overhead of auditing applies to whichever host serves as the coordinator for the query, that is, the host you connect to when you issue the query. This might be the same host for all queries, or different applications or users might connect to and issue queries through different hosts. </p>
+      <p> 
+        The runtime overhead of auditing applies to whichever host serves as the coordinator
+        for the query, that is, the host you connect to when you issue the query. This might
+        be the same host for all queries, or different applications or users might connect to
+        and issue queries through different hosts. 
+      </p>
 
-      <p> To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log data (using the <codeph>fsync()</codeph> system call) periodically rather than after every query. Currently, the <codeph>fsync()</codeph> calls are issued at a fixed interval, every 5 seconds. </p>
+      <p> 
+        To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
+        data (using the <codeph>fsync()</codeph> system call) periodically rather than after
+        every query. Currently, the <codeph>fsync()</codeph> calls are issued at a fixed
+        interval, every 5 seconds. 
+      </p>
 
       <p>
         By default, Impala avoids losing any audit log data in the case of an error during a logging operation
@@ -127,7 +142,13 @@ under the License.
 
     <conbody>
 
-      <p> The audit log files represent the query information in JSON format, one query per line. Typically, rather than looking at the log files themselves, you use the Cloudera Navigator product to consolidate the log data from all Impala hosts and filter and visualize the results in useful ways. (If you do examine the raw log data, you might run the files through a JSON pretty-printer first.) </p>
+      <p> 
+        The audit log files represent the query information in JSON format, one query per line.
+        Typically, rather than looking at the log files themselves, you should use cluster-management
+        software to consolidate the log data from all Impala hosts and filter and visualize the results
+        in useful ways. (If you do examine the raw log data, you might run the files through
+        a JSON pretty-printer first.) 
+     </p>
 
       <p>
         All the information about schema objects accessed by the query is encoded in a single nested record on the
@@ -255,7 +276,7 @@ Here is an excerpt from a sample audit log file:
     </conbody>
   </concept>
 
-  <concept id="auditing_reviewing">
+  <concept id="auditing_reviewing" audience="hidden">
 
     <title>Reviewing the Audit Logs</title>
   <prolog>

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/cede3762/docs/topics/impala_lineage.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_lineage.xml b/docs/topics/impala_lineage.xml
index b8b76b7..f444836 100644
--- a/docs/topics/impala_lineage.xml
+++ b/docs/topics/impala_lineage.xml
@@ -42,25 +42,25 @@ under the License.
     <p rev="2.2.0">
       <indexterm audience="hidden">lineage</indexterm>
       <indexterm audience="hidden">column lineage</indexterm>
-      <term>Lineage</term> is a feature in the Cloudera Navigator data
-      management component that helps you track where data originated, and how
+      <term>Lineage</term> is a feature that helps you track where data originated, and how
       data propagates through the system through SQL statements such as
         <codeph>SELECT</codeph>, <codeph>INSERT</codeph>, and <codeph>CREATE
-        TABLE AS SELECT</codeph>. Impala is covered by the Cloudera Navigator
-      lineage features in <keyword keyref="impala22_full"/> and higher. </p>
-
+        TABLE AS SELECT</codeph>.
+    </p>
     <p>
-      This type of tracking is important in high-security configurations, especially in highly regulated industries
-      such as healthcare, pharmaceuticals, financial services and intelligence. For such kinds of sensitive data, it is important to know all
+      This type of tracking is important in high-security configurations, especially in
+      highly regulated industries such as healthcare, pharmaceuticals, financial services and
+      intelligence. For such kinds of sensitive data, it is important to know all
       the places in the system that contain that data or other data derived from it; to verify who has accessed
       that data; and to be able to doublecheck that the data used to make a decision was processed correctly and
       not tampered with.
     </p>
 
-    <p>
+    <p audience="hidden">
       You interact with this feature through <term>lineage diagrams</term> showing relationships between tables and
       columns. For instructions about interpreting lineage diagrams, see
-      <xref audience="integrated" href="cn_iu_lineage.xml" /><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/cn_iu_lineage.html" scope="external" format="html"/>.
+      <xref audience="integrated" href="cn_iu_lineage.xml" />
+      <xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/cn_iu_lineage.html" scope="external" format="html"/>.
     </p>
 
     <section id="column_lineage">
@@ -118,10 +118,11 @@ under the License.
       </p>
 
       <p>
-        To enable or disable this feature on a system not managed by Cloudera Manager, set or remove the
-        <codeph>-lineage_event_log_dir</codeph> configuration option for the <cmdname>impalad</cmdname> daemon. For
+        To enable or disable this feature, set or remove the <codeph>-lineage_event_log_dir</codeph>
+        configuration option for the <cmdname>impalad</cmdname> daemon. <ph audience="hidden">For
         information about turning the lineage feature on and off through Cloudera Manager, see
-        <xref audience="integrated" href="datamgmt_impala_lineage_log.xml"/><xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/datamgmt_impala_lineage_log.html" scope="external" format="html"/>.
+        <xref audience="integrated" href="datamgmt_impala_lineage_log.xml"/>
+        <xref audience="standalone" href="http://www.cloudera.com/documentation/enterprise/latest/topics/datamgmt_impala_lineage_log.html" scope="external" format="html"/>.</ph>
       </p>
 
     </section>