You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/05/22 01:39:26 UTC

drill git commit: Submitting kris's and bridget's minor edits to docs

Repository: drill
Updated Branches:
  refs/heads/gh-pages 2070bbe78 -> 5808b09da


Submitting kris's and bridget's minor edits to docs


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/5808b09d
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/5808b09d
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/5808b09d

Branch: refs/heads/gh-pages
Commit: 5808b09da25a0a16318c1cb5e61c878c56b7d5cd
Parents: 2070bbe
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Thu May 21 16:38:45 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Thu May 21 16:38:45 2015 -0700

----------------------------------------------------------------------
 _docs/architecture/015-drill-query-execution.md     | 10 +++++-----
 .../070-configuring-user-impersonation.md           | 16 ++++++++--------
 .../075-configuring-user-authentication.md          | 13 +++++++------
 _docs/log-and-debug/002-error-messages.md           |  8 ++++----
 _docs/performance-tuning/020-partition-pruning.md   |  5 +++--
 .../020-physical-operators.md                       |  6 +++---
 6 files changed, 30 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/architecture/015-drill-query-execution.md
----------------------------------------------------------------------
diff --git a/_docs/architecture/015-drill-query-execution.md b/_docs/architecture/015-drill-query-execution.md
index 730460a..8e04163 100755
--- a/_docs/architecture/015-drill-query-execution.md
+++ b/_docs/architecture/015-drill-query-execution.md
@@ -17,7 +17,7 @@ The Foreman sends the logical plan into a cost-based optimizer to optimize the o
 
 A parallelizer in the Foreman transforms the physical plan into multiple phases, called major and minor fragments. These fragments create a multi-level execution tree that rewrites the query and executes it in parallel against the configured data sources, sending the results back to the client or application.
 
-![]({{ site.baseurl }}/docs/img/execution-tree.png)  
+![]({{ site.baseurl }}/docs/img/execution-tree.PNG)  
 
 
 ## Major Fragments
@@ -42,13 +42,13 @@ The parallelizer in the Foreman creates one or more minor fragments from a major
 
 Drill executes each minor fragment in its own thread as quickly as possible based on its upstream data requirements. Drill schedules the minor fragments on nodes with data locality. Otherwise, Drill schedules them in a round-robin fashion on the existing, available Drillbits.
 
-Minor fragments contain one or more relational operators. An operator performs a relational operation, such as scan, filter, join, or group by. Each operator has a particular operator type and an OperatorID. Each OperatorID defines its relationship within the minor fragment to which it belongs.  
+Minor fragments contain one or more relational operators. An operator performs a relational operation, such as scan, filter, join, or group by. Each operator has a particular operator type and an OperatorID. Each OperatorID defines its relationship within the minor fragment to which it belongs. See [Physical Operators]({{ site.baseurl }}/docs/physical-operators/).
 
 ![]({{ site.baseurl }}/docs/img/operators.png)
 
 For example, when performing a hash aggregation of two files, Drill breaks the first phase dedicated to scanning into two minor fragments. Each minor fragment contains scan operators that scan the files. Drill breaks the second phase dedicated to aggregation into four minor fragments. Each of the four minor fragments contain hash aggregate operators that perform the hash  aggregation operations on the data. 
 
-You cannot modify the number of minor fragments within the execution plan. However, you can view the query profile in the Drill Web UI and modify some configuration options that change the behavior of minor fragments, such as the maximum number of slices. See [Configuration Options]({{ site.baseurl }}/docs/configuration-options-introduction/) for more information.
+You cannot modify the number of minor fragments within the execution plan. However, you can view the query profile in the Drill Web UI and modify some configuration options that change the behavior of minor fragments, such as the maximum number of slices. See [Configuration Options]({{ site.baseurl }}/docs/configuration-options-introduction/).
 
 ### Execution of Minor Fragments
 Minor fragments can run as root, intermediate, or leaf fragments. An execution tree contains only one root fragment. The coordinates of the execution tree are numbered from the root, with the root being zero. Data flows downstream from the leaf fragments to the root fragment.
@@ -57,9 +57,9 @@ The root fragment runs in the Foreman and receives incoming queries, reads metad
 
 Intermediate fragments start work when data is available or fed to them from other fragments. They perform operations on the data and then send the data downstream. They also pass the aggregated results to the root fragment, which performs further aggregation and provides the query results to the client or application.
 
-The leaf fragments scan tables in parallel and communicate with the storage layer or access data on local disk. The leaf fragments pass partial results to the intermediate fragments, which perform parallel operations on intermediate results.
+The leaf fragments scan tables in parallel and communicate with the storage layer or access data on local disk. The leaf fragments pass partial results to the intermediate fragments, which perform parallel operations on intermediate results.  
 
-![]({{ site.baseurl }}/docs/leaf-frag.png)
+![]({{ site.baseurl }}/docs/img/leaf-frag.png)    
 
 Drill only plans queries that have concurrent running fragments. For example, if 20 available slices exist in the cluster, Drill plans a query that runs no more than 20 minor fragments in a particular major fragment. Drill is optimistic and assumes that it can complete all of the work in parallel. All minor fragments for a particular major fragment start at the same time based on their upstream data dependency.
 

http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/configure-drill/070-configuring-user-impersonation.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/070-configuring-user-impersonation.md b/_docs/configure-drill/070-configuring-user-impersonation.md
index dbcdaf0..90e391c 100644
--- a/_docs/configure-drill/070-configuring-user-impersonation.md
+++ b/_docs/configure-drill/070-configuring-user-impersonation.md
@@ -2,7 +2,7 @@
 title: "Configuring User Impersonation"
 parent: "Configure Drill"
 ---
-Impersonation allows a service to act on behalf of a client while performing the action requested by the client. By default, user impersonation is disabled in Drill. You can configure user impersonation in the drill-override.conf file.
+Impersonation allows a service to act on behalf of a client while performing the action requested by the client. By default, user impersonation is disabled in Drill. You can configure user impersonation in the <DRILLINSTALL_HOME>/conf/drill-override.conf file.
  
 When you enable impersonation, Drill executes client requests as the user logged in to the client. Drill passes the user credentials to the file system, and the file system checks to see if the user has permission to access the data. When you enable authentication, Drill uses the pluggable authentication module (PAM) to authenticate a user’s identity before the user can access the Drillbit process. See User Authentication.
  
@@ -13,7 +13,7 @@ If impersonation is not configured, Drill executes all of the client requests ag
 When impersonation is disabled and user Bob issues a query through the SQLLine client, SQLLine passes the query to the connecting Drillbit. The Drillbit executes the query as the system user that started the Drill process on the node. For the purpose of this example, we will assume that the system user has full access to the file system. Drill executes the query and returns the results back to the client.
 ![](http://i.imgur.com/4XxQK2I.png)
 
-When impersonation is enabled and user Bob issues a query through the SQLLine client, the Drillbit executes the query against the file system as Bob. The file system checks to see if Bob has permission to access the data. If so, Drill returns the query results to the client. If Bob does not have permission, Drill returns an error.
+When impersonation is enabled and user Bob issues a query through the SQLLine client, the Drillbit uses Bob's credentials to access data in the file system. The file system checks to see if Bob has permission to access the data. If so, Drill returns the query results to the client. If Bob does not have permission, Drill returns an error.
 ![](http://i.imgur.com/oigWqVg.png)
 
 ## Impersonation Support
@@ -27,8 +27,8 @@ The following table lists the clients, storage plugins, and types of queries tha
   </tr>
   <tr>
     <td>Clients</td>
-    <td>SQLLine ODBC JDBC</td>
-    <td>Drill Web UI REST API</td>
+    <td>SQLLine, ODBC, JDBC</td>
+    <td>Drill Web UI, REST API</td>
   </tr>
   <tr>
     <td>Storage Plugins</td>
@@ -45,7 +45,7 @@ The following table lists the clients, storage plugins, and types of queries tha
 ## Impersonation and Views
 You can use views with impersonation to provide granular access to data and protect sensitive information. When you create a view, Drill stores the view definition in a file and suffixes the file with .drill.view. For example, if you create a view named myview, Drill creates a view file named myview.drill.view and saves it in the current workspace or the workspace specified, such as dfs.views.myview. See [CREATE VIEW]({{site.baseurl}}/docs/create-view) Command.
 
-You can create a view and grant read permissions on the view to give other users access to the data that the view references. When a user queries the view, Drill impersonates the view owner to access the underlying data. A user with read access to a view can create new views from the originating view to further restrict access on data.
+You can create a view and grant read permissions on the view to give other users access to the data that the view references. When a user queries the view, Drill impersonates the view owner to access the underlying data. If the user tries to access the data directory, Drill returns a permission denied error. A user with read access to a view can create new views from the originating view to further restrict access on data.
 
 ### View Permissions
 A user must have write permission on a directory or workspace to create a view, as well as read access on the table(s) and/or view(s) that the view references. When a user creates a view, permission on the view is set to owner by default. Users can query an existing view or create new views from the view if they have read permissions on the view file and the directory or workspace where the view file is stored. 
@@ -79,7 +79,7 @@ After you set this parameter, Drill applies the same permissions on each view cr
 ## Chained Impersonation
 You can configure Drill to allow chained impersonation on views when you enable impersonation in the `drill-override.conf` file. Chained impersonation controls the number of identity transitions that Drill can make when a user queries a view. Each identity transition is equal to one hop.
  
-You can set the maximum number of hops on views to limit the number of times that Drill can impersonate a different user when a user queries a view. The default maximum number of hops is set at 3. When the maximum number of hops is set to 0, Drill does not allow impersonation chaining, and a user can only read data for which they have direct permission to access. You may set chain length to 0 to protect highly sensitive data. 
+You can set the maximum number of hops on views to limit the number of times that Drill can impersonate a different user when a user queries a view. The default maximum number of hops is set at 3. When the maximum number of hops is set to 0, Drill does not allow impersonation chaining, and a user can only read data for which they have direct permission to access. An administrator may set the chain length to 0 to protect highly sensitive data. Only an administrator can change this setting.
  
 The following example depicts a scenario where the maximum hop number is set to 3, and Drill must impersonate three users to access data when Chad queries a view that Jane created:
 
@@ -127,13 +127,13 @@ drwx------      frank:hr     /user/frank/employees
 Each record in the employees table consists of the following information:
 emp_id, emp_name, emp_ssn, emp_salary, emp_addr, emp_phone, emp_mgr
  
-Frank needs to share a subset of this information with Joe who is an HR manager reporting to Frank. To share the employee data, Frank creates a view called emp_mgr_view that accesses a subset of the data. The emp_mgr_view filters out sensitive employee information, such as the employee social security numbers, and only shows data for the employees that report directly to Joe or the manager running the query on the view. Frank and Joe both belong to the mgr group. Managers have read permission on Frank’s directory.
+Frank needs to share a subset of this information with Joe who is an HR manager reporting to Frank. To share the employee data, Frank creates a view called emp_mgr_view that accesses a subset of the data. The emp_mgr_view filters out sensitive employee information, such as the employee social security numbers, and only shows data for the employees that report directly to Joe. Frank and Joe both belong to the mgr group. Managers have read permission on Frank’s directory.
  
 rwxr-----     frank:mgr   /user/frank/emp_mgr_view.drill.view
  
 The emp_mgr_view.drill.view file contains the following view definition:
 
-(view definition: SELECT emp_id, emp_name, emp_salary, emp_addr, emp_phone FROM \`/user/frank/employee\` WHERE emp_mgr = user())
+(view definition: SELECT emp_id, emp_name, emp_salary, emp_addr, emp_phone FROM \`/user/frank/employee\` WHERE emp_mgr = 'Joe')
  
 When Joe issues SELECT * FROM emp_mgr_view, Drill impersonates Frank when accessing the employee data, and the query returns the data that Joe has permission to see based on the view definition. The query results do not include any sensitive data because the view protects that information. If Joe tries to query the employees table directly, Drill returns an error or null values.
  

http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/configure-drill/075-configuring-user-authentication.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/075-configuring-user-authentication.md b/_docs/configure-drill/075-configuring-user-authentication.md
index 07656a7..94a03d0 100755
--- a/_docs/configure-drill/075-configuring-user-authentication.md
+++ b/_docs/configure-drill/075-configuring-user-authentication.md
@@ -6,7 +6,9 @@ Authentication is the process of proving a user’s identity to access a process
  
 If user impersonation is enabled, Drill executes the client requests as the authenticated user. Otherwise, Drill executes client requests as the user that started the Drillbit process. You can enable both authorization and impersonation to improve Drill security. See [Configuring User Impersonation]({{site.baseurl}}/docs/configuring-user-impersonation/).
 
-When using PAM for authentication, each user that has permission to run Drill must exist in the list of users that resides on each Drill node in the cluster. The username (including uid) and password for each user must be identical across all of the Drill nodes. 
+When using PAM for authentication, each user that has permission to run Drill queries must exist in the list of users that resides on each Drill node in the cluster. The username (including uid) and password for each user must be identical across all of the Drill nodes. 
+
+If you use PAM with /etc/passwd for authentication, verify that the users with permission to start the Drill process are part of the shadow user group on all nodes in the cluster. This enables Drill to read the /etc/shadow file for authentication. 
 
 ## User Authentication Process
 
@@ -15,12 +17,11 @@ When user authentication is configured, each user that accesses the Drillbit pro
 When launching SQLLine, a user must include the `–n` and `–p` parameters with their username and password in the SQLLine argument:  
        `sqlline –u jdbc:drill:zk=10.10.11.112:5181 –n bob –p bobdrill`
 
- 
-When a user connects to Drill from a BI tool, such as Tableau, the MapR Drill ODBC driver prompts the user for their username and password:
+ When a user connects to Drill from a BI tool, such as Tableau, the MapR Drill ODBC driver prompts the user for their username and password:
 
 ![ODBC Driver]({{site.baseurl}}/docs/img/UserAuth_ODBC_Driver.png)
 
-The client passes the username and password to a Drillbit, which then passes the credentials to PAM. If PAM can verify that the user is authorized to access Drill, the user can connect to the Drillbit process from the client and issue queries against the file system or other storage plugins, such as Hive or HBase. However, if PAM cannot verify that the user is authorized to access Drill, the client returns an error.
+The client passes the username and password to a Drillbit as part of the connection request, which then passes the credentials to PAM. If PAM can verify that the user is authorized to access Drill, the connection is successful, and the user can issues queries against the file system or other storage plugins, such as Hive or HBase. However, if PAM cannot verify that the user is authorized to access Drill, the connection is terminated as AUTH_FAILED.
  
 The following image illustrates the user authentication process in Drill:
 
@@ -28,7 +29,7 @@ The following image illustrates the user authentication process in Drill:
 
 ### Installing and Configuring PAM
 
-Install and configure the provided Drill PAM. Drill only supports the PAM provided here.
+Install and configure the provided Drill PAM. Drill only supports the PAM provided here. Optionally, you can build and implement a custom authenticator using the instructions under "Implementing and Configuring a Custom Authenticator."
  
 Complete the following steps to install and configure PAM for Drill:
 
@@ -50,7 +51,7 @@ Complete the following steps to install and configure PAM for Drill:
            } 
           }
 
-5. (Optional) To add or remove different PAM profiles, add or delete the profile names in the `“pam_profiles”` array.  
+5. (Optional) To add or remove different PAM profiles, add or delete the profile names in the `“pam_profiles”` array shown above.  
 6. Restart the Drillbit process on each Drill node.
    * In a MapR cluster, run the following command:  
 

http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/log-and-debug/002-error-messages.md
----------------------------------------------------------------------
diff --git a/_docs/log-and-debug/002-error-messages.md b/_docs/log-and-debug/002-error-messages.md
index eb0c827..d0c1f7b 100644
--- a/_docs/log-and-debug/002-error-messages.md
+++ b/_docs/log-and-debug/002-error-messages.md
@@ -9,7 +9,7 @@ Drill produces several types of error messages. You can ignore issues that conta
    * ChannelClosedException
    * Connection reset by peer
 
-These issues typically result from a problem outside of the query process. However, if you encounter a java.lang.OutOfMemoryError error, take action and give Drill as much memory as possible to resolve the issue. See Configuring Drill Memory.
+These issues typically result from a problem outside of the query process. However, if you encounter a java.lang.OutOfMemoryError error, take action and give Drill as much memory as possible to resolve the issue. See [Configuring Drill Memory]({{ site.baseurl }}/docs/configuring-drill-memory/).
 
 Drill assigns an ErrorId to each error that occurs. An ErrorID is a unique identifier for a particular error that tells you which node assigned the error. For example,
 [ 1ee8e004-2fce-420f-9790-5c6f8b7cad46 on 10.1.1.109:31010 ]. You can log into the node that assigned the error and grep the Drill log for the ErrorId to get more information about the error.
@@ -20,6 +20,6 @@ The following table provides descriptions for the IDs included in a thread:
 
 | ID Type         | Description                                                                                                                                                                                                                                  |
 |-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| QueryID         | The identifier assigned to the query. You can locate a query in Drill Web UI by the QueryID and then cancel the query if needed. See Query Profiles for more information.                                                                    |
-| MajorFragmentID | The identifier assigned to a major fragment. Major fragments map to the physical plan. You can see major fragment activity for a query in the Drill Web UI. See [Query Profiles]({{site.baseurl}}/docs/query-profiles) for more information. |
-| MinorFragmentID | The identifier assigned to the minor fragment. Minor fragments map to the parallelization of major fragments. See Query Profiles for more information.                                                                                       |
\ No newline at end of file
+| QueryID         | The identifier assigned to the query. You can locate a query in Drill Web UI by the QueryID and then cancel the query if needed. See [Query Profiles]({{ site.baseurl }}/docs/query-profiles/).                                                                    |
+| MajorFragmentID | The identifier assigned to a major fragment. Major fragments map to the physical plan. You can see major fragment activity for a query in the Drill Web UI. See [Query Profiles]({{ site.baseurl }}/docs/query-profiles) for more information. |
+| MinorFragmentID | The identifier assigned to the minor fragment. Minor fragments map to the parallelization of major fragments. See [Query Profiles]({{ site.baseurl }}/docs/query-profiles).                                                                                       |
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/performance-tuning/020-partition-pruning.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/020-partition-pruning.md b/_docs/performance-tuning/020-partition-pruning.md
index 8babc8d..307110f 100755
--- a/_docs/performance-tuning/020-partition-pruning.md
+++ b/_docs/performance-tuning/020-partition-pruning.md
@@ -13,8 +13,9 @@ You can organize your data in such a way that maximizes partition pruning in Dri
  
 Partitioning data requires you to determine a partitioning scheme, or a logical way to store the data in a hierarchy of directories. You can then use CTAS to create Parquet files from the original data, specifying filter conditions, and then move the files into the correlating directories in the hierarchy. Once you have partitioned the data, you can create and query views on the data.
  
-Partitioning Example
-For example, if you have several text files with log data which span multiple years, and you want to partition the data by year and quarter, you could create the following hierarchy of directories:  
+### Partitioning Example  
+
+If you have several text files with log data which span multiple years, and you want to partition the data by year and quarter, you could create the following hierarchy of directories:  
        
        …/logs/1994/Q1  
        …/logs/1994/Q2  

http://git-wip-us.apache.org/repos/asf/drill/blob/5808b09d/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
index 36d1928..6680509 100644
--- a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
+++ b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
@@ -95,9 +95,9 @@ Drill uses the following receiver operators:
 
 Drill uses the following sender operators:  
 
-| PartitionSender                                                                                                                                                |                                                                                                                |
-|----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
-| The PartitionSender operator maintains a queue for each outbound destination.  May be either the number of outbound minor fragments or the number of the nodes | depending on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination. |  
+| Operator        | Description                                                                                                                                                                                                                                                                    |
+|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| PartitionSender | The PartitionSender operator maintains a queue for each outbound destination.  May be either the number of outbound minor fragments or the number of the nodes, depending on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination. |
 
 ## File Writers