You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2016/06/03 22:17:26 UTC

drill git commit: add distribution operator descriptions

Repository: drill
Updated Branches:
  refs/heads/gh-pages b3b409c8b -> ea1aa1fa7


add distribution operator descriptions


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/ea1aa1fa
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/ea1aa1fa
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/ea1aa1fa

Branch: refs/heads/gh-pages
Commit: ea1aa1fa79b512ab88263bb06c582c057e4e21c5
Parents: b3b409c
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Fri Jun 3 15:11:50 2016 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Fri Jun 3 15:11:50 2016 -0700

----------------------------------------------------------------------
 .../020-physical-operators.md                   | 234 ++++++++++---------
 1 file changed, 118 insertions(+), 116 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/ea1aa1fa/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
index bbcf1c8..d019b82 100644
--- a/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
+++ b/_docs/performance-tuning/performance-tuning-reference/020-physical-operators.md
@@ -1,116 +1,118 @@
----
-title: "Physical Operators"
-date:  
-parent: "Performance Tuning Reference"
---- 
-
-This document describes the physical operators that Drill uses in query plans.
-
-## Distribution Operators  
-
-Drill uses the following operators to perform data distribution over the network:  
-
-* HashToRandomExchange
-* HashToMergeExchange
-* UnionExchange
-* SingleMergeExchange
-* BroadcastExchange
-* UnorderedMuxExchange
-
-## Join Operators  
-
-Drill uses the following operators:
-
-| Operator         | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Hash Join        | A Hash Join is used for inner joins, left, right and full outer joins.  A hash table is built on the rows produced by the inner child of the Hash Join.  The outer child rows are used to probe the hash table and find matches. This operator Holds the entire dataset for the right hand side of the join in memory  which could be up to 2 billion records per minor fragment.                                                                          |
-| Merge Join       | A Merge Join is used for inner join, left and right outer joins.  Inputs to the Merge Join must be sorted. It reads the sorted input streams from both sides and finds matching rows.  This operator holds the amount of memory of one incoming record batch from each side of the join.   In addition, if there are repeating values in the right hand side of the join, the Merge Join will hold record batches for as long as a repeated value extends. |
-| Nested Loop Join | A Nested Loop Join is used for certain types of cartesian joins and inequality joins.                                                                                                                                                                                                                                                                                                                                                                      |  
-
-## Aggregate Operators  
-
-Drill uses the following aggregate operators:  
-
-| Operator            | Description                                                                                                                                                                                                                                                                                                                                                                                                                           |
-|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Hash Aggregate      | A Hash Aggregate performs grouped aggregation on the input data by building a hash table on the GROUP-BY keys and computing the aggregate values within each group. This operator holds memory for each aggregation grouping and each aggregate value, up to 2 billion values per minor fragment.                                                                                                                                     |
-| Streaming Aggregate | A Streaming Aggregate performs grouped aggregation and non-grouped aggregation.  For grouped aggregation, the data must be sorted on the GROUP-BY keys.  Aggregate values are computed within each group.  For non-grouped aggregation, data does not have to be sorted. This operator maintains a single aggregate grouping (keys and aggregate intermediate values) at a time in addition to the size of one incoming record batch. |  
-
-## Sort and Limit Operators  
-
-Drill uses the following sort and limiter operators:  
-
-| Operator     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Sort         | A Sort operator is used to perform an ORDER BY and as an upstream operator for other  operations that require sorted data such as Merge Join, Streaming Aggregate.                                                                                                                                                                                                                                                                                                         |
-| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.  This operator will also start spooling to the disk in the case that there is memory pressure.  In this case, the external sort will continue to try to use as much memory as available.  In all cases, external sort will hold at least one record batch in memory for each record spill.  Spills are currently sized based on the amount of memory available to the external sort operator. |
-| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Limit        | A Limit operator is used to restrict the number of rows to a value specified by the LIMIT clause.                                                                                                                                                                                                                                                                                                                                                                          |  
-
-## Projection Operators  
-
-Drill uses the following projection operators: 
-
-| Operator     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
-|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Project      | A Project operator projects columns and/or expressions involving columns and constants. This operator holds one incoming record batch plus any additional materialized projects for the same number of rows as the incoming record batch.                                                                                                                                                                                                                                  |
-| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.  This operator will also start spooling to the disk in the case that there is memory pressure.  In this case, the external sort will continue to try to use as much memory as available.  In all cases, external sort will hold at least one record batch in memory for each record spill.  Spills are currently sized based on the amount of memory available to the external sort operator. |
-| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.                                                                                                                                                                                                                                                                                                                                                                                                                 |
-| Limit        | A Limit operator is used to restrict the number of rows to a value specified by the LIMIT clause.                                                                                                                                                                                                                                                                                                                                                                          |  
-
-## Filter and Related Operators  
-
-Drill uses the following filter and related operators:  
-
-| Operator               | Description                                                                                                                                                                                                                                                                                                                                                                                      |
-|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Filter                 | A Filter operator is used to evaluate the WHERE clause and HAVING clause predicates.  These predicates may consist of join predicates as well as single table predicates.  The join predicates are evaluated by a join operator and the remaining predicates are evaluated by the Filter operator. The amount of memory it consumes is slightly more than the size of one incoming record batch. |
-| SelectionVectorRemover | A SelectionVectorRemover is used in conjunction with either a Sort or Filter operator.  This operator maintains roughly twice the amount of memory as required by a single incoming record batch.                                                                                                                                                                                                |  
-
-## Set Operators  
-
-Drill uses the following set operators:  
-
-| Operator  | Description                                                                                                                                                                                                                                                                                                     |
-|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Union-All | A Union-All operator accepts rows from 2 input streams and produces a single output stream where the left input rows are emitted first followed by the right input rows. The column names of the output stream are inherited from the left input.  The column types of the two child inputs must be compatible. |  
-
-## Scan Operators  
-
-Drill uses the following scan operators:    
-
-| Operator | Description                                                                                                                                                                                 |
-|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Scan     | Performs a scan of the underlying table.  The table may be in one of several formats, such as Parquet, Text, JSON, and so on. The Scan operator encapsulates the formats into one operator. |  
-
-## Receiver Operators 
-
-Drill uses the following receiver operators: 
-
-| Operator          | Description                                                                                                                                                         |
-|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| UnorderedReceiver | The unordered receiver operator can hold up to 5 incoming record batches.                                                                                           |
-| MergingReceiver   | This operator holds up to 5 record batches for each incoming stream (generally either number of nodes or number of sending fragments, depending on use of muxxing). |  
-
-## Sender Operators  
-
-Drill uses the following sender operators:  
-
-| Operator        | Description                                                                                                                                                                                                                                                                    |
-|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| PartitionSender | The PartitionSender operator maintains a queue for each outbound destination.  May be either the number of outbound minor fragments or the number of the nodes, depending on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination. |
-
-## File Writers  
-
-Drill uses the following file writers:  
-
-| Operator          | Description                                                                                                                                    |
-|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
-| ParquetFileWriter | The ParquetFileWriter buffers approximately twice the default Parquet row group size in memory per minor fragment (default in Drill is 512mb). |
-
-
-
-
- 
-
-
+---
+title: "Physical Operators"
+date: 2016-06-03 22:11:51 UTC
+parent: "Performance Tuning Reference"
+--- 
+
+This document describes the physical operators that Drill uses in query plans.
+
+## Distribution Operators  
+
+Drill uses the following operators to perform data distribution over the network:  
+
+| Operator             | Description                                                                                                                                                                                                                                                                                                                                               |
+|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| HashToRandomExchange | A HashToRandomExchange gets an   input row, computes a hash value on the distribution key, determines the   destination receiver based on the hash value, and sends the row in a batch   operation. The join key or aggregation group-by keys are examples of distribution   keys. The destination receiver is a minor fragment on a destination   node.  |
+| HashToMergeExchange  | A HashToMergeExchange is similar   to the HashToRandomExchange operator, except that each destination receiver   mergers incoming streams of sorted data received from a sender.                                                                                                                                                                          |
+| UnionExchange        | A UnionExchange is a   serialization operator in which each sender sends to a single (common)   destination. The receiver \u201cunions\u201d the input streams from various senders.                                                                                                                                                                                |
+| SingleMergeExchange  | A SingleMergeExchange is   distribution operator in which each sender sends a sorted stream of data to a   single receiver. The receiver performs a Merge operation to merge all of the   incoming streams. This operator is useful when performing an ORDER BY operation   that requires a final global ordering.                                        |
+| BroadcastExchange    | A BroadcastExchange is a   distrubtion operation in which each sender sends its input data to all N   receivers via a broadcast.                                                                                                                                                                                                                          |
+| UnorderedMuxExchange | An UnorderedMuxExchange is an   operation that multiplexes the data from all minor fragments on a node so the   data can be sent out on a single channel to a destination receiver. A sender   node only needs to maintain buffers for each receiving node instead of each   receiving minor fragment on every node.                                    |
+
+## Join Operators  
+
+Drill uses the following join operators:
+
+| Operator         | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Hash Join        | A Hash Join is used for inner joins, left, right and full outer joins.  A hash table is built on the rows produced by the inner child of the Hash Join.  The outer child rows are used to probe the hash table and find matches. This operator Holds the entire dataset for the right hand side of the join in memory  which could be up to 2 billion records per minor fragment.                                                                          |
+| Merge Join       | A Merge Join is used for inner join, left and right outer joins.  Inputs to the Merge Join must be sorted. It reads the sorted input streams from both sides and finds matching rows.  This operator holds the amount of memory of one incoming record batch from each side of the join.   In addition, if there are repeating values in the right hand side of the join, the Merge Join will hold record batches for as long as a repeated value extends. |
+| Nested Loop Join | A Nested Loop Join is used for certain types of cartesian joins and inequality joins.                                                                                                                                                                                                                                                                                                                                                                      |  
+
+## Aggregate Operators  
+
+Drill uses the following aggregate operators:  
+
+| Operator            | Description                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Hash Aggregate      | A Hash Aggregate performs grouped aggregation on the input data by building a hash table on the GROUP-BY keys and computing the aggregate values within each group. This operator holds memory for each aggregation grouping and each aggregate value, up to 2 billion values per minor fragment.                                                                                                                                     |
+| Streaming Aggregate | A Streaming Aggregate performs grouped aggregation and non-grouped aggregation.  For grouped aggregation, the data must be sorted on the GROUP-BY keys.  Aggregate values are computed within each group.  For non-grouped aggregation, data does not have to be sorted. This operator maintains a single aggregate grouping (keys and aggregate intermediate values) at a time in addition to the size of one incoming record batch. |  
+
+## Sort and Limit Operators  
+
+Drill uses the following sort and limiter operators:  
+
+| Operator     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Sort         | A Sort operator is used to perform an ORDER BY and as an upstream operator for other  operations that require sorted data such as Merge Join, Streaming Aggregate.                                                                                                                                                                                                                                                                                                         |
+| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.  This operator will also start spooling to the disk in the case that there is memory pressure.  In this case, the external sort will continue to try to use as much memory as available.  In all cases, external sort will hold at least one record batch in memory for each record spill.  Spills are currently sized based on the amount of memory available to the external sort operator. |
+| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| Limit        | A Limit operator is used to restrict the number of rows to a value specified by the LIMIT clause.                                                                                                                                                                                                                                                                                                                                                                          |  
+
+## Projection Operators  
+
+Drill uses the following projection operators: 
+
+| Operator     | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Project      | A Project operator projects columns and/or expressions involving columns and constants. This operator holds one incoming record batch plus any additional materialized projects for the same number of rows as the incoming record batch.                                                                                                                                                                                                                                  |
+| ExternalSort | The ExternalSort operator can potentially hold the entire dataset in memory.  This operator will also start spooling to the disk in the case that there is memory pressure.  In this case, the external sort will continue to try to use as much memory as available.  In all cases, external sort will hold at least one record batch in memory for each record spill.  Spills are currently sized based on the amount of memory available to the external sort operator. |
+| TopN         | A TopN operator is used to perform an ORDER BY with LIMIT.                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| Limit        | A Limit operator is used to restrict the number of rows to a value specified by the LIMIT clause.                                                                                                                                                                                                                                                                                                                                                                          |  
+
+## Filter and Related Operators  
+
+Drill uses the following filter and related operators:  
+
+| Operator               | Description                                                                                                                                                                                                                                                                                                                                                                                      |
+|------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Filter                 | A Filter operator is used to evaluate the WHERE clause and HAVING clause predicates.  These predicates may consist of join predicates as well as single table predicates.  The join predicates are evaluated by a join operator and the remaining predicates are evaluated by the Filter operator. The amount of memory it consumes is slightly more than the size of one incoming record batch. |
+| SelectionVectorRemover | A SelectionVectorRemover is used in conjunction with either a Sort or Filter operator.  This operator maintains roughly twice the amount of memory as required by a single incoming record batch.                                                                                                                                                                                                |  
+
+## Set Operators  
+
+Drill uses the following set operators:  
+
+| Operator  | Description                                                                                                                                                                                                                                                                                                     |
+|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Union-All | A Union-All operator accepts rows from 2 input streams and produces a single output stream where the left input rows are emitted first followed by the right input rows. The column names of the output stream are inherited from the left input.  The column types of the two child inputs must be compatible. |  
+
+## Scan Operators  
+
+Drill uses the following scan operators:    
+
+| Operator | Description                                                                                                                                                                                 |
+|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Scan     | Performs a scan of the underlying table.  The table may be in one of several formats, such as Parquet, Text, JSON, and so on. The Scan operator encapsulates the formats into one operator. |  
+
+## Receiver Operators 
+
+Drill uses the following receiver operators: 
+
+| Operator          | Description                                                                                                                                                         |
+|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| UnorderedReceiver | The unordered receiver operator can hold up to 5 incoming record batches.                                                                                           |
+| MergingReceiver   | This operator holds up to 5 record batches for each incoming stream (generally either number of nodes or number of sending fragments, depending on use of muxxing). |  
+
+## Sender Operators  
+
+Drill uses the following sender operators:  
+
+| Operator        | Description                                                                                                                                                                                                                                                                    |
+|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| PartitionSender | The PartitionSender operator maintains a queue for each outbound destination.  May be either the number of outbound minor fragments or the number of the nodes, depending on the use of muxxing operations.  Each queue may store up to 3 record batches for each destination. |
+
+## File Writers  
+
+Drill uses the following file writers:  
+
+| Operator          | Description                                                                                                                                    |
+|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
+| ParquetFileWriter | The ParquetFileWriter buffers approximately twice the default Parquet row group size in memory per minor fragment (default in Drill is 512mb). |
+
+
+
+
+ 
+
+