You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by jihoonson <gi...@git.apache.org> on 2015/11/05 08:44:39 UTC

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/844

    TAJO-1963: Add more configuration descriptions to document

    I also fixed a wrong configuration name.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-1963

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #844
    
----
commit 0b9bd167440b5e872f7ef02bae366d24e30e475d
Author: Jihoon Son <ji...@apache.org>
Date:   2015-11-05T07:43:47Z

    Add a document and fixed a wrong configuration name

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44100456
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.auto-broadcast</name>
    +    <value>true</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.non-cross-join.threshold-kb</name>
    +    <value>5120</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.cross-join.threshold-kb</name>
    +    <value>1024</value>
    +  </property>
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.partition-volume-mb`
    +"""""""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.partition-volume-mb</name>
    +    <value>128</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.common.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform a join in a task.
    +If the input data is smaller than this value, join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.common.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.inner.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an inner join in a task.
    +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.inner.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.outer.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an outer join in a task.
    +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.outer.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +"""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.hash-table.size`
    +"""""""""""""""""""""""""""""""""""""
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.hash-table.size</name>
    +    <value>100000</value>
    +  </property>
     
     ======================
    -System Config
    +Sort Query Settings
     ======================
     
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.sort.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.sort.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.external-sort.buffer-mb`
    +""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.external-sort.buffer-mb</name>
    +    <value>200</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.sort.list.size`
    +""""""""""""""""""""""""""""""""""""""
     
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.sort.list.size</name>
    +    <value>100000</value>
    +  </property>
    +
    +=========================
    +Group by Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.multi-level-aggr`
    +""""""""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.multi-level-aggr</name>
    +    <value>true</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.partition-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation is executed in two stages. When an aggregation query is executed,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>256</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.groupby.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
    +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
    +Otherwise, the sort-based aggregation is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.groupby.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.aggregate.hash-table.size`
    +""""""""""""""""""""""""""""""""""""""""""
    +
    +The initial size of list for in-memory sort.
    --- End diff --
    
    Description explains for list size, but property name looks that it means hash table size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-155031450
  
    +1
    The patch looks good to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-153982075
  
    You can see the updated document here.
    http://people.apache.org/~jihoonson/tajo-docs/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-155017340
  
    @eminency and @hyunsik, thank you guys for your review!
    I fixed the test failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44098591
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    --- End diff --
    
    IMO, 'property value type' looks clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44100331
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.auto-broadcast</name>
    +    <value>true</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.non-cross-join.threshold-kb</name>
    +    <value>5120</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.cross-join.threshold-kb</name>
    +    <value>1024</value>
    +  </property>
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.partition-volume-mb`
    +"""""""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.partition-volume-mb</name>
    +    <value>128</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.common.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform a join in a task.
    +If the input data is smaller than this value, join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.common.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.inner.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an inner join in a task.
    +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.inner.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.outer.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an outer join in a task.
    +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.outer.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +"""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.hash-table.size`
    +"""""""""""""""""""""""""""""""""""""
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.hash-table.size</name>
    +    <value>100000</value>
    +  </property>
     
     ======================
    -System Config
    +Sort Query Settings
     ======================
     
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.sort.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.sort.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.external-sort.buffer-mb`
    +""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.external-sort.buffer-mb</name>
    +    <value>200</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.sort.list.size`
    +""""""""""""""""""""""""""""""""""""""
     
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.sort.list.size</name>
    +    <value>100000</value>
    +  </property>
    +
    +=========================
    +Group by Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.multi-level-aggr`
    +""""""""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.multi-level-aggr</name>
    +    <value>true</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.partition-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation is executed in two stages. When an aggregation query is executed,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>256</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    --- End diff --
    
    The title is written with 'task', but example is done with 'partition'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44103177
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.auto-broadcast</name>
    +    <value>true</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.non-cross-join.threshold-kb</name>
    +    <value>5120</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.cross-join.threshold-kb</name>
    +    <value>1024</value>
    +  </property>
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.partition-volume-mb`
    +"""""""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.partition-volume-mb</name>
    +    <value>128</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.common.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform a join in a task.
    +If the input data is smaller than this value, join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.common.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.inner.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an inner join in a task.
    +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.inner.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.outer.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an outer join in a task.
    +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.outer.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +"""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.hash-table.size`
    +"""""""""""""""""""""""""""""""""""""
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.hash-table.size</name>
    +    <value>100000</value>
    +  </property>
     
     ======================
    -System Config
    +Sort Query Settings
     ======================
     
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.sort.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.sort.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.external-sort.buffer-mb`
    +""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.external-sort.buffer-mb</name>
    +    <value>200</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.sort.list.size`
    +""""""""""""""""""""""""""""""""""""""
     
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.sort.list.size</name>
    +    <value>100000</value>
    +  </property>
    +
    +=========================
    +Group by Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.multi-level-aggr`
    +""""""""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.multi-level-aggr</name>
    +    <value>true</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.partition-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation is executed in two stages. When an aggregation query is executed,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>256</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    --- End diff --
    
    My mistake. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44103181
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.auto-broadcast</name>
    +    <value>true</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.non-cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.non-cross-join.threshold-kb</name>
    +    <value>5120</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.broadcast.cross-join.threshold-kb`
    +"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.broadcast.cross-join.threshold-kb</name>
    +    <value>1024</value>
    +  </property>
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +"""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.partition-volume-mb`
    +"""""""""""""""""""""""""""""""""""""""""""
    +
    +The repartition join is executed in two stages. When a join query is executed with the repartition join,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.join.partition-volume-mb</name>
    +    <value>128</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.common.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform a join in a task.
    +If the input data is smaller than this value, join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.common.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.inner.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an inner join in a task.
    +If the input data is smaller than this value, the inner join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.inner.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.outer.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an outer join in a task.
    +If the input data is smaller than this value, the outer join is performed with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.outer.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +"""""""""""""""""""""""""""""""""""""
    +`tajo.executor.join.hash-table.size`
    +"""""""""""""""""""""""""""""""""""""
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.join.hash-table.size</name>
    +    <value>100000</value>
    +  </property>
     
     ======================
    -System Config
    +Sort Query Settings
     ======================
     
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.sort.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""
    +
    +The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.sort.task-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.external-sort.buffer-mb`
    +""""""""""""""""""""""""""""""""""""""""
    +
    +A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.external-sort.buffer-mb</name>
    +    <value>200</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.sort.list.size`
    +""""""""""""""""""""""""""""""""""""""
     
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.sort.list.size</name>
    +    <value>100000</value>
    +  </property>
    +
    +=========================
    +Group by Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.multi-level-aggr`
    +""""""""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.multi-level-aggr</name>
    +    <value>true</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.partition-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation is executed in two stages. When an aggregation query is executed,
    +this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>256</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.groupby.task-volume-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""
    +
    +The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.dist-query.groupby.partition-volume-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.groupby.in-memory-hash-threshold-mb`
    +""""""""""""""""""""""""""""""""""""""""""""""""""""""""
    +
    +This value provides the criterion to decide the algorithm to perform an aggregation in a task.
    +If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation.
    +Otherwise, the sort-based aggregation is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: xml
    +
    +  <property>
    +    <name>tajo.executor.groupby.in-memory-hash-threshold-mb</name>
    +    <value>64</value>
    +  </property>
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +""""""""""""""""""""""""""""""""""""""""""
    +`tajo.executor.aggregate.hash-table.size`
    +""""""""""""""""""""""""""""""""""""""""""
    +
    +The initial size of list for in-memory sort.
    --- End diff --
    
    My mistake. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/844


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-154281531
  
    Thanks for your comment. I addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-154697044
  
    Thanks, it looks good. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-154274813
  
    I leave some comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/844#issuecomment-154932737
  
    Could check the unit test?
    ```
    Failed tests: 
      TestTajoCli.testHelpSessionVars:410->assertOutputResult:103->assertOutputResult:107->assertOutputResult:125 expected:<...able size
    \set SORT_[HASH_TABLE_SIZE [int value] - Sort hash table size
    \set JOIN_HASH_TABLE_SIZE [int value] - Join hash table size
    \set INDEX_ENABLED [true or false] - index scan enabled
    \set INDEX_SELECTIVITY_THRESHOLD [real value] - the selectivity threshold for index scan
    \set PARTITION_NO_RESULT_OVERWRITE_ENABLED [true or false] - If T]rue, a partitioned t...> but was:<...able size
    \set SORT_[LIST_SIZE [int value] - List size for in-memory sort 
    \set JOIN_HASH_TABLE_SIZE [int value] - Join hash table size
    \set INDEX_ENABLED [true or false] - index scan enabled
    \set INDEX_SELECTIVITY_THRESHOLD [real value] - the selectivity threshold for index scan
    \set PARTITION_NO_RESULT_OVERWRITE_ENABLED [true or false] - If t]rue, a partitioned t...>
    Tests run: 1593, Failures: 1, Errors: 0, Skipped: 0
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1963: Add more configuration descriptions ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/844#discussion_r44103172
  
    --- Diff: tajo-docs/src/main/sphinx/configuration/tajo-site-xml.rst ---
    @@ -2,23 +2,455 @@
     The tajo-site.xml File
     **********************
     
    -To the ``core-site.xml`` file on every host in your cluster, you must add the following information:
    +You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited.
    +If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`.
    +Also, catalog configurations are found here :doc:`catalog_configuration`.
    +
    +=========================
    +Join Query Settings
    +=========================
    +
    +""""""""""""""""""""""""""""""""""""""
    +`tajo.dist-query.join.auto-broadcast`
    +""""""""""""""""""""""""""""""""""""""
    +
    +A flag to enable or disable the use of broadcast join.
    +
    +  * Property value: Boolean
    --- End diff --
    
    Thank you for the good comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---