You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2019/01/30 10:39:31 UTC

[carbondata] 27/27: [DOC] Document Update for default sort scope

This is an automated email from the ASF dual-hosted git repository.

ravipesala pushed a commit to branch branch-1.5
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 213577d6159a93c53ecc38e601b81a00b209530f
Author: namanrastogi <na...@gmail.com>
AuthorDate: Tue Jan 29 19:33:58 2019 +0530

    [DOC] Document Update for default sort scope
    
    This closes #3115
---
 docs/configuration-parameters.md | 8 ++++----
 docs/dml-of-carbondata.md        | 1 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 9f13e97..7b31413 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -61,7 +61,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.number.of.cores.while.loading | 2 | Number of cores to be used while loading data. This also determines the number of threads to be used to read the input files (csv) in parallel.**NOTE:** This configured value is used in every data loading step to parallelize the operations. Configuring a higher value can lead to increased early thread pre-emption by OS and there by reduce the overall performance. |
 | enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. This configuration enables to use unsafe functions in CarbonData. **NOTE:** For operations like data loading, which generates more short lived Java objects, Java GC can be a bottle neck. Using unsafe can overcome the GC overhead and improve the overall performance. |
 | enable.offheap.sort | true | CarbonData supports storing data in off-heap memory for certain operations during data loading and query. This helps to avoid the Java GC and thereby improve the overall performance. This configuration enables using off-heap memory for sorting of data during data loading.**NOTE:**  ***enable.unsafe.sort*** configuration needs to be configured to true for using off-heap |
-| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting options to match the balance between load and query performance. LOCAL_SORT:All the data given to an executor in the single load is fully sorted and written to carbondata files. Data loading performance is reduced a little as the entire data needs to be sorted in the executor. BATCH_SORT:Sorts the data in batches of configured size and writes to carbondata files. Data loading performance increases as the entir [...]
+| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting options to match the balance between load and query performance. LOCAL_SORT:All the data given to an executor in the single load is fully sorted and written to carbondata files. Data loading performance is reduced a little as the entire data needs to be sorted in the executor. BATCH_SORT:Sorts the data in batches of configured size and writes to carbondata files. Data loading performance increases as the entir [...]
 | carbon.load.batch.sort.size.inmb | 0 | When  ***carbon.load.sort.scope*** is configured as ***BATCH_SORT***, this configuration needs to be added to specify the batch size for sorting and writing to carbondata files. **NOTE:** It is recommended to keep the value around 45% of ***carbon.sort.storage.inmemory.size.inmb*** to avoid spill to disk. Also it is recommended to keep the value higher than ***carbon.blockletgroup.size.in.mb***. Refer to *carbon.load.sort.scope* for more informati [...]
 | carbon.global.sort.rdd.storage.level | MEMORY_ONLY | Storage level to persist dataset of RDD/dataframe when loading data with 'sort_scope'='global_sort', if user's executor has less memory, set this parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to different environment. [See detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence). |
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use when shuffling data for global sort. Default value 0 means to use same number of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 2-3 tasks per CPU core in your cluster. |
@@ -207,9 +207,9 @@ RESET
 | enable.unsafe.sort                        | Specifies whether to use unsafe sort during data loading. Unsafe sort reduces the garbage collection during data load operation, resulting in better performance. |
 | carbon.options.date.format                 | Specifies the data format of the date columns in the data being loaded |
 | carbon.options.timestamp.format            | Specifies the timestamp format of the time stamp columns in the data being loaded |
-| carbon.options.sort.scope                 | Specifies how the current data load should be sorted with. **NOTE:** Refer to [Data Loading Configuration](#data-loading-configuration)#carbon.sort.scope for detailed information. |
-| carbon.table.load.sort.scope.<db>.<table> | Overrides the SORT_SCOPE provided in CREATE TABLE.           |
-| carbon.options.global.sort.partitions     |                                                              |
+| carbon.options.sort.scope                 | Specifies how the current data load should be sorted with. This sort parameter is at the table level. **NOTE:** Refer to [Data Loading Configuration](#data-loading-configuration)#carbon.sort.scope for detailed information. |
+| carbon.table.load.sort.scope.db_name.table_name | Overrides the SORT_SCOPE provided in CREATE TABLE.           |
+| carbon.options.global.sort.partitions     | Specifies the number of partitions to be used during global sort.   |
 | carbon.options.serialization.null.format  | Default Null value representation in the data being loaded. **NOTE:** Refer to [Data Loading Configuration](#data-loading-configuration)#carbon.options.serialization.null.format for detailed information. |
 | carbon.query.directQueryOnDataMap.enabled | Specifies whether datamap can be queried directly. This is useful for debugging purposes.**NOTE: **Refer to [Query Configuration](#query-configuration) for detailed information. |
 
diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index ec2c053..f89c49a 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -491,7 +491,6 @@ CarbonData DML statements are documented here,which includes:
   ```
   ALTER TABLE table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (2,3,4)
   ```
-  NOTE: Compaction is unsupported for table containing Complex columns.
 
 
   - **CLEAN SEGMENTS AFTER Compaction**