You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2019/01/23 05:40:20 UTC

[GitHub] sraghunandan commented on a change in pull request #3093: [CARBONDATA-3263] Update doc for RANGE_COLUMN

sraghunandan commented on a change in pull request #3093: [CARBONDATA-3263] Update doc for RANGE_COLUMN
URL: https://github.com/apache/carbondata/pull/3093#discussion_r250061294
 
 

 ##########
 File path: docs/dml-of-carbondata.md
 ##########
 @@ -251,15 +252,31 @@ CarbonData DML statements are documented here,which includes:
   - ##### GLOBAL_SORT_PARTITIONS:
 
     If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data.
-
+    For RANGE_COLUMN, GLOBAL_SORT_PARTITIONS is used to specify the number of range partitions also.
   ```
   OPTIONS('GLOBAL_SORT_PARTITIONS'='2')
   ```
 
-   NOTE:
+   **NOTE:**
    * GLOBAL_SORT_PARTITIONS should be Integer type, the range is [1,Integer.MaxValue].
    * It is only used when the SORT_SCOPE is GLOBAL_SORT.
 
+   - ##### SCALE_FACTOR
+
+   For RANGE_COLUMN, SCALE_FACTOR is used to control the number of range partitions as following.
+   ```
+     splitSize = max(blocklet_size, (block_size - blocklet_size)) * scale_factor
+     numPartitions = total size of input data / splitSize
+   ```
+   The default value is 3, and the range is [1, 300].
+
+   ```
+     OPTIONS('SCALE_FACTOR'='10')
+   ```
+   **NOTE:**
+   * If both GLOBAL_SORT_PARTITIONS and SCALE_FACTOR are used at the same time, only GLOBAL_SORT_PARTITIONS is valid.
+   * The compaction on RANGE_COLUMN will use LOCAL_SORT by default now.
 
 Review comment:
   remove the word now. its not required.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services