You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ja...@apache.org on 2020/02/22 12:55:09 UTC

[carbondata] branch master updated: [CARBONDATA-3717] Fix inconsistent configs in docs

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 7011cf3  [CARBONDATA-3717] Fix inconsistent configs in docs
7011cf3 is described below

commit 7011cf38ad6b51d4fc60a5bd1cbca0e062e2adc8
Author: 勉一 <sh...@antfin.com>
AuthorDate: Fri Feb 21 19:35:30 2020 +0800

    [CARBONDATA-3717] Fix inconsistent configs in docs
    
    Why is this PR needed?
    Now there are more and more configs in CarbonData(maybe is too many that is hard to maintain).
    
    I found a lot of confusing configs when I was using Carbon:
    
    - `table_block_size` -> `table_blocksize`
    - `sort.inmemory.size.in.mb` -> `sort.inmemory.size.inmb`
    - unused config(useless):
      - carbon.number.of.cores
      - carbon.graph.rowset.size
      - carbon.enableXXHash
      - ....
    What changes were proposed in this PR?
    Fix wrong config docs;
    Remove unused/meaningless config docs;
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3632
---
 conf/carbon.properties.template                               |  2 --
 .../carbondata/core/constants/CarbonCommonConstants.java      | 11 -----------
 docs/carbon-as-spark-datasource-guide.md                      |  2 +-
 docs/usecases.md                                              |  4 +---
 ...\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md" |  1 -
 5 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/conf/carbon.properties.template b/conf/carbon.properties.template
index 1d5331c..eb635d6 100644
--- a/conf/carbon.properties.template
+++ b/conf/carbon.properties.template
@@ -33,8 +33,6 @@ carbon.sort.file.buffer.size=10
 carbon.number.of.cores.while.loading=2
 #Record count to sort and write to temp intermediate files
 carbon.sort.size=100000
-#Algorithm for hashmap for hashkey calculation
-carbon.enableXXHash=true
 #enable prefetch of data during merge sort while reading data from sort temp files in data loading
 #carbon.merge.sort.prefetch=true
 
diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index ef87011..d8194a3 100644
--- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -205,17 +205,6 @@ public final class CarbonCommonConstants {
   public static final String ZOOKEEPER_LOCATION = "/CarbonLocks";
 
   /**
-   * xxhash algorithm property for hashmap
-   */
-  @CarbonProperty
-  public static final String ENABLE_XXHASH = "carbon.enableXXHash";
-
-  /**
-   * xxhash algorithm property for hashmap Default value false
-   */
-  public static final String ENABLE_XXHASH_DEFAULT = "true";
-
-  /**
    * System property to enable or disable local dictionary generation
    */
   @CarbonProperty
diff --git a/docs/carbon-as-spark-datasource-guide.md b/docs/carbon-as-spark-datasource-guide.md
index b61bf43..275d5b1 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -55,7 +55,7 @@ Now you can create Carbon table using Spark's datasource DDL syntax.
 ## Example 
 
 ```
- CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON OPTIONS('table_block_size'='256')
+ CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON OPTIONS('table_blocksize'='256')
 ```
 
 # Using DataFrame
diff --git a/docs/usecases.md b/docs/usecases.md
index 343fccd..ec07ff3 100644
--- a/docs/usecases.md
+++ b/docs/usecases.md
@@ -83,7 +83,6 @@ Apart from these, the following CarbonData configuration was suggested to be con
 
 | Configuration for | Parameter                               | Value  | Description |
 |------------------ | --------------------------------------- | ------ | ----------- |
-| Data Loading | carbon.graph.rowset.size                | 100000 | Based on the size of each row, this determines the memory required during data loading.Higher value leads to increased memory foot print |
 | Data Loading | carbon.number.of.cores.while.loading    | 12     | More cores can improve data loading speed |
 | Data Loading | carbon.sort.size                        | 100000 | Number of records to sort at a time.More number of records configured will lead to increased memory foot print |
 | Data Loading | table_blocksize                         | 256  | To efficiently schedule multiple tasks during query |
@@ -134,7 +133,6 @@ Use all columns are no-dictionary as the cardinality is high.
 
 | Configuration for | Parameter                               | Value                   | Description |
 | ------------------| --------------------------------------- | ----------------------- | ------------------|
-| Data Loading | carbon.graph.rowset.size                | 100000                  | Based on the size of each row, this determines the memory required during data loading.Higher value leads to increased memory foot print |
 | Data Loading | enable.unsafe.sort                      | TRUE                    | Temporary data generated during sort is huge which causes GC bottlenecks. Using unsafe reduces the pressure on GC |
 | Data Loading | enable.offheap.sort                     | TRUE                    | Temporary data generated during sort is huge which causes GC bottlenecks. Using offheap reduces the pressure on GC.offheap can be accessed through java unsafe.hence enable.unsafe.sort needs to be true |
 | Data Loading | offheap.sort.chunk.size.in.mb           | 128                     | Size of memory to allocate for sorting.Can increase this based on the memory available |
@@ -143,7 +141,7 @@ Use all columns are no-dictionary as the cardinality is high.
 | Data Loading | table_blocksize                         | 512                     | To efficiently schedule multiple tasks during query. This size depends on data scenario.If data is such that the filters would select less number of blocklets to scan, keeping higher number works well.If the number blocklets to scan is more, better to reduce the size as more tasks can be scheduled in parallel. |
 | Data Loading | carbon.sort.intermediate.files.limit    | 100                     | Increased to 100 as number of cores are more.Can perform merging in backgorund.If less number of files to merge, sort threads would be idle |
 | Data Loading | carbon.use.local.dir                    | TRUE                    | yarn application directory will be usually on a single disk.YARN would be configured with multiple disks to be used as temp or to assign randomly to applications. Using the yarn temp directory will allow carbon to use multiple disks and improve IO performance |
-| Data Loading | sort.inmemory.size.in.mb                | 92160 | Memory allocated to do inmemory sorting. When more memory is available in the node, configuring this will retain more sort blocks in memory so that the merge sort is faster due to no/very less IO |
+| Data Loading | sort.inmemory.size.inmb                | 92160 | Memory allocated to do inmemory sorting. When more memory is available in the node, configuring this will retain more sort blocks in memory so that the merge sort is faster due to no/very less IO |
 | Compaction | carbon.major.compaction.size            | 921600                  | Sum of several loads to combine into single segment |
 | Compaction | carbon.number.of.cores.while.compacting | 12                      | Higher number of cores can improve the compaction speed.Data size is huge.Compaction need to use more threads to speed up the process |
 | Compaction | carbon.enable.auto.load.merge           | FALSE                   | Doing auto minor compaction is costly process as data size is huge.Perform manual compaction when the cluster is less loaded |
diff --git "a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md" "b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
index 39b69f2..ee58282 100644
--- "a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
+++ "b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
@@ -89,7 +89,6 @@ LIMIT 5000
 | CarbonData主要配置                   | 参数值 | 描述                                                         |
 | ------------------------------------ | ------ | ------------------------------------------------------------ |
 | carbon.inmemory.record.size          | 480000 | 查询每个表需要加载到内存的总行数。                           |
-| carbon.number.of.cores               | 4      | carbon查询过程中并行扫描的线程数。                           |
 | carbon.number.of.cores.while.loading | 15     | carbon数据加载过程中并行扫描的线程数。                       |
 | carbon.sort.file.buffer.size         | 20     | 在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB |
 | carbon.sort.size                     | 500000 | 在数据加载操作时,每次被排序的记录数。                       |