You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ku...@apache.org on 2020/05/06 11:32:22 UTC

[carbondata] branch master updated: [CARBONDATA-3791] Updated configuration-parameters.md and removed unused configuration

This is an automated email from the ASF dual-hosted git repository.

kunalkapoor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 636958e  [CARBONDATA-3791] Updated configuration-parameters.md and removed unused configuration
636958e is described below

commit 636958e9be090db746452b414b0309d856db7e1e
Author: Venu Reddy <k....@gmail.com>
AuthorDate: Mon May 4 22:08:29 2020 +0530

    [CARBONDATA-3791] Updated configuration-parameters.md and removed unused configuration
    
    Why is this PR needed?
    Updated configuration-parameters.md and removed unused configuration
    
    What changes were proposed in this PR?
    Updated configuration-parameters.md and removed unused configuration
    
    This closes #3744
---
 .../org/apache/carbondata/core/constants/CarbonCommonConstants.java  | 5 -----
 docs/configuration-parameters.md                                     | 5 ++---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index b5e7f0d..9d418d4 100644
--- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -959,11 +959,6 @@ public final class CarbonCommonConstants {
   public static final String ENABLE_OFFHEAP_SORT_DEFAULT = "true";
 
   @CarbonProperty
-  public static final String ENABLE_INMEMORY_MERGE_SORT = "enable.inmemory.merge.sort";
-
-  public static final String ENABLE_INMEMORY_MERGE_SORT_DEFAULT = "false";
-
-  @CarbonProperty
   public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB = "offheap.sort.chunk.size.inmb";
 
   public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB_DEFAULT = "64";
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 4627cac..dc105a8 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -31,7 +31,7 @@ This section provides the details of all the configurations required for the Car
 
 | Property | Default Value | Description |
 |----------------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...]
-| carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in HDFS or S3. |
+| carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in one of the carbon supported filesystems. Like HDFS or S3. It is not recommended to use this property. |
 | carbon.ddl.base.hdfs.url | (none) | To simplify and shorten the path to be specified in DDL/DML commands, this property is supported. This property is used to configure the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS of core-site.xml. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnb [...]
 | carbon.badRecords.location | (none) | CarbonData can detect the records not conforming to defined table schema and isolate them as bad records. This property is used to specify where to store such bad records. |
 | carbon.streaming.auto.handoff.enabled | true | CarbonData supports storing of streaming data. To have high throughput for streaming, the data is written in Row format which is highly optimized for write, but performs poorly for query. When this property is true and when the streaming data size reaches ***carbon.streaming.segment.max.size***, CabonData will automatically convert the data to columnar format and optimize it for faster querying.**NOTE:** It is not recommended to keep the d [...]
@@ -63,7 +63,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.number.of.cores.while.loading | 2 | Number of cores to be used while loading data. This also determines the number of threads to be used to read the input files (csv) in parallel.**NOTE:** This configured value is used in every data loading step to parallelize the operations. Configuring a higher value can lead to increased early thread pre-emption by OS and there by reduce the overall performance. |
 | enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. This configuration enables to use unsafe functions in CarbonData. **NOTE:** For operations like data loading, which generates more short lived Java objects, Java GC can be a bottle neck. Using unsafe can overcome the GC overhead and improve the overall performance. |
 | enable.offheap.sort | true | CarbonData supports storing data in off-heap memory for certain operations during data loading and query. This helps to avoid the Java GC and thereby improve the overall performance. This configuration enables using off-heap memory for sorting of data during data loading.**NOTE:**  ***enable.unsafe.sort*** configuration needs to be configured to true for using off-heap |
-| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting options to match the balance between load and query performance. LOCAL_SORT:All the data given to an executor in the single load is fully sorted and written to carbondata files. Data loading performance is reduced a little as the entire data needs to be sorted in the executor. GLOBAL SORT:Entire data in the data load is fully sorted and written to carbondata files. Data loading performance would get reduced as [...]
+| carbon.load.sort.scope | NO_SORT [If sort columns are not specified while creating table] and LOCAL_SORT [If sort columns are specified] | CarbonData can support various sorting options to match the balance between load and query performance. LOCAL_SORT: All the data given to an executor in the single load is fully sorted and written to carbondata files. Data loading performance is reduced a little as the entire data needs to be sorted in the executor. GLOBAL SORT: Entire data in the d [...]
 | carbon.global.sort.rdd.storage.level | MEMORY_ONLY | Storage level to persist dataset of RDD/dataframe when loading data with 'sort_scope'='global_sort', if user's executor has less memory, set this parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to different environment. [See detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence). |
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use when shuffling data for global sort. Default value 0 means to use same number of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 2-3 tasks per CPU core in your cluster. |
 | carbon.sort.size | 100000 | Number of records to hold in memory to sort and write intermediate sort temp files. **NOTE:** Memory required for data loading will increase if you turn this value bigger. Besides each thread will cache this amout of records. The number of threads is configured by *carbon.number.of.cores.while.loading*. |
@@ -77,7 +77,6 @@ This section provides the details of all the configurations required for the Car
 | carbon.merge.sort.reader.thread | 3 | CarbonData sorts and writes data to intermediate files to limit the memory usage. When the intermediate files reaches ***carbon.sort.intermediate.files.limit***, the files will be merged in another thread pool. This value will control the size of the pool. Each thread will read the intermediate files and do merge sort and finally write the records to another file. **NOTE:** Refer to ***carbon.sort.intermediate.files.limit*** for operation descripti [...]
 | carbon.merge.sort.prefetch | true | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These intermediate temp files will have to be sorted using merge sort before writing into CarbonData format. This configuration enables pre fetching of data from these temp files in order to optimize IO and speed up data loading process. |
 | carbon.prefetch.buffersize | 1000 | When the configuration ***carbon.merge.sort.prefetch*** is configured to true, we need to set the number of records that can be prefetched. This configuration is used specify the number of records to be prefetched.**NOTE: **Configuring more number of records to be prefetched increases memory footprint as more records will have to be kept in memory. |
-| enable.inmemory.merge.sort | false | CarbonData sorts and writes data to intermediate files to limit the memory usage. These intermediate files needs to be sorted again using merge sort before writing to the final carbondata file. Performing merge sort in memory would increase the sorting performance at the cost of increased memory footprint. This Configuration specifies to do in-memory merge sort or to do file based merge sort. |
 | carbon.sort.storage.inmemory.size.inmb | 512 | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. When ***enable.unsafe.sort*** configuration is enabled, instead of using ***carbon.sort.size*** which is based on rows count, size occupied in memory is used to determine when to flush data pages to intermediate temp files. This configuration determines the memory to be used for storin [...]
 | carbon.load.sortmemory.spill.percentage | 0 | During data loading, some data pages are kept in memory upto memory configured in ***carbon.sort.storage.inmemory.size.inmb*** beyond which they are spilled to disk as intermediate temporary sort files. This configuration determines after what percentage data needs to be spilled to disk. **NOTE:** Without this configuration, when the data pages occupy upto configured memory, new data pages would be dumped to disk and old pages are still mai [...]
 | carbon.enable.calculate.size | true | **For Load Operation**: Enabling this property will let carbondata calculate the size of the carbon data file (.carbondata) and the carbon index file (.carbonindex) for each load and update the table status file. **For Describe Formatted**: Enabling this property will let carbondata calculate the total size of the carbon data files and the carbon index files for the each table and display it in describe formatted command. **NOTE:** This is useful t [...]