You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by aj...@apache.org on 2020/05/06 14:41:22 UTC
[carbondata] branch master updated: [CARBONDATA-3791] Fix spelling, validating links for quick-start guide and configuration parameters

This is an automated email from the ASF dual-hosted git repository.

ajantha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 4fec616  [CARBONDATA-3791] Fix spelling, validating links for quick-start guide and configuration parameters
4fec616 is described below

commit 4fec616dadcc52e867f17ccadfec88e9d0b656ce
Author: Mahesh Raju Somalaraju <ma...@huawei.com>
AuthorDate: Mon May 4 00:00:07 2020 +0530

    [CARBONDATA-3791] Fix spelling, validating links for quick-start guide and configuration parameters
    
    Why is this PR needed?
    Fix spelling, validating links for quick-start guide and configuration parameters
    
    What changes were proposed in this PR?
    .md file changes
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3740
---
 docs/configuration-parameters.md | 46 ++++++++++------------
 docs/quick-start-guide.md        | 85 +++++++++++++++++++++-------------------
 2 files changed, 65 insertions(+), 66 deletions(-)

diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index dc105a8..9428e00 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -31,10 +31,9 @@ This section provides the details of all the configurations required for the Car
 
 | Property | Default Value | Description |
 |----------------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [...]
-| carbon.storelocation | spark.sql.warehouse.dir property value | Location where CarbonData will create the store, and write the data in its custom format. If not specified,the path defaults to spark.sql.warehouse.dir property. **NOTE:** Store location should be in one of the carbon supported filesystems. Like HDFS or S3. It is not recommended to use this property. |
 | carbon.ddl.base.hdfs.url | (none) | To simplify and shorten the path to be specified in DDL/DML commands, this property is supported. This property is used to configure the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS of core-site.xml. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnb [...]
 | carbon.badRecords.location | (none) | CarbonData can detect the records not conforming to defined table schema and isolate them as bad records. This property is used to specify where to store such bad records. |
-| carbon.streaming.auto.handoff.enabled | true | CarbonData supports storing of streaming data. To have high throughput for streaming, the data is written in Row format which is highly optimized for write, but performs poorly for query. When this property is true and when the streaming data size reaches ***carbon.streaming.segment.max.size***, CabonData will automatically convert the data to columnar format and optimize it for faster querying.**NOTE:** It is not recommended to keep the d [...]
+| carbon.streaming.auto.handoff.enabled | true | CarbonData supports storing of streaming data. To have high throughput for streaming, the data is written in Row format which is highly optimized for write, but performs poorly for query. When this property is true and when the streaming data size reaches ***carbon.streaming.segment.max.size***, CabonData will automatically convert the data to columnar format and optimize it for faster querying. **NOTE:** It is not recommended to keep the  [...]
 | carbon.streaming.segment.max.size | 1024000000 | CarbonData writes streaming data in row format which is optimized for high write throughput. This property defines the maximum size of data to be held is row format, beyond which it will be converted to columnar format in order to support high performance query, provided ***carbon.streaming.auto.handoff.enabled*** is true. **NOTE:** Setting higher value will impact the streaming ingestion. The value has to be configured in bytes. |
 | carbon.segment.lock.files.preserve.hours | 48 | In order to support parallel data loading onto the same table, CarbonData sequences(locks) at the granularity of segments. Operations affecting the segment(like IUD, alter) are blocked from parallel operations. This property value indicates the number of hours the segment lock files will be preserved after dataload. These lock files will be deleted with the clean command after the configured number of hours. |
 | carbon.timestamp.format | yyyy-MM-dd HH:mm:ss | CarbonData can understand data of timestamp type and process it in special manner. It can be so that the format of Timestamp data is different from that understood by CarbonData by default. This configuration allows users to specify the format of Timestamp in their data. |
@@ -58,9 +57,9 @@ This section provides the details of all the configurations required for the Car
 | carbon.concurrent.lock.retries | 100 | CarbonData supports concurrent data loading onto same table. To ensure the loading status is correctly updated into the system,locks are used to sequence the status updation step. This configuration specifies the maximum number of retries to obtain the lock for updating the load status. **NOTE:** This value is high as more number of concurrent loading happens,more the chances of not able to obtain the lock when tried. Adjust this value according t [...]
 | carbon.concurrent.lock.retry.timeout.sec | 1 | Specifies the interval between the retries to obtain the lock for concurrent operations. **NOTE:** Refer to ***carbon.concurrent.lock.retries*** for understanding why CarbonData uses locks during data loading operations. |
 | carbon.csv.read.buffersize.byte | 1048576 | CarbonData uses Hadoop InputFormat to read the csv files. This configuration value is used to pass buffer size as input for the Hadoop MR job when reading the csv files. This value is configured in bytes. **NOTE:** Refer to ***org.apache.hadoop.mapreduce. InputFormat*** documentation for additional information. |
-| carbon.loading.prefetch | false | CarbonData uses univocity parser to read csv files. This configuration is used to inform the parser whether it can prefetch the data from csv files to speed up the reading.**NOTE:** Enabling prefetch improves the data loading performance, but needs higher memory to keep more records which are read ahead from disk. |
-| carbon.skip.empty.line | false | The csv files givent to CarbonData for loading can contain empty lines. Based on the business scenario, this empty line might have to be ignored or needs to be treated as NULL value for all columns. In order to define this business behavior, this configuration is provided.**NOTE:** In order to consider NULL values for non string columns and continue with data load, ***carbon.bad.records.action*** need to be set to **FORCE**;else data load will be failed [...]
-| carbon.number.of.cores.while.loading | 2 | Number of cores to be used while loading data. This also determines the number of threads to be used to read the input files (csv) in parallel.**NOTE:** This configured value is used in every data loading step to parallelize the operations. Configuring a higher value can lead to increased early thread pre-emption by OS and there by reduce the overall performance. |
+| carbon.loading.prefetch | false | CarbonData uses univocity parser to read csv files. This configuration is used to inform the parser whether it can prefetch the data from csv files to speed up the reading. **NOTE:** Enabling prefetch improves the data loading performance, but needs higher memory to keep more records which are read ahead from disk. |
+| carbon.skip.empty.line | false | The csv files givent to CarbonData for loading can contain empty lines. Based on the business scenario, this empty line might have to be ignored or needs to be treated as NULL value for all columns. In order to define this business behavior, this configuration is provided. **NOTE:** In order to consider NULL values for non string columns and continue with data load, ***carbon.bad.records.action*** need to be set to **FORCE**;else data load will be faile [...]
+| carbon.number.of.cores.while.loading | 2 | Number of cores to be used while loading data. This also determines the number of threads to be used to read the input files (csv) in parallel. **NOTE:** This configured value is used in every data loading step to parallelize the operations. Configuring a higher value can lead to increased early thread pre-emption by OS and there by reduce the overall performance. |
 | enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. This configuration enables to use unsafe functions in CarbonData. **NOTE:** For operations like data loading, which generates more short lived Java objects, Java GC can be a bottle neck. Using unsafe can overcome the GC overhead and improve the overall performance. |
 | enable.offheap.sort | true | CarbonData supports storing data in off-heap memory for certain operations during data loading and query. This helps to avoid the Java GC and thereby improve the overall performance. This configuration enables using off-heap memory for sorting of data during data loading.**NOTE:**  ***enable.unsafe.sort*** configuration needs to be configured to true for using off-heap |
 | carbon.load.sort.scope | NO_SORT [If sort columns are not specified while creating table] and LOCAL_SORT [If sort columns are specified] | CarbonData can support various sorting options to match the balance between load and query performance. LOCAL_SORT: All the data given to an executor in the single load is fully sorted and written to carbondata files. Data loading performance is reduced a little as the entire data needs to be sorted in the executor. GLOBAL SORT: Entire data in the d [...]
@@ -84,10 +83,10 @@ This section provides the details of all the configurations required for the Car
 | carbon.timegranularity | SECOND | The configuration is used to specify the data granularity level such as DAY, HOUR, MINUTE, or SECOND. This helps to store more than 68 years of data into CarbonData. |
 | carbon.use.local.dir | true | CarbonData,during data loading, writes files to local temp directories before copying the files to HDFS. This configuration is used to specify whether CarbonData can write locally to tmp directory of the container or to the YARN application directory. |
 | carbon.sort.temp.compressor | SNAPPY | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' a [...]
-| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance. In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing [...]
+| carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance. In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing [...]
 | enable.data.loading.statistics | false | CarbonData has extensive logging which would be useful for debugging issues related to performance or hard to locate issues. This configuration when made ***true*** would log additional data loading statistics information to more accurately locate the issues being debugged. **NOTE:** Enabling this would log more debug information to log files, there by increasing the log files size significantly in short span of time. It is advised to configure  [...]
 | carbon.dictionary.chunk.size | 10000 | CarbonData generates dictionary keys and writes them to separate dictionary file during data loading. To optimize the IO, this configuration determines the number of dictionary keys to be persisted to dictionary file at a time. **NOTE:** Writing to file also serves as a commit point to the dictionary generated. Increasing more values in memory causes more data loss during system or application failure. It is advised to alter this configuration jud [...]
-| carbon.load.directWriteToStorePath.enabled | false | During data load, all the carbondata files are written to local disk and finally copied to the target store location in HDFS/S3. Enabling this parameter will make carbondata files to be written directly onto target HDFS/S3 location bypassing the local disk.**NOTE:** Writing directly to HDFS/S3 saves local disk IO(once for writing the files and again for copying to HDFS/S3) there by improving the performance. But the drawback is when  [...]
+| carbon.load.directWriteToStorePath.enabled | false | During data load, all the carbondata files are written to local disk and finally copied to the target store location in HDFS/S3. Enabling this parameter will make carbondata files to be written directly onto target HDFS/S3 location bypassing the local disk. **NOTE:** Writing directly to HDFS/S3 saves local disk IO(once for writing the files and again for copying to HDFS/S3) there by improving the performance. But the drawback is when [...]
 | carbon.options.serialization.null.format | \N | Based on the business scenarios, some columns might need to be loaded with null values. As null value cannot be written in csv files, some special characters might be adopted to specify null values. This configuration can be used to specify the null values format in the data being loaded. |
 | carbon.column.compressor | snappy | CarbonData will compress the column values using the compressor specified by this configuration. Currently CarbonData supports 'snappy', 'zstd' and 'gzip' compressors. |
 | carbon.minmax.allowed.byte.count | 200 | CarbonData will write the min max values for string/varchar types column using the byte count specified by this configuration. Max value is 1000 bytes(500 characters) and Min value is 10 bytes(5 characters). **NOTE:** This property is useful for reducing the store size thereby improving the query performance but can lead to query degradation if value is not configured properly. | |
@@ -101,19 +100,19 @@ This section provides the details of all the configurations required for the Car
 | Parameter | Default Value | Description |
 |-----------------------------------------------|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.number.of.cores.while.compacting | 2 | Number of cores to be used while compacting data. This also determines the number of threads to be used to read carbondata files in parallel. |
-| carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. This configuration is for minor compaction which decides how many segments to be merged. Configuration is of the form (x,y). Compaction will be triggered for every x segments and form a single level 1 compacted segment. When the number of compacted level 1 segments reach y, compact [...]
+| carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. This configuration is for minor compaction which decides how many segments to be merged. Configuration is of the form (x,y). Compaction will be triggered for every x segments and form a single level 1 compacted segment. When the number of compacted level 1 segments reach y, compact [...]
 | carbon.major.compaction.size | 1024 | To improve query performance and all the segments can be merged and compacted to a single segment upto configured size. This Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. This value is expressed in MB. |
-| carbon.horizontal.compaction.enable | true | CarbonData supports DELETE/UPDATE functionality by creating delta data files for existing carbondata files. These delta files would grow as more number of DELETE/UPDATE operations are performed. Compaction of these delta files are termed as horizontal compaction. This configuration is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files be [...]
+| carbon.horizontal.compaction.enable | true | CarbonData supports DELETE/UPDATE functionality by creating delta data files for existing carbondata files. These delta files would grow as more number of DELETE/UPDATE operations are performed. Compaction of these delta files are termed as horizontal compaction. This configuration is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files be [...]
 | carbon.horizontal.update.compaction.threshold | 1 | This configuration specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and are compacted into single UPDATE delta file. Values range between 1 to 10000. |
 | carbon.horizontal.delete.compaction.threshold | 1 | This configuration specifies the threshold limit on number of DELETE delta files within a block of a segment. In case the number of delta files goes beyond the threshold, the DELETE delta files for the particular block of the segment becomes eligible for horizontal compaction and are compacted into single DELETE delta file. Values range between 1 to 10000. |
-| carbon.update.segment.parallelism | 1 | CarbonData processes the UPDATE operations by grouping records belonging to a segment into a single executor task. When the amount of data to be updated is more, this behavior causes problems like restarting of executor due to low memory and data-spill related errors. This property specifies the parallelism for each segment during update.**NOTE:** It is recommended to set this value to a multiple of the number of executors for balance. Values ran [...]
-| carbon.numberof.preserve.segments | 0 | If the user wants to preserve some number of segments from being compacted then he can set this configuration. Example: carbon.numberof.preserve.segments = 2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.**NOTE:** This configuration is useful when the chances of input data can be wrong due to environment scenarios. Preserving some of the latest segments from being compacted can help t [...]
-| carbon.allowed.compaction.days | 0 | This configuration is used to control on the number of recent segments that needs to be compacted, ignoring the older ones. This configuration is in days. For Example: If the configuration is 2, then the segments which are loaded in the time frame of past 2 days only will get merged. Segments which are loaded earlier than 2 days will not be merged. This configuration is disabled by default.**NOTE:** This configuration is useful when a bulk of histor [...]
-| carbon.enable.auto.load.merge | false | Compaction can be automatically triggered once data load completes. This ensures that the segments are merged in time and thus query times does not increase with increase in segments. This configuration enables to do compaction along with data loading.**NOTE: **Compaction will be triggered once the data load completes. But the status of data load wait till the compaction is completed. Hence it might look like data loading time has increased, but  [...]
+| carbon.update.segment.parallelism | 1 | CarbonData processes the UPDATE operations by grouping records belonging to a segment into a single executor task. When the amount of data to be updated is more, this behavior causes problems like restarting of executor due to low memory and data-spill related errors. This property specifies the parallelism for each segment during update. **NOTE:** It is recommended to set this value to a multiple of the number of executors for balance. Values ra [...]
+| carbon.numberof.preserve.segments | 0 | If the user wants to preserve some number of segments from being compacted then he can set this configuration. Example: carbon.numberof.preserve.segments = 2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default. **NOTE:** This configuration is useful when the chances of input data can be wrong due to environment scenarios. Preserving some of the latest segments from being compacted can help  [...]
+| carbon.allowed.compaction.days | 0 | This configuration is used to control on the number of recent segments that needs to be compacted, ignoring the older ones. This configuration is in days. For Example: If the configuration is 2, then the segments which are loaded in the time frame of past 2 days only will get merged. Segments which are loaded earlier than 2 days will not be merged. This configuration is disabled by default. **NOTE:** This configuration is useful when a bulk of histo [...]
+| carbon.enable.auto.load.merge | false | Compaction can be automatically triggered once data load completes. This ensures that the segments are merged in time and thus query times does not increase with increase in segments. This configuration enables to do compaction along with data loading. **NOTE:** Compaction will be triggered once the data load completes. But the status of data load wait till the compaction is completed. Hence it might look like data loading time has increased, but [...]
 | carbon.enable.page.level.reader.in.compaction|true|Enabling page level reader for compaction reduces the memory usage while compacting more number of segments. It allows reading only page by page instead of reading whole blocklet to memory. **NOTE:** Please refer to [file-structure-of-carbondata](./file-structure-of-carbondata.md#carbondata-file-format) to understand the storage format of CarbonData and concepts of pages.|
-| carbon.concurrent.compaction | true | Compaction of different tables can be executed concurrently. This configuration determines whether to compact all qualifying tables in parallel or not. **NOTE: **Compacting concurrently is a resource demanding operation and needs more resources there by affecting the query performance also. This configuration is **deprecated** and might be removed in future releases. |
-| carbon.compaction.prefetch.enable | false | Compaction operation is similar to Query + data load where in data from qualifying segments are queried and data loading performed to generate a new single segment. This configuration determines whether to query ahead data from segments and feed it for data loading. **NOTE: **This configuration is disabled by default as it needs extra resources for querying extra data. Based on the memory availability on the cluster, user can enable it to imp [...]
-| carbon.merge.index.in.segment | true | Each CarbonData file has a companion CarbonIndex file which maintains the metadata about the data. These CarbonIndex files are read and loaded into driver and is used subsequently for pruning of data during queries. These CarbonIndex files are very small in size(few KB) and are many. Reading many small files from HDFS is not efficient and leads to slow IO performance. Hence these CarbonIndex files belonging to a segment can be combined into  a sin [...]
+| carbon.concurrent.compaction | true | Compaction of different tables can be executed concurrently. This configuration determines whether to compact all qualifying tables in parallel or not. **NOTE:** Compacting concurrently is a resource demanding operation and needs more resources there by affecting the query performance also. This configuration is **deprecated** and might be removed in future releases. |
+| carbon.compaction.prefetch.enable | false | Compaction operation is similar to Query + data load where in data from qualifying segments are queried and data loading performed to generate a new single segment. This configuration determines whether to query ahead data from segments and feed it for data loading. **NOTE:** This configuration is disabled by default as it needs extra resources for querying extra data. Based on the memory availability on the cluster, user can enable it to imp [...]
+| carbon.merge.index.in.segment | true | Each CarbonData file has a companion CarbonIndex file which maintains the metadata about the data. These CarbonIndex files are read and loaded into driver and is used subsequently for pruning of data during queries. These CarbonIndex files are very small in size(few KB) and are many. Reading many small files from HDFS is not efficient and leads to slow IO performance. Hence these CarbonIndex files belonging to a segment can be combined into  a sin [...]
 | carbon.enable.range.compaction | true | To configure Ranges-based Compaction to be used or not for RANGE_COLUMN. If true after compaction also the data would be present in ranges. |
 | carbon.si.segment.merge | false | Making this true degrade the LOAD performance. When the number of small files increase for SI segments(it can happen as number of columns will be less and we store position id and reference columns), user an either set to true which will merge the data files for upcoming loads or run SI refresh command which does this job for all segments. (REFRESH INDEX <index_table>) |
 
@@ -125,7 +124,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.max.executor.lru.cache.size | -1 | Maximum memory **(in MB)** upto which the executor process can cache the data (BTree and reverse dictionary values). Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. **NOTE:** If this parameter is not configured, then the value of ***carbon.max.driver.lru.cache.size*** will be used. |
 | max.query.execution.time | 60 | Maximum time allowed for one query to be executed. The value is in minutes. |
 | carbon.enableMinMax | true | CarbonData maintains the metadata which enables to prune unnecessary files from being scanned as per the query conditions. To achieve pruning, Min,Max of each column is maintined.Based on the filter condition in the query, certain data can be skipped from scanning by matching the filter value against the min,max values of the column(s) present in that carbondata file. This pruning enhances query performance significantly. |
-| carbon.dynamical.location.scheduler.timeout | 5 | CarbonData has its own scheduling algorithm to suggest to Spark on how many tasks needs to be launched and how much work each task need to do in a Spark cluster for any query on CarbonData. To determine the number of tasks that can be scheduled, knowing the count of active executors is necessary. When dynamic allocation is enabled on a YARN based spark cluster, executor processes are shutdown if no request is received for a particular a [...]
+| carbon.dynamical.location.scheduler.timeout | 5 | CarbonData has its own scheduling algorithm to suggest to Spark on how many tasks needs to be launched and how much work each task need to do in a Spark cluster for any query on CarbonData. To determine the number of tasks that can be scheduled, knowing the count of active executors is necessary. When dynamic allocation is enabled on a YARN based spark cluster, executor processes are shutdown if no request is received for a particular a [...]
 | carbon.scheduler.min.registered.resources.ratio | 0.8 | Specifies the minimum resource (executor) ratio needed for starting the block distribution. The default value is 0.8, which indicates 80% of the requested resource is allocated for starting block distribution. The minimum value is 0.1 min and the maximum value is 1.0. |
 | carbon.search.enabled (Alpha Feature) | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. |
 | carbon.search.query.timeout | 10s | Time within which the result is expected from the workers, beyond which the query is terminated |
@@ -137,7 +136,7 @@ This section provides the details of all the configurations required for the Car
 | carbon.enable.vector.reader | true | Spark added vector processing to optimize cpu cache miss and there by increase the query performance. This configuration enables to fetch data as columnar batch of size 4*1024 rows instead of fetching data row by row and provide it to spark so that there is improvement in  select queries performance. |
 | carbon.task.distribution | block | CarbonData has its own scheduling algorithm to suggest to Spark on how many tasks needs to be launched and how much work each task need to do in a Spark cluster for any query on CarbonData. Each of these task distribution suggestions has its own advantages and disadvantages. Based on the customer use case, appropriate task distribution can be configured.**block**: Setting this value will launch one task per block. This setting is suggested in case of  [...]
 | carbon.custom.block.distribution | false | CarbonData has its own scheduling algorithm to suggest to Spark on how many tasks needs to be launched and how much work each task need to do in a Spark cluster for any query on CarbonData. When this configuration is true, CarbonData would distribute the available blocks to be scanned among the available number of cores. For Example:If there are 10 blocks to be scanned and only 3 tasks can be run(only 3 executor cores available in the cluster) [...]
-| enable.query.statistics | false | CarbonData has extensive logging which would be useful for debugging issues related to performance or hard to locate issues. This configuration when made ***true*** would log additional query statistics information to more accurately locate the issues being debugged.**NOTE:** Enabling this would log more debug information to log files, there by increasing the log files size significantly in short span of time. It is advised to configure the log files s [...]
+| enable.query.statistics | false | CarbonData has extensive logging which would be useful for debugging issues related to performance or hard to locate issues. This configuration when made ***true*** would log additional query statistics information to more accurately locate the issues being debugged. **NOTE:** Enabling this would log more debug information to log files, there by increasing the log files size significantly in short span of time. It is advised to configure the log files  [...]
 | enable.unsafe.in.query.processing | false | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. This configuration enables to use unsafe functions in CarbonData while scanning the  data during query. |
 | carbon.max.driver.threads.for.block.pruning | 4 | Number of threads used for driver pruning when the carbon files are more than 100k Maximum memory. This configuration can used to set number of threads between 1 to 4. |
 | carbon.heap.memory.pooling.threshold.bytes | 1048576 | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. Using unsafe, memory can be allocated on Java Heap or off heap. This configuration controls the allocation mechanism on Java HEAP. If the heap memory allocations of the given size is greater or equal than this value,it should go through the pooling mechanism. But if set this size to -1, it should not go through the pooling mechanism. Default  [...]
@@ -205,21 +204,18 @@ RESET
 
 | Properties                                | Description                                                  |
 | ----------------------------------------- | ------------------------------------------------------------ |
-| carbon.options.bad.records.logger.enable  | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records.**NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. |
-| carbon.options.bad.records.logger.enable  | To enable or disable bad record logger.                      |
-| carbon.options.bad.records.action         | This property can have four types of actions for bad records FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. |
+| carbon.options.bad.records.logger.enable  | To enable or disable a bad record logger. CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad records, logging them will slow down the overall data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. |      [...]
+| carbon.options.bad.records.action         | This property has four types of  bad record actions: FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. |
 | carbon.options.is.empty.data.bad.record   | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. |
-| carbon.options.batch.sort.size.inmb       | Size of batch data to keep in memory, as a thumb rule it supposed to be less than 45% of sort.inmemory.size.inmb otherwise it may spill intermediate data to disk. |
 | carbon.options.bad.record.path            | Specifies the HDFS path where bad records needs to be stored. |
-| carbon.custom.block.distribution          | Specifies whether to use the Spark or Carbon block distribution feature.**NOTE: **Refer to [Query Configuration](#query-configuration)#carbon.custom.block.distribution for more details on CarbonData scheduler. |
+| carbon.custom.block.distribution          | Specifies whether to use the Spark or Carbon block distribution feature. **NOTE:** Refer to [Query Configuration](#query-configuration)#carbon.custom.block.distribution for more details on CarbonData scheduler. |
 | enable.unsafe.sort                        | Specifies whether to use unsafe sort during data loading. Unsafe sort reduces the garbage collection during data load operation, resulting in better performance. |
 | carbon.options.date.format                 | Specifies the data format of the date columns in the data being loaded |
 | carbon.options.timestamp.format            | Specifies the timestamp format of the time stamp columns in the data being loaded |
 | carbon.options.sort.scope                 | Specifies how the current data load should be sorted with. This sort parameter is at the table level. **NOTE:** Refer to [Data Loading Configuration](#data-loading-configuration)#carbon.sort.scope for detailed information. |
-| carbon.table.load.sort.scope.db_name.table_name | Overrides the SORT_SCOPE provided in CREATE TABLE.           |
+| carbon.table.load.sort.scope.<db_name>.<table_name> | Overrides the SORT_SCOPE provided in CREATE TABLE.           |
 | carbon.options.global.sort.partitions     | Specifies the number of partitions to be used during global sort.   |
 | carbon.options.serialization.null.format  | Default Null value representation in the data being loaded. **NOTE:** Refer to [Data Loading Configuration](#data-loading-configuration)#carbon.options.serialization.null.format for detailed information. |
-| carbon.query.directQueryOnDataMap.enabled | Specifies whether datamap can be queried directly. This is useful for debugging purposes.**NOTE: **Refer to [Query Configuration](#query-configuration) for detailed information. |
 
 **Examples:**
 
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index 4635cdb..e08b7b7 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -16,10 +16,10 @@
 -->
 
 # Quick Start
-This tutorial provides a quick introduction to using CarbonData. To follow along with this guide, first download a packaged release of CarbonData from the [CarbonData website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively it can be created following [Building CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
+This tutorial provides a quick introduction to use CarbonData. To follow along with this guide, download a packaged release of CarbonData from the [CarbonData website](https://dist.apache.org/repos/dist/release/carbondata/). Alternatively, it can be created following [Building CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
 
 ##  Prerequisites
-* CarbonData supports Spark versions upto 2.2.1.Please download Spark package from [Spark website](https://spark.apache.org/downloads.html)
+* CarbonData supports Spark versions up to 2.4. Please download Spark package from [Spark website](https://spark.apache.org/downloads.html)
 
 * Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData
 
@@ -36,10 +36,10 @@ This tutorial provides a quick introduction to using CarbonData. To follow along
 ## Integration
 
 ### Integration with Execution Engines
-CarbonData can be integrated with Spark,Presto and Hive execution engines. The below documentation guides on Installing and Configuring with these execution engines.
+CarbonData can be integrated with Spark, Presto, Flink and Hive execution engines. The below documentation guides on Installing and Configuring with these execution engines.
 
 #### Spark
-[Installing and Configuring CarbonData to run locally with Spark SQL CLI (version: 2.3+)](#installing-and-configuring-carbondata-to-run-locally-with-spark-sql)
+[Installing and Configuring CarbonData to run locally with Spark SQL CLI](#installing-and-configuring-carbondata-to-run-locally-with-spark-sql-cli)
 
 [Installing and Configuring CarbonData to run locally with Spark Shell](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 
@@ -66,9 +66,9 @@ CarbonData can be integrated with Spark,Presto and Hive execution engines. The b
 #### Alluxio
 [CarbonData supports read and write with Alluxio](./alluxio-guide.md)
 
-## Installing and Configuring CarbonData to run locally with Spark SQL CLI (version: 2.3+)
+## Installing and Configuring CarbonData to run locally with Spark SQL CLI
 
-In Spark SQL CLI, it use CarbonExtensions to customize the SparkSession with CarbonData's parser, analyzer, optimizer and physical planning strategy rules in Spark.
+This will work with spark 2.3+ versions. In Spark SQL CLI, it uses CarbonExtensions to customize the SparkSession with CarbonData's parser, analyzer, optimizer and physical planning strategy rules in Spark.
 To enable CarbonExtensions, we need to add the following configuration.
 
 |Key|Value|
@@ -95,7 +95,9 @@ STORED AS carbondata;
 ###### Loading Data to a Table
 
 ```
-LOAD DATA INPATH '/path/to/sample.csv' INTO TABLE test_table;
+LOAD DATA INPATH '/local-path/sample.csv' INTO TABLE test_table;
+
+LOAD DATA INPATH 'hdfs://hdfs-path/sample.csv' INTO TABLE test_table;
 ```
 
 ```
@@ -119,7 +121,7 @@ GROUP BY city;
 
 ## Installing and Configuring CarbonData to run locally with Spark Shell
 
-Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit [Apache Spark Documentation](http://spark.apache.org/docs/latest/) for more details on Spark shell.
+Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit [Apache Spark Documentation](http://spark.apache.org/docs/latest/) for more details on the Spark shell.
 
 #### Basics
 
@@ -129,7 +131,7 @@ Start Spark shell by running the following command in the Spark directory:
 ```
 ./bin/spark-shell --jars <carbondata assembly jar path>
 ```
-**NOTE**: Path where packaged release of CarbonData was downloaded or assembly jar will be available after [building CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) and can be copied from `./assembly/target/scala-2.1x/carbondata_xxx.jar`
+**NOTE**: Path where packaged release of CarbonData was downloaded or assembly jar will be available after [building CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) and can be copied from `./assembly/target/scala-2.1x/apache-carbondata_xxx.jar`
 
 In this shell, SparkSession is readily available as `spark` and Spark context is readily available as `sc`.
 
@@ -234,9 +236,9 @@ carbon.sql(
 
 ### Procedure
 
-1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar`. 
+1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/apache-carbondata_xxx.jar`. 
 
-2. Copy `./assembly/target/scala-2.1x/carbondata_xxx.jar` to `$SPARK_HOME/carbonlib` folder.
+2. Copy `./assembly/target/scala-2.1x/apache-carbondata_xxx.jar` to `$SPARK_HOME/carbonlib` folder.
 
    **NOTE**: Create the carbonlib folder if it does not exist inside `$SPARK_HOME` path.
 
@@ -253,13 +255,7 @@ carbon.sql(
 | spark.driver.extraJavaOptions   | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. |
 | spark.executor.extraJavaOptions | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` | A string of extra JVM options to pass to executors. For instance, GC settings or other logging. **NOTE**: You can enter multiple values separated by space. |
 
-7. Add the following properties in `$SPARK_HOME/conf/carbon.properties` file:
-
-| Property             | Required | Description                                                  | Example                              | Remark                        |
-| -------------------- | -------- | ------------------------------------------------------------ | ------------------------------------ | ----------------------------- |
-| carbon.storelocation | NO       | Location where data CarbonData will create the store and write the data in its own format. If not specified then it takes spark.sql.warehouse.dir path. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory |
-
-8. Verify the installation. For example:
+7. Verify the installation. For example:
 
 ```
 ./bin/spark-shell \
@@ -268,7 +264,9 @@ carbon.sql(
 --executor-memory 2G
 ```
 
-**NOTE**: Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
+**NOTE**: 
+ - property "carbon.storelocation" is deprecated in carbondata 2.0 version. Only the users who used this property in previous versions can still use it in carbon 2.0 version.
+ - Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
 
 ## Installing and Configuring CarbonData on Spark on YARN Cluster
 
@@ -284,7 +282,7 @@ carbon.sql(
 
    The following steps are only for Driver Nodes. (Driver nodes are the one which starts the spark context.)
 
-1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar` and copy to `$SPARK_HOME/carbonlib` folder.
+1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/apache-carbondata_xxx.jar` and copy to `$SPARK_HOME/carbonlib` folder.
 
    **NOTE**: Create the carbonlib folder if it does not exists inside `$SPARK_HOME` path.
 
@@ -310,13 +308,7 @@ mv carbondata.tar.gz carbonlib/
 | spark.driver.extraClassPath     | Extra classpath entries to prepend to the classpath of the driver. **NOTE**: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the value in below parameter spark.driver.extraClassPath. | `$SPARK_HOME/carbonlib/*`                                    |
 | spark.driver.extraJavaOptions   | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` |
 
-5. Add the following properties in `$SPARK_HOME/conf/carbon.properties`:
-
-| Property             | Required | Description                                                  | Example                              | Default Value                 |
-| -------------------- | -------- | ------------------------------------------------------------ | ------------------------------------ | ----------------------------- |
-| carbon.storelocation | NO       | Location where CarbonData will create the store and write the data in its own format. If not specified then it takes spark.sql.warehouse.dir path. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory |
-
-6. Verify the installation.
+5. Verify the installation.
 
 ```
 ./bin/spark-shell \
@@ -327,6 +319,7 @@ mv carbondata.tar.gz carbonlib/
 ```
 
 **NOTE**:
+ - property "carbon.storelocation" is deprecated in carbondata 2.0 version. Only the users who used this property in previous versions can still use it in carbon 2.0 version.
  - Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
  - If use Spark + Hive 1.1.X, it needs to add carbondata assembly jar and carbondata-hive jar into parameter 'spark.sql.hive.metastore.jars' in spark-default.conf file.
 
@@ -343,13 +336,27 @@ b. Run the following command to start the CarbonData thrift server.
 ```
 ./bin/spark-submit \
 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
-$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR
+```
+
+| Parameter           | Description                                                  | Example                                                    |
+| ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- |
+| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar       |
+
+c. Run the following command to work with S3 storage.
+
+```
+./bin/spark-submit \
+--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <access_key> <secret_key> <endpoint>
 ```
 
 | Parameter           | Description                                                  | Example                                                    |
 | ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- |
-| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar       |
-| carbon_store_path   | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` |
+| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | apache-carbondata-xx.jar       |
+| access_key   | Access key for S3 storage |
+| secret_key   | Secret key for S3 storage |
+| endpoint   | Endpoint for connecting to S3 storage |
 
 **NOTE**: From Spark 1.6, by default the Thrift server runs in multi-session mode. Which means each JDBC/ODBC connection owns a copy of their own SQL configuration and temporary function registry. Cached tables are still shared though. If you prefer to run the Thrift server in single-session mode and share all SQL configuration and temporary function registry, please set option `spark.sql.hive.thriftServer.singleSession` to `true`. You may either add this option to `spark-defaults.conf`, [...]
 
@@ -357,7 +364,7 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
 ./bin/spark-submit \
 --conf spark.sql.hive.thriftServer.singleSession=true \
 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
-$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR
 ```
 
 **But** in single-session mode, if one user changes the database from one connection, the database of the other connections will be changed too.
@@ -369,8 +376,7 @@ $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
 ```
 ./bin/spark-submit \
 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer \
-$SPARK_HOME/carbonlib/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar \
-hdfs://<host_name>:port/user/hive/warehouse/carbon.store
+$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar
 ```
 
 - Start with Fixed executors and resources.
@@ -382,8 +388,7 @@ hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 --driver-memory 20G \
 --executor-memory 250G \
 --executor-cores 32 \
-$SPARK_HOME/carbonlib/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar \
-hdfs://<host_name>:port/user/hive/warehouse/carbon.store
+$SPARK_HOME/carbonlib/apache-carbondata-xxx.jar
 ```
 
 ### Connecting to CarbonData Thrift Server Using Beeline.
@@ -401,9 +406,9 @@ Example
 
 ## Installing and Configuring CarbonData on Presto
 
-**NOTE:** **CarbonData tables cannot be created nor loaded from Presto. User need to create CarbonData Table and load data into it
+**NOTE:** **CarbonData tables cannot be created nor loaded from Presto. User needs to create CarbonData Table and load data into it
 either with [Spark](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) or [SDK](./sdk-guide.md) or [C++ SDK](./csdk-guide.md).
-Once the table is created,it can be queried from Presto.**
+Once the table is created, it can be queried from Presto.**
 
 Please refer the presto guide linked below.
 
@@ -411,7 +416,7 @@ prestodb guide  - [prestodb](./prestodb-guide.md)
 
 prestosql guide - [prestosql](./prestosql-guide.md)
 
-Once installed the presto with carbonData as per above guide,
+Once installed the presto with carbonData as per the above guide,
 you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.
 
 List the schemas(databases) available
@@ -438,6 +443,4 @@ Query from the available tables
 select * from carbon_table;
 ```
 
-**Note :** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.
-
-```
+**Note:** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.