You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2018/11/30 16:34:15 UTC

[22/26] carbondata git commit: [Documentation] Editorial review

[Documentation] Editorial review

Corrected spelling mistakes and grammer

This closes #2965


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d28f87c2
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d28f87c2
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d28f87c2

Branch: refs/heads/branch-1.5
Commit: d28f87c2896f9b50220c673cfb3d20f588042fee
Parents: 9c149d7
Author: sgururajshetty <sg...@gmail.com>
Authored: Thu Nov 29 18:44:22 2018 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Nov 30 21:57:21 2018 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md     | 4 ++--
 docs/ddl-of-carbondata.md            | 4 ++--
 docs/dml-of-carbondata.md            | 6 +++---
 docs/file-structure-of-carbondata.md | 3 +--
 4 files changed, 8 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d28f87c2/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index a41a3d5..4aa2929 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -69,9 +69,9 @@ This section provides the details of all the configurations required for the Car
 | carbon.options.bad.records.logger.enable | false | CarbonData can identify the records that are not conformant to schema and isolate them as bad records. Enabling this configuration will make CarbonData to log such bad records. **NOTE:** If the input data contains many bad records, logging them will slow down the over all data loading throughput. The data load operation status would depend on the configuration in ***carbon.bad.records.action***. |
 | carbon.bad.records.action | FAIL | CarbonData in addition to identifying the bad records, can take certain actions on such data. This configuration can have four types of actions for bad records namely FORCE, REDIRECT, IGNORE and FAIL. If set to FORCE then it auto-corrects the data by storing the bad records as NULL. If set to REDIRECT then bad records are written to the raw CSV instead of being loaded. If set to IGNORE then bad records are neither loaded nor written to the raw CSV. If set to FAIL then data loading fails if any bad records are found. |
 | carbon.options.is.empty.data.bad.record | false | Based on the business scenarios, empty("" or '' or ,,) data can be valid or invalid. This configuration controls how empty data should be treated by CarbonData. If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. |
-| carbon.options.bad.record.path | (none) | Specifies the HDFS path where bad records are to be stored. By default the value is Null. This path must to be configured by the user if ***carbon.options.bad.records.logger.enable*** is **true** or ***carbon.bad.records.action*** is **REDIRECT**. |
+| carbon.options.bad.record.path | (none) | Specifies the HDFS path where bad records are to be stored. By default the value is Null. This path must be configured by the user if ***carbon.options.bad.records.logger.enable*** is **true** or ***carbon.bad.records.action*** is **REDIRECT**. |
 | carbon.blockletgroup.size.in.mb | 64 | Please refer to [file-structure-of-carbondata](./file-structure-of-carbondata.md#carbondata-file-format) to understand the storage format of CarbonData. The data are read as a group of blocklets which are called blocklet groups. This parameter specifies the size of each blocklet group. Higher value results in better sequential IO access. The minimum value is 16MB, any value lesser than 16MB will reset to the default value (64MB). **NOTE:** Configuring a higher value might lead to poor performance as an entire blocklet group will have to read into memory before processing. For filter queries with limit, it is **not advisable** to have a bigger blocklet size. For aggregation queries which need to return more number of rows, bigger blocklet size is advisable. |
-| carbon.sort.file.write.buffer.size | 16384 | CarbonData sorts and writes data to intermediate files to limit the memory usage. This configuration determines the buffer size to be used for reading and writing such files. **NOTE:** This configuration is useful to tune IO and derive optimal performance. Based on the OS and underlying harddisk type, these values can significantly affect the overall performance. It is ideal to tune the buffersize equivalent to the IO buffer size of the OS. Recommended range is between 10240 and 10485760 bytes. |
+| carbon.sort.file.write.buffer.size | 16384 | CarbonData sorts and writes data to intermediate files to limit the memory usage. This configuration determines the buffer size to be used for reading and writing such files. **NOTE:** This configuration is useful to tune IO and derive optimal performance. Based on the OS and underlying harddisk type, these values can significantly affect the overall performance. It is ideal to tune the buffer size equivalent to the IO buffer size of the OS. Recommended range is between 10240 and 10485760 bytes. |
 | carbon.sort.intermediate.files.limit | 20 | CarbonData sorts and writes data to intermediate files to limit the memory usage. Before writing the target carbondata file, the records in these intermediate files needs to be merged to reduce the number of intermediate files. This configuration determines the minimum number of intermediate files after which merged sort is applied on them sort the data. **NOTE:** Intermediate merging happens on a separate thread in the background. Number of threads used is determined by ***carbon.merge.sort.reader.thread***. Configuring a low value will cause more time to be spent in merging these intermediate merged files which can cause more IO. Configuring a high value would cause not to use the idle threads to do intermediate sort merges. Recommended range is between 2 and 50. |
 | carbon.merge.sort.reader.thread | 3 | CarbonData sorts and writes data to intermediate files to limit the memory usage. When the intermediate files reaches ***carbon.sort.intermediate.files.limit***, the files will be merged in another thread pool. This value will control the size of the pool. Each thread will read the intermediate files and do merge sort and finally write the records to another file. **NOTE:** Refer to ***carbon.sort.intermediate.files.limit*** for operation description. Configuring smaller number of threads can cause merging slow down over loading process whereas configuring larger number of threads can cause thread contention with threads in other data loading steps. Hence configure a fraction of ***carbon.number.of.cores.while.loading***. |
 | carbon.merge.sort.prefetch | true | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These intermediate temp files will have to be sorted using merge sort before writing into CarbonData format. This configuration enables pre fetching of data from these temp files in order to optimize IO and speed up data loading process. |

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d28f87c2/docs/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index acdf8db..965f11c 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -450,7 +450,7 @@ CarbonData DDL statements are documented here,which includes:
    - ##### Compression for table
 
      Data compression is also supported by CarbonData.
-     By default, Snappy is used to compress the data. CarbonData also support ZSTD compressor.
+     By default, Snappy is used to compress the data. CarbonData also supports ZSTD compressor.
      User can specify the compressor in the table property:
 
      ```
@@ -557,7 +557,7 @@ CarbonData DDL statements are documented here,which includes:
 
 ### Create external table on Non-Transactional table data location.
   Non-Transactional table data location will have only carbondata and carbonindex files, there will not be a metadata folder (table status and schema).
-  Our SDK module currently support writing data in this format.
+  Our SDK module currently supports writing data in this format.
 
   **Example:**
   ```

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d28f87c2/docs/dml-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index 393ebd3..65654a4 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -58,7 +58,7 @@ CarbonData DML statements are documented here,which includes:
 | [COLUMNDICT](#columndict)                               | Path to read the dictionary data from for particular column  |
 | [DATEFORMAT](#dateformattimestampformat)                | Format of date in the input csv file                         |
 | [TIMESTAMPFORMAT](#dateformattimestampformat)           | Format of timestamp in the input csv file                    |
-| [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to parititon the sort columns to make the evenly distributed |
+| [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to partition the sort columns to make the evenly distributed |
 | [SINGLE_PASS](#single_pass)                             | When to enable single pass data loading                      |
 | [BAD_RECORDS_LOGGER_ENABLE](#bad-records-handling)      | Whether to enable bad records logging                        |
 | [BAD_RECORD_PATH](#bad-records-handling)                | Bad records logging path. Useful when bad record logging is enabled |
@@ -83,7 +83,7 @@ CarbonData DML statements are documented here,which includes:
     ```
 
   - ##### COMMENTCHAR:
-    Comment Characters can be provided in the load command if user want to comment lines.
+    Comment Characters can be provided in the load command if user wants to comment lines.
     ```
     OPTIONS('COMMENTCHAR'='#')
     ```
@@ -184,7 +184,7 @@ CarbonData DML statements are documented here,which includes:
 
     **NOTE:**
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
-    * Carbondata will use these bounds as ranges to process data concurrently during the final sort percedure. The records will be sorted and written out inside each partition. Since the partition is sorted, all records will be sorted.
+    * Carbondata will use these bounds as ranges to process data concurrently during the final sort procedure. The records will be sorted and written out inside each partition. Since the partition is sorted, all records will be sorted.
     * Since the actual order and literal order of the dictionary column are not necessarily the same, we do not recommend you to use this feature if the first sort column is 'dictionary_include'.
     * The option works better if your CPU usage during loading is low. If your current system CPU usage is high, better not to use this option. Besides, it depends on the user to specify the bounds. If user does not know the exactly bounds to make the data distributed evenly among the bounds, loading performance will still be better than before or at least the same as before.
     * Users can find more information about this option in the description of PR1953.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d28f87c2/docs/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/file-structure-of-carbondata.md b/docs/file-structure-of-carbondata.md
index 9e656bb..9313593 100644
--- a/docs/file-structure-of-carbondata.md
+++ b/docs/file-structure-of-carbondata.md
@@ -122,8 +122,7 @@ Compared with V2: The blocklet data volume of V2 format defaults to 120,000 line
 
 #### Footer format
 
-Footer records each carbondata
-All blocklet data distribution information and statistical related metadata information (minmax, startkey/endkey) inside the file.
+Footer records each carbondata, all blocklet data distribution information and statistical related metadata information (minmax, startkey/endkey) inside the file.
 
 ![Footer format](../docs/images/2-3_4.png?raw=true)