You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by aj...@apache.org on 2020/05/05 16:01:14 UTC

[carbondata] branch master updated: [CARBONDATA-3791] Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.

This is an automated email from the ASF dual-hosted git repository.

ajantha pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new fbf311c  [CARBONDATA-3791] Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.
fbf311c is described below

commit fbf311c83c3e3523c4150133f320c7bcd39b82b8
Author: Nihal kumar ojha <ni...@gmail.com>
AuthorDate: Mon May 4 10:12:45 2020 +0530

    [CARBONDATA-3791] Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.
    
    Why is this PR needed?
    Correct spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.
    
    What changes were proposed in this PR?
    Corrected spelling, query, default value, in performance-tuning, prestodb and prestosql documentation.
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3737
---
 docs/performance-tuning.md | 14 ++++++++------
 docs/prestodb-guide.md     | 13 ++++++++-----
 docs/prestosql-guide.md    | 14 +++++++++-----
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/docs/performance-tuning.md b/docs/performance-tuning.md
index f485388..05352db 100644
--- a/docs/performance-tuning.md
+++ b/docs/performance-tuning.md
@@ -54,7 +54,7 @@
     BEGIN_TIME bigint,
     HOST String,
     Dime_1 String,
-    counter_1, Decimal
+    counter_1 Decimal,
     ...
     
     )STORED AS carbondata
@@ -79,7 +79,7 @@
       BEGIN_TIME bigint,
       HOST String,
       Dime_1 String,
-      counter_1, Decimal
+      counter_1 Decimal,
       ...
       
       )STORED AS carbondata
@@ -128,6 +128,9 @@
 
   **NOTE:**
   + BloomFilter can be created to enhance performance for queries with precise equal/in conditions. You can find more information about it in BloomFilter index [document](./index/bloomfilter-index-guide.md).
+  + Lucene index can be created on string columns which has content of more length to enhance the query performance. You can find more information about it in Lucene index [document](./index/lucene-index-guide.md).
+  + Secondary index can be created based on the column position in main table(Recommended for right columns) and the queries should have filter on that column to improve the filter query performance. You can find more information about it in secondary index [document](./index/secondary-index-guide.md).
+  + Materialized view can be created to improve query performance provided the storage requirements and loading time is acceptable. You can find more information about it in materialized view [document](./mv-guide.md).
 
 
 ## Configuration for Optimizing Data Loading performance for Massive Data
@@ -141,12 +144,12 @@
 | Parameter | Default Value | Description/Tuning |
 |-----------|-------------|--------|
 |carbon.number.of.cores.while.loading|Default: 2. This value should be >= 2|Specifies the number of cores used for data processing during data loading in CarbonData. |
-|carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to write local file in sort step when loading data|
-|carbon.sort.file.write.buffer.size|Default:  16384.|CarbonData sorts and writes data to intermediate files to limit the memory usage. This configuration determines the buffer size to be used for reading and writing such files. |
+|carbon.sort.size|Default: 100000. The value should be >= 1000.|Threshold to write local file in sort step when loading data|
+|carbon.sort.file.write.buffer.size|Default:  16384. The value should be >= 10240 and <= 10485760.|CarbonData sorts and writes data to intermediate files to limit the memory usage. This configuration determines the buffer size to be used for reading and writing such files. |
 |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores used for temp file merging during data loading in CarbonData.|
 |carbon.merge.sort.prefetch|Default: true | You may want set this value to false if you have not enough memory|
 
-  For example, if there are 10 million records, and i have only 16 cores, 64GB memory, will be loaded to CarbonData table.
+  For example, if there are 10 million records, and I have only 16 cores, 64 GB memory will be loaded to CarbonData table.
   Using the default configuration  always fail in sort step. Modify carbon.properties as suggested below:
 
   ```
@@ -172,7 +175,6 @@
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
 | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. Specially, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
 | carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB to 1GB. |
-| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the nodes load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB to 256MB,Avoid generating a large number of small files. |
 
   Note: If your CarbonData instance is provided only for query, you may specify the property 'spark.speculation=true' which is in conf directory of spark.
 
diff --git a/docs/prestodb-guide.md b/docs/prestodb-guide.md
index b048d9d..7b2b2a9 100644
--- a/docs/prestodb-guide.md
+++ b/docs/prestodb-guide.md
@@ -28,8 +28,8 @@ This tutorial provides a quick introduction to using current integration/presto
 ### Installing Presto
 
 To know about which version of presto is supported by this version of carbon, visit 
-https://github.com/apache/carbondata/blob/master/integration/presto/pom.xml
-and look for ```<presto.version>```
+https://github.com/apache/carbondata/blob/master/pom.xml
+and look for ```<presto.version>``` inside `prestodb` profile.
 
 _Example:_ 
   `<presto.version>0.217</presto.version>`
@@ -139,11 +139,14 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 
 ##### Configuring Carbondata in Presto
 1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+2. As carbondata connector extends hive connector all the configurations(including S3) is same as hive connector.
+Just replace the connector name in hive configuration and copy same to carbondata.properties
+`connector.name = carbondata` 
 
 ### Add Plugins
 
 1. Create a directory named `carbondata` in plugin directory of presto.
-2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+2. Copy all the jars from ../integration/presto/target/carbondata-presto-X.Y.Z-SNAPSHOT to `plugin/carbondata` directory on all nodes.
 
 ### Start Presto Server on all nodes
 
@@ -295,6 +298,6 @@ carbondata files.
 
 ### Supported features of presto carbon
 Presto carbon only supports reading the carbon table which is written by spark carbon or carbon SDK. 
-During reading, it supports the non-distributed datamaps like block datamap and bloom datamap.
+During reading, it supports the non-distributed indexes like block index and bloom index.
 It doesn't support Materialized View as it needs query plan to be changed and presto does not allow it.
-Also Presto carbon supports streaming segment read from streaming table created by spark.
+Also, Presto carbon supports streaming segment read from streaming table created by spark.
diff --git a/docs/prestosql-guide.md b/docs/prestosql-guide.md
index 8832b7a..11bb385 100644
--- a/docs/prestosql-guide.md
+++ b/docs/prestosql-guide.md
@@ -28,8 +28,8 @@ This tutorial provides a quick introduction to using current integration/presto
 ### Installing Presto
 
 To know about which version of presto is supported by this version of carbon, visit 
-https://github.com/apache/carbondata/blob/master/integration/presto/pom.xml
-and look for ```<presto.version>```
+https://github.com/apache/carbondata/blob/master/pom.xml
+and look for ```<presto.version>``` inside `prestosql` profile.
 
 _Example:_ 
   `<presto.version>316</presto.version>`
@@ -139,11 +139,15 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 
 ##### Configuring Carbondata in Presto
 1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+2. As carbondata connector extends hive connector all the configurations(including S3) is same as hive connector.
+Just replace the connector name in hive configuration and copy same to carbondata.properties
+`connector.name = carbondata`
 
 ### Add Plugins
 
 1. Create a directory named `carbondata` in plugin directory of presto.
-2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+2. Copy all the jars from ../integration/presto/target/carbondata-presto-X.Y.Z-SNAPSHOT to `plugin/carbondata` directory on all nodes.
+
 
 ### Start Presto Server on all nodes
 
@@ -294,6 +298,6 @@ carbondata files.
 
 ### Supported features of presto carbon
 Presto carbon only supports reading the carbon table which is written by spark carbon or carbon SDK. 
-During reading, it supports the non-distributed datamaps like block datamap and bloom datamap.
+During reading, it supports the non-distributed index like block index and bloom index.
 It doesn't support Materialized View as it needs query plan to be changed and presto does not allow it.
-Also Presto carbon supports streaming segment read from streaming table created by spark.
+Also, Presto carbon supports streaming segment read from streaming table created by spark.