You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2018/03/03 12:43:49 UTC

[02/25] carbondata git commit: [HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage

[HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage

Fix documentation errors.Add examples for pre-aggregate usage

This closes #1945


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ff2a2134
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ff2a2134
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ff2a2134

Branch: refs/heads/branch-1.3
Commit: ff2a2134401f54eb6fd57e818500b2401d486c50
Parents: b58de09
Author: Raghunandan S <ca...@gmail.com>
Authored: Wed Feb 7 17:27:51 2018 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Sat Mar 3 17:39:33 2018 +0530

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md           | 72 ++++++++------------
 .../examples/PreAggregateTableExample.scala     | 24 +++++++
 2 files changed, 51 insertions(+), 45 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/ff2a2134/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index 18ad5b8..c846ffc 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -627,21 +627,21 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
   
   ```
   LOAD DATA [LOCAL] INPATH 'folder_path' 
-    INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
-    OPTIONS(property_name=property_value, ...)
-  NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT STATMENT 
+  INTO TABLE [db_name.]table_name PARTITION (partition_spec) 
+  OPTIONS(property_name=property_value, ...)
+    
+  INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) <SELECT STATMENT>
   ```
   
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'CA')
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
+  PARTITION (country = 'US', state = 'CA')
     
   INSERT INTO TABLE locationTable
-    PARTITION (country = 'US', state = 'AL')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  PARTITION (country = 'US', state = 'AL')
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Load Data Using Dynamic Partition
@@ -650,12 +650,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
 
   Example:
   ```
-  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
-    INTO TABLE locationTable
+  LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+  INTO TABLE locationTable
           
   INSERT INTO TABLE locationTable
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US' AND au.state = 'AL';
+  SELECT <columns list excluding partition columns> FROM another_user
   ```
 
 #### Show Partitions
@@ -679,19 +678,19 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
   
   ```
    INSERT OVERWRITE TABLE table_name
-    PARTITION (column = 'partition_name')
-    select_statement
+   PARTITION (column = 'partition_name')
+   select_statement
   ```
   
   Example:
   ```
   INSERT OVERWRITE TABLE partitioned_user
-    PARTITION (country = 'US')
-    SELECT * FROM another_user au 
-    WHERE au.country = 'US';
+  PARTITION (country = 'US')
+  SELECT * FROM another_user au 
+  WHERE au.country = 'US';
   ```
 
-### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not supports update and delete data.
+### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition feature does not support update and delete data.
 
   The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column.
 
@@ -886,11 +885,11 @@ will be transformed by Query Planner to fetch data from pre-aggregate table **ag
 
 But queries of kind
 ```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex
 
 SELECT sex, avg(quantity) from sales GROUP BY sex
 
-SELECT max(price), country from sales GROUP BY country
+SELECT country, max(price) from sales GROUP BY country
 ```
 
 will fetch the data from the main table **sales**
@@ -910,18 +909,13 @@ pre-aggregate tables satisfy the query condition, the plan is transformed automa
 pre-aggregate table to fetch the data
 
 ##### Compacting pre-aggregate tables
-Compaction is an optional operation for pre-aggregate table. If compaction is performed on main 
-table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate 
-table.To further improve performance on pre-aggregate table, compaction can be triggered on 
-pre-aggregate tables directly, it will merge the segments inside pre-aggregation table. 
-To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just like the main table
+Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate table.
+Running Compaction command on main table will **not automatically** compact the pre-aggregate 
+tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed on
+main table but not performed on pre-aggregate table, all queries still can benefit from 
+pre-aggregate tables.To further improve performance on pre-aggregate tables, compaction can be 
+triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate table. 
 
-  NOTE:
-  * If the aggregate function used in the pre-aggregate table creation included distinct-count,
-     during compaction, the pre-aggregate table values are recomputed.This would a costly 
-     operation as compared to the compaction of pre-aggregate tables containing other aggregate 
-     functions alone
- 
 ##### Update/Delete Operations on pre-aggregate tables
 This functionality is not supported.
 
@@ -1005,16 +999,6 @@ roll-up for the queries on these hierarchies.
   ) AS
   SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
    avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time’=’order_time’,
-  'minute_granualrity’=’1’,
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
   ```
   
   For Querying data and automatically roll-up to the desired aggregation level,Carbondata supports 
@@ -1028,9 +1012,7 @@ roll-up for the queries on these hierarchies.
   ```
   
   It is **not necessary** to create pre-aggregate tables for each granularity unless required for 
-  query
-  .Carbondata
-   can roll-up the data and fetch it
+  query.Carbondata can roll-up the data and fetch it.
    
   For Example: For main table **sales** , If pre-aggregate tables were created as  
   

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ff2a2134/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
----------------------------------------------------------------------
diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
index fe3a93d..d27eefb 100644
--- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
+++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -135,6 +135,30 @@ object PreAggregateTableExample {
     println("time for query on table without pre-aggregate table:" + time_without_aggTable.toString)
     // scalastyle:on
 
+    // 3. if avg function is defined for a column, sum also can be used on that;but not other way
+    // round
+    val time_without_aggTable_sum = time {
+      spark.sql(
+        s"""
+           | SELECT id, sum(age)
+           | FROM personTableWithoutAgg group by id
+      """.stripMargin).count()
+    }
+
+    val time_with_aggTable_sum = time {
+      spark.sql(
+        s"""
+           | SELECT id, sum(age)
+           | FROM personTable group by id
+      """.stripMargin).count()
+    }
+    // scalastyle:off
+    println("time for query with function sum on table with pre-aggregate table:" +
+      time_with_aggTable_sum.toString)
+    println("time for query with function sum on table without pre-aggregate table:" +
+      time_without_aggTable_sum.toString)
+    // scalastyle:on
+
     spark.sql("DROP TABLE IF EXISTS mainTable")
     spark.sql("DROP TABLE IF EXISTS personTable")
     spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")