You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2018/03/03 12:43:49 UTC
[02/25] carbondata git commit: [HOTFIX] Fix documentation errors.Add
examples for pre-aggregate usage
[HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage
Fix documentation errors.Add examples for pre-aggregate usage
This closes #1945
Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ff2a2134
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ff2a2134
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ff2a2134
Branch: refs/heads/branch-1.3
Commit: ff2a2134401f54eb6fd57e818500b2401d486c50
Parents: b58de09
Author: Raghunandan S <ca...@gmail.com>
Authored: Wed Feb 7 17:27:51 2018 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Sat Mar 3 17:39:33 2018 +0530
----------------------------------------------------------------------
docs/data-management-on-carbondata.md | 72 ++++++++------------
.../examples/PreAggregateTableExample.scala | 24 +++++++
2 files changed, 51 insertions(+), 45 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/carbondata/blob/ff2a2134/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index 18ad5b8..c846ffc 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -627,21 +627,21 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
```
LOAD DATA [LOCAL] INPATH 'folder_path'
- INTO TABLE [db_name.]table_name PARTITION (partition_spec)
- OPTIONS(property_name=property_value, ...)
- NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT STATMENT
+ INTO TABLE [db_name.]table_name PARTITION (partition_spec)
+ OPTIONS(property_name=property_value, ...)
+
+ INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) <SELECT STATMENT>
```
Example:
```
- LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
- INTO TABLE locationTable
- PARTITION (country = 'US', state = 'CA')
+ LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+ INTO TABLE locationTable
+ PARTITION (country = 'US', state = 'CA')
INSERT INTO TABLE locationTable
- PARTITION (country = 'US', state = 'AL')
- SELECT * FROM another_user au
- WHERE au.country = 'US' AND au.state = 'AL';
+ PARTITION (country = 'US', state = 'AL')
+ SELECT <columns list excluding partition columns> FROM another_user
```
#### Load Data Using Dynamic Partition
@@ -650,12 +650,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
Example:
```
- LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
- INTO TABLE locationTable
+ LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
+ INTO TABLE locationTable
INSERT INTO TABLE locationTable
- SELECT * FROM another_user au
- WHERE au.country = 'US' AND au.state = 'AL';
+ SELECT <columns list excluding partition columns> FROM another_user
```
#### Show Partitions
@@ -679,19 +678,19 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
```
INSERT OVERWRITE TABLE table_name
- PARTITION (column = 'partition_name')
- select_statement
+ PARTITION (column = 'partition_name')
+ select_statement
```
Example:
```
INSERT OVERWRITE TABLE partitioned_user
- PARTITION (country = 'US')
- SELECT * FROM another_user au
- WHERE au.country = 'US';
+ PARTITION (country = 'US')
+ SELECT * FROM another_user au
+ WHERE au.country = 'US';
```
-### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not supports update and delete data.
+### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition feature does not support update and delete data.
The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column.
@@ -886,11 +885,11 @@ will be transformed by Query Planner to fetch data from pre-aggregate table **ag
But queries of kind
```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex
SELECT sex, avg(quantity) from sales GROUP BY sex
-SELECT max(price), country from sales GROUP BY country
+SELECT country, max(price) from sales GROUP BY country
```
will fetch the data from the main table **sales**
@@ -910,18 +909,13 @@ pre-aggregate tables satisfy the query condition, the plan is transformed automa
pre-aggregate table to fetch the data
##### Compacting pre-aggregate tables
-Compaction is an optional operation for pre-aggregate table. If compaction is performed on main
-table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate
-table.To further improve performance on pre-aggregate table, compaction can be triggered on
-pre-aggregate tables directly, it will merge the segments inside pre-aggregation table.
-To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just like the main table
+Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate table.
+Running Compaction command on main table will **not automatically** compact the pre-aggregate
+tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed on
+main table but not performed on pre-aggregate table, all queries still can benefit from
+pre-aggregate tables.To further improve performance on pre-aggregate tables, compaction can be
+triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate table.
- NOTE:
- * If the aggregate function used in the pre-aggregate table creation included distinct-count,
- during compaction, the pre-aggregate table values are recomputed.This would a costly
- operation as compared to the compaction of pre-aggregate tables containing other aggregate
- functions alone
-
##### Update/Delete Operations on pre-aggregate tables
This functionality is not supported.
@@ -1005,16 +999,6 @@ roll-up for the queries on these hierarchies.
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
-
- CREATE DATAMAP agg_minute
- ON TABLE sales
- USING "timeseries"
- DMPROPERTIES (
- 'event_time’=’order_time’,
- 'minute_granualrity’=’1’,
- ) AS
- SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
- avg(price) FROM sales GROUP BY order_time, country, sex
```
For Querying data and automatically roll-up to the desired aggregation level,Carbondata supports
@@ -1028,9 +1012,7 @@ roll-up for the queries on these hierarchies.
```
It is **not necessary** to create pre-aggregate tables for each granularity unless required for
- query
- .Carbondata
- can roll-up the data and fetch it
+ query.Carbondata can roll-up the data and fetch it.
For Example: For main table **sales** , If pre-aggregate tables were created as
http://git-wip-us.apache.org/repos/asf/carbondata/blob/ff2a2134/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
----------------------------------------------------------------------
diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
index fe3a93d..d27eefb 100644
--- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
+++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -135,6 +135,30 @@ object PreAggregateTableExample {
println("time for query on table without pre-aggregate table:" + time_without_aggTable.toString)
// scalastyle:on
+ // 3. if avg function is defined for a column, sum also can be used on that;but not other way
+ // round
+ val time_without_aggTable_sum = time {
+ spark.sql(
+ s"""
+ | SELECT id, sum(age)
+ | FROM personTableWithoutAgg group by id
+ """.stripMargin).count()
+ }
+
+ val time_with_aggTable_sum = time {
+ spark.sql(
+ s"""
+ | SELECT id, sum(age)
+ | FROM personTable group by id
+ """.stripMargin).count()
+ }
+ // scalastyle:off
+ println("time for query with function sum on table with pre-aggregate table:" +
+ time_with_aggTable_sum.toString)
+ println("time for query with function sum on table without pre-aggregate table:" +
+ time_without_aggTable_sum.toString)
+ // scalastyle:on
+
spark.sql("DROP TABLE IF EXISTS mainTable")
spark.sql("DROP TABLE IF EXISTS personTable")
spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")