You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2017/02/17 14:01:25 UTC

[1/9] incubator-carbondata git commit: fix docs issues

Repository: incubator-carbondata
Updated Branches:
  refs/heads/branch-1.0 8a5e44e98 -> 3236c764c


fix docs issues

fix docs issues

fix comments


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/657eccd9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/657eccd9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/657eccd9

Branch: refs/heads/branch-1.0
Commit: 657eccd9ea4c0562907d7db1c232930c0167c4e1
Parents: 8a5e44e
Author: chenliang613 <ch...@huawei.com>
Authored: Sun Jan 22 16:10:22 2017 +0800
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:23:45 2017 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md | 12 ++++++------
 docs/data-management.md          |  2 +-
 docs/quick-start-guide.md        | 20 +++++++++-----------
 3 files changed, 16 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index bc6919a..774734a 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -34,10 +34,10 @@ This section provides the details of all the configurations required for the Car
 | Property | Default Value | Description |
 |----------------------------|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.storelocation | /user/hive/warehouse/carbon.store | Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS. |
-| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv. |
+| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv. |
 | carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where the bad records are stored. |
-| carbon.kettle.home | $SPARK_HOME/carbonlib/carbonplugins | Path used by CarbonData internally to create graph for loading the data. |
-| carbon.data.file.version | 2 | If this parameter value is set to 1, then CarbonData will support the data load which is in old format. If the value is set to 2, then CarbonData will support the data load of new format only. NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format. |                    
+| carbon.kettle.home | $SPARK_HOME/carbonlib/carbonplugins | Configuration for loading the data with kettle. |
+| carbon.data.file.version | 2 | If this parameter value is set to 1, then CarbonData will support the data load which is in old format(0.x version). If the value is set to 2(1.x onwards version), then CarbonData will support the data load of new format only.|                    
 
 ##  Performance Configuration
 This section provides the details of all the configurations required for CarbonData Performance Optimization.
@@ -132,7 +132,7 @@ This section provides the details of all the configurations required for CarbonD
 | Parameter | Default Value | Description |
 |---------------------------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | high.cardinality.identify.enable | true | If the parameter is true, the high cardinality columns of the dictionary code are automatically recognized and these columns will not be used as global dictionary encoding. If the parameter is false, all dictionary encoding columns are used as dictionary encoding. The high cardinality column must meet the following requirements: value of cardinality > configured value of high.cardinalityEqually, the value of cardinality is higher than the threshold.value of cardinality/ row number x 100 > configured value of high.cardinality.row.count.percentageEqually, the ratio of the cardinality value to data row number is higher than the configured percentage. |
-| high.cardinality.threshold | 1000000 | Threshold to identify whether high cardinality column.Configuration value formula: Value of cardinality > configured value of high.cardinality. The minimum value is 10000. |
+| high.cardinality.threshold | 1000000 | high.cardinality.threshold | 1000000 | It is a threshold to identify high cardinality of the columns.If the value of columns' cardinality > the configured value, then the columns are excluded from dictionary encoding. |
 | high.cardinality.row.count.percentage | 80 | Percentage to identify whether column cardinality is more than configured percent of total row count.Configuration value formula:Value of cardinality/ row number x 100 > configured value of high.cardinality.row.count.percentageThe value of the parameter must be larger than 0. |
 | carbon.cutOffTimestamp | 1970-01-01 05:30:00 | Sets the start date for calculating the timestamp. Java counts the number of milliseconds from start of "1970-01-01 00:00:00". This property is used to customize the start of position. For example "2000-01-01 00:00:00". The date must be in the form "carbon.timestamp.format". NOTE: The CarbonData supports data store up to 68 years from the cut-off time defined. For example, if the cut-off time is 1970-01-01 05:30:00, then the data can be stored up to 2038-01-01 05:30:00. |
 | carbon.timegranularity | SECOND | The property used to set the data granularity level DAY, HOUR, MINUTE, or SECOND. |
@@ -142,8 +142,8 @@ This section provides the details of all the configurations required for CarbonD
  
 | Parameter | Default Value | Description |
 |----------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| spark.driver.memory | 1g | Amount of memory to use for the driver process, i.e. where SparkContext is initialized. NOTE: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file. |
-| spark.executor.memory | 1g | Amount of memory to use per executor process. |
+| spark.driver.memory | 1g | Amount of memory to be used by the driver process. |
+| spark.executor.memory | 1g | Amount of memory to be used per executor process. |
 | spark.sql.bigdata.register.analyseRule | org.apache.spark.sql.hive.acl.CarbonAccessControlRules | CarbonAccessControlRules need to be set for enabling Access Control. |
    
  
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/data-management.md
----------------------------------------------------------------------
diff --git a/docs/data-management.md b/docs/data-management.md
index 70f4d28..2663aff 100644
--- a/docs/data-management.md
+++ b/docs/data-management.md
@@ -73,7 +73,7 @@ This tutorial is going to introduce you to the conceptual details of data manage
    
    * Delete by Segment ID
       
-      After you get the segment ID of the segment that you want to delete, execute the [DELETE](dml-operation-on-carbondata.md ) command for the selected segment.
+      After you get the segment ID of the segment that you want to delete, execute the delete command for the selected segment.
       The status of deleted segment is updated to Marked for delete / Marked for Update.
       
 | SegmentSequenceId | Status            | Load Start Time      | Load End Time        |

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index ceeaac0..5a2d6e2 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -70,24 +70,22 @@ val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(
 ##### Creating a Table
 
 ```
-scala>carbon.sql("create table if not exists test_table
-                (id string, name string, city string, age Int)
-                STORED BY 'carbondata'")
+scala>carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name string, city string, age Int) STORED BY 'carbondata'")
 ```
 
 ##### Loading Data to a Table
 
 ```
-scala>carbon.sql("load data inpath 'sample.csv file's path' into table test_table")
+scala>carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
 NOTE:Please provide the real file path of sample.csv for the above script.
 
 ###### Query Data from a Table
 
 ```
-scala>spark.sql("select * from test_table").show()
+scala>carbon.sql("SELECT * FROM test_table").show()
 
-scala>spark.sql("select city, avg(age), sum(age) from test_table group by city").show()
+scala>carbon.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()
 ```
 
 ## Interactive Analysis with Spark Shell
@@ -122,24 +120,24 @@ NOTE: By default store location is pointed to "../carbon.store", user can provid
 ##### Creating a Table
 
 ```
-scala>cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
+scala>cc.sql("CREATE TABLE IF NOT EXISTS test_table (id string, name string, city string, age Int) STORED BY 'carbondata'")
 ```
 To see the table created :
 
 ```
-scala>cc.sql("show tables").show()
+scala>cc.sql("SHOW TABLES").show()
 ```
 
 ##### Loading Data to a Table
 
 ```
-scala>cc.sql("load data inpath 'sample.csv file's path' into table test_table")
+scala>cc.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
 NOTE:Please provide the real file path of sample.csv for the above script.
 
 ##### Query Data from a Table
 
 ```
-scala>cc.sql("select * from test_table").show()
-scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show()
+scala>cc.sql("SELECT * FROM test_table").show()
+scala>cc.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()
 ```


[6/9] incubator-carbondata git commit: Update build command after optimizing thrift compile issues

Posted by ra...@apache.org.
Update build command after optimizing thrift compile issues

fix comment


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/78881d9d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/78881d9d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/78881d9d

Branch: refs/heads/branch-1.0
Commit: 78881d9dabeb1f08b8300fb108afbbb67d2639db
Parents: d399cd3
Author: chenliang613 <ch...@huawei.com>
Authored: Sat Feb 11 07:59:04 2017 -0800
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:29:02 2017 +0530

----------------------------------------------------------------------
 build/README.md | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/78881d9d/build/README.md
----------------------------------------------------------------------
diff --git a/build/README.md b/build/README.md
index 0115f5d..5fa6814 100644
--- a/build/README.md
+++ b/build/README.md
@@ -26,16 +26,7 @@
 * [Oracle Java 7 or 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html)
 * [Apache Thrift 0.9.3](http://archive.apache.org/dist/thrift/0.9.3/)
 
-## Build release version
-Note:Need install Apache Thrift 0.9.3
-```
-mvn clean -DskipTests -Pbuild-with-format -Pspark-1.6 install
-```
-
-## Build dev version(snapshot version,clone from github)
-Note:Already uploaded format.jar to snapshot repo for facilitating dev users,
-so the compilation command works without "-Pbuild-with-format"
-
+## Build command
 Build without test,by default carbondata takes Spark 1.6.2 to build the project
 ```
 mvn -DskipTests clean package
@@ -57,3 +48,9 @@ Build with test
 ```
 mvn clean package
 ```
+
+## For contributors : To build the format code after any changes, please follow the below command.
+Note:Need install Apache Thrift 0.9.3
+```
+mvn clean -DskipTests -Pbuild-with-format -Pspark-1.6 package
+```
\ No newline at end of file


[2/9] incubator-carbondata git commit: added license haeder for FloatDataTypeTestCase.scala and DateTypeTest.scala

Posted by ra...@apache.org.
added license haeder for FloatDataTypeTestCase.scala and DateTypeTest.scala


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/c1278c04
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/c1278c04
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/c1278c04

Branch: refs/heads/branch-1.0
Commit: c1278c04973b310d2ad5a8da8342e80ecb3b8f61
Parents: 657eccd
Author: PallaviSingh1992 <pa...@yahoo.co.in>
Authored: Fri Jan 27 12:41:30 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:24:21 2017 +0530

----------------------------------------------------------------------
 .../primitiveTypes/FloatDataTypeTestCase.scala      | 16 ++++++++++++++++
 .../spark/testsuite/datetype/DateTypeTest.scala     | 16 ++++++++++++++++
 2 files changed, 32 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/c1278c04/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/FloatDataTypeTestCase.scala
----------------------------------------------------------------------
diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/FloatDataTypeTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/FloatDataTypeTestCase.scala
index 117fa9a..8eaf12c 100644
--- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/FloatDataTypeTestCase.scala
+++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/FloatDataTypeTestCase.scala
@@ -1,3 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
 package org.apache.carbondata.integration.spark.testsuite.primitiveTypes
 
 import org.apache.spark.sql.Row

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/c1278c04/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala
----------------------------------------------------------------------
diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala
index 53b138e..37f800d 100644
--- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala
+++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala
@@ -1,3 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
 package org.apache.carbondata.spark.testsuite.datetype
 
 import org.apache.spark.sql.common.util.QueryTest


[3/9] incubator-carbondata git commit: [CARBONDATA-686] Extend period coverage in NOTICE

Posted by ra...@apache.org.
[CARBONDATA-686]�Extend period coverage in NOTICE


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/b79ec78f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/b79ec78f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/b79ec78f

Branch: refs/heads/branch-1.0
Commit: b79ec78f473fadce7978872cd5ac0a671bbea4e6
Parents: c1278c0
Author: Jean-Baptiste Onofr� <jb...@apache.org>
Authored: Mon Jan 30 07:48:17 2017 +0100
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:27:53 2017 +0530

----------------------------------------------------------------------
 NOTICE | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b79ec78f/NOTICE
----------------------------------------------------------------------
diff --git a/NOTICE b/NOTICE
index da83581..c2a5289 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,8 +1,8 @@
 Apache CarbonData (incubating)
-Copyright 2016 The Apache Software Foundation
+Copyright 2016-2017 The Apache Software Foundation
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
 Based on source code originally developed by
-Huawei (http://www.huawei.com/).
\ No newline at end of file
+Huawei (http://www.huawei.com/).


[5/9] incubator-carbondata git commit: Added carbondata repository to pom file .

Posted by ra...@apache.org.
Added carbondata repository to pom file .


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/d399cd37
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/d399cd37
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/d399cd37

Branch: refs/heads/branch-1.0
Commit: d399cd378f43fad6e2529a8d2d3c00a13246bb24
Parents: ec4ec12
Author: ravipesala <ra...@gmail.com>
Authored: Sat Feb 11 16:31:28 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:28:38 2017 +0530

----------------------------------------------------------------------
 pom.xml | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/d399cd37/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index af5cc0b..c1319f0 100644
--- a/pom.xml
+++ b/pom.xml
@@ -127,6 +127,10 @@
       <id>pentaho-releases</id>
       <url>http://repository.pentaho.org/artifactory/repo/</url>
     </repository>
+    <repository>
+      <id>carbondata-releases</id>
+      <url>http://136.243.101.176:9091/repository/carbondata/</url>
+    </repository>
   </repositories>
 
   <dependencyManagement>


[7/9] incubator-carbondata git commit: CARBONDATA-697 Jira single_pass is not used while doing data load

Posted by ra...@apache.org.
CARBONDATA-697 Jira single_pass is not used while doing data load

Written dictionary values in file on shutdown of dictionary server.


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/e0016e28
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/e0016e28
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/e0016e28

Branch: refs/heads/branch-1.0
Commit: e0016e2850ea4b4d9162d66be8a191e9e1bd3949
Parents: 78881d9
Author: BJangir <ba...@gmail.com>
Authored: Mon Feb 6 22:13:55 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:29:36 2017 +0530

----------------------------------------------------------------------
 .../dataload/TestLoadDataGeneral.scala          |  18 +++
 .../spark/rdd/CarbonDataRDDFactory.scala        |  14 ++
 .../execution/command/carbonTableSchema.scala   | 158 ++++++++++++++-----
 3 files changed, 152 insertions(+), 38 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/e0016e28/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
----------------------------------------------------------------------
diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
index 5d9f750..aa18b8f 100644
--- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
+++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
@@ -130,6 +130,24 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterAll {
     sql("DROP TABLE load_test")
   }
 
+  test("test data loading into table with Single Pass") {
+    sql("DROP TABLE IF EXISTS load_test_singlepass")
+    sql(""" CREATE TABLE load_test_singlepass(id int, name string, city string, age int)
+        STORED BY 'org.apache.carbondata.format' """)
+    val testData = s"$resourcesPath/sample.csv"
+    try {
+      sql(s"LOAD DATA LOCAL INPATH '$testData' into table load_test_singlepass options ('USE_KETTLE'='FALSE','SINGLE_PASS'='TRUE')")
+    } catch {
+      case ex: Exception =>
+        assert(false)
+    }
+    checkAnswer(
+      sql("SELECT id,name FROM load_test_singlepass where name='eason'"),
+      Seq(Row(2,"eason"))
+    )
+    sql("DROP TABLE load_test_singlepass")
+  }
+
   override def afterAll {
     sql("DROP TABLE if exists loadtest")
     sql("drop table if exists invalidMeasures")

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/e0016e28/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
----------------------------------------------------------------------
diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala b/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
index 9024d57..c7f22cc 100644
--- a/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
+++ b/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
@@ -40,6 +40,7 @@ import org.apache.spark.util.SparkUtil
 import org.apache.carbondata.common.logging.LogServiceFactory
 import org.apache.carbondata.core.constants.CarbonCommonConstants
 import org.apache.carbondata.core.datastore.block.{Distributable, TableBlockInfo}
+import org.apache.carbondata.core.dictionary.server.DictionaryServer
 import org.apache.carbondata.core.locks.{CarbonLockFactory, ICarbonLock, LockUsage}
 import org.apache.carbondata.core.metadata.{CarbonTableIdentifier, ColumnarFormatVersion}
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
@@ -364,6 +365,7 @@ object CarbonDataRDDFactory {
       columnar: Boolean,
       partitionStatus: String = CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS,
       useKettle: Boolean,
+      result: Future[DictionaryServer],
       dataFrame: Option[DataFrame] = None,
       updateModel: Option[UpdateTableModel] = None): Unit = {
     val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
@@ -962,6 +964,18 @@ object CarbonDataRDDFactory {
           // TODO : Handle it
           LOGGER.info("********Database updated**********")
         }
+
+        // write dictionary file and shutdown dictionary server
+        if (carbonLoadModel.getUseOnePass) {
+          try {
+            result.get().shutdown()
+          } catch {
+            case ex: Exception =>
+              LOGGER.error("Error while close dictionary server and write dictionary file for " +
+                s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
+              throw new Exception("Dataload failed due to error while write dictionary file!")
+          }
+        }
         LOGGER.audit("Data load is successful for " +
             s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
         try {

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/e0016e28/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
----------------------------------------------------------------------
diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
index 6fba830..d1f1771 100644
--- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
+++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
@@ -18,6 +18,10 @@
 package org.apache.spark.sql.execution.command
 
 import java.io.File
+import java.util.concurrent.Callable
+import java.util.concurrent.Executors
+import java.util.concurrent.ExecutorService
+import java.util.concurrent.Future
 
 import scala.collection.JavaConverters._
 import scala.language.implicitConversions
@@ -37,6 +41,7 @@ import org.apache.carbondata.api.CarbonStore
 import org.apache.carbondata.common.logging.LogServiceFactory
 import org.apache.carbondata.core.constants.CarbonCommonConstants
 import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.dictionary.server.DictionaryServer
 import org.apache.carbondata.core.locks.{CarbonLockFactory, LockUsage}
 import org.apache.carbondata.core.metadata.{CarbonMetadata, CarbonTableIdentifier}
 import org.apache.carbondata.core.metadata.encoder.Encoding
@@ -440,6 +445,23 @@ case class LoadTable(
         .setBadRecordsAction(
           TableOptionConstant.BAD_RECORDS_ACTION.getName + "," + badRecordsLoggerRedirect)
 
+      val useOnePass = options.getOrElse("single_pass", "false").trim.toLowerCase match {
+        case "true" =>
+          if (!useKettle && StringUtils.isEmpty(allDictionaryPath)) {
+            true
+          } else {
+            LOGGER.error("Can't use single_pass, because SINGLE_PASS and ALL_DICTIONARY_PATH" +
+              "can not be used together, and USE_KETTLE must be set as false")
+            false
+          }
+        case "false" =>
+          false
+        case illegal =>
+          LOGGER.error(s"Can't use single_pass, because illegal syntax found: [" + illegal + "] " +
+            "Please set it as 'true' or 'false'")
+          false
+      }
+      carbonLoadModel.setUseOnePass(useOnePass)
       if (delimiter.equalsIgnoreCase(complex_delimiter_level_1) ||
           complex_delimiter_level_1.equalsIgnoreCase(complex_delimiter_level_2) ||
           delimiter.equalsIgnoreCase(complex_delimiter_level_2)) {
@@ -455,6 +477,9 @@ case class LoadTable(
       carbonLoadModel.setAllDictPath(allDictionaryPath)
 
       val partitionStatus = CarbonCommonConstants.STORE_LOADSTATUS_SUCCESS
+      var result: Future[DictionaryServer] = null
+      var executorService: ExecutorService = null
+
       try {
         // First system has to partition the data first and then call the load data
         LOGGER.info(s"Initiating Direct Load for the Table : ($dbName.$tableName)")
@@ -466,54 +491,105 @@ case class LoadTable(
         carbonLoadModel.setCsvHeaderColumns(CommonUtil.getCsvHeaderColumns(carbonLoadModel))
         GlobalDictionaryUtil.updateTableMetadataFunc = LoadTable.updateTableMetadata
 
-        val (dictionaryDataFrame, loadDataFrame) = if (updateModel.isDefined) {
-          val fields = dataFrame.get.schema.fields
-          import org.apache.spark.sql.functions.udf
-          // extracting only segment from tupleId
-          val getSegIdUDF = udf((tupleId: String) =>
-            CarbonUpdateUtil.getRequiredFieldFromTID(tupleId, TupleIdEnum.SEGMENT_ID))
-          // getting all fields except tupleId field as it is not required in the value
-          var otherFields = fields.toSeq
-            .filter(field => !field.name
-              .equalsIgnoreCase(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID))
-            .map(field => {
-              if (field.name.endsWith(CarbonCommonConstants.UPDATED_COL_EXTENSION) && false) {
-                new Column(field.name
-                  .substring(0,
-                    field.name.lastIndexOf(CarbonCommonConstants.UPDATED_COL_EXTENSION)))
-              } else {
-
-                new Column(field.name)
-              }
-            })
-
-          // extract tupleId field which will be used as a key
-          val segIdColumn = getSegIdUDF(new Column(UnresolvedAttribute
-            .quotedString(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID))).as("segId")
-          // use dataFrameWithoutTupleId as dictionaryDataFrame
-          val dataFrameWithoutTupleId = dataFrame.get.select(otherFields: _*)
-          otherFields = otherFields :+ segIdColumn
-          // use dataFrameWithTupleId as loadDataFrame
-          val dataFrameWithTupleId = dataFrame.get.select(otherFields: _*)
-          (Some(dataFrameWithoutTupleId), Some(dataFrameWithTupleId))
-        } else {
-          (dataFrame, dataFrame)
-        }
-        GlobalDictionaryUtil
-          .generateGlobalDictionary(
-            sparkSession.sqlContext,
+        if (carbonLoadModel.getUseOnePass) {
+          val colDictFilePath = carbonLoadModel.getColDictFilePath
+          if (colDictFilePath != null) {
+            val storePath = relation.tableMeta.storePath
+            val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+            val carbonTableIdentifier = carbonTable.getAbsoluteTableIdentifier
+              .getCarbonTableIdentifier
+            val carbonTablePath = CarbonStorePath
+              .getCarbonTablePath(storePath, carbonTableIdentifier)
+            val dictFolderPath = carbonTablePath.getMetadataDirectoryPath
+            val dimensions = carbonTable.getDimensionByTableName(
+              carbonTable.getFactTableName).asScala.toArray
+            carbonLoadModel.initPredefDictMap()
+            // generate predefined dictionary
+            GlobalDictionaryUtil
+              .generatePredefinedColDictionary(colDictFilePath, carbonTableIdentifier,
+                dimensions, carbonLoadModel, sparkSession.sqlContext, storePath, dictFolderPath)
+          }
+          // dictionaryServerClient dictionary generator
+          val dictionaryServerPort = CarbonProperties.getInstance()
+            .getProperty(CarbonCommonConstants.DICTIONARY_SERVER_PORT,
+              CarbonCommonConstants.DICTIONARY_SERVER_PORT_DEFAULT)
+          carbonLoadModel.setDictionaryServerPort(Integer.parseInt(dictionaryServerPort))
+          val sparkDriverHost = sparkSession.sqlContext.sparkContext.
+            getConf.get("spark.driver.host")
+          carbonLoadModel.setDictionaryServerHost(sparkDriverHost)
+          // start dictionary server when use one pass load.
+          executorService = Executors.newFixedThreadPool(1)
+          result = executorService.submit(new Callable[DictionaryServer]() {
+            @throws[Exception]
+            def call: DictionaryServer = {
+              Thread.currentThread().setName("Dictionary server")
+              val server: DictionaryServer = new DictionaryServer
+              server.startServer(dictionaryServerPort.toInt)
+              server
+            }
+          })
+          CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext,
             carbonLoadModel,
             relation.tableMeta.storePath,
-            dictionaryDataFrame)
-        CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext,
+            kettleHomePath,
+            columnar,
+            partitionStatus,
+            useKettle,
+            result,
+            dataFrame,
+            updateModel)
+        }
+        else {
+          val (dictionaryDataFrame, loadDataFrame) = if (updateModel.isDefined) {
+            val fields = dataFrame.get.schema.fields
+            import org.apache.spark.sql.functions.udf
+            // extracting only segment from tupleId
+            val getSegIdUDF = udf((tupleId: String) =>
+              CarbonUpdateUtil.getRequiredFieldFromTID(tupleId, TupleIdEnum.SEGMENT_ID))
+            // getting all fields except tupleId field as it is not required in the value
+            var otherFields = fields.toSeq
+              .filter(field => !field.name
+                .equalsIgnoreCase(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID))
+              .map(field => {
+                if (field.name.endsWith(CarbonCommonConstants.UPDATED_COL_EXTENSION) && false) {
+                  new Column(field.name
+                    .substring(0,
+                      field.name.lastIndexOf(CarbonCommonConstants.UPDATED_COL_EXTENSION)))
+                } else {
+
+                  new Column(field.name)
+                }
+              })
+
+            // extract tupleId field which will be used as a key
+            val segIdColumn = getSegIdUDF(new Column(UnresolvedAttribute
+              .quotedString(CarbonCommonConstants.CARBON_IMPLICIT_COLUMN_TUPLEID))).as("segId")
+            // use dataFrameWithoutTupleId as dictionaryDataFrame
+            val dataFrameWithoutTupleId = dataFrame.get.select(otherFields: _*)
+            otherFields = otherFields :+ segIdColumn
+            // use dataFrameWithTupleId as loadDataFrame
+            val dataFrameWithTupleId = dataFrame.get.select(otherFields: _*)
+            (Some(dataFrameWithoutTupleId), Some(dataFrameWithTupleId))
+          } else {
+            (dataFrame, dataFrame)
+          }
+          GlobalDictionaryUtil
+            .generateGlobalDictionary(
+              sparkSession.sqlContext,
+              carbonLoadModel,
+              relation.tableMeta.storePath,
+              dictionaryDataFrame)
+          CarbonDataRDDFactory.loadCarbonData(sparkSession.sqlContext,
             carbonLoadModel,
             relation.tableMeta.storePath,
             kettleHomePath,
             columnar,
             partitionStatus,
             useKettle,
+            result,
             loadDataFrame,
             updateModel)
+        }
       } catch {
         case ex: Exception =>
           LOGGER.error(ex)
@@ -522,6 +598,12 @@ case class LoadTable(
       } finally {
         // Once the data load is successful delete the unwanted partition files
         try {
+
+          // shutdown dictionary server thread
+          if (carbonLoadModel.getUseOnePass) {
+            executorService.shutdownNow()
+          }
+
           val fileType = FileFactory.getFileType(partitionLocation)
           if (FileFactory.isFileExist(partitionLocation, fileType)) {
             val file = FileFactory


[9/9] incubator-carbondata git commit: Added documentation for new features in the DML, DDL Section and added content to troubleshooting.

Posted by ra...@apache.org.
Added documentation for new features in the DML, DDL Section and added content to troubleshooting.


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/3236c764
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/3236c764
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/3236c764

Branch: refs/heads/branch-1.0
Commit: 3236c764c0e383dee6d17d49bdaaf49f1a683bf0
Parents: babe5f8
Author: PallaviSingh1992 <pa...@yahoo.co.in>
Authored: Mon Jan 30 16:12:03 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:30:34 2017 +0530

----------------------------------------------------------------------
 docs/ddl-operation-on-carbondata.md | 139 ++++++++++++-------
 docs/dml-operation-on-carbondata.md |  26 +++-
 docs/installation-guide.md          |  29 ++--
 docs/troubleshooting.md             | 231 ++++++++++++++++++++++++++++++-
 4 files changed, 351 insertions(+), 74 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/3236c764/docs/ddl-operation-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-operation-on-carbondata.md b/docs/ddl-operation-on-carbondata.md
index d261963..ca2107e 100644
--- a/docs/ddl-operation-on-carbondata.md
+++ b/docs/ddl-operation-on-carbondata.md
@@ -27,18 +27,18 @@ The following DDL operations are supported in CarbonData :
 * [SHOW TABLE](#show-table)
 * [DROP TABLE](#drop-table)
 * [COMPACTION](#compaction)
+* [BUCKETING](#bucketing)
 
 ## CREATE TABLE
   This command can be used to create a CarbonData table by specifying the list of fields along with the table properties.
-  
 ```
-   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name 
-                    [(col_name data_type , ...)]               
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type , ...)]
    STORED BY 'carbondata'
    [TBLPROPERTIES (property_name=property_value, ...)]
    // All Carbon's additional table options will go into properties
 ```
-   
+
 ### Parameter Description
 
 | Parameter | Description | Optional |
@@ -48,93 +48,86 @@ The following DDL operations are supported in CarbonData :
 | table_name | The name of the table in Database. Table Name should consist of alphanumeric characters and underscore(_) special character. | No |
 | STORED BY | "org.apache.carbondata.format", identifies and creates a CarbonData table. | No |
 | TBLPROPERTIES | List of CarbonData table properties. |  |
- 
- 
+
 ### Usage Guidelines
-            
+
    Following are the guidelines for using table properties.
-     
+
    - **Dictionary Encoding Configuration**
-   
+
        Dictionary encoding is enabled by default for all String columns, and disabled for non-String columns. You can include and exclude columns for dictionary encoding.
-     
 ```
-       TBLPROPERTIES ("DICTIONARY_EXCLUDE"="column1, column2") 
-       TBLPROPERTIES ("DICTIONARY_INCLUDE"="column1, column2") 
+       TBLPROPERTIES ("DICTIONARY_EXCLUDE"="column1, column2")
+       TBLPROPERTIES ("DICTIONARY_INCLUDE"="column1, column2")
 ```
-       
+
    Here, DICTIONARY_EXCLUDE will exclude dictionary creation. This is applicable for high-cardinality columns and is an optional parameter. DICTIONARY_INCLUDE will generate dictionary for the columns specified in the list.
-     
+
    - **Row/Column Format Configuration**
-     
+
        Column groups with more than one column are stored in row format, instead of columnar format. By default, each column is a separate column group.
-     
 ```
-TBLPROPERTIES ("COLUMN_GROUPS"="(column1,column3),
-(Column4,Column5,Column6)") 
+TBLPROPERTIES ("COLUMN_GROUPS"="(column1, column3),
+(Column4,Column5,Column6)")
 ```
-   
+
    - **Table Block Size Configuration**
-   
+
      The block size of table files can be defined using the property TABLE_BLOCKSIZE. It accepts only integer values. The default value is 1024 MB and supports a range of 1 MB to 2048 MB.
-     If you do not specify this value in the DDL command , default value is used. 
-     
+     If you do not specify this value in the DDL command, default value is used.
 ```
        TBLPROPERTIES ("TABLE_BLOCKSIZE"="512 MB")
 ```
-     
+
   Here 512 MB means the block size of this table is 512 MB, you can also set it as 512M or 512.
-   
+
    - **Inverted Index Configuration**
-     
+
       Inverted index is very useful to improve compression ratio and query speed, especially for those low-cardinality columns who are in reward position.
       By default inverted index is enabled. The user can disable the inverted index creation for some columns.
-     
 ```
-       TBLPROPERTIES ("NO_INVERTED_INDEX"="column1,column3")
+       TBLPROPERTIES ("NO_INVERTED_INDEX"="column1, column3")
 ```
 
   No inverted index shall be generated for the columns specified in NO_INVERTED_INDEX. This property is applicable on columns with high-cardinality and is an optional parameter.
 
    NOTE:
-     
-   - By default all columns other than numeric datatype are treated as dimensions and all columns of numeric datatype are treated as measures. 
-    
+
+   - By default all columns other than numeric datatype are treated as dimensions and all columns of numeric datatype are treated as measures.
+
    - All dimensions except complex datatype columns are part of multi dimensional key(MDK). This behavior can be overridden by using TBLPROPERTIES. If the user wants to keep any column (except columns of complex datatype) in multi dimensional key then he can keep the columns either in DICTIONARY_EXCLUDE or DICTIONARY_INCLUDE.
-     
-     
+
 ### Example:
 ```
    CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
                                 productNumber Int,
-                                productName String, 
-                                storeCity String, 
-                                storeProvince String, 
-                                productCategory String, 
+                                productName String,
+                                storeCity String,
+                                storeProvince String,
+                                productCategory String,
                                 productBatch String,
                                 saleQuantity Int,
-                                revenue Int)       
-   STORED BY 'carbondata' 
+                                revenue Int)
+   STORED BY 'carbondata'
    TBLPROPERTIES ('COLUMN_GROUPS'='(productName,productCategory)',
                   'DICTIONARY_EXCLUDE'='productName',
                   'DICTIONARY_INCLUDE'='productNumber',
                   'NO_INVERTED_INDEX'='productBatch')
 ```
-    
+
 ## SHOW TABLE
 
   This command can be used to list all the tables in current database or all the tables of a specific database.
 ```
   SHOW TABLES [IN db_Name];
 ```
-  
+
 ### Parameter Description
 | Parameter  | Description                                                                               | Optional |
 |------------|-------------------------------------------------------------------------------------------|----------|
 | IN db_Name | Name of the database. Required only if tables of this specific database are to be listed. | Yes      |
 
 ### Example:
-  
 ```
   SHOW TABLES IN ProductSchema;
 ```
@@ -142,7 +135,6 @@ TBLPROPERTIES ("COLUMN_GROUPS"="(column1,column3),
 ## DROP TABLE
 
  This command is used to delete an existing table.
-
 ```
   DROP TABLE [IF EXISTS] [db_name.]table_name;
 ```
@@ -154,7 +146,6 @@ TBLPROPERTIES ("COLUMN_GROUPS"="(column1,column3),
 | table_name | Name of the table to be deleted. | NO |
 
 ### Example:
-
 ```
   DROP TABLE IF EXISTS productSchema.productSalesTable;
 ```
@@ -162,13 +153,12 @@ TBLPROPERTIES ("COLUMN_GROUPS"="(column1,column3),
 ## COMPACTION
 
 This command merges the specified number of segments into one segment. This enhances the query performance of the table.
-
 ```
   ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR';
 ```
-  
+
   To get details about Compaction refer to [Data Management](data-management.md)
-  
+
 ### Parameter Description
 
 | Parameter | Description | Optional |
@@ -179,15 +169,64 @@ This command merges the specified number of segments into one segment. This enha
 ### Syntax
 
 - **Minor Compaction**
-
 ```
 ALTER TABLE table_name COMPACT 'MINOR';
 ```
 - **Major Compaction**
-
 ```
 ALTER TABLE table_name COMPACT 'MAJOR';
 ```
 
-  
-  
+## BUCKETING
+
+Bucketing feature can be used to distribute/organize the table/partition data into multiple files such
+that similar records are present in the same file. While creating a table, a user needs to specify the
+columns to be used for bucketing and the number of buckets. For the selction of bucket the Hash value
+of columns is used.
+```
+   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
+                    [(col_name data_type, ...)]
+   STORED BY 'carbondata'
+   TBLPROPERTIES(\u201cBUCKETNUMBER\u201d=\u201dnoOfBuckets\u201d,
+   \u201cBUCKETCOLUMNS\u201d=\u2019\u2019columnname\u201d, \u201cTABLENAME\u201d=\u201dtablename\u201d)
+
+```
+
+## Parameter Description
+
+| Parameter 	| Description 	| Optional 	|
+|---------------	|------------------------------------------------------------------------------------------------------------------------------	|----------	|
+| BUCKETNUMBER 	| Specifies the number of Buckets to be created. 	| No 	|
+| BUCKETCOLUMNS 	| Specify the columns to be considered for Bucketing  	| No 	|
+| TABLENAME 	| The name of the table in Database. Table Name should consist of alphanumeric characters and underscore(_) special character. 	| Yes 	|
+
+## Usage Guidelines
+
+- The feature is supported for Spark 1.6.2 onwards, but the performance optimization is evident from Spark 2.1 onwards.
+
+- Bucketing can not be performed for columns of Complex Data Types.
+
+- Columns in the BUCKETCOLUMN parameter must be either a dimension or a measure but combination of both is not supported.
+
+## Example :
+```
+ CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
+                                productNumber Int,
+                                productName String,
+                                storeCity String,
+                                storeProvince String,
+                                productCategory String,
+                                productBatch String,
+                                saleQuantity Int,
+                                revenue Int)
+   STORED BY 'carbondata'
+   TBLPROPERTIES ('COLUMN_GROUPS'='(productName,productCategory)',
+                  'DICTIONARY_EXCLUDE'='productName',
+                  'DICTIONARY_INCLUDE'='productNumber',
+                  'NO_INVERTED_INDEX'='productBatch',
+                  'BUCKETNUMBER'='4',
+                  'BUCKETCOLUMNS'='productNumber,saleQuantity',
+                  'TABLENAME'='productSalesTable')
+
+  ```
+

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/3236c764/docs/dml-operation-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-operation-on-carbondata.md b/docs/dml-operation-on-carbondata.md
index 0523d95..74fa0b0 100644
--- a/docs/dml-operation-on-carbondata.md
+++ b/docs/dml-operation-on-carbondata.md
@@ -133,7 +133,27 @@ You can use the following options to load data:
 
     NOTE: Date formats are specified by date pattern strings. The date pattern letters in CarbonData are same as in JAVA. Refer to [SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html).
 
+- **USE_KETTLE:** This option is used to specify whether to use kettle for loading data or not. By default kettle is not used for data loading.
 
+    ```
+    OPTIONS('USE_KETTLE'='FALSE')
+    ```
+
+   Note :  It is recommended to set the value for this option as false.
+
+- **SINGLE_PASS:** Single Pass Loading enables single job to finish data loading with dictionary generation on the fly. It enhances performance in the scenarios where the subsequent data loading after initial load involves fewer incremental updates on the dictionary.
+
+   This option specifies whether to use single pass for loading data or not. By default this option is set to FALSE.
+
+    ```
+    OPTIONS('SINGLE_PASS'='TRUE')
+    ```
+
+   Note :
+
+   * If this option is set to TRUE then data loading will take less time.
+
+   * If this option is set to some invalid value other than TRUE or FALSE then it uses the default value.
 ### Example:
 
 ```
@@ -142,9 +162,11 @@ options('DELIMITER'=',', 'QUOTECHAR'='"','COMMENTCHAR'='#',
 'FILEHEADER'='empno,empname,designation,doj,workgroupcategory,
  workgroupcategoryname,deptno,deptname,projectcode,
  projectjoindate,projectenddate,attendance,utilization,salary',
-'MULTILINE'='true','ESCAPECHAR'='\','COMPLEX_DELIMITER_LEVEL_1'='$', 
+'MULTILINE'='true','ESCAPECHAR'='\','COMPLEX_DELIMITER_LEVEL_1'='$',
 'COMPLEX_DELIMITER_LEVEL_2'=':',
-'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary'
+'ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary',
+'USE_KETTLE'='FALSE',
+'SINGLE_PASS'='TRUE'
 )
 ```
 

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/3236c764/docs/installation-guide.md
----------------------------------------------------------------------
diff --git a/docs/installation-guide.md b/docs/installation-guide.md
index 7c1f7eb..d8f1b5e 100644
--- a/docs/installation-guide.md
+++ b/docs/installation-guide.md
@@ -53,14 +53,13 @@ followed by :
     NOTE: carbonplugins will contain .kettle folder.
     
 * In Spark node, configure the properties mentioned in the following table in ``"<SPARK_HOME>/conf/spark-defaults.conf"`` file.
-  
-| Property | Description | Value |
-|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
-| carbon.kettle.home | Path that will be used by CarbonData internally to create graph for loading the data | $SPARK_HOME /carbonlib/carbonplugins |
-| spark.driver.extraJavaOptions | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties |
-| spark.executor.extraJavaOptions | A string of extra JVM options to pass to executors. For instance, GC settings or other logging. NOTE: You can enter multiple values separated by space. | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties |
 
-  
+| Property | Value | Description |
+|---------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.kettle.home | $SPARK_HOME /carbonlib/carbonplugins | Path that will be used by CarbonData internally to create graph for loading the data |
+| spark.driver.extraJavaOptions | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. |
+| spark.executor.extraJavaOptions | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties | A string of extra JVM options to pass to executors. For instance, GC settings or other logging. NOTE: You can enter multiple values separated by space. |
+
 * Add the following properties in ``"<SPARK_HOME>/conf/" carbon.properties``:
 
 | Property             | Required | Description                                                                            | Example                             | Remark  |
@@ -78,7 +77,7 @@ followed by :
 
 NOTE: Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
 
-To get started with CarbonData : [Quick Start](quick-start-guide.md) , [DDL Operations on CarbonData](ddl-operation-on-carbondata.md)
+To get started with CarbonData : [Quick Start](quick-start-guide.md), [DDL Operations on CarbonData](ddl-operation-on-carbondata.md)
 
 ## Installing and Configuring CarbonData on "Spark on YARN" Cluster
 
@@ -123,14 +122,14 @@ To get started with CarbonData : [Quick Start](quick-start-guide.md) , [DDL Oper
 
 
 * Verify the installation.
-   
+
 ```
-     ./bin/spark-shell --master yarn-client --driver-memory 1g 
+     ./bin/spark-shell --master yarn-client --driver-memory 1g
      --executor-cores 2 --executor-memory 2G
 ```
   NOTE: Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
 
-  Getting started with CarbonData : [Quick Start](quick-start-guide.md) , [DDL Operations on CarbonData](ddl-operation-on-carbondata.md)
+  Getting started with CarbonData : [Quick Start](quick-start-guide.md), [DDL Operations on CarbonData](ddl-operation-on-carbondata.md)
 
 ## Query Execution Using CarbonData Thrift Server
 
@@ -139,17 +138,17 @@ To get started with CarbonData : [Quick Start](quick-start-guide.md) , [DDL Oper
    a. cd ``<SPARK_HOME>``
 
    b. Run the following command to start the CarbonData thrift server.
-     
+
 ```
-./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true 
+./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
 $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
 ```
-  
+
 | Parameter | Description | Example |
 |---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
 | CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the ``"<SPARK_HOME>"/carbonlib/`` folder. | carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar |
-| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. | hdfs//<host_name>:54310/user/hive/warehouse/carbon.store |
+| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. | ``hdfs//<host_name>:54310/user/hive/warehouse/carbon.store`` |
 
 ### Examples
    

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/3236c764/docs/troubleshooting.md
----------------------------------------------------------------------
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index 21d3db6..9181d83 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -18,13 +18,230 @@
 -->
 
 # Troubleshooting
-This tutorial is designed to provide troubleshooting for end users and developers 
+This tutorial is designed to provide troubleshooting for end users and developers
 who are building, deploying, and using CarbonData.
 
-### General Prevention and Best Practices
- * When trying to create a table with a single numeric column, table creation fails: 
-   One column that can be considered as dimension is mandatory for table creation.
-         
- * "Files locked for updation" when same table is accessed from two or more instances: 
-    Remove metastore_db from the examples folder.
+## Failed to load thrift libraries
 
+  **Symptom**
+
+  Thrift throws following exception :
+
+  ```
+  thrift: error while loading shared libraries:
+  libthriftc.so.0: cannot open shared object file: No such file or directory
+  ```
+
+  **Possible Cause**
+
+  The complete path to the directory containing the libraries is not configured correctly.
+
+  **Procedure**
+
+  Follow the Apache thrift docs at [https://thrift.apache.org/docs/install](https://thrift.apache.org/docs/install) to install thrift correctly.
+
+## Failed to launch the Spark Shell
+
+  **Symptom**
+
+  The shell prompts the following error :
+
+  ```
+  org.apache.spark.sql.CarbonContext$$anon$$apache$spark$sql$catalyst$analysis
+  $OverrideCatalog$_setter_$org$apache$spark$sql$catalyst$analysis
+  $OverrideCatalog$$overrides_$e
+  ```
+
+  **Possible Cause**
+
+  The Spark Version and the selected Spark Profile do not match.
+
+  **Procedure**
+
+  1. Ensure your spark version and selected profile for spark are correct.
+
+  2. Use the following command :
+
+    ```
+     "mvn -Pspark-2.1 -Dspark.version {yourSparkVersion} clean package"
+    ```
+
+    Note :  Refrain from using "mvn clean package" without specifying the profile.
+
+## Failed to execute load query on cluster.
+
+  **Symptom**
+
+  Load query failed with the following exception:
+
+  ```
+  Dictionary file is locked for updation.
+  ```
+
+  **Possible Cause**
+
+  The carbon.properties file is not identical in all the nodes of the cluster.
+
+  **Procedure**
+
+  Follow the steps to ensure the carbon.properties file is consistent across all the nodes:
+
+  1. Copy the carbon.properties file from the master node to all the other nodes in the cluster.
+     For example, you can use ssh to copy this file to all the nodes.
+
+  2. For the changes to take effect, restart the Spark cluster.
+
+## Failed to execute insert query on cluster.
+
+  **Symptom**
+
+  Load query failed with the following exception:
+
+  ```
+  Dictionary file is locked for updation.
+  ```
+
+  **Possible Cause**
+
+  The carbon.properties file is not identical in all the nodes of the cluster.
+
+  **Procedure**
+
+  Follow the steps to ensure the carbon.properties file is consistent across all the nodes:
+
+  1. Copy the carbon.properties file from the master node to all the other nodes in the cluster.
+       For example, you can use scp to copy this file to all the nodes.
+
+  2. For the changes to take effect, restart the Spark cluster.
+
+## Failed to connect to hiveuser with thrift
+
+  **Symptom**
+
+  We get the following exception :
+
+  ```
+  Cannot connect to hiveuser.
+  ```
+
+  **Possible Cause**
+
+  The external process does not have permission to access.
+
+  **Procedure**
+
+  Ensure that the Hiveuser in mysql must allow its access to the external processes.
+
+## Failure to read the metastore db during table creation.
+
+  **Symptom**
+
+  We get the following exception on trying to connect :
+
+  ```
+  Cannot read the metastore db
+  ```
+
+  **Possible Cause**
+
+  The metastore db is dysfunctional.
+
+  **Procedure**
+
+  Remove the metastore db from the carbon.metastore in the Spark Directory.
+
+## Failed to load data on the cluster
+
+  **Symptom**
+
+  Data loading fails with the following exception :
+
+   ```
+   Data Load failure exeception
+   ```
+
+  **Possible Cause**
+
+  The following issue can cause the failure :
+
+  1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.
+
+  2. Path to hdfs ddl is not configured correctly in the carbon.properties.
+
+  **Procedure**
+
+   Follow the steps to ensure the following configuration files are consistent across all the nodes:
+
+   1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster.
+      For example, you can use scp to copy this file to all the nodes.
+
+      Note : Set the path to hdfs ddl in carbon.properties in the master node.
+
+   2. For the changes to take effect, restart the Spark cluster.
+
+
+
+## Failed to insert data on the cluster
+
+  **Symptom**
+
+  Insertion fails with the following exception :
+
+   ```
+   Data Load failure exeception
+   ```
+
+  **Possible Cause**
+
+  The following issue can cause the failure :
+
+  1. The core-site.xml, hive-site.xml, yarn-site and carbon.properties are not consistent across all nodes of the cluster.
+
+  2. Path to hdfs ddl is not configured correctly in the carbon.properties.
+
+  **Procedure**
+
+   Follow the steps to ensure the following configuration files are consistent across all the nodes:
+
+   1. Copy the core-site.xml, hive-site.xml, yarn-site,carbon.properties files from the master node to all the other nodes in the cluster.
+      For example, you can use scp to copy this file to all the nodes.
+
+      Note : Set the path to hdfs ddl in carbon.properties in the master node.
+
+   2. For the changes to take effect, restart the Spark cluster.
+
+## Failed to execute Concurrent Operations(Load,Insert,Update) on table by multiple workers.
+
+  **Symptom**
+
+  Execution fails with the following exception :
+
+   ```
+   Table is locked for updation.
+   ```
+
+  **Possible Cause**
+
+  Concurrency not supported.
+
+  **Procedure**
+
+  Worker must wait for the query execution to complete and the table to release the lock for another query execution to succeed..
+
+## Failed to create a table with a single numeric column.
+
+  **Symptom**
+
+  Execution fails with the following exception :
+
+   ```
+   Table creation fails.
+   ```
+
+  **Possible Cause**
+
+  Behavior not supported.
+
+  **Procedure**
+
+  A single column that can be considered as dimension is mandatory for table creation.


[4/9] incubator-carbondata git commit: Create DataFrame example in example/spark2, read carbon data to dataframe

Posted by ra...@apache.org.
Create DataFrame example in example/spark2, read carbon data to dataframe

Create DataFrame example in example/spark2, read carbon data to dataframe

Create DataFrame example in example/spark2, read carbon data to dataframe

Create CarbonDataFrameExample in example/spark2

fix scalastyle

trigger travis ci


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/ec4ec129
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/ec4ec129
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/ec4ec129

Branch: refs/heads/branch-1.0
Commit: ec4ec1294de330d85e1ea9b7422bb161c19905c3
Parents: b79ec78
Author: chenliang613 <ch...@huawei.com>
Authored: Wed Feb 8 00:06:29 2017 -0500
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:28:20 2017 +0530

----------------------------------------------------------------------
 .../examples/CarbonDataFrameExample.scala       | 89 ++++++++++++++++++++
 1 file changed, 89 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/ec4ec129/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala
----------------------------------------------------------------------
diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala
new file mode 100644
index 0000000..e4d1646
--- /dev/null
+++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonDataFrameExample.scala
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import java.io.File
+
+import org.apache.spark.sql.{SaveMode, SparkSession}
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CarbonDataFrameExample {
+
+  def main(args: Array[String]) {
+    val rootPath = new File(this.getClass.getResource("/").getPath
+                            + "../../../..").getCanonicalPath
+    val storeLocation = s"$rootPath/examples/spark2/target/store"
+    val warehouse = s"$rootPath/examples/spark2/target/warehouse"
+    val metastoredb = s"$rootPath/examples/spark2/target"
+
+    CarbonProperties.getInstance()
+      .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "yyyy/MM/dd")
+
+    import org.apache.spark.sql.CarbonSession._
+    val spark = SparkSession
+      .builder()
+      .master("local")
+      .appName("CarbonDataFrameExample")
+      .config("spark.sql.warehouse.dir", warehouse)
+      .getOrCreateCarbonSession(storeLocation, metastoredb)
+
+    spark.sparkContext.setLogLevel("ERROR")
+
+    // Writes Dataframe to CarbonData file:
+    import spark.implicits._
+    val df = spark.sparkContext.parallelize(1 to 100)
+      .map(x => ("a", "b", x))
+      .toDF("c1", "c2", "number")
+
+    // Saves dataframe to carbondata file
+    df.write
+      .format("carbondata")
+      .option("tableName", "carbon_table")
+      .option("compress", "true")
+      .option("tempCSV", "false")
+      .mode(SaveMode.Overwrite)
+      .save()
+
+    spark.sql(""" SELECT * FROM carbon_table """).show()
+
+    // Specify schema
+    import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
+    val customSchema = StructType(Array(
+      StructField("c1", StringType),
+      StructField("c2", StringType),
+      StructField("number", IntegerType)))
+
+    // Reads carbondata to dataframe
+    val carbondf = spark.read
+      .format("carbondata")
+      .schema(customSchema)
+      .option("tableName", "carbon_table")
+      .load()
+
+    // Dataframe operations
+    carbondf.printSchema()
+    carbondf.select($"c1", $"number" + 10).show()
+    carbondf.filter($"number" > 31).show()
+
+    spark.sql("DROP TABLE IF EXISTS carbon_table")
+  }
+}
+// scalastyle:on println


[8/9] incubator-carbondata git commit: [CARBONDATA-694] update quick start document

Posted by ra...@apache.org.
[CARBONDATA-694] update quick start document


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/babe5f86
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/babe5f86
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/babe5f86

Branch: refs/heads/branch-1.0
Commit: babe5f86cbed9d9f2eba56be9670e3941f582d1f
Parents: e0016e2
Author: hexiaoqiao <he...@meituan.com>
Authored: Wed Feb 15 15:05:24 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Fri Feb 17 19:29:59 2017 +0530

----------------------------------------------------------------------
 docs/quick-start-guide.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/babe5f86/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index 5a2d6e2..e6ef742 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -62,8 +62,9 @@ import org.apache.spark.sql.CarbonSession._
 * Create a CarbonSession :
 
 ```
-val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession()
+val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>")
 ```
+NOTE: By default metastore location is pointed to "../carbon.metastore", user can provide own metastore location to CarbonSession like `SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs store path>", "<local metastore path>")`
 
 #### Executing Queries