You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ch...@apache.org on 2018/09/07 15:53:47 UTC

[2/4] carbondata git commit: [CARBONDATA-2915] Reformat Documentation of CarbonData

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/file-structure-of-carbondata.md b/docs/file-structure-of-carbondata.md
index 303d0e0..ba9004c 100644
--- a/docs/file-structure-of-carbondata.md
+++ b/docs/file-structure-of-carbondata.md
@@ -6,35 +6,173 @@
     (the "License"); you may not use this file except in compliance with 
     the License.  You may obtain a copy of the License at
 
-      http://www.apache.org/licenses/LICENSE-2.0
+```
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software 
+distributed under the License is distributed on an "AS IS" BASIS, 
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and 
+limitations under the License.
+```
 
-    Unless required by applicable law or agreed to in writing, software 
-    distributed under the License is distributed on an "AS IS" BASIS, 
-    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-    See the License for the specific language governing permissions and 
-    limitations under the License.
 -->
 
-# CarbonData File Structure
+# CarbonData table structure
 
 CarbonData files contain groups of data called blocklets, along with all required information like schema, offsets and indices etc, in a file header and footer, co-located in HDFS.
 
 The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.
 
-### Understanding CarbonData File Structure
-* Block : It would be as same as HDFS block, CarbonData creates one file for each data block, user can specify TABLE_BLOCKSIZE during creation table. Each file contains File Header, Blocklets and File Footer.
+This document describes the what a CarbonData table looks like in a HDFS directory, files written and content of each file.
+
+- [File Directory Structure](#file-directory-structure)
+
+- [File Content details](#file-content-details)
+  - [Schema file format](#schema-file-format)
+  - [CarbonData file format](#carbondata-file-format)
+    - [Blocklet format](#blocklet-format)
+      - [V1](#v1)
+      - [V2](#v2)
+      - [V3](#v3)
+    - [Footer format](#footer-format)
+  - [carbonindex file format](#carbonindex-file-format)
+  - [Dictionary file format](#dictionary-file-format)
+  - [tablestatus file format](#tablestatus-file-format)
+
+## File Directory Structure
+
+The CarbonData files are stored in the location specified by the ***carbon.storelocation*** configuration (configured in carbon.properties; if not configured, the default is ../carbon.store).
+
+  The file directory structure is as below: 
+
+![File Directory Structure](../docs/images/2-1_1.png?raw=true)
+
+1. ModifiedTime.mdt records the timestamp of the metadata with the modification time attribute of the file. When the drop table and create table are used, the modification time of the file is updated.This is common to all databases and hence is kept in parallel to databases
+2. The **default** is the database name and contains the user tables.default is used when user doesn't specify any database name;else user configured database name will be the directory name. user_table is the table name.
+3. Metadata directory stores schema files, tablestatus and dictionary files (including .dict, .dictmeta and .sortindex). There are three types of metadata data information files.
+4. data and index files are stored under directory named **Fact**. The Fact directory has a Part0 partition directory, where 0 is the partition number.
+5. There is a Segment_0 directory under the Part0 directory, where 0 is the segment number.
+6. There are two types of files, carbondata and carbonindex, in the Segment_0 directory.
+
+
+
+## File Content details
+
+When the table is created, the user_table directory is generated, and a schema file is generated in the Metadata directory for recording the table structure.
+
+When loading data in batches, each batch loading generates a new segment directory. The scheduling tries to control a task processing data loading task on each node. Each task will generate multiple carbondata files and one carbonindex file.
+
+During  global dictionary generation, if the two-pass scheme is used, before the data is loaded, the corresponding dict, dictmeta and sortindex files are generated for each dictionary-encoded column, and partial dictionary files can be provided by the pre-define dictionary method to reduce the need. A dictionary-encoded column is generated by scanning the full amount of data; a dictionary file of all dictionary code columns can also be provided by the all dictionary method to avoid scanning data. If the single-pass scheme is adopted, the global dictionary code is generated in real time during data loading, and after the data is loaded, the dictionary is solidified into a dictionary file.
+
+The following sections use the Java object generated by the thrift file describing the carbondata file format to explain the contents of each file one by one (you can also directly read the format defined in the [thrift file](https://github.com/apache/carbondata/tree/master/format/src/main/thrift))
+
+### Schema file format
+
+The contents of the schema file is as shown below
+
+![Schema file format](../docs/images/2-2_1.png?raw=true)
+
+1. TableSchema class
+    The TableSchema class does not store the table name, it is infered from the directory name(user_table).
+    tableProperties is used to record table-related properties, such as: table_blocksize.
+2. ColumnSchema class
+    Encoders are used to record the encoding used in column storage.
+    columnProperties is used to record column related properties.
+3. BucketingInfo class
+    When creating a bucket table, you can specify the number of buckets in the table and the column to splitbuckets.
+4. DataType class
+    Describes the data types supported by CarbonData.
+5. Encoding class
+    Several encodings that may be used in CarbonData files.
+
+### CarbonData file format
+
+#### File Header
+
+It contains CarbonData file version number, list of column schema and schema updation timestamp.
+
+![File Header](../docs/images/carbon_data_file_structure_new.png?raw=true)
+
+The carbondata file consists of multiple blocklets and footer parts. The blocklet is the dataset inside the carbondata file (the latest V3 format, the default configuration is 64MB), each blocklet contains a ColumnChunk for each column, and a ColumnChunk may contain one or more Column Pages.
+
+The carbondata file currently supports V1, V2 and V3 versions. The main difference is the change of the blocklet part, which is introduced one by one.
+
+#### Blocklet format
+
+#####  V1
+
+ Blocket consists of all column data pages, RLE pages, and rowID pages. Since the pages in the blocklet are grouped according to the page type, the three pieces of data of each column are distributed and stored in the blocklet, and the offset and length information of all the pages need to be recorded in the footer part.
+
+![V1](../docs/images/2-3_1.png?raw=true)
+
+##### V2
+
+The blocklet consists of ColumnChunk for all columns. The ColumnChunk for a column consists of a ColumnPage, which includes the data chunk header, data page, RLE page, and rowID page. Since ColumnChunk aggregates the three types of Page data of the column together, it can read the column data using fewer readers. Since the header part records the length information of all the pages, the footer part only needs to record the offset and length of the ColumnChunk, and also reduces the amount of footer data.
+
+![V2](../docs/images/2-3_2.png?raw=true)
+
+##### V3
+
+The blocklet is also composed of ColumnChunks of all columns. What is changed is that a ColumnChunk consists of one or more Column Pages, and Column Page adds a new BlockletMinMaxIndex.
+
+Compared with V2: The blocklet data volume of V2 format defaults to 120,000 lines, and the blocklet data volume of V3 format defaults to 64MB. For the same size data file, the information of the footer part index metadata may be further reduced; meanwhile, the V3 format adds a new page. Level data filtering, and the amount of data per page is only 32,000 lines by default, which is much less than the 120,000 lines of V2 format. The accuracy of data filtering hits further, and more data can be filtered out before decompressing data.
+
+![V3](../docs/images/2-3_3.png?raw=true)
+
+#### Footer format
+
+Footer records each carbondata
+All blocklet data distribution information and statistical related metadata information (minmax, startkey/endkey) inside the file.
+
+![Footer format](../docs/images/2-3_4.png?raw=true)
+
+1.  BlockletInfo3 is used to record the offset and length of all ColumnChunk3.
+2.  SegmentInfo is used to record the number of columns and the cardinality of each column.
+3.  BlockletIndex includes BlockletMinMaxIndex and BlockletBTreeIndex.
+
+BlockletBTreeIndex is used to record the startkey/endkey of all blocklets in the block. When querying, the startkey/endkey of the query is generated by filtering conditions combined with mdkey. With BlocketBtreeIndex, the range of blocklets satisfying the conditions in each block can be delineated.
+
+BlockletMinMaxIndex is used to record the min/max value of all columns in the blocklet. By using the min/max check on the filter condition, you can skip the block/blocklet that does not satisfy the condition.
+
+### carbonindex file format
+
+Extract the BlockletIndex part of the footer part to generate the carbonindex file. Load data in batches, schedule as much as possible to control a node to start a task, each task generates multiple carbondata files and a carbonindex file. The carbonindex file records the index information of all the blocklets in all the carbondata files generated by the task.
+
+As shown in the figure, the index information corresponding to a block is recorded by a BlockIndex object, including carbondata filename, footer offset and BlockletIndex. The BlockIndex data volume is less than the footer. The file is directly used to build the index on the driver side when querying, without having to skip the footer part of the data volume of multiple data files.
+
+![carbonindex file format](../docs/images/2-4_1.png?raw=true)
+
+### Dictionary file format
+
+
+For each dictionary encoded column, a dictionary file is used to store the dictionary metadata for that column.
+
+1. dict file records the distinct value list of a column
+
+For the first time dataloading, the file is generated using a distinct value list of a column. The value in the file is unordered; the subsequent append is used. In the second step of dataloading (Data Convert Step), the dictionary code column will replace the true value of the data with the dictionary key.
+
+![Dictionary file format](../docs/images/2-5_1.png?raw=true)
+
+
+2.  dictmeta records the metadata description of the new distinct value of each dataloading
+
+The dictionary cache uses this information to incrementally flush the cache.
+
+![Dictionary Chunk](../docs/images/2-5_2.png?raw=true)
+	
+
+3.  sortindex records the result set of the key code of the dictionary code sorted by value.
+
+In dataLoading, if there is a new dictionary value, the sortindex file will be regenerated using all the dictionary codes.
+
+Filtering queries based on dictionary code columns need to convert the value filter filter to the key filter condition. Using the sortindex file, you can quickly construct an ordered value sequence to quickly find the key value corresponding to the value, thus speeding up the conversion process.
+
+![sortindex file format](../docs/images/2-5_3.png?raw=true)
 
-![CarbonData File Structure](../docs/images/carbon_data_file_structure_new.png?raw=true)
+### tablestatus file format
 
-* File Header : It contains CarbonData file version number, list of column schema and schema updation timestamp.
-* File Footer : it contains Number of rows, segmentinfo ,all blocklets’ info and index, you can find the detail from the below diagram.
-* Blocklet : Rows are grouped to form a blocklet, the size of the blocklet is configurable and default size is 64MB, Blocklet contains Column Page groups for each column.
-* Column Page Group : Data of one column and it is further divided into pages, it is guaranteed to be contiguous in file.
-* Page : It has the data of one column and the number of row is fixed to 32000 size.
+Tablestatus records the segment-related information (in gson format) for each load and merge, including load time, load status, segment name, whether it was deleted, and the segment name incorporated. Regenerate the tablestatusfile after each load or merge.
 
-![CarbonData File Format](../docs/images/carbon_data_format_new.png?raw=true)
+![tablestatus file format](../docs/images/2-6_1.png?raw=true)
 
-### Each page contains three types of data
-* Data Page: Contains the encoded data of a column of columns.
-* Row ID Page (optional): Contains the row ID mappings used when the data page is stored as an inverted index.
-* RLE Page (optional): Contains additional metadata used when the data page is RLE coded.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/hive-guide.md
----------------------------------------------------------------------
diff --git a/docs/hive-guide.md b/docs/hive-guide.md
new file mode 100644
index 0000000..c38a539
--- /dev/null
+++ b/docs/hive-guide.md
@@ -0,0 +1,100 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# Quick Start
+This tutorial provides a quick introduction to using current integration/hive module.
+
+## Build (In 1.2.0, hive integration only support spark2.1 and hadoop2.7.2)
+mvn -DskipTests -Pspark-2.1 -Phadoop-2.7.2 clean package
+
+## Prepare CarbonData in Spark
+* Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData.
+
+  ```
+  cd carbondata
+  cat > sample.csv << EOF
+  id,name,scale,country,salary
+  1,yuhai,1.77,china,33000.1
+  2,runlin,1.70,china,33000.2
+  EOF
+  ```
+
+* copy data to HDFS
+
+```
+$HADOOP_HOME/bin/hadoop fs -put sample.csv <hdfs store path>/sample.csv
+```
+
+* Add the following params to $SPARK_CONF_DIR/conf/hive-site.xml
+```xml
+<property>
+  <name>hive.metastore.pre.event.listeners</name>
+  <value>org.apache.carbondata.hive.CarbonHiveMetastoreListener</value>
+</property>
+```
+* Start Spark shell by running the following command in the Spark directory
+
+```
+./bin/spark-shell --jars <carbondata assembly jar path, carbon hive jar path>
+```
+
+```
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.CarbonSession._
+val rootPath = "hdfs:////user/hadoop/carbon"
+val storeLocation = s"$rootPath/store"
+val warehouse = s"$rootPath/warehouse"
+val metastoredb = s"$rootPath/metastore_db"
+
+val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metastoredb)
+
+carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'")
+carbon.sql("LOAD DATA INPATH '<hdfs store path>/sample.csv' INTO TABLE hive_carbon")
+scala>carbon.sql("SELECT * FROM hive_carbon").show()
+```
+
+## Query Data in Hive
+### Configure hive classpath
+```
+mkdir hive/auxlibs/
+cp carbondata/assembly/target/scala-2.11/carbondata_2.11*.jar hive/auxlibs/
+cp carbondata/integration/hive/target/carbondata-hive-*.jar hive/auxlibs/
+cp $SPARK_HOME/jars/spark-catalyst*.jar hive/auxlibs/
+cp $SPARK_HOME/jars/scala*.jar hive/auxlibs/
+export HIVE_AUX_JARS_PATH=hive/auxlibs/
+```
+### Fix snappy issue
+```
+copy snappy-java-xxx.jar from "./<SPARK_HOME>/jars/" to "./Library/Java/Extensions"
+export HADOOP_OPTS="-Dorg.xerial.snappy.lib.path=/Library/Java/Extensions -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib -Dorg.xerial.snappy.tempdir=/Users/apple/DEMO/tmp"
+```
+
+### Start hive client
+$HIVE_HOME/bin/hive
+
+### Query data from hive table
+
+```
+set hive.mapred.supports.subdirectories=true;
+set mapreduce.input.fileinputformat.input.dir.recursive=true;
+
+select * from hive_carbon;
+select count(*) from hive_carbon;
+select * from hive_carbon order by id;
+```
+
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/how-to-contribute-to-apache-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/how-to-contribute-to-apache-carbondata.md b/docs/how-to-contribute-to-apache-carbondata.md
new file mode 100644
index 0000000..f64c948
--- /dev/null
+++ b/docs/how-to-contribute-to-apache-carbondata.md
@@ -0,0 +1,192 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# How to contribute to Apache CarbonData
+
+The Apache CarbonData community welcomes all kinds of contributions from anyone with a passion for
+faster data format! Apache CarbonData is a new file format for faster interactive query using
+advanced columnar storage, index, compression and encoding techniques to improve computing
+efficiency,in turn it will help speedup queries an order of magnitude faster over PetaBytes of data.
+
+We use a review-then-commit workflow in CarbonData for all contributions.
+
+* Engage -> Design -> Code -> Review -> Commit
+
+## Engage
+
+### Mailing list(s)
+
+We discuss design and implementation issues on dev@carbondata.apache.org Join by
+emailing dev-subscribe@carbondata.apache.org
+
+### Apache JIRA
+
+We use [Apache JIRA](https://issues.apache.org/jira/browse/CARBONDATA) as an issue tracking and
+project management tool, as well as a way to communicate among a very diverse and distributed set
+of contributors. To be able to gather feedback, avoid frustration, and avoid duplicated efforts all
+CarbonData-related work should be tracked there.
+
+If you do not already have an Apache JIRA account, sign up [here](https://issues.apache.org/jira/).
+
+If a quick search doesn’t turn up an existing JIRA issue for the work you want to contribute,
+create it. Please discuss your proposal with a committer or the component lead in JIRA or,
+alternatively, on the developer mailing list(dev@carbondata.apache.org).
+
+If there’s an existing JIRA issue for your intended contribution, please comment about your
+intended work. Once the work is understood, a committer will assign the issue to you.
+(If you don’t have a JIRA role yet, you’ll be added to the “contributor” role.) If an issue is
+currently assigned, please check with the current assignee before reassigning.
+
+For moderate or large contributions, you should not start coding or writing a design doc unless
+there is a corresponding JIRA issue assigned to you for that work. Simple changes,
+like fixing typos, do not require an associated issue.
+
+### Design
+
+To clearly express your thoughts and get early feedback from other community members, we encourage you to clearly scope, document the design of non-trivial contributions and discuss with the CarbonData community before you start coding.
+
+Generally, the JIRA issue is the best place to gather relevant design docs, comments, or references. It’s great to explicitly include relevant stakeholders early in the conversation. For designs that may be generally interesting, we also encourage conversations on the developer’s mailing list.
+
+### Code
+
+We use GitHub’s pull request functionality to review proposed code changes.
+If you do not already have a personal GitHub account, sign up [here](https://github.com).
+
+### Git config
+
+Ensure to finish the below config(user.email, user.name) before starting PR works.
+```
+$ git config --global user.email "you@example.com"
+$ git config --global user.name "Your Name"
+```
+
+#### Fork the repository on GitHub
+
+Go to the [Apache CarbonData GitHub mirror](https://github.com/apache/carbondata) and
+fork the repository to your account.
+This will be your private workspace for staging changes.
+
+#### Clone the repository locally
+
+You are now ready to create the development environment on your local machine.
+Clone CarbonData’s read-only GitHub mirror.
+```
+$ git clone https://github.com/apache/carbondata.git
+$ cd carbondata
+```
+Add your forked repository as an additional Git remote, where you’ll push your changes.
+```
+$ git remote add <GitHub_user> https://github.com/<GitHub_user>/carbondata.git
+```
+You are now ready to start developing!
+
+#### Create a branch in your fork
+
+You’ll work on your contribution in a branch in your own (forked) repository. Create a local branch,
+initialized with the state of the branch you expect your changes to be merged into.
+Keep in mind that we use several branches, including master, feature-specific, and
+release-specific branches. If you are unsure, initialize with the state of the master branch.
+```
+$ git fetch --all
+$ git checkout -b <my-branch> origin/master
+```
+At this point, you can start making and committing changes to this branch in a standard way.
+
+#### Syncing and pushing your branch
+
+Periodically while you work, and certainly before submitting a pull request, you should update
+your branch with the most recent changes to the target branch.
+```
+$ git pull --rebase
+```
+Remember to always use --rebase parameter to avoid extraneous merge commits.
+
+To push your local, committed changes to your (forked) repository on GitHub, run:
+```
+$ git push <GitHub_user> <my-branch>
+```
+#### Testing
+
+All code should have appropriate unit testing coverage. New code should have new tests in the
+same contribution. Bug fixes should include a regression test to prevent the issue from reoccurring.
+
+For contributions to the Java code, run unit tests locally via Maven.
+```
+$ mvn clean verify
+```
+
+### Review
+
+Once the initial code is complete and the tests pass, it’s time to start the code review process.
+We review and discuss all code, no matter who authors it. It’s a great way to build community,
+since you can learn from other developers, and they become familiar with your contribution.
+It also builds a strong project by encouraging a high quality bar and keeping code consistent
+throughout the project.
+
+#### Create a pull request
+
+Organize your commits to make your reviewer’s job easier. Use the following command to
+re-order, squash, edit, or change description of individual commits.
+```
+$ git rebase -i origin/master
+```
+Navigate to the CarbonData GitHub mirror to create a pull request. The title of the pull request
+should be strictly in the following format:
+```
+[CARBONDATA-JiraTicketNumer][FeatureName] Description of pull request    
+```
+Please include a descriptive pull request message to help make the reviewer’s job easier:
+```
+ - The root cause/problem statement
+ - What is the implemented solution
+ ```
+
+If you know a good committer to review your pull request, please make a comment like the following.
+If not, don’t worry, a committer will pick it up.
+```
+Hi @<committer/reviewer name>, can you please take a look?
+```
+
+#### Code Review and Revision
+
+During the code review process, don’t rebase your branch or otherwise modify published commits,
+since this can remove existing comment history and be confusing to the reviewer,
+When you make a revision, always push it in a new commit.
+
+Our GitHub mirror automatically provides pre-commit testing coverage using Jenkins.
+Please make sure those tests pass,the contribution cannot be merged otherwise.
+
+#### LGTM
+Once the reviewer is happy with the change, they’ll respond with an LGTM (“looks good to me!”).
+At this point, the committer will take over, possibly make some additional touch ups,
+and merge your changes into the codebase.
+
+In the case both the author and the reviewer are committers, either can merge the pull request.
+Just be sure to communicate clearly whose responsibility it is in this particular case.
+
+Thank you for your contribution to Apache CarbonData!
+
+#### Deleting your branch(optional)
+Once the pull request is merged into the Apache CarbonData repository, you can safely delete the
+branch locally and purge it from your forked repository.
+
+From another local branch, run:
+```
+$ git fetch --all
+$ git branch -d <my-branch>
+$ git push <GitHub_user> --delete <my-branch>
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-1_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-1_1.png b/docs/images/2-1_1.png
new file mode 100644
index 0000000..676e041
Binary files /dev/null and b/docs/images/2-1_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-2_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-2_1.png b/docs/images/2-2_1.png
new file mode 100644
index 0000000..3369d45
Binary files /dev/null and b/docs/images/2-2_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-3_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-3_1.png b/docs/images/2-3_1.png
new file mode 100644
index 0000000..bdd346e
Binary files /dev/null and b/docs/images/2-3_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-3_2.png
----------------------------------------------------------------------
diff --git a/docs/images/2-3_2.png b/docs/images/2-3_2.png
new file mode 100644
index 0000000..b5b33aa
Binary files /dev/null and b/docs/images/2-3_2.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-3_3.png
----------------------------------------------------------------------
diff --git a/docs/images/2-3_3.png b/docs/images/2-3_3.png
new file mode 100644
index 0000000..be39323
Binary files /dev/null and b/docs/images/2-3_3.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-3_4.png
----------------------------------------------------------------------
diff --git a/docs/images/2-3_4.png b/docs/images/2-3_4.png
new file mode 100644
index 0000000..6da4cc1
Binary files /dev/null and b/docs/images/2-3_4.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-4_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-4_1.png b/docs/images/2-4_1.png
new file mode 100644
index 0000000..52b3b42
Binary files /dev/null and b/docs/images/2-4_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-5_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-5_1.png b/docs/images/2-5_1.png
new file mode 100644
index 0000000..b219d8b
Binary files /dev/null and b/docs/images/2-5_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-5_2.png
----------------------------------------------------------------------
diff --git a/docs/images/2-5_2.png b/docs/images/2-5_2.png
new file mode 100644
index 0000000..ca9d627
Binary files /dev/null and b/docs/images/2-5_2.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-5_3.png
----------------------------------------------------------------------
diff --git a/docs/images/2-5_3.png b/docs/images/2-5_3.png
new file mode 100644
index 0000000..27aaca8
Binary files /dev/null and b/docs/images/2-5_3.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/2-6_1.png
----------------------------------------------------------------------
diff --git a/docs/images/2-6_1.png b/docs/images/2-6_1.png
new file mode 100644
index 0000000..d61c084
Binary files /dev/null and b/docs/images/2-6_1.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/images/carbondata-performance.png
----------------------------------------------------------------------
diff --git a/docs/images/carbondata-performance.png b/docs/images/carbondata-performance.png
new file mode 100644
index 0000000..eee7bf6
Binary files /dev/null and b/docs/images/carbondata-performance.png differ

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/introduction.md
----------------------------------------------------------------------
diff --git a/docs/introduction.md b/docs/introduction.md
new file mode 100644
index 0000000..434ccfa
--- /dev/null
+++ b/docs/introduction.md
@@ -0,0 +1,117 @@
+## What is CarbonData
+
+CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data with Spark SQL. CarbonData allows faster interactive queries over PetaBytes of data.
+
+
+
+## What does this mean
+
+CarbonData has specially engineered optimizations like multi level indexing, compression and encoding techniques targeted to improve performance of analytical queries which can include filters, aggregation and distinct counts where users expect sub second response time for queries on TB level data on commodity hardware clusters with just a few nodes.
+
+CarbonData has 
+
+- **Unique data organisation** for faster retrievals and minimise amount of data retrieved
+
+- **Advanced push down optimisations** for deep integration with Spark so as to improvise the Spark DataSource API and other experimental features thereby ensure computing is performed close to the data to minimise amount of data read, processed, converted and transmitted(shuffled) 
+
+- **Multi level indexing** to efficiently prune the files and data to be scanned and hence reduce I/O scans and CPU processing
+
+## CarbonData Features & Functions
+
+CarbonData has rich set of featues to support various use cases in Big Data analytics.The below table lists the major features supported by CarbonData.
+
+
+
+### Table Management
+
+- ##### DDL (Create, Alter,Drop,CTAS)
+
+​	CarbonData provides its own DDL to create and manage carbondata tables.These DDL conform to 			Hive,Spark SQL format and support additional properties and configuration to take advantages of CarbonData functionalities.
+
+- ##### DML(Load,Insert)
+
+  CarbonData provides its own DML to manage data in carbondata tables.It adds many customizations through configurations to completely customize the behavior as per user requirement scenarios.
+
+- ##### Update and Delete
+
+  CarbonData supports Update and Delete on Big Data.CarbonData provides the syntax similar to Hive to support IUD operations on CarbonData tables.
+
+- ##### Segment Management
+
+  CarbonData has unique concept of segments to manage incremental loads to CarbonData tables effectively.Segment management helps to easily control the table, perform easy retention, and is also used to provide transaction capability for operations being performed.
+
+- ##### Partition
+
+  CarbonData supports 2 kinds of partitions.1.partition similar to hive partition.2.CarbonData partition supporting hash,list,range partitioning.
+
+- ##### Compaction
+
+  CarbonData manages incremental loads as segments.Compaction help to compact the growing number of segments and also to improve query filter pruning.
+
+- ##### External Tables
+
+  CarbonData can read any carbondata file and automatically infer schema from the file and provide a relational table view to perform sql queries using Spark or any other applicaion.
+
+### DataMaps
+
+- ##### Pre-Aggregate
+
+  CarbonData has concept of datamaps to assist in pruning of data while querying so that performance is faster.Pre Aggregate tables are kind of datamaps which can improve the query performance by order of magnitude.CarbonData will automatically pre-aggregae the incremental data and re-write the query to automatically fetch from the most appropriate pre-aggregate table to serve the query faster.
+
+- ##### Time Series
+
+  CarbonData has built in understanding of time order(Year, month,day,hour, minute,second).Time series is a pre-aggregate table which can automatically roll-up the data to the desired level during incremental load and serve the query from the most appropriate pre-aggregate table.
+
+- ##### Bloom filter
+
+  CarbonData supports bloom filter as a datamap in order to quickly and efficiently prune the data for scanning and acheive faster query performance.
+
+- ##### Lucene
+
+  Lucene is popular for indexing text data which are long.CarbonData provides a lucene datamap so that text columns can be indexed using lucene and use the index result for efficient pruning of data to be retrieved during query.
+
+- ##### MV (Materialized Views)
+
+  MVs are kind of pre-aggregate tables which can support efficent query re-write and processing.CarbonData provides MV which can rewrite query to fetch from any table(including non-carbondata tables).Typical usecase is to store the aggregated data of a non-carbondata fact table into carbondata and use mv to rewrite the query to fetch from carbondata.
+
+### Streaming
+
+- ##### Spark Streaming
+
+  CarbonData supports streaming of data into carbondata in near-realtime and make it immediately available for query.CarbonData provides a DSL to create source and sink tables easily without the need for the user to write his application.
+
+### SDK
+
+- ##### CarbonData writer
+
+  CarbonData supports writing data from non-spark application using SDK.Users can use SDK to generate carbondata files from custom applications.Typical usecase is to write the streaming application plugged in to kafka and use carbondata as sink(target) table for storing.
+
+- ##### CarbonData reader
+
+  CarbonData supports reading of data from non-spark application using SDK.Users can use the SDK to read the carbondata files from their application and do custom processing.
+
+### Storage
+
+- ##### S3
+
+  CarbonData can write to S3, OBS or any cloud storage confirming to S3 protocol.CarbonData uses the HDFS api to write to cloud object stores.
+
+- ##### HDFS
+
+  CarbonData uses HDFS api to write and read data from HDFS.CarbonData can take advantage of the locality information to efficiently suggest spark to run tasks near to the data.
+
+
+
+## Integration with Big Data ecosystem
+
+Refer to Integration with [Spark](./quick-start-guide.md#spark), [Presto](./quick-start-guide.md#presto) for detailed information on integrating CarbonData with these execution engines.
+
+## Scenarios where CarbonData is suitable
+
+CarbonData is useful in various analytical work loads.Some of the most typical usecases where CarbonData is being used is [documented here](./usecases.md).
+
+
+
+## Performance Results
+
+![Performance Results](../docs/images/carbondata-performance.png?raw=true)

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/language-manual.md
----------------------------------------------------------------------
diff --git a/docs/language-manual.md b/docs/language-manual.md
new file mode 100644
index 0000000..123cae3
--- /dev/null
+++ b/docs/language-manual.md
@@ -0,0 +1,39 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# Overview
+
+
+
+CarbonData has its own parser, in addition to Spark's SQL Parser, to parse and process certain Commands related to CarbonData table handling. You can interact with the SQL interface using the [command-line](https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-spark-sql-cli) or over [JDBC/ODBC](https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbcodbc-server).
+
+- [Data Types](./supported-data-types-in-carbondata.md)
+- Data Definition Statements
+  - [DDL:](./ddl-of-carbondata.md)[Create](./ddl-of-carbondata.md#create-table),[Drop](./ddl-of-carbondata.md#drop-table),[Partition](./ddl-of-carbondata.md#partition),[Bucketing](./ddl-of-carbondata.md#bucketing),[Alter](./ddl-of-carbondata.md#alter-table),[CTAS](./ddl-of-carbondata.md#create-table-as-select),[External Table](./ddl-of-carbondata.md#create-external-table)
+  - [DataMaps](./datamap/datamap-management.md)
+    - [Bloom](./datamap/bloomfilter-datamap-guide.md)
+    - [Lucene](./datamap/lucene-datamap-guide.md)
+    - [Pre-Aggregate](./datamap/preaggregate-datamap-guide.md)
+    - [Time Series](./datamap/timeseries-datamap-guide.md)
+  - Materialized Views (MV)
+  - [Streaming](./streaming-guide.md)
+- Data Manipulation Statements
+  - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), [Insert](./ddl-of-carbondata.md#insert-overwrite), [Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
+  - [Segment Management](./segment-management-on-carbondata.md)
+- [Configuration Properties](./configuration-parameters.md)
+
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/performance-tuning.md
----------------------------------------------------------------------
diff --git a/docs/performance-tuning.md b/docs/performance-tuning.md
new file mode 100644
index 0000000..f56a63b
--- /dev/null
+++ b/docs/performance-tuning.md
@@ -0,0 +1,246 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+# Useful Tips
+  This tutorial guides you to create CarbonData Tables and optimize performance.
+  The following sections will elaborate on the below topics :
+
+  * [Suggestions to create CarbonData Table](#suggestions-to-create-carbondata-table)
+  * [Configuration for Optimizing Data Loading performance for Massive Data](#configuration-for-optimizing-data-loading-performance-for-massive-data)
+  * [Optimizing Query Performance](#configurations-for-optimizing-carbondata-performance)
+  * [Compaction Configurations for Optimizing CarbonData Query Performance](#compaction-configurations-for-optimizing-carbondata-query-performance)
+
+## Suggestions to Create CarbonData Table
+
+  For example, the results of the analysis for table creation with dimensions ranging from 10 thousand to 10 billion rows and 100 to 300 columns have been summarized below.
+  The following table describes some of the columns from the table used.
+
+  - **Table Column Description**
+
+| Column Name | Data Type     | Cardinality | Attribution |
+|-------------|---------------|-------------|-------------|
+| msisdn      | String        | 30 million  | Dimension   |
+| BEGIN_TIME  | BigInt        | 10 Thousand | Dimension   |
+| HOST        | String        | 1 million   | Dimension   |
+| Dime_1      | String        | 1 Thousand  | Dimension   |
+| counter_1   | Decimal       | NA          | Measure     |
+| counter_2   | Numeric(20,0) | NA          | Measure     |
+| ...         | ...           | NA          | Measure     |
+| counter_100 | Decimal       | NA          | Measure     |
+
+
+  - **Put the frequently-used column filter in the beginning of SORT_COLUMNS**
+
+  For example, MSISDN filter is used in most of the query then we must put the MSISDN as the first column in SORT_COLUMNS property.
+  The create table command can be modified as suggested below :
+
+  ```
+  create table carbondata_table(
+    msisdn String,
+    BEGIN_TIME bigint,
+    HOST String,
+    Dime_1 String,
+    counter_1, Decimal
+    ...
+    
+    )STORED AS carbondata
+    TBLPROPERTIES ('SORT_COLUMNS'='msisdn, Dime_1')
+  ```
+
+  Now the query with MSISDN in the filter will be more efficient.
+
+  - **Put the frequently-used columns in the order of low to high cardinality in SORT_COLUMNS**
+
+  If the table in the specified query has multiple columns which are frequently used to filter the results, it is suggested to put
+  the columns in the order of cardinality low to high in SORT_COLUMNS configuration. This ordering of frequently used columns improves the compression ratio and
+  enhances the performance of queries with filter on these columns.
+
+  For example, if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order of table is suggested as
+  Dime_1>HOST>MSISDN, because Dime_1 has the lowest cardinality.
+  The create table command can be modified as suggested below :
+
+  ```
+  create table carbondata_table(
+      msisdn String,
+      BEGIN_TIME bigint,
+      HOST String,
+      Dime_1 String,
+      counter_1, Decimal
+      ...
+      
+      )STORED AS carbondata
+      TBLPROPERTIES ('SORT_COLUMNS'='Dime_1, HOST, MSISDN')
+  ```
+
+  - **For measure type columns with non high accuracy, replace Numeric(20,0) data type with Double data type**
+
+  For columns of measure type, not requiring high accuracy, it is suggested to replace Numeric data type with Double to enhance query performance. 
+  The create table command can be modified as below :
+
+```
+  create table carbondata_table(
+    Dime_1 String,
+    BEGIN_TIME bigint,
+    END_TIME bigint,
+    HOST String,
+    MSISDN String,
+    counter_1 decimal,
+    counter_2 double,
+    ...
+    )STORED AS carbondata
+    TBLPROPERTIES ('SORT_COLUMNS'='Dime_1, HOST, MSISDN')
+```
+  The result of performance analysis of test-case shows reduction in query execution time from 15 to 3 seconds, thereby improving performance by nearly 5 times.
+
+ - **Columns of incremental character should be re-arranged at the end of dimensions**
+
+  Consider the following scenario where data is loaded each day and the begin_time is incremental for each load, it is suggested to put begin_time at the end of dimensions.
+  Incremental values are efficient in using min/max index. The create table command can be modified as below :
+
+  ```
+  create table carbondata_table(
+    Dime_1 String,
+    HOST String,
+    MSISDN String,
+    counter_1 double,
+    counter_2 double,
+    BEGIN_TIME bigint,
+    END_TIME bigint,
+    ...
+    counter_100 double
+    )STORED AS carbondata
+    TBLPROPERTIES ('SORT_COLUMNS'='Dime_1, HOST, MSISDN')
+  ```
+
+  **NOTE:**
+  + BloomFilter can be created to enhance performance for queries with precise equal/in conditions. You can find more information about it in BloomFilter datamap [document](./datamap/bloomfilter-datamap-guide.md).
+
+
+## Configuration for Optimizing Data Loading performance for Massive Data
+
+
+  CarbonData supports large data load, in this process sorting data while loading consumes a lot of memory and disk IO and
+  this can result sometimes in "Out Of Memory" exception.
+  If you do not have much memory to use, then you may prefer to slow the speed of data loading instead of data load failure.
+  You can configure CarbonData by tuning following properties in carbon.properties file to get a better performance.
+
+| Parameter | Default Value | Description/Tuning |
+|-----------|-------------|--------|
+|carbon.number.of.cores.while.loading|Default: 2.This value should be >= 2|Specifies the number of cores used for data processing during data loading in CarbonData. |
+|carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to write local file in sort step when loading data|
+|carbon.sort.file.write.buffer.size|Default:  50000.|DataOutputStream buffer. |
+|carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores used for temp file merging during data loading in CarbonData.|
+|carbon.merge.sort.prefetch|Default: true | You may want set this value to false if you have not enough memory|
+
+  For example, if there are 10 million records, and i have only 16 cores, 64GB memory, will be loaded to CarbonData table.
+  Using the default configuration  always fail in sort step. Modify carbon.properties as suggested below:
+
+  ```
+  carbon.merge.sort.reader.thread=1
+  carbon.sort.size=5000
+  carbon.sort.file.write.buffer.size=5000
+  carbon.merge.sort.prefetch=false
+  ```
+
+## Configurations for Optimizing CarbonData Performance
+
+  Recently we did some performance POC on CarbonData for Finance and telecommunication Field. It involved detailed queries and aggregation
+  scenarios. After the completion of POC, some of the configurations impacting the performance have been identified and tabulated below :
+
+| Parameter | Location | Used For  | Description | Tuning |
+|----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------||
+| carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | Data loading | During the loading of data, local temp is used to sort the data. This number specifies the minimum number of intermediate files after which the  merge sort has to be initiated. | Increasing the parameter to a higher value will improve the load performance. For example, when we increase the value from 20 to 100, it increases the data load performance from 35MB/S to more than 50MB/S. Higher values of this parameter consumes  more memory during the load. |
+| carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | Data loading | Specifies the number of cores used for data processing during data loading in CarbonData. | If you have more number of CPUs, then you can increase the number of CPUs, which will increase the performance. For example if we increase the value from 2 to 4 then the CSV reading performance can increase about 1 times |
+| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data loading and Querying | For minor compaction, specifies the number of segments to be merged in stage 1 and number of compacted segments to be merged in stage 2. | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. Configuring this parameter will merge the small segment to one big segment which will sort the data and improve the performance. For Example in one telecommunication scenario, the performance improves about 2 times after minor compaction. |
+| spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | The number of task started when spark shuffle. | The value can be 1 to 2 times as much as the executor cores. In an aggregation scenario, reducing the number from 200 to 32 reduced the query time from 17 to 9 seconds. |
+| spark.executor.instances/spark.executor.cores/spark.executor.memory | spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, and memory used for CarbonData query. | In the bank scenario, we provide the 4 CPUs cores and 15 GB for each executor which can get good performance. This 2 value does not mean more the better. It needs to be configured properly in case of limited resources. For example, In the bank scenario, it has enough CPU 32 cores each node but less memory 64 GB each node. So we cannot give more CPU but less memory. For example, when 4 cores and 12GB for each executor. It sometimes happens GC during the query which impact the query performance very much from the 3 second to more than 15 seconds. In this scenario need to increase the memory or decrease the CPU cores. |
+| carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading | The buffer size to store records, returned from the block scan. | In limit scenario this parameter is very important. For example your query limit is 1000. But if we set this value to 3000 that means we get 3000 records from scan but spark will only take 1000 rows. So the 2000 remaining are useless. In one Finance test case after we set it to 100, in the limit 1000 scenario the performance increase about 2 times in comparison to if we set this value to 12000. |
+| carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether use YARN local directories for multi-table load disk load balance | If this is set it to true CarbonData will use YARN local directories for multi-table load disk load balance, that will improve the data load performance. |
+| carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading | Whether to use multiple YARN local directories during table data loading for disk load balance | After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use all YARN local directories during data load for disk load balance, that will improve the data load performance. Please enable this property when you encounter disk hotspot problem during data loading. |
+| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. |
+| carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB~1GB. |
+| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the node load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB~256MB,Avoid generating a large number of small files. |
+
+  Note: If your CarbonData instance is provided only for query, you may specify the property 'spark.speculation=true' which is in conf directory of spark.
+
+## Compaction Configurations for Optimizing CarbonData Query Performance
+
+CarbonData provides many configurations to tune the compaction behavior so that query peformance is improved.
+
+
+
+Based on the number of cores available in the node, it is recommended to tune the configuration 	***carbon.number.of.cores.while.compacting*** appropriately.Configuring a higher value will improve the overall compaction performance.
+
+<p>&nbsp;</p>
+<table style="width: 777px;">
+<tbody>
+<tr style="height: 23px;">
+<td style="height: 23px; width: 95.375px;">No</td>
+<td style="height: 23px; width: 299.625px;">&nbsp;Data Loading frequency</td>
+<td style="height: 23px; width: 144px;">Data Size of each load</td>
+<td style="height: 23px; width: 204px;">Minor Compaction configuration</td>
+<td style="height: 23px; width: 197px;">&nbsp;Major compaction configuration</td>
+</tr>
+<tr style="height: 29.5px;">
+<td style="height: 29.5px; width: 95.375px;">1</td>
+<td style="height: 29.5px; width: 299.625px;">&nbsp;Batch(Once is several Hours)</td>
+<td style="height: 29.5px; width: 144px;">Big</td>
+<td style="height: 29.5px; width: 204px;">&nbsp;Not Suggested</td>
+<td style="height: 29.5px; width: 197px;">Configure Major Compaction size of 3-4 load size.Perform Major compaction once in a day</td>
+</tr>
+<tr style="height: 23px;">
+<td style="height: 23px; width: 95.375px;" rowspan="2">2</td>
+<td style="height: 23px; width: 299.625px;" rowspan="2">&nbsp;Batch(Once in few minutes)&nbsp;</td>
+<td style="height: 23px; width: 144px;">Big&nbsp;</td>
+<td style="height: 23px; width: 204px;">
+<p>&nbsp;Minor compaction (2,2).</p>
+<p>Enable Auto compaction, if high rate data loading speed is not required or the time between loads is sufficient to run the compaction</p>
+</td>
+<td style="height: 23px; width: 197px;">Major compaction size of 10 load size.Perform Major compaction once in a day</td>
+</tr>
+<tr style="height: 23px;">
+<td style="height: 23px; width: 144px;">Small</td>
+<td style="height: 23px; width: 204px;">
+<p>Minor compaction (6,6).</p>
+<p>Enable Auto compaction, if high rate data loading speed is not required or the time between loads is sufficient to run the compaction</p>
+</td>
+<td style="height: 23px; width: 197px;">Major compaction size of 10 load size.Perform Major compaction once in a day</td>
+</tr>
+<tr style="height: 23px;">
+<td style="height: 23px; width: 95.375px;">3</td>
+<td style="height: 23px; width: 299.625px;">&nbsp;History data loaded as single load,incremental loads matches&nbsp;(1) or (2)</td>
+<td style="height: 23px; width: 144px;">Big</td>
+<td style="height: 23px; width: 204px;">
+<p>&nbsp;Configure ALLOWED_COMPACTION_DAYS to exclude the History load.</p>
+<p>Configure Minor compaction configuration based&nbsp;condition (1) or (2)</p>
+</td>
+<td style="height: 23px; width: 197px;">&nbsp;Configure Major compaction size smaller than the history load size.</td>
+</tr>
+<tr style="height: 23px;">
+<td style="height: 23px; width: 95.375px;">4</td>
+<td style="height: 23px; width: 299.625px;">&nbsp;There can be error in recent data loaded.Need reload sometimes</td>
+<td style="height: 23px; width: 144px;">&nbsp;(1) or (2)</td>
+<td style="height: 23px; width: 204px;">
+<p>&nbsp;Configure COMPACTION_PRESERVE_SEGMENTS</p>
+<p>to exclude the recent few segments from compacting.</p>
+<p>Configure Minor compaction configuration based&nbsp;condition (1) or (2)</p>
+</td>
+<td style="height: 23px; width: 197px;">Same as (1) or (2)&nbsp;</td>
+</tr>
+</tbody>
+</table>
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index 1b3ffc2..37c398c 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -19,9 +19,9 @@
 This tutorial provides a quick introduction to using CarbonData.To follow along with this guide, first download a packaged release of CarbonData from the [CarbonData website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively it can be created following [Building CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
 
 ##  Prerequisites
-* Spark 2.2.1 version is installed and running.CarbonData supports Spark versions upto 2.2.1.Please follow steps described in [Spark docs website](https://spark.apache.org/docs/latest) for installing and running Spark.
+* CarbonData supports Spark versions upto 2.2.1.Please download Spark package from [Spark website](https://spark.apache.org/downloads.html)
 
-* Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData.
+* Create a sample.csv file using the following commands. The CSV file is required for loading data into CarbonData
 
   ```
   cd carbondata
@@ -33,7 +33,27 @@ This tutorial provides a quick introduction to using CarbonData.To follow along
   EOF
   ```
 
-## Interactive Analysis with Spark Shell Version 2.1
+## Integration
+
+CarbonData can be integrated with Spark and Presto Execution Engines.The below documentation guides on Installing and Configuring with these execution engines.
+
+### Spark
+
+[Installing and Configuring CarbonData to run locally with Spark Shell](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
+
+[Installing and Configuring CarbonData on Standalone Spark Cluster](#installing-and-configuring-carbondata-on-standalone-spark-cluster)
+
+[Installing and Configuring CarbonData on Spark on YARN Cluster](#installing-and-configuring-carbondata-on-spark-on-yarn-cluster)
+
+[Installing and Configuring CarbonData Thrift Server for Query Execution](#query-execution-using-carbondata-thrift-server)
+
+
+### Presto
+[Installing and Configuring CarbonData on Presto](#installing-and-configuring-carbondata-on-presto)
+
+
+
+## Installing and Configuring CarbonData to run locally with Spark Shell
 
 Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit [Apache Spark Documentation](http://spark.apache.org/docs/latest/) for more details on Spark shell.
 
@@ -72,12 +92,12 @@ val carbon = SparkSession.builder().config(sc.getConf)
 
 ```
 scala>carbon.sql("CREATE TABLE
-                        IF NOT EXISTS test_table(
-                                  id string,
-                                  name string,
-                                  city string,
-                                  age Int)
-                       STORED BY 'carbondata'")
+                    IF NOT EXISTS test_table(
+                    id string,
+                    name string,
+                    city string,
+                    age Int)
+                  STORED AS carbondata")
 ```
 
 ###### Loading Data to a Table
@@ -87,7 +107,7 @@ scala>carbon.sql("LOAD DATA INPATH '/path/to/sample.csv'
                   INTO TABLE test_table")
 ```
 **NOTE**: Please provide the real file path of `sample.csv` for the above script. 
-If you get "tablestatus.lock" issue, please refer to [troubleshooting](troubleshooting.md)
+If you get "tablestatus.lock" issue, please refer to [FAQ](faq.md)
 
 ###### Query Data from a Table
 
@@ -98,3 +118,341 @@ scala>carbon.sql("SELECT city, avg(age), sum(age)
                   FROM test_table
                   GROUP BY city").show()
 ```
+
+
+
+## Installing and Configuring CarbonData on Standalone Spark Cluster
+
+### Prerequisites
+
+- Hadoop HDFS and Yarn should be installed and running.
+- Spark should be installed and running on all the cluster nodes.
+- CarbonData user should have permission to access HDFS.
+
+### Procedure
+
+1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar`. 
+
+2. Copy `./assembly/target/scala-2.1x/carbondata_xxx.jar` to `$SPARK_HOME/carbonlib` folder.
+
+   **NOTE**: Create the carbonlib folder if it does not exist inside `$SPARK_HOME` path.
+
+3. Add the carbonlib folder path in the Spark classpath. (Edit `$SPARK_HOME/conf/spark-env.sh` file and modify the value of `SPARK_CLASSPATH` by appending `$SPARK_HOME/carbonlib/*` to the existing value)
+
+4. Copy the `./conf/carbon.properties.template` file from CarbonData repository to `$SPARK_HOME/conf/` folder and rename the file to `carbon.properties`.
+
+5. Repeat Step 2 to Step 5 in all the nodes of the cluster.
+
+6. In Spark node[master], configure the properties mentioned in the following table in `$SPARK_HOME/conf/spark-defaults.conf` file.
+
+| Property                        | Value                                                        | Description                                                  |
+| ------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| spark.driver.extraJavaOptions   | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. |
+| spark.executor.extraJavaOptions | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` | A string of extra JVM options to pass to executors. For instance, GC settings or other logging. **NOTE**: You can enter multiple values separated by space. |
+
+1. Add the following properties in `$SPARK_HOME/conf/carbon.properties` file:
+
+| Property             | Required | Description                                                  | Example                              | Remark                        |
+| -------------------- | -------- | ------------------------------------------------------------ | ------------------------------------ | ----------------------------- |
+| carbon.storelocation | NO       | Location where data CarbonData will create the store and write the data in its own format. If not specified then it takes spark.sql.warehouse.dir path. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory |
+
+1. Verify the installation. For example:
+
+```
+./spark-shell --master spark://HOSTNAME:PORT --total-executor-cores 2
+--executor-memory 2G
+```
+
+**NOTE**: Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
+
+
+
+## Installing and Configuring CarbonData on Spark on YARN Cluster
+
+   This section provides the procedure to install CarbonData on "Spark on YARN" cluster.
+
+### Prerequisites
+
+- Hadoop HDFS and Yarn should be installed and running.
+- Spark should be installed and running in all the clients.
+- CarbonData user should have permission to access HDFS.
+
+### Procedure
+
+   The following steps are only for Driver Nodes. (Driver nodes are the one which starts the spark context.)
+
+1. [Build the CarbonData](https://github.com/apache/carbondata/blob/master/build/README.md) project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar` and copy to `$SPARK_HOME/carbonlib` folder.
+
+   **NOTE**: Create the carbonlib folder if it does not exists inside `$SPARK_HOME` path.
+
+2. Copy the `./conf/carbon.properties.template` file from CarbonData repository to `$SPARK_HOME/conf/` folder and rename the file to `carbon.properties`.
+
+3. Create `tar.gz` file of carbonlib folder and move it inside the carbonlib folder.
+
+```
+cd $SPARK_HOME
+tar -zcvf carbondata.tar.gz carbonlib/
+mv carbondata.tar.gz carbonlib/
+```
+
+1. Configure the properties mentioned in the following table in `$SPARK_HOME/conf/spark-defaults.conf` file.
+
+| Property                        | Description                                                  | Value                                                        |
+| ------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
+| spark.master                    | Set this value to run the Spark in yarn cluster mode.        | Set yarn-client to run the Spark in yarn cluster mode.       |
+| spark.yarn.dist.files           | Comma-separated list of files to be placed in the working directory of each executor. | `$SPARK_HOME/conf/carbon.properties`                         |
+| spark.yarn.dist.archives        | Comma-separated list of archives to be extracted into the working directory of each executor. | `$SPARK_HOME/carbonlib/carbondata.tar.gz`                    |
+| spark.executor.extraJavaOptions | A string of extra JVM options to pass to executors. For instance  **NOTE**: You can enter multiple values separated by space. | `-Dcarbon.properties.filepath = carbon.properties`           |
+| spark.executor.extraClassPath   | Extra classpath entries to prepend to the classpath of executors. **NOTE**: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the values in below parameter spark.driver.extraClassPath | `carbondata.tar.gz/carbonlib/*`                              |
+| spark.driver.extraClassPath     | Extra classpath entries to prepend to the classpath of the driver. **NOTE**: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the value in below parameter spark.driver.extraClassPath. | `$SPARK_HOME/carbonlib/*`                                    |
+| spark.driver.extraJavaOptions   | A string of extra JVM options to pass to the driver. For instance, GC settings or other logging. | `-Dcarbon.properties.filepath = $SPARK_HOME/conf/carbon.properties` |
+
+1. Add the following properties in `$SPARK_HOME/conf/carbon.properties`:
+
+| Property             | Required | Description                                                  | Example                              | Default Value                 |
+| -------------------- | -------- | ------------------------------------------------------------ | ------------------------------------ | ----------------------------- |
+| carbon.storelocation | NO       | Location where CarbonData will create the store and write the data in its own format. If not specified then it takes spark.sql.warehouse.dir path. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory |
+
+1. Verify the installation.
+
+```
+ ./bin/spark-shell --master yarn-client --driver-memory 1g
+ --executor-cores 2 --executor-memory 2G
+```
+
+  **NOTE**: Make sure you have permissions for CarbonData JARs and files through which driver and executor will start.
+
+
+
+## Query Execution Using CarbonData Thrift Server
+
+### Starting CarbonData Thrift Server.
+
+   a. cd `$SPARK_HOME`
+
+   b. Run the following command to start the CarbonData thrift server.
+
+```
+./bin/spark-submit
+--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+```
+
+| Parameter           | Description                                                  | Example                                                    |
+| ------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- |
+| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/` folder. | carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar       |
+| carbon_store_path   | This is a parameter to the CarbonThriftServer class. This a HDFS path where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.properties. If not specified then it takes spark.sql.warehouse.dir path. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store` |
+
+**NOTE**: From Spark 1.6, by default the Thrift server runs in multi-session mode. Which means each JDBC/ODBC connection owns a copy of their own SQL configuration and temporary function registry. Cached tables are still shared though. If you prefer to run the Thrift server in single-session mode and share all SQL configuration and temporary function registry, please set option `spark.sql.hive.thriftServer.singleSession` to `true`. You may either add this option to `spark-defaults.conf`, or pass it to `spark-submit.sh` via `--conf`:
+
+```
+./bin/spark-submit
+--conf spark.sql.hive.thriftServer.singleSession=true
+--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+```
+
+**But** in single-session mode, if one user changes the database from one connection, the database of the other connections will be changed too.
+
+**Examples**
+
+- Start with default memory and executors.
+
+```
+./bin/spark-submit
+--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
+$SPARK_HOME/carbonlib
+/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar
+hdfs://<host_name>:port/user/hive/warehouse/carbon.store
+```
+
+- Start with Fixed executors and resources.
+
+```
+./bin/spark-submit
+--class org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
+--num-executors 3 --driver-memory 20g --executor-memory 250g 
+--executor-cores 32 
+/srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib
+/carbondata_2.xx-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar
+hdfs://<host_name>:port/user/hive/warehouse/carbon.store
+```
+
+### Connecting to CarbonData Thrift Server Using Beeline.
+
+```
+     cd $SPARK_HOME
+     ./sbin/start-thriftserver.sh
+     ./bin/beeline -u jdbc:hive2://<thriftserver_host>:port
+
+     Example
+     ./bin/beeline -u jdbc:hive2://10.10.10.10:10000
+```
+
+
+
+## Installing and Configuring CarbonData on Presto
+
+**NOTE:** **CarbonData tables cannot be created nor loaded from Presto.User need to create CarbonData Table and load data into it
+either with [Spark](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) or [SDK](./sdk-guide.md).
+Once the table is created,it can be queried from Presto.**
+
+
+### Installing Presto
+
+ 1. Download the 0.187 version of Presto using:
+    `wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+
+ 3. Download the Presto CLI for the coordinator and name it presto.
+
+  ```
+    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+
+    mv presto-cli-0.187-executable.jar presto
+
+    chmod +x presto
+  ```
+
+### Create Configuration Files
+
+  1. Create `etc` folder in presto-server-0.187 directory.
+  2. Create `config.properties`, `jvm.config`, `log.properties`, and `node.properties` files.
+  3. Install uuid to generate a node.id.
+
+      ```
+      sudo apt-get install uuid
+
+      uuid
+      ```
+
+
+##### Contents of your node.properties file
+
+  ```
+  node.environment=production
+  node.id=<generated uuid>
+  node.data-dir=/home/ubuntu/data
+  ```
+
+##### Contents of your jvm.config file
+
+  ```
+  -server
+  -Xmx16G
+  -XX:+UseG1GC
+  -XX:G1HeapRegionSize=32M
+  -XX:+UseGCOverheadLimit
+  -XX:+ExplicitGCInvokesConcurrent
+  -XX:+HeapDumpOnOutOfMemoryError
+  -XX:OnOutOfMemoryError=kill -9 %p
+  ```
+
+##### Contents of your log.properties file
+  ```
+  com.facebook.presto=INFO
+  ```
+
+ The default minimum level is `INFO`. There are four levels: `DEBUG`, `INFO`, `WARN` and `ERROR`.
+
+### Coordinator Configurations
+
+##### Contents of your config.properties
+  ```
+  coordinator=true
+  node-scheduler.include-coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery-server.enabled=true
+  discovery.uri=<coordinator_ip>:8086
+  ```
+The options `node-scheduler.include-coordinator=false` and `coordinator=true` indicate that the node is the coordinator and tells the coordinator not to do any of the computation work itself and to use the workers.
+
+**Note**: It is recommended to set `query.max-memory-per-node` to half of the JVM config max memory, though the workload is highly concurrent, lower value for `query.max-memory-per-node` is to be used.
+
+Also relation between below two configuration-properties should be like:
+If, `query.max-memory-per-node=30GB`
+Then, `query.max-memory=<30GB * number of nodes>`.
+
+### Worker Configurations
+
+##### Contents of your config.properties
+
+  ```
+  coordinator=false
+  http-server.http.port=8086
+  query.max-memory=50GB
+  query.max-memory-per-node=2GB
+  discovery.uri=<coordinator_ip>:8086
+  ```
+
+**Note**: `jvm.config` and `node.properties` files are same for all the nodes (worker + coordinator). All the nodes should have different `node.id`.(generated by uuid command).
+
+### Catalog Configurations
+
+1. Create a folder named `catalog` in etc directory of presto on all the nodes of the cluster including the coordinator.
+
+##### Configuring Carbondata in Presto
+1. Create a file named `carbondata.properties` in the `catalog` folder and set the required properties on all the nodes.
+
+### Add Plugins
+
+1. Create a directory named `carbondata` in plugin directory of presto.
+2. Copy `carbondata` jars to `plugin/carbondata` directory on all nodes.
+
+### Start Presto Server on all nodes
+
+```
+./presto-server-0.187/bin/launcher start
+```
+To run it as a background process.
+
+```
+./presto-server-0.187/bin/launcher run
+```
+To run it in foreground.
+
+### Start Presto CLI
+```
+./presto
+```
+To connect to carbondata catalog use the following command:
+
+```
+./presto --server <coordinator_ip>:8086 --catalog carbondata --schema <schema_name>
+```
+Execute the following command to ensure the workers are connected.
+
+```
+select * from system.runtime.nodes;
+```
+Now you can use the Presto CLI on the coordinator to query data sources in the catalog using the Presto workers.
+
+List the schemas(databases) available
+
+```
+show schemas;
+```
+
+Selected the schema where CarbonData table resides
+
+```
+use carbonschema;
+```
+
+List the available tables
+
+```
+show tables;
+```
+
+Query from the available tables
+
+```
+select * from carbon_table;
+```
+
+**Note :** Create Tables and data loads should be done before executing queries as we can not create carbon table from this interface.
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/s3-guide.md
----------------------------------------------------------------------
diff --git a/docs/s3-guide.md b/docs/s3-guide.md
index 7e989ac..a2e5f07 100644
--- a/docs/s3-guide.md
+++ b/docs/s3-guide.md
@@ -46,7 +46,7 @@ For example:
 CREATE TABLE IF NOT EXISTS db1.table1(col1 string, col2 int) STORED AS carbondata LOCATION 's3a://mybucket/carbonstore'
 ``` 
 
-For more details on create table, Refer [data-management-on-carbondata](./data-management-on-carbondata.md#create-table)
+For more details on create table, Refer [DDL of CarbonData](ddl-of-carbondata.md#create-table)
 
 # Authentication
 
@@ -84,6 +84,7 @@ sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.access.key","456")
 
 1. Object Storage like S3 does not support file leasing mechanism(supported by HDFS) that is 
 required to take locks which ensure consistency between concurrent operations therefore, it is 
-recommended to set the configurable lock path property([carbon.lock.path](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md#miscellaneous-configuration))
+recommended to set the configurable lock path property([carbon.lock.path](./configuration-parameters.md#system-configuration))
  to a HDFS directory.
 2. Concurrent data manipulation operations are not supported. Object stores follow eventual consistency semantics, i.e., any put request might take some time to reflect when trying to list. This behaviour causes the data read is always not consistent or not the latest.
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/sdk-guide.md
----------------------------------------------------------------------
diff --git a/docs/sdk-guide.md b/docs/sdk-guide.md
index 7ed8fc2..d786406 100644
--- a/docs/sdk-guide.md
+++ b/docs/sdk-guide.md
@@ -7,7 +7,7 @@
     the License.  You may obtain a copy of the License at
 
       http://www.apache.org/licenses/LICENSE-2.0
-
+    
     Unless required by applicable law or agreed to in writing, software 
     distributed under the License is distributed on an "AS IS" BASIS, 
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -16,8 +16,16 @@
 -->
 
 # SDK Guide
-In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader.
+
+CarbonData provides SDK to facilitate
+
+1. [Writing carbondata files from other application which does not use Spark](#sdk-writer)
+2. [Reading carbondata files from other application which does not use Spark](#sdk-reader)
+
 # SDK Writer
+
+In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader.
+
 This SDK writer, writes carbondata file and carbonindex file at a given path.
 External client can make use of this writer to convert other format data or live data to create carbondata and index files.
 These SDK writer output contains just a carbondata and carbonindex files. No metadata folder will be present.
@@ -865,4 +873,5 @@ public String getProperty(String key);
 */
 public String getProperty(String key, String defaultValue);
 ```
-Reference : [list of carbon properties](http://carbondata.apache.org/configuration-parameters.html)
+Reference : [list of carbon properties](./configuration-parameters.md)
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6e50c1c6/docs/segment-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/segment-management-on-carbondata.md b/docs/segment-management-on-carbondata.md
new file mode 100644
index 0000000..fe0cbd4
--- /dev/null
+++ b/docs/segment-management-on-carbondata.md
@@ -0,0 +1,142 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+
+## SEGMENT MANAGEMENT
+
+Each load into CarbonData is written into a separate folder called Segment.Segments is a powerful 
+concept which helps to maintain consistency of data and easy transaction management.CarbonData provides DML (Data Manipulation Language) commands to maintain the segments.
+
+- [Show Segments](#show-segment)
+- [Delete Segment by ID](#delete-segment-by-id)
+- [Delete Segment by Date](#delete-segment-by-date)
+- [Query Data with Specified Segments](#query-data-with-specified-segments)
+
+### SHOW SEGMENT
+
+  This command is used to list the segments of CarbonData table.
+
+  ```
+  SHOW [HISTORY] SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
+  ```
+
+  Example:
+  Show visible segments
+  ```
+  SHOW SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4
+  ```
+  Show all segments, include invisible segments
+  ```
+  SHOW HISTORY SEGMENTS FOR TABLE CarbonDatabase.CarbonTable LIMIT 4
+  ```
+
+### DELETE SEGMENT BY ID
+
+  This command is used to delete segment by using the segment ID. Each segment has a unique segment ID associated with it. 
+  Using this segment ID, you can remove the segment.
+
+  The following command will get the segmentID.
+
+  ```
+  SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
+  ```
+
+  After you retrieve the segment ID of the segment that you want to delete, execute the following command to delete the selected segment.
+
+  ```
+  DELETE FROM TABLE [db_name.]table_name WHERE SEGMENT.ID IN (segment_id1, segments_id2, ...)
+  ```
+
+  Example:
+
+  ```
+  DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0)
+  DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)
+  ```
+
+### DELETE SEGMENT BY DATE
+
+  This command will allow to delete the CarbonData segment(s) from the store based on the date provided by the user in the DML command. 
+  The segment created before the particular date will be removed from the specific stores.
+
+  ```
+  DELETE FROM TABLE [db_name.]table_name WHERE SEGMENT.STARTTIME BEFORE DATE_VALUE
+  ```
+
+  Example:
+  ```
+  DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' 
+  ```
+
+### QUERY DATA WITH SPECIFIED SEGMENTS
+
+  This command is used to read data from specified segments during CarbonScan.
+
+  Get the Segment ID:
+  ```
+  SHOW SEGMENTS FOR TABLE [db_name.]table_name LIMIT number_of_segments
+  ```
+
+  Set the segment IDs for table
+  ```
+  SET carbon.input.segments.<database_name>.<table_name> = <list of segment IDs>
+  ```
+
+  **NOTE:**
+  carbon.input.segments: Specifies the segment IDs to be queried. This property allows you to query specified segments of the specified table. The CarbonScan will read data from specified segments only.
+
+  If user wants to query with segments reading in multi threading mode, then CarbonSession. threadSet can be used instead of SET query.
+  ```
+  CarbonSession.threadSet ("carbon.input.segments.<database_name>.<table_name>","<list of segment IDs>");
+  ```
+
+  Reset the segment IDs
+  ```
+  SET carbon.input.segments.<database_name>.<table_name> = *;
+  ```
+
+  If user wants to query with segments reading in multi threading mode, then CarbonSession. threadSet can be used instead of SET query. 
+  ```
+  CarbonSession.threadSet ("carbon.input.segments.<database_name>.<table_name>","*");
+  ```
+
+  **Examples:**
+
+  * Example to show the list of segment IDs,segment status, and other required details and then specify the list of segments to be read.
+
+  ```
+  SHOW SEGMENTS FOR carbontable1;
+  
+  SET carbon.input.segments.db.carbontable1 = 1,3,9;
+  ```
+
+  * Example to query with segments reading in multi threading mode:
+
+  ```
+  CarbonSession.threadSet ("carbon.input.segments.db.carbontable_Multi_Thread","1,3");
+  ```
+
+  * Example for threadset in multithread environment (following shows how it is used in Scala code):
+
+  ```
+  def main(args: Array[String]) {
+  Future {          
+    CarbonSession.threadSet ("carbon.input.segments.db.carbontable_Multi_Thread","1")
+    spark.sql("select count(empno) from carbon.input.segments.db.carbontable_Multi_Thread").show();
+     }
+   }
+  ```