You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ra...@apache.org on 2017/03/23 04:33:24 UTC

[1/2] incubator-carbondata git commit: Update file structure info as per V3 format definition

Repository: incubator-carbondata
Updated Branches:
  refs/heads/master 9f38a3dde -> e441ab0d4


Update file structure info as per V3 format definition


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/b128868c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/b128868c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/b128868c

Branch: refs/heads/master
Commit: b128868cd80aab2704b7e9958af6bce67739c92a
Parents: 9f38a3d
Author: chenliang613 <ch...@huawei.com>
Authored: Wed Mar 22 16:32:06 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Thu Mar 23 10:02:32 2017 +0530

----------------------------------------------------------------------
 docs/file-structure-of-carbondata.md           |  21 ++++++++++++++------
 docs/images/carbon_data_file_structure_new.png | Bin 78374 -> 9477 bytes
 docs/images/carbon_data_format_new.png         | Bin 73708 -> 35510 bytes
 3 files changed, 15 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b128868c/docs/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/file-structure-of-carbondata.md b/docs/file-structure-of-carbondata.md
index 63e34ec..bfbcee4 100644
--- a/docs/file-structure-of-carbondata.md
+++ b/docs/file-structure-of-carbondata.md
@@ -17,20 +17,29 @@
     under the License.
 -->
 
-#  CarbonData File Structure
+# CarbonData File Structure
 
-CarbonData files contain groups of data called blocklets, along with all required information like schema, offsets and indices etc, in a file footer, co-located in HDFS.
+CarbonData files contain groups of data called blocklets, along with all required information like schema, offsets and indices etc, in a file header and footer, co-located in HDFS.
 
 The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.
 
-Each blocklet in the file is further divided into chunks of data called data chunks. Each data chunk is organized either in columnar format or row format, and stores the data of either a single column or a set of columns. All blocklets in a file contain the same number and type of data chunks.
+### Understanding CarbonData File Structure
+* Block : It would be as same as HDFS block, CarbonData creates one file for each data block, user can specify TABLE_BLOCKSIZE during creation table. Each file contains File Header, Blocklets and File Footer. 
 
 ![CarbonData File Structure](../docs/images/carbon_data_file_structure_new.png?raw=true)
 
-Each data chunk contains multiple groups of data called as pages. There are three types of pages.
+* File Header : It contains CarbonData file version number, list of column schema and schema updation timestamp.
+* File Footer : it contains Number of rows, segmentinfo ,all blocklets\u2019 info and index, you can find the detail from the below diagram.
+* Blocklet : Rows are grouped to form a blocklet, the size of the blocklet is configurable and default size is 64MB, Blocklet contains Column Page groups for each column.
+* Column Page Group : Data of one column and it is further divided to pages, it is guaranteed to be contiguous in file.
+* Page : It has the data of one column and the number of row is fixed to 32000 size. 
 
-* Data Page: Contains the encoded data of a column/group of columns.
+![CarbonData File Format](../docs/images/carbon_data_format_new.png?raw=true)
+
+### Each page contains three types of data
+* Data Page: Contains the encoded data of a column of columns.
 * Row ID Page (optional): Contains the row ID mappings used when the data page is stored as an inverted index.
 * RLE Page (optional): Contains additional metadata used when the data page is RLE coded.
 
-![CarbonData File Format](../docs/images/carbon_data_format_new.png?raw=true)
+
+

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b128868c/docs/images/carbon_data_file_structure_new.png
----------------------------------------------------------------------
diff --git a/docs/images/carbon_data_file_structure_new.png b/docs/images/carbon_data_file_structure_new.png
index 3f9241b..1c6f22b 100644
Binary files a/docs/images/carbon_data_file_structure_new.png and b/docs/images/carbon_data_file_structure_new.png differ

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b128868c/docs/images/carbon_data_format_new.png
----------------------------------------------------------------------
diff --git a/docs/images/carbon_data_format_new.png b/docs/images/carbon_data_format_new.png
index 9d0b194..f0fc553 100644
Binary files a/docs/images/carbon_data_format_new.png and b/docs/images/carbon_data_format_new.png differ


[2/2] incubator-carbondata git commit: [CARBONDATA-804] Update file structure info as per V3 format definition This closes #684

Posted by ra...@apache.org.
[CARBONDATA-804] Update file structure info as per V3 format definition This closes #684


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/e441ab0d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/e441ab0d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/e441ab0d

Branch: refs/heads/master
Commit: e441ab0d45e54069aa08f555d0aed8253b72290e
Parents: 9f38a3d b128868
Author: ravipesala <ra...@gmail.com>
Authored: Thu Mar 23 10:03:06 2017 +0530
Committer: ravipesala <ra...@gmail.com>
Committed: Thu Mar 23 10:03:06 2017 +0530

----------------------------------------------------------------------
 docs/file-structure-of-carbondata.md           |  21 ++++++++++++++------
 docs/images/carbon_data_file_structure_new.png | Bin 78374 -> 9477 bytes
 docs/images/carbon_data_format_new.png         | Bin 73708 -> 35510 bytes
 3 files changed, 15 insertions(+), 6 deletions(-)
----------------------------------------------------------------------