You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by sh...@apache.org on 2015/10/26 10:08:00 UTC

[14/45] incubator-kylin git commit: Small update on blog for hybrid model

Small update on blog for hybrid model


Project: http://git-wip-us.apache.org/repos/asf/incubator-kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kylin/commit/e924a2db
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kylin/tree/e924a2db
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kylin/diff/e924a2db

Branch: refs/heads/master
Commit: e924a2db0e9025d9e370403579e4f789ec4412f4
Parents: 38f4fd0
Author: shaofengshi <sh...@apache.org>
Authored: Sat Sep 26 22:22:08 2015 +0800
Committer: shaofengshi <sh...@apache.org>
Committed: Mon Sep 28 09:09:03 2015 +0800

----------------------------------------------------------------------
 website/_posts/blog/2015-09-22-hybrid-model.md | 56 ++++++++++++---------
 1 file changed, 32 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/e924a2db/website/_posts/blog/2015-09-22-hybrid-model.md
----------------------------------------------------------------------
diff --git a/website/_posts/blog/2015-09-22-hybrid-model.md b/website/_posts/blog/2015-09-22-hybrid-model.md
index 1e0f4c7..d966fa5 100644
--- a/website/_posts/blog/2015-09-22-hybrid-model.md
+++ b/website/_posts/blog/2015-09-22-hybrid-model.md
@@ -6,21 +6,22 @@ author: Shaofeng Shi
 categories: blog
 ---
 
-**Apache Kylin v1.0 introduces a new realization "hybrid model" (also called "dynamic model"); This post introduces the concept and how to define it.**
+**Apache Kylin v1.0 introduces a new realization "hybrid model" (also called "dynamic model"); This post introduces the concept and how to create a hybrid instance.**
 
 # Problem
 
-For incoming SQL queries, Kylin picks ONE (and only ONE) realization to serve the query; Before the "hybrid", there is only one type of realization open for user: Cube. That to say, only 1 Cube would be selected to answer a query;
+For incoming SQL queries, Kylin picks one (and only one) realization to serve the query; Before the "hybrid", there is only one type of realization open for user: Cube. That to say, only 1 Cube would be selected to answer a query;
 
 Now let's start with a sample case; Assume user has a Cube called "Cube_V1", it has been built for a couple of months; Now the user wants to add new dimension or metrics to fulfill their business need; So he created a new Cube named "Cube_V2"; 
 
-Due to some reason user wants to keep the data of "Cube_V1", and expects to build "Cube_V2" from the end date of "Cube_V1"; The possible reasons include:
+Due to some reason user wants to keep "Cube_V1", and expects to build "Cube_V2" from the end date of "Cube_V1"; Possible reasons include:
 
 * History source data has been dropped from Hadoop, not possible to build "Cube_V2" from the very beginning;
 * The cube is large, rebuilding takes very long time;
-* New dimension/metrics is only feasible for the new date, or user feels fine if they were absent for old cube; etc.
+* New dimension/metrics is only available or applied since some day;
+* User feels okay that the result is empty for old days when the query uses new dimensions/metrics.
 
-For some queries that don't use the new measure and metrics, user hopes both "Cube_V1" and "Cube_V2" can be scanned to get a full result, such as "select count(*)...", "select sum(price)..."; With such a background, the "hybrid model" is introduced in Kylin;
+For the queries against the common dimensions/metrics, user expects both "Cube_V1" and "Cube_V2" be scanned to get a full result set; Under such a background, the "hybrid model" is introduced to solve this problem.
 
 ## Hybrid Model
 
@@ -28,11 +29,11 @@ Hybrid model is a new realization which is a composite of one or multiple other
 
 ![]( /images/blog/hybrid-model.png)
 
-Hybrid doesn't have its real storage; It is just like a virtual database view over tables; It acts as a delegator who delegates the requests to its children realizations.
+Hybrid doesn't have its real storage; It is like a virtual database view over the tables; A hybrid instance acts as a delegator who forward the requests to its children realizations and then union the results when gets back from them.
 
-## How to add a Hybrid model
+## How to add a hybrid instance
 
-As there is no UI for creating/editing hybrid model, if have the need, you need manually edit Kylin metadata;
+So far there is no UI for creating/editing hybrid; if have the need, you need manually edit Kylin metadata;
 
 ### Step 1: Take a backup of kylin metadata store 
 
@@ -45,20 +46,20 @@ $KYLIN_HOME/bin/metastore.sh backup
 
 A backup folder will be created, assume it is $KYLIN_HOME/metadata_backup/2015-09-25/
  
-### Step 2: Create sub-folder "hybrid" in the metadata folder,
+### Step 2: Create sub-folder "hybrid"
 
 ```
 mkdir -p $KYLIN_HOME/metadata_backup/2015-09-25/hybrid
 ```
 
-### Step 3: Create a hybrid json file: 
+### Step 3: Create a hybrid instance json file: 
 
 ```
 vi $KYLIN_HOME/metadata_backup/2015-09-25/hybrid/my_hybrid.json
 
 ```
 
-Input content like this:
+Input content like below, the "name" and "uuid" need be unique:
 
 ```
 {
@@ -80,7 +81,7 @@ Input content like this:
 Here "Cube_V1" and "Cube_V2" are the cubes that you want to combine.
 
 
-### Step 4: Add hybrid model to project
+### Step 4: Add hybrid instance to project
 
 Open project json file (for example project "default") with text editor:
 
@@ -89,7 +90,7 @@ vi $KYLIN_HOME/metadata_backup/2015-09-25/project/default.json
 
 ```
 
-In the "realizations" array, add one entry like:
+In the "realizations" array, add one entry like below, the type need be "HYBRID", "realization" is the name of the hybrid instance:
 
 ```
     {
@@ -105,24 +106,31 @@ In the "realizations" array, add one entry like:
   $KYLIN_HOME/bin/metastore.sh restore $KYLIN_HOME/metadata_backup/2015-09-25/
 
 ```
+Please note, the "restore" action will upload the metadata from local to remote hbase store, which may overwrite the changes in remote; So please do this when there is no metadata change from Kylin server during this period (no building job, no cube creation/update, etc), or only pickup the changed files to an empty local folder before run "restore";
 
 ### Step 6: Reload metadata
 
-Restart Kylin server, or click "Reload metadata" in the "Admin" tab on Kylin web UI to load the changes; Ideally the hybrid will start to work; You can do some verifications.
+Restart Kylin server, or click "Reload metadata" in the "Admin" tab on Kylin web UI to load the changes; Ideally the hybrid will start to work; You can do some verifications by writing some SQLs.
 
 ## FAQ:
 
-**Question 1**: when will hybrid be selected to serve query?
-If one of the cube can answer the query, the hybrid which has it as a child will be selected;
+**Question 1**: When will hybrid be selected to answer a SQL query?
+If one of its underlying cube can answer the query, the hybrid will be selected; 
 
-**Question 2**: how hybrid to answer the query?
-Hybrid will delegate the query to each of its child realization (if it is capable); And then return all the results to query engine; Query engine will aggregate before return to user;
+**Question 2**: How hybrid to answer the query?
+Hybrid will delegate the query to each of its children realizations; If a child cube is capable for this query (match all dimensions/metrics), it will return the results to the hybrid, otherwise it will be skipped; Finally query engine will aggregate the data from hybrid before return to user;
 
-**Question 3**: will hybrid check the data duplication?
-No; it depends on you to ensure the cubes in a hybrid don't have date/time range duplication; For example, the "Cube_V1" is ended at 2015-9-20 (including), the "Cube_V2" should start from 2015-9-21 or later;
+**Question 3**: Will hybrid check the date/time duplication?
+No; it depends on user to ensure the cubes in a hybrid don't have date/time range duplication; For example, the "Cube_V1" is ended at 2015-9-20 (excluding), the "Cube_V2" should start from 2015-9-20 (including); 
 
-**Question 4**: will hybrid restrict the children cubes having the same data model?
-No; hybrid doesn't check the cube's fact/lookup tables and join conditions at all;
+**Question 4**: Will hybrid restrict the children cubes having the same data model?
+No; To provide as much as flexibility, hybrid doesn't check whether the children cubes' fact/lookup tables and join conditions are the same; But user should understand what they're doing to avoid unexpected behavior.
 
-**Question 5**: can hybrid have another hybrid as child?
-No; didn't see the need; so far it assumes all children are Cubes;
+**Question 5**: Can hybrid have another hybrid as child?
+No; we don't see the need; so far it assumes all children are Cubes;
+
+**Question 6**: Can I use hybrid to join multiple cubes?
+No; the purpose of hybrid is to consolidate history cube and new cube, something like a "union", not "join";
+
+**Question 7**: If a child cube is disabled, will it be scanned via the hybrid?
+No; hybrid instance will check the child realization's status before sending query to it; so if the cube is disabled, it will not be scanned. 
\ No newline at end of file