You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2016/03/12 14:18:11 UTC

svn commit: r1734700 [14/14] - in /kylin/site: ./ blog/2016/02/03/streaming-cubing/ blog/2016/02/18/new-aggregation-group/ development/ docs/ docs/gettingstarted/ docs/howto/ docs15/ docs15/gettingstarted/ docs15/howto/ docs15/install/ docs15/tutorial/

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1734700&r1=1734699&r2=1734700&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sat Mar 12 13:18:10 2016
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Thu, 10 Mar 2016 12:01:13 -0800</pubDate>
-    <lastBuildDate>Thu, 10 Mar 2016 12:01:13 -0800</lastBuildDate>
+    <pubDate>Sat, 12 Mar 2016 13:16:43 -0800</pubDate>
+    <lastBuildDate>Sat, 12 Mar 2016 13:16:43 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
@@ -29,7 +29,7 @@
 
 &lt;h2 id=&quot;abstract&quot;&gt;Abstract&lt;/h2&gt;
 
-&lt;p&gt;Curse of dimension is an infamous problem for all of the OLAP engines based on pre-calculation. In versions prior to 2.1, Kylin tried to address the problem by some simple techniques, which relieved the problem to some degree. During our open source practices, we found these techniques lack of systematic design thinking, and incapable of addressing lots of common issues. In Kylin 2.1 we redesigned the aggregation group mechanism to make it better server all kinds of cube design scenarios.&lt;/p&gt;
+&lt;p&gt;Curse of dimension is an infamous problem for all of the OLAP engines based on pre-calculation. In versions prior to v1.5, Kylin tried to address the problem by some simple techniques, which relieved the problem to some degree. During our open source practices, we found these techniques lack of systematic design thinking, and incapable of addressing lots of common issues. In Kylin v1.5 we redesigned the aggregation group mechanism to make it better server all kinds of cube design scenarios.&lt;/p&gt;
 
 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
@@ -39,7 +39,7 @@
 
 &lt;p&gt;First, we can remove dimensions those do NOT necessarily have to be dimensions. For example, imagine a date lookup table where keeps cal_dt is the PK column as well as lots of deriving columns like week_begin_dt, month_begin_dt. Even though analysts need week_begin_dt as a dimension, we can prune it as it can always be calculated from dimension cal_dt, this is the “derived” optimization.&lt;/p&gt;
 
-&lt;p&gt;Second, some of combinations between dimensions can be pruned. This is the main discuss for this article, and let’s call it “combination pruning”. For example, if a dimension is specified as “mandatory”, then all of the combinations without such dimension can be pruned. If dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained. Prior to 2.1, Kylin also had an “aggregation group” concept, which also serves for combination pruning. However it is poorly documented and hard to understand (I also found it is difficult to explain). Anyway we’ll skip it as we will re-define what “aggregation group” really is.&lt;/p&gt;
+&lt;p&gt;Second, some of combinations between dimensions can be pruned. This is the main discuss for this article, and let’s call it “combination pruning”. For example, if a dimension is specified as “mandatory”, then all of the combinations without such dimension can be pruned. If dimension A,B,C forms a “hierarchy” relation, then only combinations with A, AB or ABC shall be remained. Prior to v1.5, Kylin also had an “aggregation group” concept, which also serves for combination pruning. However it is poorly documented and hard to understand (I also found it is difficult to explain). Anyway we’ll skip it as we will re-define what “aggregation group” really is.&lt;/p&gt;
 
 &lt;p&gt;During our open source practice we found some significant drawbacks for the original combination pruning techniques. Firstly, these techniques are isolated rather than systematically well designed. Secondly, the original aggregation group is poorly designed and documented that it is hardly used outside eBay. Thirdly, which is the most important one, it’s not expressive enough in terms of describing semantics.&lt;/p&gt;
 
@@ -95,15 +95,15 @@
   &lt;/tbody&gt;
 &lt;/table&gt;
 
-&lt;p&gt;Unfortunately there is no way to express such pruning settings with the existing semantic tools prior to Kylin 2.1&lt;/p&gt;
+&lt;p&gt;Unfortunately there is no way to express such pruning settings with the existing semantic tools prior to Kylin v1.5&lt;/p&gt;
 
 &lt;h2 id=&quot;new-aggregation-group-design&quot;&gt;New Aggregation Group Design&lt;/h2&gt;
 
-&lt;p&gt;In Kylin 2.1 we redesigned the aggregation group mechanism in the jira issue https://issues.apache.org/jira/browse/KYLIN-242. The issue was named “Kylin Cuboid Whitelist” because the new design even enables cube designer to specify expected cuboids by keeping a whitelist, imagine how expressive it can be!&lt;/p&gt;
+&lt;p&gt;In Kylin v1.5 we redesigned the aggregation group mechanism in the jira issue https://issues.apache.org/jira/browse/KYLIN-242. The issue was named “Kylin Cuboid Whitelist” because the new design even enables cube designer to specify expected cuboids by keeping a whitelist, imagine how expressive it can be!&lt;/p&gt;
 
 &lt;p&gt;In the new design, aggregation group (abbr. AGG) is defined as a cluster of cuboids that subject to shared rules. Cube designer can define one or more AGG for a cube, and the union of all AGGs’ contributed cuboids consists of the valid combination for a cube. Notice a cuboid is allowed to appear in multiple AGGs, and it will only be computed once during cube building.&lt;/p&gt;
 
-&lt;p&gt;If you look into the internal of AGG ( https://github.com/apache/kylin/blob/2.x-staging/core-cube/src/main/java/org/apache/kylin/cube/model/AggregationGroup.java) there’re two important properties defined: &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;includes&quot;)&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;select_rule&quot;)&lt;/code&gt;.&lt;/p&gt;
+&lt;p&gt;If you look into the internal of AGG (https://github.com/apache/kylin/blob/kylin-1.5.0/core-cube/src/main/java/org/apache/kylin/cube/model/AggregationGroup.java) there’re two important properties defined: &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;includes&quot;)&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;select_rule&quot;)&lt;/code&gt;.&lt;/p&gt;
 
 &lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;@JsonProperty(&quot;includes&quot;)&lt;/code&gt;&lt;br /&gt;
 This property is for specifying which dimensions are included in the AGG. The value of the property must be a subset of the complete dimensions. Keep the proper minimal by including only necessary dimensions.&lt;/p&gt;
@@ -114,7 +114,7 @@ Select rules are the rules that all vali
 &lt;ul&gt;
   &lt;li&gt;Hierarchy rules, described above&lt;/li&gt;
   &lt;li&gt;Mandatory rule, described above&lt;/li&gt;
-  &lt;li&gt;Joint rules. This is a newly introduced rule. If two or more dimensions are “joint”, then any valid cuboid will either contain none of these dimensions, or contain them all. In other words, these dimensions will always be “together”. This is useful when the cube designer is sure some of the dimensions will always be queried together. It is also a nuclear weapon for combination pruning on less-likely-to-use dimensions. Suppose you have 20 dimensions, the first 10 dimensions are frequently used and the latter 10 are less likely to be used. By joining the latter 10 dimensions as “joint”, you’re effectively reducing cuboid numbers from 220 to 211. Actually this is pretty much what the old “aggregation group” mechanism was for. If you’re using it prior Kylin 2.1, our metadata upgrade tool will automatically translate it to joint semantics.&lt;br /&gt;
+  &lt;li&gt;Joint rules. This is a newly introduced rule. If two or more dimensions are “joint”, then any valid cuboid will either contain none of these dimensions, or contain them all. In other words, these dimensions will always be “together”. This is useful when the cube designer is sure some of the dimensions will always be queried together. It is also a nuclear weapon for combination pruning on less-likely-to-use dimensions. Suppose you have 20 dimensions, the first 10 dimensions are frequently used and the latter 10 are less likely to be used. By joining the latter 10 dimensions as “joint”, you’re effectively reducing cuboid numbers from 220 to 211. Actually this is pretty much what the old “aggregation group” mechanism was for. If you’re using it prior Kylin v1.5, our metadata upgrade tool will automatically translate it to joint semantics.&lt;br /&gt;
 By flexibly using the new aggregation group you can in theory control whatever cuboid to compute/skip. This could significant reduce the computation and storage overhead, especially when the cube is serving for a fixed dashboard, which will reproduce SQL queries that only require some specific cuboids. In extreme cases you can configure each AGG contain only one cuboid, and a handful of AGGs will consists of the cuboid whitelist that you’ll need.&lt;/li&gt;
 &lt;/ul&gt;
 
@@ -135,9 +135,9 @@ By flexibly using the new aggregation gr
 
 &lt;h2 id=&quot;start-using-it&quot;&gt;Start using it&lt;/h2&gt;
 
-&lt;p&gt;The new aggregation group mechanism should be available in Kylin 2.1. Up to today (2016.2.18) Kylin has not released 2.1 version yet. Use it at your own risk by compiling the latest 2.x-staging code branch.&lt;/p&gt;
+&lt;p&gt;The new aggregation group mechanism should be available in Kylin v1.5. Up to today (2016.2.18) Kylin has not released v1.5 version yet. Use it at your own risk by compiling the latest master code branch.&lt;/p&gt;
 
-&lt;p&gt;For legacy users you will need to upgrade your metadata store from Kylin 2.0 to Kylin 2.1. Cube rebuild is not required if you’re upgrading from Kylin 2.0.&lt;/p&gt;
+&lt;p&gt;For legacy users you will need to upgrade your metadata store from Kylin v1.4 to Kylin v1.5. Cube rebuild is not required if you’re upgrading from Kylin v1.4.&lt;/p&gt;
 </description>
         <pubDate>Thu, 18 Feb 2016 08:30:00 -0800</pubDate>
         <link>http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/</link>
@@ -150,7 +150,7 @@ By flexibly using the new aggregation gr
     
       <item>
         <title>Streaming cubing (Prototype)</title>
-        <description>&lt;p&gt;One of the most important features in 2.x branches is streaming cubing which enables OLAP analysis on streaming data. Streaming cubing delivers faster insights on the data to help more promptly business decisions. Even though there are already many real time analysis tools in open source community, Kylin Streaming cubing still differs from them in multiple angles:&lt;/p&gt;
+        <description>&lt;p&gt;One of the most important features in v1.5 is streaming cubing which enables OLAP analysis on streaming data. Streaming cubing delivers faster insights on the data to help more promptly business decisions. Even though there are already many real time analysis tools in open source community, Kylin Streaming cubing still differs from them in multiple angles:&lt;/p&gt;
 
 &lt;p&gt;Firstly, Kylin Streaming Cubing aligns with Kylin traditional cubing to provided unified, ANSI SQL interface. Actually Kylin Streaming shares the storage engine and query engine with traditional Kylin cubes, so in theory all of the optimization techniques to save storage and speed up query performance can also be applied on streaming cubes. Besides, all the supported aggregations/filters/UDFs still work for streaming cubes. By unifying the storage engine and query engine we also get freed from double amount of maintaince work.&lt;/p&gt;
 
@@ -168,7 +168,7 @@ By flexibly using the new aggregation gr
   &lt;li&gt;Job Scheduling Module to trigger Streaming Batch Ingestion. Kylin does not put too much efforts in job scheduling, streaming cubing is not a exception. Currently we provided a simple implementation based on Linux Crontab.&lt;/li&gt;
 &lt;/ul&gt;
 
-&lt;p&gt;We’ll publish more detailed documents on how to use Kylin Streaming soon. In latest 2.x branch we are also working on more complicated load balancing schemes for streaming cubing. Please stay tuned.&lt;/p&gt;
+&lt;p&gt;We’ll publish more detailed documents on how to use Kylin Streaming soon. In latest v1.5 we are also working on more complicated load balancing schemes for streaming cubing. Please stay tuned.&lt;/p&gt;
 
 </description>
         <pubDate>Wed, 03 Feb 2016 08:30:00 -0800</pubDate>