You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2016/08/04 06:02:20 UTC

svn commit: r1755142 [2/2] - in /kylin/site: blog/2016/08/ blog/2016/08/01/ blog/2016/08/01/count-distinct-in-kylin/ blog/2016/08/01/count-distinct-in-kylin/index.html blog/index.html feed.xml

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1755142&r1=1755141&r2=1755142&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Thu Aug  4 06:02:20 2016
@@ -19,11 +19,118 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Fri, 29 Jul 2016 06:59:27 -0700</pubDate>
-    <lastBuildDate>Fri, 29 Jul 2016 06:59:27 -0700</lastBuildDate>
+    <pubDate>Thu, 04 Aug 2016 06:59:16 -0700</pubDate>
+    <lastBuildDate>Thu, 04 Aug 2016 06:59:16 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>Use Count Distinct in Apache Kylin</title>
+        <description>&lt;p&gt;Since v.1.5.3&lt;/p&gt;
+
+&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
+&lt;p&gt;Count Distinct is a commonly measure in OLAP analyze, usually used for uv, etc. Apache Kylin offers two kinds of count distinct, approximately and precisely, differs on resource and performance.&lt;/p&gt;
+
+&lt;h2 id=&quot;approximately-count-distinct&quot;&gt;Approximately Count Distinct&lt;/h2&gt;
+&lt;p&gt;Apache Kylin implements approximately count distinct using HyperLogLog algorithm, offered serveral precision, with the error rates from 9.75% to 1.22%. &lt;br /&gt;
+The result of measure has theorically upper limit in size, as 2^N bytes. For the max precision N=16, the upper limit is 64KB, and the max error rate is 1.22%. &lt;br /&gt;
+This implementation’s pros is fast caculating and storage resource saving, but can’t be used for precisely requirements.&lt;/p&gt;
+
+&lt;h2 id=&quot;precisely-count-distinct&quot;&gt;Precisely Count Distinct&lt;/h2&gt;
+&lt;p&gt;Apache Kylin also implements precisely count distinct based on bitmap. For the data with type tiny int(byte), small int(short) and int, project the value into the bitmap directly. For the data with type long, string and others, encode the value as String into a dict, and project the dict id into the bitmap.&lt;br /&gt;
+The result of measure is the serialized data of bitmap, not just the count value. This makes sure that the rusult is always right with any roll-up, even across segments.&lt;br /&gt;
+This implementation’s pros is precesily result, without error, but needs more storage resources. One result size maybe hundreds of MB, when the count distinct value over millions.&lt;/p&gt;
+
+&lt;h2 id=&quot;global-dictionary&quot;&gt;Global Dictionary&lt;/h2&gt;
+&lt;p&gt;Apache Kylin encode value into dictionay at the segment level by default. That means one same value in different segments maybe encoded into different id, which means the result of precisely count distinct maybe not correct.&lt;br /&gt;
+We introduced Global Dictionary with ensurance that one same value always encode into same id in different segments, to resolve this problem. Meanwhile, the capacity of dict has expanded dramatically, upper to support 2G values in one dict. It can also be used to replace default dictionary which has 5M values limitation.&lt;br /&gt;
+Current version has no UI for global dictionary yet, and the cube desc json shoule be modified to enable it:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&quot;dictionaries&quot;: [
+    {
+          &quot;column&quot;: &quot;SUCPAY_USERID&quot;,
+	  &quot;reuse&quot;: &quot;USER_ID&quot;,
+          &quot;builder&quot;: &quot;org.apache.kylin.dict.GlobalDictionaryBuilder&quot;
+    }
+]
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;column&lt;/code&gt; means the column which to be encoded, the &lt;code class=&quot;highlighter-rouge&quot;&gt;builder&lt;/code&gt; specifies the dictionary builder, only &lt;code class=&quot;highlighter-rouge&quot;&gt;org.apache.kylin.dict.GlobalDictionaryBuilder&lt;/code&gt; is available for now.&lt;br /&gt;
+The ‘reuse` is used to optimize the dict of more than one columns based on one dataset, please refer the next section ‘Example’ for more details.&lt;br /&gt;
+The global dictionay can’t be used for dimensiion encoding for now, that means if one column is used for dimension and count distinct measure in one cube, the dimension encoding should be others but not dict.&lt;/p&gt;
+
+&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
+&lt;p&gt;Here’s some example data:&lt;br /&gt;
+| DT           | USER_ID | FLAG1 | FLAG2 | USER_ID_FLAG1 | USER_ID_FLAG2 |&lt;br /&gt;
+| :———-: | :——: | :—: | :—: | :————-: | :————-: |&lt;br /&gt;
+| 2016-06-08   | AAA      | 1     | 1     | AAA             | AAA             |&lt;br /&gt;
+| 2016-06-08   | BBB      | 1     | 1     | BBB             | BBB             |&lt;br /&gt;
+| 2016-06-08   | CCC      | 0     | 1     | NULL            | CCC             |&lt;br /&gt;
+| 2016-06-09   | AAA      | 0     | 1     | NULL            | AAA             |&lt;br /&gt;
+| 2016-06-09   | CCC      | 1     | 0     | CCC             | NULL            |&lt;br /&gt;
+| 2016-06-10   | BBB      | 0     | 1     | NULL            | BBB             |&lt;/p&gt;
+
+&lt;p&gt;There’s basic columns &lt;code class=&quot;highlighter-rouge&quot;&gt;DT&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;USER_ID&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;FLAG1&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;FLAG2&lt;/code&gt;, and condition columns &lt;code class=&quot;highlighter-rouge&quot;&gt;USER_ID_FLAG1=if(FLAG1=1,USER_ID,null)&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;USER_ID_FLAG2=if(FLAG2=1,USER_ID,null)&lt;/code&gt;. Supposed the cube is builded by day, has 3 segments.&lt;/p&gt;
+
+&lt;p&gt;Without the global dictionay, the precisely count distinct in semgent is correct, but the roll-up acrros segments result is wrong. Here’s an example:&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;select count(distinct user_id_flag1) from table where dt in (&#39;2016-06-08&#39;, &#39;2016-06-09&#39;)
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+&lt;p&gt;The result is 2 but not 3. The reason is that the dict in 2016-06-08 segment is AAA=&amp;gt;1, BBB=&amp;gt;1, and the dict in 2016-06-09 segment is CCC=&amp;gt; 1.&lt;br /&gt;
+With global dictionary config as below, the dict became as AAA=&amp;gt;1, BBB=&amp;gt;2, CCC=&amp;gt;3, that will procude correct result.&lt;br /&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;
+&quot;dictionaries&quot;: [
+    {
+      &quot;column&quot;: &quot;USER_ID_FLAG1&quot;,
+      &quot;builder&quot;: &quot;org.apache.kylin.dict.GlobalDictionaryBuilder&quot;
+    }
+]
+&lt;/code&gt;&lt;/p&gt;
+
+&lt;p&gt;Actually, the data of USER_ID_FLAG1 and USER_ID_FLAG2 both are a subset of USER_ID dataset, that made the dictionary re-using possible. Just encode the USER_ID dataset, and config USER_ID_FLAG1 and USER_ID_FLAG2 resue USER_ID dict:&lt;br /&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;
+&quot;dictionaries&quot;: [
+    {
+      &quot;column&quot;: &quot;USER_ID&quot;,
+      &quot;builder&quot;: &quot;org.apache.kylin.dict.GlobalDictionaryBuilder&quot;
+    },
+    {
+      &quot;column&quot;: &quot;USER_ID_FLAG1&quot;,
+      &quot;reuse&quot;: &quot;USER_ID&quot;,
+      &quot;builder&quot;: &quot;org.apache.kylin.dict.GlobalDictionaryBuilder&quot;
+    },
+    {
+      &quot;column&quot;: &quot;USER_ID_FLAG2&quot;,
+      &quot;reuse&quot;: &quot;USER_ID&quot;,
+      &quot;builder&quot;: &quot;org.apache.kylin.dict.GlobalDictionaryBuilder&quot;
+    }
+]
+&lt;/code&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;performance-tunning&quot;&gt;Performance Tunning&lt;/h2&gt;
+&lt;p&gt;When using global dictionary and the dictionary is large, the step ‘Build Base Cuboid Data’ may took long time. That mainly caused by the dictionary cache loading and eviction cost, since the dictionary size is bigger than mapper memory size. To solve this problem, overwrite the cube configuration as following, adjust the mapper size to 8GB:&lt;br /&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;
+kylin.job.mr.config.override.mapred.map.child.java.opts=-Xmx8g
+kylin.job.mr.config.override.mapreduce.map.memory.mb=8500
+&lt;/code&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;
+&lt;p&gt;Here’s some basically pricipal to decide which kind of count distinct will be used:&lt;br /&gt;
+ - If the result with error rate is acceptable, approximately way is always an better way&lt;br /&gt;
+ - If you need precisely result, the only way is precisely count distinct&lt;br /&gt;
+ - If you don’t need roll-up across segments, or the column data type is tinyint/smallint/int, or the values count is less than 5M, just use default dictionary; otherwise the global dictionary should be configured, and consider the reuse column optimization&lt;/p&gt;
+</description>
+        <pubDate>Mon, 01 Aug 2016 11:30:00 -0700</pubDate>
+        <link>http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/</link>
+        <guid isPermaLink="true">http://kylin.apache.org/blog/2016/08/01/count-distinct-in-kylin/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Kylin v1.5.3 正式发布</title>
         <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin v1.5.3正式发布。&lt;/p&gt;
 
@@ -394,131 +501,6 @@ Check the regionserver log, there should
       </item>
     
       <item>
-        <title>Apache Kylin v1.5.2 Release Announcement</title>
-        <description>&lt;p&gt;The Apache Kylin community is pleased to announce the release of Apache Kylin v1.5.2.&lt;/p&gt;
-
-&lt;p&gt;Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc.&lt;/p&gt;
-
-&lt;p&gt;To download Apache Kylin v1.5.2 source code or binary package:&lt;br /&gt;
-please visit the &lt;a href=&quot;http://kylin.apache.org/download&quot;&gt;download&lt;/a&gt; page.&lt;/p&gt;
-
-&lt;p&gt;This is a major release which brings more stable, robust and well management version, Apache Kylin community resolved about 76 issues including bug fixes, improvements, and few new features.&lt;/p&gt;
-
-&lt;h2 id=&quot;change-highlights&quot;&gt;Change Highlights&lt;/h2&gt;
-
-&lt;p&gt;&lt;strong&gt;New Feature&lt;/strong&gt;&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Count distinct on any dimension should work even not a predefined measure &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1016&quot;&gt;KYLIN-1016&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Support Hive View as Lookup Table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1077&quot;&gt;KYLIN-1077&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Display time column as partition column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1441&quot;&gt;KYLIN-1441&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Make Kylin run on MapR &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1515&quot;&gt;KYLIN-1515&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Download diagnosis zip from GUI &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1600&quot;&gt;KYLIN-1600&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;support kylin on cdh 5.7 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1672&quot;&gt;KYLIN-1672&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;strong&gt;Improvement&lt;/strong&gt;&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Enhance mail notification &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-869&quot;&gt;KYLIN-869&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;HiveColumnCardinalityJob should use configurations in conf/kylin_job_conf.xml &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-955&quot;&gt;KYLIN-955&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Enable deriving dimensions on non PK/FK &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1313&quot;&gt;KYLIN-1313&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Improve performance of converting data to hfile &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1323&quot;&gt;KYLIN-1323&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Tools to extract all cube/hybrid/project related metadata to facilitate diagnosing/debugging/* sharing &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1340&quot;&gt;KYLIN-1340&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;change RealizationCapacity from three profiles to specific numbers &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1381&quot;&gt;KYLIN-1381&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;quicker and better response to v2 storage engine’s rpc timeout exception &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1391&quot;&gt;KYLIN-1391&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Memory hungry cube should select LAYER and INMEM cubing smartly &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1418&quot;&gt;KYLIN-1418&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;For GUI, to add one option “yyyy-MM-dd HH:MM:ss” for Partition Date Column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1432&quot;&gt;KYLIN-1432&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;cuboid sharding based on specific column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1453&quot;&gt;KYLIN-1453&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;attach a hyperlink to introduce new aggregation group &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1487&quot;&gt;KYLIN-1487&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Move query cache back to query controller level &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1526&quot;&gt;KYLIN-1526&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Hfile owner is not hbase &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1542&quot;&gt;KYLIN-1542&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Make hbase encoding and block size configurable just like hbase compression &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1544&quot;&gt;KYLIN-1544&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Refactor storage engine(v2) to be extension friendly &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1561&quot;&gt;KYLIN-1561&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Add and use a separate kylin_job_conf.xml for in-mem cubing &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1566&quot;&gt;KYLIN-1566&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Front-end work for KYLIN-1557 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1567&quot;&gt;KYLIN-1567&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Coprocessor thread voluntarily stop itself when it reaches timeout &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1578&quot;&gt;KYLIN-1578&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;IT preparation classes like BuildCubeWithEngine should exit with status code upon build * exception &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1579&quot;&gt;KYLIN-1579&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Use 1 byte instead of 8 bytes as column indicator in fact distinct MR job &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1580&quot;&gt;KYLIN-1580&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Specify region cut size in cubedesc and leave the RealizationCapacity in model as a hint &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1584&quot;&gt;KYLIN-1584&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;make MAX_HBASE_FUZZY_KEYS in GTScanRangePlanner configurable &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1585&quot;&gt;KYLIN-1585&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;show cube level configuration overwrites properties in CubeDesigner &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1587&quot;&gt;KYLIN-1587&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;enabling different block size setting for small column families &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1591&quot;&gt;KYLIN-1591&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Add “isShardBy” flag in rowkey panel &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1599&quot;&gt;KYLIN-1599&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Need not to shrink scan cache when hbase rows can be large &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1601&quot;&gt;KYLIN-1601&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;User could dump hbase usage for diagnosis &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1602&quot;&gt;KYLIN-1602&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Bring more information in diagnosis tool &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1614&quot;&gt;KYLIN-1614&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Use deflate level 1 to enable compression “on the fly” &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1621&quot;&gt;KYLIN-1621&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Make the hll precision for data samping configurable &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1623&quot;&gt;KYLIN-1623&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;HyperLogLogPlusCounter will become inaccurate when there’re billions of entries &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1624&quot;&gt;KYLIN-1624&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;GC log overwrites old one after restart Kylin service &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1625&quot;&gt;KYLIN-1625&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;add backdoor toggle to dump binary cube storage response for further analysis &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1627&quot;&gt;KYLIN-1627&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;strong&gt;Bug&lt;/strong&gt;&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;column width is too narrow for timestamp field &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-989&quot;&gt;KYLIN-989&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;cube data not updated after purge &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1197&quot;&gt;KYLIN-1197&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Can not get more than one system admin email in config &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1305&quot;&gt;KYLIN-1305&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Should check and ensure TopN measure has two parameters specified &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1551&quot;&gt;KYLIN-1551&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Unsafe check of initiated in HybridInstance#init() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1563&quot;&gt;KYLIN-1563&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Select any column when adding a custom aggregation in GUI &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1569&quot;&gt;KYLIN-1569&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Unclosed ResultSet in QueryService#getMetadata() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1574&quot;&gt;KYLIN-1574&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;NPE in Job engine when execute MR job &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1581&quot;&gt;KYLIN-1581&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Agg group info will be blank when trying to edit cube &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1593&quot;&gt;KYLIN-1593&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;columns in metric could also be in filter/groupby &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1595&quot;&gt;KYLIN-1595&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;UT fail, due to String encoding CharsetEncoder mismatch &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1596&quot;&gt;KYLIN-1596&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;cannot run complete UT at windows dev machine &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1598&quot;&gt;KYLIN-1598&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Concurrent write issue on hdfs when deploy coprocessor &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1604&quot;&gt;KYLIN-1604&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Cube is ready but insight tables not result &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1612&quot;&gt;KYLIN-1612&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;UT ‘HiveCmdBuilderTest’ fail on ‘testBeeline’ &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1615&quot;&gt;KYLIN-1615&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Can’t find any realization coursed by Top-N measure &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1619&quot;&gt;KYLIN-1619&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;sql not executed and report topN error &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1622&quot;&gt;KYLIN-1622&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Web UI of TopN, “group by” column couldn’t be a dimension column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1631&quot;&gt;KYLIN-1631&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Unclosed OutputStream in SSHClient#scpFileToLocal() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1634&quot;&gt;KYLIN-1634&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Sample cube build error &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1637&quot;&gt;KYLIN-1637&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Unclosed HBaseAdmin in ToolUtil#getHBaseMetaStoreId() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1638&quot;&gt;KYLIN-1638&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Wrong logging of JobID in MapReduceExecutable.java &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1639&quot;&gt;KYLIN-1639&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Kylin’s hll counter count “NULL” as a value &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1643&quot;&gt;KYLIN-1643&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Purge a cube, and then build again, the start date is not updated &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1647&quot;&gt;KYLIN-1647&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;java.io.IOException: Filesystem closed - in Cube Build Step 2 (MapR) &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1650&quot;&gt;KYLIN-1650&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;function name ‘getKylinPropertiesAsInputSteam’ misspelt &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1655&quot;&gt;KYLIN-1655&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Streaming/kafka config not match with table name &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1660&quot;&gt;KYLIN-1660&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;tableName got truncated during request mapping for /tables/tableName &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1662&quot;&gt;KYLIN-1662&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Should check project selection before add a stream table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1666&quot;&gt;KYLIN-1666&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Streaming table name should allow enter “DB.TABLE” format &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1667&quot;&gt;KYLIN-1667&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;make sure metadata in 1.5.2 compatible with 1.5.1 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1673&quot;&gt;KYLIN-1673&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;MetaData clean just clean FINISHED and DISCARD jobs,but job correct status is SUCCEED &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1678&quot;&gt;KYLIN-1678&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;error happens while execute a sql contains ‘?’ using Statement &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1685&quot;&gt;KYLIN-1685&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Illegal char on result dataset table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1688&quot;&gt;KYLIN-1688&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;KylinConfigExt lost base properties when store into file &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1721&quot;&gt;KYLIN-1721&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;IntegerDimEnc serialization exception inside coprocessor &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1722&quot;&gt;KYLIN-1722&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;Data and metadata of this version is back compatible with v1.5.1, but may need to &lt;a href=&quot;/docs15/howto/howto_update_coprocessor.html&quot;&gt;redeploy hbase coprocessor&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;&lt;strong&gt;Support&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;Any issue or question, please&lt;br /&gt;
-open JIRA to Kylin project: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;&lt;br /&gt;
-or&lt;br /&gt;
-send mail to Apache Kylin dev mailing list: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;&lt;em&gt;Great thanks to everyone who contributed!&lt;/em&gt;&lt;/p&gt;
-</description>
-        <pubDate>Thu, 26 May 2016 08:00:00 -0700</pubDate>
-        <link>http://kylin.apache.org/blog/2016/05/26/release-v1.5.2/</link>
-        <guid isPermaLink="true">http://kylin.apache.org/blog/2016/05/26/release-v1.5.2/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
         <title>Apache Kylin v1.5.2 正式发布</title>
         <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin v1.5.2正式发布。&lt;/p&gt;
 
@@ -644,6 +626,131 @@ send mail to Apache Kylin dev mailing li
       </item>
     
       <item>
+        <title>Apache Kylin v1.5.2 Release Announcement</title>
+        <description>&lt;p&gt;The Apache Kylin community is pleased to announce the release of Apache Kylin v1.5.2.&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc.&lt;/p&gt;
+
+&lt;p&gt;To download Apache Kylin v1.5.2 source code or binary package:&lt;br /&gt;
+please visit the &lt;a href=&quot;http://kylin.apache.org/download&quot;&gt;download&lt;/a&gt; page.&lt;/p&gt;
+
+&lt;p&gt;This is a major release which brings more stable, robust and well management version, Apache Kylin community resolved about 76 issues including bug fixes, improvements, and few new features.&lt;/p&gt;
+
+&lt;h2 id=&quot;change-highlights&quot;&gt;Change Highlights&lt;/h2&gt;
+
+&lt;p&gt;&lt;strong&gt;New Feature&lt;/strong&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Count distinct on any dimension should work even not a predefined measure &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1016&quot;&gt;KYLIN-1016&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Support Hive View as Lookup Table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1077&quot;&gt;KYLIN-1077&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Display time column as partition column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1441&quot;&gt;KYLIN-1441&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Make Kylin run on MapR &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1515&quot;&gt;KYLIN-1515&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Download diagnosis zip from GUI &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1600&quot;&gt;KYLIN-1600&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;support kylin on cdh 5.7 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1672&quot;&gt;KYLIN-1672&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;Improvement&lt;/strong&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Enhance mail notification &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-869&quot;&gt;KYLIN-869&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;HiveColumnCardinalityJob should use configurations in conf/kylin_job_conf.xml &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-955&quot;&gt;KYLIN-955&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Enable deriving dimensions on non PK/FK &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1313&quot;&gt;KYLIN-1313&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Improve performance of converting data to hfile &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1323&quot;&gt;KYLIN-1323&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Tools to extract all cube/hybrid/project related metadata to facilitate diagnosing/debugging/* sharing &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1340&quot;&gt;KYLIN-1340&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;change RealizationCapacity from three profiles to specific numbers &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1381&quot;&gt;KYLIN-1381&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;quicker and better response to v2 storage engine’s rpc timeout exception &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1391&quot;&gt;KYLIN-1391&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Memory hungry cube should select LAYER and INMEM cubing smartly &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1418&quot;&gt;KYLIN-1418&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;For GUI, to add one option “yyyy-MM-dd HH:MM:ss” for Partition Date Column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1432&quot;&gt;KYLIN-1432&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;cuboid sharding based on specific column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1453&quot;&gt;KYLIN-1453&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;attach a hyperlink to introduce new aggregation group &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1487&quot;&gt;KYLIN-1487&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Move query cache back to query controller level &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1526&quot;&gt;KYLIN-1526&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Hfile owner is not hbase &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1542&quot;&gt;KYLIN-1542&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Make hbase encoding and block size configurable just like hbase compression &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1544&quot;&gt;KYLIN-1544&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Refactor storage engine(v2) to be extension friendly &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1561&quot;&gt;KYLIN-1561&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Add and use a separate kylin_job_conf.xml for in-mem cubing &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1566&quot;&gt;KYLIN-1566&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Front-end work for KYLIN-1557 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1567&quot;&gt;KYLIN-1567&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Coprocessor thread voluntarily stop itself when it reaches timeout &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1578&quot;&gt;KYLIN-1578&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;IT preparation classes like BuildCubeWithEngine should exit with status code upon build * exception &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1579&quot;&gt;KYLIN-1579&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Use 1 byte instead of 8 bytes as column indicator in fact distinct MR job &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1580&quot;&gt;KYLIN-1580&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Specify region cut size in cubedesc and leave the RealizationCapacity in model as a hint &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1584&quot;&gt;KYLIN-1584&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;make MAX_HBASE_FUZZY_KEYS in GTScanRangePlanner configurable &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1585&quot;&gt;KYLIN-1585&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;show cube level configuration overwrites properties in CubeDesigner &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1587&quot;&gt;KYLIN-1587&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;enabling different block size setting for small column families &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1591&quot;&gt;KYLIN-1591&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Add “isShardBy” flag in rowkey panel &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1599&quot;&gt;KYLIN-1599&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Need not to shrink scan cache when hbase rows can be large &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1601&quot;&gt;KYLIN-1601&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;User could dump hbase usage for diagnosis &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1602&quot;&gt;KYLIN-1602&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Bring more information in diagnosis tool &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1614&quot;&gt;KYLIN-1614&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Use deflate level 1 to enable compression “on the fly” &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1621&quot;&gt;KYLIN-1621&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Make the hll precision for data samping configurable &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1623&quot;&gt;KYLIN-1623&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;HyperLogLogPlusCounter will become inaccurate when there’re billions of entries &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1624&quot;&gt;KYLIN-1624&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;GC log overwrites old one after restart Kylin service &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1625&quot;&gt;KYLIN-1625&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;add backdoor toggle to dump binary cube storage response for further analysis &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1627&quot;&gt;KYLIN-1627&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;Bug&lt;/strong&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;column width is too narrow for timestamp field &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-989&quot;&gt;KYLIN-989&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;cube data not updated after purge &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1197&quot;&gt;KYLIN-1197&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Can not get more than one system admin email in config &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1305&quot;&gt;KYLIN-1305&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Should check and ensure TopN measure has two parameters specified &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1551&quot;&gt;KYLIN-1551&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Unsafe check of initiated in HybridInstance#init() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1563&quot;&gt;KYLIN-1563&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Select any column when adding a custom aggregation in GUI &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1569&quot;&gt;KYLIN-1569&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Unclosed ResultSet in QueryService#getMetadata() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1574&quot;&gt;KYLIN-1574&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;NPE in Job engine when execute MR job &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1581&quot;&gt;KYLIN-1581&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Agg group info will be blank when trying to edit cube &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1593&quot;&gt;KYLIN-1593&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;columns in metric could also be in filter/groupby &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1595&quot;&gt;KYLIN-1595&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;UT fail, due to String encoding CharsetEncoder mismatch &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1596&quot;&gt;KYLIN-1596&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;cannot run complete UT at windows dev machine &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1598&quot;&gt;KYLIN-1598&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Concurrent write issue on hdfs when deploy coprocessor &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1604&quot;&gt;KYLIN-1604&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Cube is ready but insight tables not result &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1612&quot;&gt;KYLIN-1612&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;UT ‘HiveCmdBuilderTest’ fail on ‘testBeeline’ &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1615&quot;&gt;KYLIN-1615&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Can’t find any realization coursed by Top-N measure &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1619&quot;&gt;KYLIN-1619&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;sql not executed and report topN error &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1622&quot;&gt;KYLIN-1622&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Web UI of TopN, “group by” column couldn’t be a dimension column &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1631&quot;&gt;KYLIN-1631&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Unclosed OutputStream in SSHClient#scpFileToLocal() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1634&quot;&gt;KYLIN-1634&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Sample cube build error &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1637&quot;&gt;KYLIN-1637&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Unclosed HBaseAdmin in ToolUtil#getHBaseMetaStoreId() &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1638&quot;&gt;KYLIN-1638&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Wrong logging of JobID in MapReduceExecutable.java &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1639&quot;&gt;KYLIN-1639&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Kylin’s hll counter count “NULL” as a value &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1643&quot;&gt;KYLIN-1643&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Purge a cube, and then build again, the start date is not updated &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1647&quot;&gt;KYLIN-1647&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;java.io.IOException: Filesystem closed - in Cube Build Step 2 (MapR) &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1650&quot;&gt;KYLIN-1650&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;function name ‘getKylinPropertiesAsInputSteam’ misspelt &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1655&quot;&gt;KYLIN-1655&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Streaming/kafka config not match with table name &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1660&quot;&gt;KYLIN-1660&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;tableName got truncated during request mapping for /tables/tableName &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1662&quot;&gt;KYLIN-1662&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Should check project selection before add a stream table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1666&quot;&gt;KYLIN-1666&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Streaming table name should allow enter “DB.TABLE” format &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1667&quot;&gt;KYLIN-1667&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;make sure metadata in 1.5.2 compatible with 1.5.1 &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1673&quot;&gt;KYLIN-1673&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;MetaData clean just clean FINISHED and DISCARD jobs,but job correct status is SUCCEED &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1678&quot;&gt;KYLIN-1678&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;error happens while execute a sql contains ‘?’ using Statement &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1685&quot;&gt;KYLIN-1685&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;Illegal char on result dataset table &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1688&quot;&gt;KYLIN-1688&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;KylinConfigExt lost base properties when store into file &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1721&quot;&gt;KYLIN-1721&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;IntegerDimEnc serialization exception inside coprocessor &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN-1722&quot;&gt;KYLIN-1722&lt;/a&gt;&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;strong&gt;Upgrade&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;Data and metadata of this version is back compatible with v1.5.1, but may need to &lt;a href=&quot;/docs15/howto/howto_update_coprocessor.html&quot;&gt;redeploy hbase coprocessor&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Support&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;Any issue or question, please&lt;br /&gt;
+open JIRA to Kylin project: &lt;a href=&quot;https://issues.apache.org/jira/browse/KYLIN/&quot;&gt;https://issues.apache.org/jira/browse/KYLIN/&lt;/a&gt;&lt;br /&gt;
+or&lt;br /&gt;
+send mail to Apache Kylin dev mailing list: &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;em&gt;Great thanks to everyone who contributed!&lt;/em&gt;&lt;/p&gt;
+</description>
+        <pubDate>Thu, 26 May 2016 08:00:00 -0700</pubDate>
+        <link>http://kylin.apache.org/blog/2016/05/26/release-v1.5.2/</link>
+        <guid isPermaLink="true">http://kylin.apache.org/blog/2016/05/26/release-v1.5.2/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Apache Kylin v1.5.1 正式发布</title>
         <description>&lt;p&gt;Apache Kylin社区非常高兴宣布Apache Kylin v1.5.1正式发布。&lt;/p&gt;
 
@@ -816,397 +923,6 @@ send mail to Apache Kylin dev mailing li
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>Approximate Top-N support in Kylin</title>
-        <description>&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;
-
-&lt;p&gt;Find the Top-N (or Top-K) entities from a dataset is a common scenario and requirement in data minding; We often see the reports or news like “Top 100 companies in the world”, “Most popular 20 electronics” sold on a big e-commerce platform, etc. Exploring and analysising the top entities can always find some high value information.&lt;/p&gt;
-
-&lt;p&gt;Within the era of big data, this need is much stronger than ever before, as both the raw dataset and the number of entities can be vast; Without certain pre-calculation, get the Top-K entities among a distributed big dataset may take a long time, makes the ad-hoc query inefficient.&lt;/p&gt;
-
-&lt;p&gt;In v1.5.0, Apache Kylin introduces the “Top-N” measure, aiming to pre-calculate the top entities during the cube build phase; in the query phase,  Kylin can quickly fetch and return the top records. The performance would be much better than a cube without “Top-N”, giving the analyst more power to inspect data.&lt;/p&gt;
-
-&lt;p&gt;Please note, this “Top-N” measure is an approximate realization, to use it well you need have a good understanding with the algorithm as well as the data distribution.&lt;/p&gt;
-
-&lt;h2 id=&quot;top-n-query&quot;&gt;Top-N query&lt;/h2&gt;
-
-&lt;p&gt;Let’s start with the sample table that shipped in Kylin binary package. If you haven’t run that, follow this tutorial to create it: &lt;a href=&quot;https://kylin.apache.org/docs15/tutorial/kylin_sample.html&quot;&gt;Quick Start with Sample Cube&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;The sample fact table “default.kylin_sales” mock the transactions on an online marketplace. It has a couple of dimension and measure columns. To be simple, here we only use four: “PART_DT”, “LSTG_SITE_ID”, “SELLER_ID” and “PRICE”. Bellow table is the concept of these columns, with a rough cardinality, the “SELLER_ID” is a high cardinality column.&lt;/p&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: left&quot;&gt;Column&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;Description&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;Cardinality&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;PART_DT&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;Transaction Date&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;730: two years&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;LSTG_SITE_ID&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;Site ID, 0 represents ‘US’&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;50&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;SELLER_ID&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;Seller ID&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;About one million&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;PRICE&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;Sold amount&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;-&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Very often this online marketplace company need to identify the top sellers  (say top 100) in a given time period in some countries. The query looks like:&lt;/p&gt;
-
-&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SELECT SELLER_ID, SUM(PRICE) FROM KYLIN_SALES
- WHERE 
-	PART_DT &amp;gt;= date&#39;2016-02-18&#39; AND PART_DT &amp;lt; date&#39;2016-03-18&#39; 
-		AND LSTG_SITE_ID in (0) 
-	group by SELLER_ID 
-	order by SUM(PRICE) DESC limit 100;
-&lt;/code&gt;&lt;/pre&gt;
-&lt;/div&gt;
-
-&lt;h2 id=&quot;without-top-n-pre-calculation&quot;&gt;Without Top-N pre-calculation&lt;/h2&gt;
-
-&lt;p&gt;Before Kylin v1.5.0, all the “group by” columns need be as dimension, we come of a design that use PART_DT, LSTG_SITE_ID and SELLER_ID as dimensions, and define SUM(PRICE) as the measure. After build, the base cubiod of the cube will be like:&lt;/p&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: left&quot;&gt;Rowkey of base cuboid&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;SUM(PRICE)&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_00_seller0000001&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_00_seller0000002&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;…&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;…&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_00_seller0999999&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_01_seller0999999&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;…&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;…&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;…&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;…&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20160318_49_seller0999999&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Assume these dimensions are independent. The number of rows in base cuboid is 730*50*1million = 36.5 billion. Other cuboids which include “SELLER_ID” will also has millions of rows. At this moment you may notice that the cube expansion rate is high, the situation would be worse if there are more dimensions or the cardinality is higher. But the real challenge is not here.&lt;/p&gt;
-
-&lt;p&gt;Soon you will find the Top-N query couldn’t work, or took an unacceptable long time. Assume you want the top sellers in past 30 days in US, it need read 30 million rows from storage, aggregate and sort, finally return the top 100 ones.&lt;/p&gt;
-
-&lt;p&gt;Now we see, due to no pre-calculation, although the final result set is small, the memory footprint and I/Os in between is heavy.&lt;/p&gt;
-
-&lt;h2 id=&quot;with-top-n-pre-calculation&quot;&gt;With Top-N pre-calculation&lt;/h2&gt;
-
-&lt;p&gt;With the Top-N measure, Kylin will pre-calculate the top entities for each dimension combination duing the cube build, saving the result (both entity ID and measure value) as a column in storage. The entity ID (“SELLER_ID” in this case) now can be moved from dimension to the measure, which doesn’t participate in the rowkey. For the sample scenario described above, the newly designed cube will have 2 dimensions (PART_DT, LSTG_SITE_ID), and 1 Top-N measure.&lt;/p&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: left&quot;&gt;Rowkey of base cuboid&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;Top-N measure&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_00&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;seller0010091:xx.xx, seller0005002:xx.xx, …, seller0001789:xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20140318_01&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;seller0032036:xx.xx, seller0010091:xx.xx, …, seller000699:xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;…&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;…&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: left&quot;&gt;20160318_49&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;seller0061016:xx.xx, seller0665091:xx.xx, …, seller000699:xx.xx&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;The base cuboid will have 730 * 50 = 36.5 k rows now. In the measure cell, the Top certain records be stored in a container in descending order, those tail entities have been filtered out.&lt;/p&gt;
-
-&lt;p&gt;For the same query, “Top sellers in past 30 days in US” now only need read 30 rows from storage. The measure object, also called as counter containers will be further aggregated/merged at the storage side, finally only one container is returned to Kylin. Kylin extract the “SELLER_ID” and “SUM(PRICE)” from it before returns to client. The cost is much lighter than before, the performance gets highly improved.&lt;/p&gt;
-
-&lt;h2 id=&quot;algorithm&quot;&gt;Algorithm&lt;/h2&gt;
-
-&lt;p&gt;Kylin’s Top-N implementation referred to &lt;a href=&quot;https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java&quot;&gt;stream-lib&lt;/a&gt;, which is based on the Space-Saving algorithm and the Stream-Summary data structure as described in &lt;i&gt;[1]Efficient Computation of Frequent and Top-k Elements in Data Streams&lt;/i&gt; by Metwally, Agrawal, and Abbadi.&lt;/p&gt;
-
-&lt;p&gt;A couple of modifications are made to let it better fit with Kylin:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Using double as the counter data type;&lt;/li&gt;
-  &lt;li&gt;Simplfied data strucutre, using one linked list for all entries;&lt;/li&gt;
-  &lt;li&gt;A more compact serializer;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Besides, in order to run SpaceSaving in parallel on Hadoop, we make it mergable with the algorithm introduced in &lt;i&gt;[2] A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution&lt;/i&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;accuracy&quot;&gt;Accuracy&lt;/h2&gt;
-
-&lt;p&gt;Although the experiments in paper [1] has proved SpaceSaving’s efficiency and accuracy for realistic Zipfian data, it doesn’t ensure 100% accuracy for all scenarios. SpaceSaving uses a fixed space to put the most frequent candidates;  when the entities exceeds the space size, the tail entities will be truncated, causing data loss. The parallel algorithm merges multiple SpaceSavings into one, at that moment for the entities appeared in one but not in the other it had some assumptions, this will also cause some data distortion. Finally, the result from Top-N measure may have minor difference with the real result.&lt;/p&gt;
-
-&lt;p&gt;A couple of factors can affect the accuracy:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Zipfian distribution&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Many rankings in the world follows the &lt;strong&gt;[3] Zipfian distribution&lt;/strong&gt;, such as the population ranks of cities in various countries, corporation sizes, income rankings, etc. But the exponent of the distribution varies in different scenarios, this will affect the correctness of the result to some extend. The higher the exponent is (the distribution is more sharp), the more accurate answer will get. If the distribution is very flat, entities’ values are very close, the rankings from SpaceSaving will be less accurate. When using SpaceSaving, you’d better have an calculation on your data distribution.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Space in SpaceSaving&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;As mentioned above, SpaceSaving use a limited space to put the most frequent elements. Giving more space it will provide more accurate answer. For example, to calculate Top N elements, using 100 * N space would provide more accurate answer than 50 * N space. If the space is more than the entity’s cardinality, the result will be accurate. More space will take more CPU, memory and storage, this need be balanced.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Entity cardinality&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Element cardinality is also a factor to consider. Calculating Top 100 among 10 thousands is easiser than among 10 million.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Dataset size&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Error ratio from a big dataset is less than from a small dataset. The same for Top-N calculation.&lt;/p&gt;
-
-&lt;h2 id=&quot;statistics&quot;&gt;Statistics&lt;/h2&gt;
-
-&lt;p&gt;We designed a test case to calculate the top 100 elements using the parallel SpaceSaving among a generated data set (with commons-math3’s ZipfDistribution); The entity’s occurancy follows the Zipfian distribution, adjusting the parameters of Zipfian exponent, space, entity cardinality and dataset size time to times, compare the result with the accurate result (using mergesort) to collect the statistics, we get a rough accuracy report in below.&lt;/p&gt;
-
-&lt;p&gt;The first column is the entity cardinality, means among how many entities to identify the top 100 elements; The other three columns represent how much space using in the algorithm: 20X means using 2,000, 50X means use 5,000, and so on. Each cell of the table shows how many records are matched with the real result; if the error (or see difference) is less than 5/million of total data size we would think it is matched. E.g, for a 1 million data set, if the difference &amp;lt; 5. The SpaceSaving is calculated in parallel with 10 threads.&lt;/p&gt;
-
-&lt;h3 id=&quot;test-1-calculate-top-100-in-1-million-records-zipf-distribution-exponent--05-error-tolerance--5&quot;&gt;Test 1. Calculate top-100 in 1 million records, Zipf distribution exponent = 0.5, error tolerance &amp;lt; 5&lt;/h3&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: right&quot;&gt;Element cardinality&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;20X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;50X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;100X space&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;10,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;78%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;100,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;12%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;50%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;95%&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Conclusion: More space can get better accuracy.&lt;/p&gt;
-
-&lt;h3 id=&quot;test-2-calculate-top-100-in-1-million-records-zipf-distribution-exponent--06-error-tolerance--5&quot;&gt;Test 2. Calculate top-100 in 1 million records, Zipf distribution exponent = 0.6, error tolerance &amp;lt; 5&lt;/h3&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: right&quot;&gt;Element cardinality&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;20X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;50X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;100X space&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;10,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;93%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;100,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;30%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;89%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;99%&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Conclusion: more sharp the entities distribute, the better answer SpaceSaving prvoides&lt;/p&gt;
-
-&lt;h3 id=&quot;test-3-calculate-top-100-in-20-million-records-zif-distribution-exponent--05-error-tolerance--100&quot;&gt;Test 3. Calculate top-100 in 20 million records, Zif distribution exponent = 0.5, error tolerance &amp;lt; 100&lt;/h3&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: right&quot;&gt;Element cardinality&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;20X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;50X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;100X space&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;1,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;10,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;100,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;1,000,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;99%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Conclusion: The result from SpaceSaving will be close to actual when the dataset is enough big.&lt;/p&gt;
-
-&lt;h3 id=&quot;test-4-calculate-top-100-in-20-million-records-zif-distribution-exponent--06-error-tolerance--100&quot;&gt;Test 4. Calculate top-100 in 20 million records, Zif distribution exponent = 0.6, error tolerance &amp;lt; 100&lt;/h3&gt;
-
-&lt;table&gt;
-  &lt;thead&gt;
-    &lt;tr&gt;
-      &lt;th style=&quot;text-align: right&quot;&gt;Element cardinality&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;20X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;50X space&lt;/th&gt;
-      &lt;th style=&quot;text-align: center&quot;&gt;100X space&lt;/th&gt;
-    &lt;/tr&gt;
-  &lt;/thead&gt;
-  &lt;tbody&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;10,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;20,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;100,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-    &lt;tr&gt;
-      &lt;td style=&quot;text-align: right&quot;&gt;1,000,000&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;99%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-      &lt;td style=&quot;text-align: center&quot;&gt;100%&lt;/td&gt;
-    &lt;/tr&gt;
-  &lt;/tbody&gt;
-&lt;/table&gt;
-
-&lt;p&gt;Conclusion: same conclusion as test 3.&lt;/p&gt;
-
-&lt;p&gt;These statistics matches with what we expected above. It just gives us a rough estimation on the result correctness. To use this feature well in Kylin, you need know about all these variables, and do some pilots before publish it to the analysts.&lt;/p&gt;
-
-&lt;h2 id=&quot;query-performance&quot;&gt;Query performance&lt;/h2&gt;
-
-&lt;p&gt;Coming soon.&lt;/p&gt;
-
-&lt;p&gt;##Futher works&lt;/p&gt;
-
-&lt;p&gt;This feature in v1.5.0 is a basic version, which may solve 80% cases; While it has some limitations or hard-codings that deserve your attention:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;SUM() is the default aggregation function;&lt;/li&gt;
-  &lt;li&gt;Sort in descending order always;&lt;/li&gt;
-  &lt;li&gt;Use 50X space always;&lt;/li&gt;
-  &lt;li&gt;Use dictionary encoding for the literal column;&lt;/li&gt;
-  &lt;li&gt;UI only allow selecting topn(10), topn(100) and topn(1000) as the return type;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Please note here, if you select “topn(10)” as the return type, it doesn’t mean you have to use “limit 10” in your query; You can use other limit numbers, Kylin can at most return the top 500 entities for one combination, but the precision after 10 are not tested.&lt;/p&gt;
-
-&lt;p&gt;Whether or not to support more aggregations/sortings/encodings are totally based on user need. If you have any comment or suggestion, please subscribe and then drop email to our dev mailing list &lt;a href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#100;&amp;#101;&amp;#118;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;, thanks for your feedbak.&lt;/p&gt;
-
-&lt;p&gt;##References&lt;/p&gt;
-
-&lt;p&gt;[1] &lt;a href=&quot;https://dl.acm.org/citation.cfm?id=2131596&quot;&gt;Efficient Computation of Frequent and Top-k Elements in Data Streams&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;[2] &lt;a href=&quot;http://arxiv.org/pdf/1401.0702.pdf&quot;&gt;A parallel space saving algorithm for frequent items&lt;br /&gt;
-and the Hurwitz zeta distribution&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;[3] &lt;a href=&quot;https://en.wikipedia.org/wiki/Zipf%27s_law&quot;&gt;Zipfian law on wikipedia&lt;/a&gt;&lt;/p&gt;
-</description>
-        <pubDate>Sat, 19 Mar 2016 09:30:00 -0700</pubDate>
-        <link>http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/</link>
-        <guid isPermaLink="true">http://kylin.apache.org/blog/2016/03/19/approximate-topn-measure/</guid>
-        
-        
-        <category>blog</category>
         
       </item>