You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2015/03/28 01:03:59 UTC

[11/37] incubator-kylin git commit: KYLIN-650 minor change

KYLIN-650 minor change


Project: http://git-wip-us.apache.org/repos/asf/incubator-kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kylin/commit/4bd54078
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kylin/tree/4bd54078
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kylin/diff/4bd54078

Branch: refs/heads/master
Commit: 4bd54078af6dd7c60c4bb5eb8aab4a99e2a6d5f3
Parents: eff1393
Author: honma <ho...@ebay.com>
Authored: Thu Mar 19 22:27:23 2015 -0700
Committer: honma <ho...@ebay.com>
Committed: Thu Mar 19 22:27:23 2015 -0700

----------------------------------------------------------------------
 README.md                                       | 12 ++++--
 ...requently Asked Questions on Installation.md | 26 ------------
 .../MISC/FAQ on Kylin Installation and Usage.md | 40 +++++++++++++++++++
 docs/MISC/How to Contribute.md                  | 42 ++++++++++++++++++++
 4 files changed, 90 insertions(+), 30 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/4bd54078/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 232758c..2489576 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,10 @@ Tutorial
 Please follow this installation tutorial to start with Kylin: [Installation Tutorial](docs/Installation/Installation.md)
 
 
+Advance Usage
+-------
+
+
 Get Help
 ------------
 
@@ -26,13 +30,13 @@ The fastest way to get response from our developers is to send email to our mail
 Resources
 ------------
 
-* Web Site: <http://kylin.io>
+* [FAQ](docs/MISC/FAQ on Kylin Installation and Usage.md)
 
-* Developer Mail: <de...@kylin.incubator.apache.org>
+* Web Site: <http://kylin.incubator.apache.org/>
 
-* How To Contribute: See [wiki](https://github.com/KylinOLAP/Kylin/wiki/How-to-Contribute)
+* Developer Mail: <de...@kylin.incubator.apache.org>
 
-* Presentation: [Kylin Hadoop OLAP Engine v1.0](https://github.com/KylinOLAP/Kylin/blob/master/docs/Kylin_Hadoop_OLAP_Engine_v1.0.pdf?raw=true)
+* How To Contribute: See [this](docs/MISC/How to Contribute.md)
 
 *  Apache Proposal: [Apache Kylin](https://wiki.apache.org/incubator/KylinProposal)
 

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/4bd54078/docs/Installation/Frequently Asked Questions on Installation.md
----------------------------------------------------------------------
diff --git a/docs/Installation/Frequently Asked Questions on Installation.md b/docs/Installation/Frequently Asked Questions on Installation.md
deleted file mode 100644
index c160403..0000000
--- a/docs/Installation/Frequently Asked Questions on Installation.md	
+++ /dev/null
@@ -1,26 +0,0 @@
-Frequently Asked Questions on Installation
----
-* Some NPM error causes ERROR exit (中国大陆地区用户请特别注意此问题)?
-> Check out https://github.com/KylinOLAP/Kylin/issues/35
-
-* Can't get master address from ZooKeeper" when installing Kylin on Hortonworks Sandbox
-> Check out https://github.com/KylinOLAP/Kylin/issues/9.
-
-* Install scripted finished in my virtual machine, but cannot visit via http://localhost:9080
-> Check out https://github.com/KylinOLAP/Kylin/issues/12.
-
-* Map Reduce Job information can't display on sandbox deployment
-> Check out https://github.com/KylinOLAP/Kylin/issues/40
-
-* Install Kylin on CDH 5.2 or Hadoop 2.5.x
-> Check out discussion: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kylin-olap/X0GZfsX1jLc/nzs6xAhNpLkJ
-> 
-```
-I was able to deploy Kylin with following option in POM.
-<hadoop2.version>2.5.0</hadoop2.version>
-<yarn.version>2.5.0</yarn.version>
-<hbase-hadoop2.version>0.98.6-hadoop2</hbase-hadoop2.version>
-<zookeeper.version>3.4.5</zookeeper.version>
-<hive.version>0.13.1</hive.version>
-My Cluster is running on Cloudera Distribution CDH 5.2.0.
-```

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/4bd54078/docs/MISC/FAQ on Kylin Installation and Usage.md
----------------------------------------------------------------------
diff --git a/docs/MISC/FAQ on Kylin Installation and Usage.md b/docs/MISC/FAQ on Kylin Installation and Usage.md
new file mode 100644
index 0000000..1c1291e
--- /dev/null
+++ b/docs/MISC/FAQ on Kylin Installation and Usage.md	
@@ -0,0 +1,40 @@
+FAQ on Kylin Installation and Usage
+---
+#### Some NPM error causes ERROR exit (中国大陆地区用户请特别注意此问题)?
+Check out https://github.com/KylinOLAP/Kylin/issues/35
+
+#### Can't get master address from ZooKeeper" when installing Kylin on Hortonworks Sandbox
+Check out https://github.com/KylinOLAP/Kylin/issues/9.
+
+#### Install scripted finished in my virtual machine, but cannot visit via http://localhost:9080
+Check out https://github.com/KylinOLAP/Kylin/issues/12.
+
+#### Map Reduce Job information can't display on sandbox deployment
+Check out https://github.com/KylinOLAP/Kylin/issues/40
+
+#### Install Kylin on CDH 5.2 or Hadoop 2.5.x
+Check out discussion: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kylin-olap/X0GZfsX1jLc/nzs6xAhNpLkJ
+```
+I was able to deploy Kylin with following option in POM.
+<hadoop2.version>2.5.0</hadoop2.version>
+<yarn.version>2.5.0</yarn.version>
+<hbase-hadoop2.version>0.98.6-hadoop2</hbase-hadoop2.version>
+<zookeeper.version>3.4.5</zookeeper.version>
+<hive.version>0.13.1</hive.version>
+My Cluster is running on Cloudera Distribution CDH 5.2.0.
+```
+
+#### Unable to load a big cube as HTable, with java.lang.OutOfMemoryError: unable to create new native thread
+HBase (as of writing) allocates one thread per region when bulk loading a HTable. Try reduce the number of regions of your cube by setting its "capacity" to "MEDIUM" or "LARGE". Also tweaks OS & JVM can allow more threads, for example see [this article](http://blog.egilh.com/2006/06/2811aspx.html).
+
+#### Failed to run BuildCubeWithEngineTest, saying failed to connect to hbase while hbase is active
+User may get this error when first time run hbase client, please check the error trace to see whether there is an error saying couldn't access a folder like "/hadoop/hbase/local/jars"; If that folder doesn't exist, create it.
+
+#### SUM(field) returns a negtive result while all the numbers in this field are > 0
+If a column is declared as integer in Hive, the SQL engine (calcite) will use column's type (integer) as the data type for "SUM(field)", while the aggregated value on this field may exceed the scope of integer; in that case the cast will cause a negtive value be returned; The workround is, alter that column's type to BIGINT in hive, and then sync the table schema to Kylin (the cube doesn't need rebuild); Keep in mind that, always declare as BIGINT in hive for an integer column which would be used as a measure in Kylin; See hive number types: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-NumericTypes
+
+#### Why Kylin need extract the distinct columns from Fact Table before building cube?
+Kylin uses dictionary to encode the values in each column, this greatly reduce the cube's storage size. To build the dictionary, Kylin need fetch the distinct values for each column.
+
+#### Why Kylin calculate the HIVE table cardinality?
+The cardinality of dimensions is an important measure of cube complexity. The higher the cardinality, the bigger the cube, and thus the longer to build and the slower to query. Cardinality > 1,000 is worth attention and > 1,000,000 should be avoided at best effort. For optimal cube performance, try reduce high cardinality by categorize values or derive features.

http://git-wip-us.apache.org/repos/asf/incubator-kylin/blob/4bd54078/docs/MISC/How to Contribute.md
----------------------------------------------------------------------
diff --git a/docs/MISC/How to Contribute.md b/docs/MISC/How to Contribute.md
new file mode 100644
index 0000000..e270606
--- /dev/null
+++ b/docs/MISC/How to Contribute.md	
@@ -0,0 +1,42 @@
+#### Setup Dev Env
+* subscribe our developers' mail list via <de...@kylin.incubator.apache.org>
+* Fork on [GitHub](https://github.com/KylinOLAP)
+* ...
+
+#### Making Changes
+* Raise an issue on GitHub, describe the feature/enhancement/bug
+* Discuss with others in google group or issue comments, make sure the proposed changes fit in with what others are doing and have planned for the project
+* Make changes in your fork
+* Write unit test if no existing cover your change
+* Push to GitHub under your fork
+
+
+#### Contribute The Work
+* Raise a pull request on GitHub, include both **code** and **test**, link with **related issue**
+* Committer will review in terms of correctness, performance, design, coding style, test coverage
+* Discuss and revise if necessary
+* Finally committer merge code into main branch
+
+
+#### Wish List
+Some potential work items
+* Query Engine
+  * Cache generated class, reduce delay into ms level and avoid full GC triggered by perm generation
+  * [Issue #14](https://github.com/KylinOLAP/Kylin/issues/14) Derive meaningful cost in OLAP relational operator
+  * [Issue #15](https://github.com/KylinOLAP/Kylin/issues/15) Implement multi-column distinct count
+* Metadata
+  * [Issue #7](https://github.com/KylinOLAP/Kylin/issues/7) Merge multiple hbase tables
+* Job Engine
+  * [Issue #16](https://github.com/KylinOLAP/Kylin/issues/16) Tune HDFS & HBase parameters
+  * [Issue #17](https://github.com/KylinOLAP/Kylin/issues/17) Increase HDFS block size 1GB or close
+  * Shell command to support kill operation
+  * Use DoubleDouble instead of BigDecimal during cube build
+  * Drop quartz dependency, assess the cost/benefit first
+  * Cardinality run as one step job, allows progress tracking
+* ODBC/JDBC
+  * Test Kylin remote JDBC with java report tools
+  * Implement ODBC async mode, fetching from Kylin and feeding to client in parallel
+* Benchmark
+  * [Issue #18](https://github.com/KylinOLAP/Kylin/issues/18) Benchmark on standard dataset
+
+