You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by Luke Han <lu...@gmail.com> on 2014/11/14 16:36:48 UTC

[PROPOSAL] Kylin for Incubation

Hi all,
We would like to propose Kylin as an Apache Incubator project. The
complete proposal can be found:
https://wiki.apache.org/incubator/KylinProposal and posted the text of
the proposal below.

Thanks.
Luke


Kylin Proposal
==============

# Abstract

Kylin is a distributed and scalable OLAP engine built on Hadoop to
support extremely large datasets.

# Proposal

Kylin is an open source Distributed Analytics Engine that provides
multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
accelerate analytics on Hadoop by allowing the use of SQL-compatible
tools. Kylin provides a SQL interface and multi-dimensional analysis
(MOLAP) on Hadoop to support extremely large datasets and tightly
integrate with Hadoop ecosystem.

## Overview of Kylin

Kylin platform has two parts of data processing and interactive:
First, Kylin will read data from source, Hive, and run a set of tasks
including Map Reduce job, shell script to pre-calcuate results for a
specified data model, then save the resulting OLAP cube into storage
such as HBase. Once these OLAP cubes are ready, a user can submit a
request from any SQL-based tool or third party applications to Kylin’s
REST server. The Server calls the Query Engine to determine if the
target dataset already exists. If so, the engine directly accesses the
target data in the form of a predefined cube, and returns the result
with sub-second latency. Otherwise, the engine is designed to route
non-matching queries to whichever SQL on Hadoop tool is already
available on a Hadoop cluster, such as Hive.

Kylin platform includes:

- Metadata Manager: Kylin is a metadata-driven application. The Kylin
Metadata Manager is the key component that manages all metadata stored
in Kylin including all cube metadata. All other components rely on the
Metadata Manager.

- Job Engine: This engine is designed to handle all of the offline
jobs including shell script, Java API, and Map Reduce jobs. The Job
Engine manages and coordinates all of the jobs in Kylin to make sure
each job executes and handles failures.

- Storage Engine: This engine manages the underlying storage –
specifically, the cuboids, which are stored as key-value pairs. The
Storage Engine uses HBase – the best solution from the Hadoop
ecosystem for leveraging an existing K-V system. Kylin can also be
extended to support other K-V systems, such as Redis.

- Query Engine: Once the cube is ready, the Query Engine can receive
and parse user queries. It then interacts with other components to
return the results to the user.

- REST Server: The REST Server is an entry point for applications to
develop against Kylin. Applications can submit queries, get results,
trigger cube build jobs, get metadata, get user privileges, and so on.

- ODBC Driver: To support third-party tools and applications – such as
Tableau – we have built and open-sourced an ODBC Driver. The goal is
to make it easy for users to onboard.

# Background

The challenge we face at eBay is that our data volume is becoming
bigger and bigger while our user base is becoming more diverse. For
e.g. our business users and analysts consistently ask for minimal
latency when visualizing data on Tableau and Excel. So, we worked
closely with our internal analyst community and outlined the product
requirements for Kylin:

- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size range

Existing SQL-on-Hadoop solutions commonly need to perform partial or
full table or file scans to compute the results of queries. The cost
of these large data scans can make many queries very slow (more than a
minute). The core idea of MOLAP (multi-dimensional OLAP) is to
pre-compute data along dimensions of interest and store resulting
aggregates as a "cube". MOLAP is much faster but is inflexible. We
realized that no existing product met our exact requirements
externally – especially in the open source Hadoop community. To meet
our emerging business needs, we built a platform from scratch to
support MOLAP for these business requirements and then to support more
others include ROLAP. With an excellent development team and several
pilot customers, we have been able to bring the Kylin platform into
production as well as open source it.

# Rationale

When data grows to petabyte scale, the process of pre-calculation of a
query takes a long time and costly and powerful hardware. However,
with the benefit of Hadoop’s distributed computing architecture, jobs
can leverage hundreds or thousands of Hadoop data nodes. There still
exists a big gap between the growing volume of data and interactive
analytics:

- Existing Business Intelligence (OLAP) platforms cannot scale out to
support fast growing data.
- Existing SQL on Hadoop projects are not designed for OLAP use cases,
huge tables joins will always take long time to scan and calculate.
- No mature OLAP solution exists on Hadoop

As mentioned in the background, the business requirements triggered by
increase in data volume drove eBay to invest in building a solution
from scratch to offer Analytics capability on Hadoop cluster. With
Hadoop’s power of distributed computing Kylin can perform
pre-calculations in parallel and merge the final results, thereby
significantly reducing the processing time.

To serve queries by the analyst community, Kylin generates cuboids
with all possible combinations of dimensions, and calculate all
metrics at different levels. The cuboids are then integrated to form a
pre-calculated OLAP cube. All cuboids are key-value structured: keys
are composites formed from combinations of multiple dimensions and
values are aggregations results for that particular combination of
dimensions. Kylin uses HBase to store cubes. HBase is useful because
it supports efficient searches across ranges of data.

# Current Status

## Meritocracy

Kylin has been deployed in production at eBay and is processing
extremely large datasets. The platform has demonstrated great
performance benefits and has proved to be a better way for analysts to
leverage data on Hadoop with a more convenient approach using their
favorite tool.

## Community

Kylin seeks to develop developer and user communities during incubation.

## Core Developers

Kylin is currently being designed and developed by six engineers from
eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
Xiaodong Duo. In addition, some outside contributors are actively
contributing in design and development. Among them, Julian Hyde from
Hortonworks is a very important contributor. All of these core
developers have deep expertise in Hadoop and the Hadoop Ecosystem in
general.

## Alignment

The ASF is a natural host for Kylin given that it is already the home
of Hadoop, Pig, Hive, and other emerging cloud software projects.
Kylin was designed to offer OLAP capability on Hadoop from the
beginning in order to solve data access and analysis challenges in
Hadoop clusters. Kylin complements the existing Hadoop analytics area
by providing a comprehensive solution based on pre-computed views.

In Kylin, we are leveraging an open-source dynamic data management
framework called Apache Calcite to parse SQL and plug in our code.
Apache Calcite was previously called Optiq, was originally authored by
Julian Hyde and is now an Apache Incubator project.

# Known Risks

## Orphaned Products

The core developers of Kylin team plan to work full time on this
project. There is very little risk of Kylin getting orphaned since at
least one large company (eBay) is extensively using it in their
production Hadoop clusters. For example, currently there are 3 use
cases with more that 12+Billion rows and 1000 activity requests per
day using Kylin in production. Furthermore, since Kylin was open
sourced at the beginning of October 2014, it has received more than
280 stars and been forked nearly 100 times. Kylin has one major
release so far and and received 5 pull requests from contributors in
the first month pull requests from external sources in the last month,
which further demonstrates Kylin as a very active project. We plan to
extend and diversify this community further through Apache.

## Inexperience with Open Source

The core developers are all active users and followers of open source.
They are already committers and contributors to the Kylin Github
project. All have been involved with the source code that has been
released under an open source license, and several of them also have
experience developing code in an open source environment. Though the
core set of Developers do not have Apache Open Source experience,
there are plans to onboard individuals with Apache open source
experience on to the project.

## Homogenous Developers

The core developers include developers from eBay, Ctrip and
Hortonworks. Apache Incubation process encourages an open and diverse
meritocratic community. Apache Kylin has the required amount of
diversity with committers from three different organizations, but is
also aware that bulk of the commits come from a single entity. Kylin
intends to make every possible effort to build a diverse, vibrant and
involved community and has already received substantial interest from
various organizations

## Reliance on Salaried Developers

eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
and some of its key engineers are working full time on the project. In
addition, since there is a growing Big Data need for scalable OLAP
solutions on Hadoop, we look forward to other Apache developers and
researchers to contribute to the project. Additional contributors,
including Apache committers have plans to join this effort shortly.
Also key to addressing the risk associated with relying on Salaried
developers from a single entity is to increase the diversity of the
contributors and actively lobby for Domain experts in the BI space to
contribute. Apache Kylin intends to do this. One approach already
taken is to approach the Apache Drill project to explore possible
cooperation.

## Relationships with Other Apache Products

Kylin has a strong relationship and dependency with Apache Hadoop
HBase, Hive and Calcite. Being part of Apache’s Incubation community,
could help with a closer collaboration among these four projects and
as well as others.

Kylin is likely to have substantial value to Apache Drill due to the
common use of Calcite as a query optimization engine and similar
approaches between Kylin's approach to cubing and Drill's approach to
input sources.

## An Excessive Fascination with the Apache Brand

Kylin is proposing to enter incubation at Apache in order to help
efforts to diversify the committer-base, not so much to capitalize on
the Apache brand. The Kylin project is in production use already
inside EBay, but is not expected to be an EBay product for external
customers. As such, the Kylin project is not seeking to use the Apache
brand as a marketing tool.

# Documentation

Information about Kylin can be found at
https://github.com/KylinOLAP/Kylin. The following links provide more
information about Kylin in open source:

- Kylin web site: http://kylin.io
- Codebase at Github: https://github.com/KylinOLAP/Kylin
- Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
- User community: https://groups.google.com/forum/#!forum/kylin-olap

## Initial Source

Kylin has been under development since 2013 by a team of engineers at
eBay Inc. It is currently hosted on Github.com under an Apache license
at https://github.com/KylinOLAP/Kylin

## External Dependencies

Kylin has the following external dependencies.

* Basic

- JDK 1.6+
- Apache Maven
- JUnit
- DBUnit
- Log4j
- Slf4j
- Apache Commons
- Google Guava
- Jackson

* Hadoop

- Apache Hadoop
- Apache HBase
- Apache Hive
- Apache Zookeeper
- Apache Curator

* Utility

- H2
- JSCH

* REST Service

- Spring

* Query

- Antlr
- Apache Calcite (formerly Optiq)
- Linq4j

* Job

- Quartz

* Web build tool

- NPM
- Grunt
- bower

* Web

- Angular JS
- jQuery
- Bootstrap
- D3 JS
- ACE

##Cryptography

Kylin will eventually support encryption on the wire. This is not one
of the initial goals, and we do not expect Kylin to be a controlled
export item due to the use of encryption. Kylin supports but does not
require the Kerberos authentication mechanism to access secured Hadoop
services.

# Required Resources

## Mailing List

- kylin-private for private PMC discussions (with moderated subscriptions)
- kylin-dev
- kylin-commits

##Subversion Directory

Git is the preferred source control system: git://git.apache.org/Kylin

## Issue Tracking

JIRA Kylin (KYLIN)

## Other Resources

The existing code already has unit tests so we will make use of
existing Apache continuous testing infrastructure. The resulting load
should not be very large.

# Initial Committers

- Jiang Xu < jiangxu.china at gmail dot com>
- Luke Han <lukhan at ebay dot com>
- Yang Li <yangli9 at ebay dot com>
- George Song <ysong1 at ebay dot com>
- Hongbin Ma <honma at ebay dot com>
- Xiaodong Duo < oranjedog at gmail dot com>
- Julian Hyde < jhyde at apache dot org >
- Ankur Bansal < abansal at ebay dot com>

## Affiliations

The initial committers are employees of eBay Inc., Ctrip and
Hortonworks. The nominated mentors are employees of Hortonworks, MapR
Technologies and Pivotal.

# Sponsors

## Champion

- Owen O’Malley < omalley at apache dot org >
- Ted Dunning <tdunning at apache dot org>

## Nominated Mentors

- Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
Co-founder and Senior Architect, Hortonworks
- Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
Chief Architect, MapR Technologies
- Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
- Jacques Nadeau <jacques at apache dot org> (pending admission to
IPMC) - Apache Drill PMC Chair, MapR Technologies

#Sponsoring Entity

We are requesting the Incubator to sponsor this project.

Re: [PROPOSAL] Kylin for Incubation

Posted by Luke Han <lu...@gmail.com>.
Check again with Apache trademark is a more safe way to continue use this
name.
Will contact them and do the check again.

Thank you very much to point this out.
Luke

2014-11-15 0:01 GMT+08:00 Ross Gardler (MS OPEN TECH) <
Ross.Gardler@microsoft.com>:

> Please check with VP Trademarks here at Apache.
>
> Sent from my Windows Phone
> ________________________________
> From: Luke Han<ma...@gmail.com>
> Sent: ‎11/‎14/‎2014 8:00 AM
> To: general@incubator.apache.org<ma...@incubator.apache.org>
> Subject: Re: [PROPOSAL] Kylin for Incubation
>
> We have noticed this from the beginning, below is the comments from our
> Legal team:
> "We’ve done a preliminary trademark search for Kylin in the US, and there
> weren’t any directly conflicting brands. "
>
> I think it should be ok to use:)
>
> Thanks.
>
> Luke
>
> 2014-11-14 23:47 GMT+08:00 Ross Gardler (MS OPEN TECH) <
> Ross.Gardler@microsoft.com>:
>
> > Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
> >
> > Sent from my Windows Phone
> > ________________________________
> > From: Luke Han<ma...@gmail.com>
> > Sent: ‎11/‎14/‎2014 7:38 AM
> > To: general@incubator.apache.org<ma...@incubator.apache.org>
> > Subject: [PROPOSAL] Kylin for Incubation
> >
> > Hi all,
> > We would like to propose Kylin as an Apache Incubator project. The
> > complete proposal can be found:
> > https://wiki.apache.org/incubator/KylinProposal and posted the text of
> > the proposal below.
> >
> > Thanks.
> > Luke
> >
> >
> > Kylin Proposal
> > ==============
> >
> > # Abstract
> >
> > Kylin is a distributed and scalable OLAP engine built on Hadoop to
> > support extremely large datasets.
> >
> > # Proposal
> >
> > Kylin is an open source Distributed Analytics Engine that provides
> > multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> > accelerate analytics on Hadoop by allowing the use of SQL-compatible
> > tools. Kylin provides a SQL interface and multi-dimensional analysis
> > (MOLAP) on Hadoop to support extremely large datasets and tightly
> > integrate with Hadoop ecosystem.
> >
> > ## Overview of Kylin
> >
> > Kylin platform has two parts of data processing and interactive:
> > First, Kylin will read data from source, Hive, and run a set of tasks
> > including Map Reduce job, shell script to pre-calcuate results for a
> > specified data model, then save the resulting OLAP cube into storage
> > such as HBase. Once these OLAP cubes are ready, a user can submit a
> > request from any SQL-based tool or third party applications to Kylin’s
> > REST server. The Server calls the Query Engine to determine if the
> > target dataset already exists. If so, the engine directly accesses the
> > target data in the form of a predefined cube, and returns the result
> > with sub-second latency. Otherwise, the engine is designed to route
> > non-matching queries to whichever SQL on Hadoop tool is already
> > available on a Hadoop cluster, such as Hive.
> >
> > Kylin platform includes:
> >
> > - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> > Metadata Manager is the key component that manages all metadata stored
> > in Kylin including all cube metadata. All other components rely on the
> > Metadata Manager.
> >
> > - Job Engine: This engine is designed to handle all of the offline
> > jobs including shell script, Java API, and Map Reduce jobs. The Job
> > Engine manages and coordinates all of the jobs in Kylin to make sure
> > each job executes and handles failures.
> >
> > - Storage Engine: This engine manages the underlying storage –
> > specifically, the cuboids, which are stored as key-value pairs. The
> > Storage Engine uses HBase – the best solution from the Hadoop
> > ecosystem for leveraging an existing K-V system. Kylin can also be
> > extended to support other K-V systems, such as Redis.
> >
> > - Query Engine: Once the cube is ready, the Query Engine can receive
> > and parse user queries. It then interacts with other components to
> > return the results to the user.
> >
> > - REST Server: The REST Server is an entry point for applications to
> > develop against Kylin. Applications can submit queries, get results,
> > trigger cube build jobs, get metadata, get user privileges, and so on.
> >
> > - ODBC Driver: To support third-party tools and applications – such as
> > Tableau – we have built and open-sourced an ODBC Driver. The goal is
> > to make it easy for users to onboard.
> >
> > # Background
> >
> > The challenge we face at eBay is that our data volume is becoming
> > bigger and bigger while our user base is becoming more diverse. For
> > e.g. our business users and analysts consistently ask for minimal
> > latency when visualizing data on Tableau and Excel. So, we worked
> > closely with our internal analyst community and outlined the product
> > requirements for Kylin:
> >
> > - Sub-second query latency on billions of rows
> > - ANSI SQL availability for those using SQL-compatible tools
> > - Full OLAP capability to offer advanced functionality
> > - Support for high cardinality and very large dimensions
> > - High concurrency for thousands of users
> > - Distributed and scale-out architecture for analysis in the TB to PB
> size
> > range
> >
> > Existing SQL-on-Hadoop solutions commonly need to perform partial or
> > full table or file scans to compute the results of queries. The cost
> > of these large data scans can make many queries very slow (more than a
> > minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> > pre-compute data along dimensions of interest and store resulting
> > aggregates as a "cube". MOLAP is much faster but is inflexible. We
> > realized that no existing product met our exact requirements
> > externally – especially in the open source Hadoop community. To meet
> > our emerging business needs, we built a platform from scratch to
> > support MOLAP for these business requirements and then to support more
> > others include ROLAP. With an excellent development team and several
> > pilot customers, we have been able to bring the Kylin platform into
> > production as well as open source it.
> >
> > # Rationale
> >
> > When data grows to petabyte scale, the process of pre-calculation of a
> > query takes a long time and costly and powerful hardware. However,
> > with the benefit of Hadoop’s distributed computing architecture, jobs
> > can leverage hundreds or thousands of Hadoop data nodes. There still
> > exists a big gap between the growing volume of data and interactive
> > analytics:
> >
> > - Existing Business Intelligence (OLAP) platforms cannot scale out to
> > support fast growing data.
> > - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> > huge tables joins will always take long time to scan and calculate.
> > - No mature OLAP solution exists on Hadoop
> >
> > As mentioned in the background, the business requirements triggered by
> > increase in data volume drove eBay to invest in building a solution
> > from scratch to offer Analytics capability on Hadoop cluster. With
> > Hadoop’s power of distributed computing Kylin can perform
> > pre-calculations in parallel and merge the final results, thereby
> > significantly reducing the processing time.
> >
> > To serve queries by the analyst community, Kylin generates cuboids
> > with all possible combinations of dimensions, and calculate all
> > metrics at different levels. The cuboids are then integrated to form a
> > pre-calculated OLAP cube. All cuboids are key-value structured: keys
> > are composites formed from combinations of multiple dimensions and
> > values are aggregations results for that particular combination of
> > dimensions. Kylin uses HBase to store cubes. HBase is useful because
> > it supports efficient searches across ranges of data.
> >
> > # Current Status
> >
> > ## Meritocracy
> >
> > Kylin has been deployed in production at eBay and is processing
> > extremely large datasets. The platform has demonstrated great
> > performance benefits and has proved to be a better way for analysts to
> > leverage data on Hadoop with a more convenient approach using their
> > favorite tool.
> >
> > ## Community
> >
> > Kylin seeks to develop developer and user communities during incubation.
> >
> > ## Core Developers
> >
> > Kylin is currently being designed and developed by six engineers from
> > eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> > Xiaodong Duo. In addition, some outside contributors are actively
> > contributing in design and development. Among them, Julian Hyde from
> > Hortonworks is a very important contributor. All of these core
> > developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> > general.
> >
> > ## Alignment
> >
> > The ASF is a natural host for Kylin given that it is already the home
> > of Hadoop, Pig, Hive, and other emerging cloud software projects.
> > Kylin was designed to offer OLAP capability on Hadoop from the
> > beginning in order to solve data access and analysis challenges in
> > Hadoop clusters. Kylin complements the existing Hadoop analytics area
> > by providing a comprehensive solution based on pre-computed views.
> >
> > In Kylin, we are leveraging an open-source dynamic data management
> > framework called Apache Calcite to parse SQL and plug in our code.
> > Apache Calcite was previously called Optiq, was originally authored by
> > Julian Hyde and is now an Apache Incubator project.
> >
> > # Known Risks
> >
> > ## Orphaned Products
> >
> > The core developers of Kylin team plan to work full time on this
> > project. There is very little risk of Kylin getting orphaned since at
> > least one large company (eBay) is extensively using it in their
> > production Hadoop clusters. For example, currently there are 3 use
> > cases with more that 12+Billion rows and 1000 activity requests per
> > day using Kylin in production. Furthermore, since Kylin was open
> > sourced at the beginning of October 2014, it has received more than
> > 280 stars and been forked nearly 100 times. Kylin has one major
> > release so far and and received 5 pull requests from contributors in
> > the first month pull requests from external sources in the last month,
> > which further demonstrates Kylin as a very active project. We plan to
> > extend and diversify this community further through Apache.
> >
> > ## Inexperience with Open Source
> >
> > The core developers are all active users and followers of open source.
> > They are already committers and contributors to the Kylin Github
> > project. All have been involved with the source code that has been
> > released under an open source license, and several of them also have
> > experience developing code in an open source environment. Though the
> > core set of Developers do not have Apache Open Source experience,
> > there are plans to onboard individuals with Apache open source
> > experience on to the project.
> >
> > ## Homogenous Developers
> >
> > The core developers include developers from eBay, Ctrip and
> > Hortonworks. Apache Incubation process encourages an open and diverse
> > meritocratic community. Apache Kylin has the required amount of
> > diversity with committers from three different organizations, but is
> > also aware that bulk of the commits come from a single entity. Kylin
> > intends to make every possible effort to build a diverse, vibrant and
> > involved community and has already received substantial interest from
> > various organizations
> >
> > ## Reliance on Salaried Developers
> >
> > eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> > and some of its key engineers are working full time on the project. In
> > addition, since there is a growing Big Data need for scalable OLAP
> > solutions on Hadoop, we look forward to other Apache developers and
> > researchers to contribute to the project. Additional contributors,
> > including Apache committers have plans to join this effort shortly.
> > Also key to addressing the risk associated with relying on Salaried
> > developers from a single entity is to increase the diversity of the
> > contributors and actively lobby for Domain experts in the BI space to
> > contribute. Apache Kylin intends to do this. One approach already
> > taken is to approach the Apache Drill project to explore possible
> > cooperation.
> >
> > ## Relationships with Other Apache Products
> >
> > Kylin has a strong relationship and dependency with Apache Hadoop
> > HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> > could help with a closer collaboration among these four projects and
> > as well as others.
> >
> > Kylin is likely to have substantial value to Apache Drill due to the
> > common use of Calcite as a query optimization engine and similar
> > approaches between Kylin's approach to cubing and Drill's approach to
> > input sources.
> >
> > ## An Excessive Fascination with the Apache Brand
> >
> > Kylin is proposing to enter incubation at Apache in order to help
> > efforts to diversify the committer-base, not so much to capitalize on
> > the Apache brand. The Kylin project is in production use already
> > inside EBay, but is not expected to be an EBay product for external
> > customers. As such, the Kylin project is not seeking to use the Apache
> > brand as a marketing tool.
> >
> > # Documentation
> >
> > Information about Kylin can be found at
> > https://github.com/KylinOLAP/Kylin. The following links provide more
> > information about Kylin in open source:
> >
> > - Kylin web site: http://kylin.io
> > - Codebase at Github: https://github.com/KylinOLAP/Kylin
> > - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> > - User community: https://groups.google.com/forum/#!forum/kylin-olap
> >
> > ## Initial Source
> >
> > Kylin has been under development since 2013 by a team of engineers at
> > eBay Inc. It is currently hosted on Github.com under an Apache license
> > at https://github.com/KylinOLAP/Kylin
> >
> > ## External Dependencies
> >
> > Kylin has the following external dependencies.
> >
> > * Basic
> >
> > - JDK 1.6+
> > - Apache Maven
> > - JUnit
> > - DBUnit
> > - Log4j
> > - Slf4j
> > - Apache Commons
> > - Google Guava
> > - Jackson
> >
> > * Hadoop
> >
> > - Apache Hadoop
> > - Apache HBase
> > - Apache Hive
> > - Apache Zookeeper
> > - Apache Curator
> >
> > * Utility
> >
> > - H2
> > - JSCH
> >
> > * REST Service
> >
> > - Spring
> >
> > * Query
> >
> > - Antlr
> > - Apache Calcite (formerly Optiq)
> > - Linq4j
> >
> > * Job
> >
> > - Quartz
> >
> > * Web build tool
> >
> > - NPM
> > - Grunt
> > - bower
> >
> > * Web
> >
> > - Angular JS
> > - jQuery
> > - Bootstrap
> > - D3 JS
> > - ACE
> >
> > ##Cryptography
> >
> > Kylin will eventually support encryption on the wire. This is not one
> > of the initial goals, and we do not expect Kylin to be a controlled
> > export item due to the use of encryption. Kylin supports but does not
> > require the Kerberos authentication mechanism to access secured Hadoop
> > services.
> >
> > # Required Resources
> >
> > ## Mailing List
> >
> > - kylin-private for private PMC discussions (with moderated
> subscriptions)
> > - kylin-dev
> > - kylin-commits
> >
> > ##Subversion Directory
> >
> > Git is the preferred source control system: git://git.apache.org/Kylin
> >
> > ## Issue Tracking
> >
> > JIRA Kylin (KYLIN)
> >
> > ## Other Resources
> >
> > The existing code already has unit tests so we will make use of
> > existing Apache continuous testing infrastructure. The resulting load
> > should not be very large.
> >
> > # Initial Committers
> >
> > - Jiang Xu < jiangxu.china at gmail dot com>
> > - Luke Han <lukhan at ebay dot com>
> > - Yang Li <yangli9 at ebay dot com>
> > - George Song <ysong1 at ebay dot com>
> > - Hongbin Ma <honma at ebay dot com>
> > - Xiaodong Duo < oranjedog at gmail dot com>
> > - Julian Hyde < jhyde at apache dot org >
> > - Ankur Bansal < abansal at ebay dot com>
> >
> > ## Affiliations
> >
> > The initial committers are employees of eBay Inc., Ctrip and
> > Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> > Technologies and Pivotal.
> >
> > # Sponsors
> >
> > ## Champion
> >
> > - Owen O’Malley < omalley at apache dot org >
> > - Ted Dunning <tdunning at apache dot org>
> >
> > ## Nominated Mentors
> >
> > - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> > Co-founder and Senior Architect, Hortonworks
> > - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> > Chief Architect, MapR Technologies
> > - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member,
> Pivotal
> > - Jacques Nadeau <jacques at apache dot org> (pending admission to
> > IPMC) - Apache Drill PMC Chair, MapR Technologies
> >
> > #Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
> >
>
>
>
> --
>
> Best Regards!
> ---------------------
>
> Luke Han
>



-- 

Best Regards!
---------------------

Luke Han

RE: [PROPOSAL] Kylin for Incubation

Posted by "Ross Gardler (MS OPEN TECH)" <Ro...@microsoft.com>.
Please check with VP Trademarks here at Apache.

Sent from my Windows Phone
________________________________
From: Luke Han<ma...@gmail.com>
Sent: ‎11/‎14/‎2014 8:00 AM
To: general@incubator.apache.org<ma...@incubator.apache.org>
Subject: Re: [PROPOSAL] Kylin for Incubation

We have noticed this from the beginning, below is the comments from our
Legal team:
"We’ve done a preliminary trademark search for Kylin in the US, and there
weren’t any directly conflicting brands. "

I think it should be ok to use:)

Thanks.

Luke

2014-11-14 23:47 GMT+08:00 Ross Gardler (MS OPEN TECH) <
Ross.Gardler@microsoft.com>:

> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
>
> Sent from my Windows Phone
> ________________________________
> From: Luke Han<ma...@gmail.com>
> Sent: ‎11/‎14/‎2014 7:38 AM
> To: general@incubator.apache.org<ma...@incubator.apache.org>
> Subject: [PROPOSAL] Kylin for Incubation
>
> Hi all,
> We would like to propose Kylin as an Apache Incubator project. The
> complete proposal can be found:
> https://wiki.apache.org/incubator/KylinProposal and posted the text of
> the proposal below.
>
> Thanks.
> Luke
>
>
> Kylin Proposal
> ==============
>
> # Abstract
>
> Kylin is a distributed and scalable OLAP engine built on Hadoop to
> support extremely large datasets.
>
> # Proposal
>
> Kylin is an open source Distributed Analytics Engine that provides
> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> accelerate analytics on Hadoop by allowing the use of SQL-compatible
> tools. Kylin provides a SQL interface and multi-dimensional analysis
> (MOLAP) on Hadoop to support extremely large datasets and tightly
> integrate with Hadoop ecosystem.
>
> ## Overview of Kylin
>
> Kylin platform has two parts of data processing and interactive:
> First, Kylin will read data from source, Hive, and run a set of tasks
> including Map Reduce job, shell script to pre-calcuate results for a
> specified data model, then save the resulting OLAP cube into storage
> such as HBase. Once these OLAP cubes are ready, a user can submit a
> request from any SQL-based tool or third party applications to Kylin’s
> REST server. The Server calls the Query Engine to determine if the
> target dataset already exists. If so, the engine directly accesses the
> target data in the form of a predefined cube, and returns the result
> with sub-second latency. Otherwise, the engine is designed to route
> non-matching queries to whichever SQL on Hadoop tool is already
> available on a Hadoop cluster, such as Hive.
>
> Kylin platform includes:
>
> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> Metadata Manager is the key component that manages all metadata stored
> in Kylin including all cube metadata. All other components rely on the
> Metadata Manager.
>
> - Job Engine: This engine is designed to handle all of the offline
> jobs including shell script, Java API, and Map Reduce jobs. The Job
> Engine manages and coordinates all of the jobs in Kylin to make sure
> each job executes and handles failures.
>
> - Storage Engine: This engine manages the underlying storage –
> specifically, the cuboids, which are stored as key-value pairs. The
> Storage Engine uses HBase – the best solution from the Hadoop
> ecosystem for leveraging an existing K-V system. Kylin can also be
> extended to support other K-V systems, such as Redis.
>
> - Query Engine: Once the cube is ready, the Query Engine can receive
> and parse user queries. It then interacts with other components to
> return the results to the user.
>
> - REST Server: The REST Server is an entry point for applications to
> develop against Kylin. Applications can submit queries, get results,
> trigger cube build jobs, get metadata, get user privileges, and so on.
>
> - ODBC Driver: To support third-party tools and applications – such as
> Tableau – we have built and open-sourced an ODBC Driver. The goal is
> to make it easy for users to onboard.
>
> # Background
>
> The challenge we face at eBay is that our data volume is becoming
> bigger and bigger while our user base is becoming more diverse. For
> e.g. our business users and analysts consistently ask for minimal
> latency when visualizing data on Tableau and Excel. So, we worked
> closely with our internal analyst community and outlined the product
> requirements for Kylin:
>
> - Sub-second query latency on billions of rows
> - ANSI SQL availability for those using SQL-compatible tools
> - Full OLAP capability to offer advanced functionality
> - Support for high cardinality and very large dimensions
> - High concurrency for thousands of users
> - Distributed and scale-out architecture for analysis in the TB to PB size
> range
>
> Existing SQL-on-Hadoop solutions commonly need to perform partial or
> full table or file scans to compute the results of queries. The cost
> of these large data scans can make many queries very slow (more than a
> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> pre-compute data along dimensions of interest and store resulting
> aggregates as a "cube". MOLAP is much faster but is inflexible. We
> realized that no existing product met our exact requirements
> externally – especially in the open source Hadoop community. To meet
> our emerging business needs, we built a platform from scratch to
> support MOLAP for these business requirements and then to support more
> others include ROLAP. With an excellent development team and several
> pilot customers, we have been able to bring the Kylin platform into
> production as well as open source it.
>
> # Rationale
>
> When data grows to petabyte scale, the process of pre-calculation of a
> query takes a long time and costly and powerful hardware. However,
> with the benefit of Hadoop’s distributed computing architecture, jobs
> can leverage hundreds or thousands of Hadoop data nodes. There still
> exists a big gap between the growing volume of data and interactive
> analytics:
>
> - Existing Business Intelligence (OLAP) platforms cannot scale out to
> support fast growing data.
> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> huge tables joins will always take long time to scan and calculate.
> - No mature OLAP solution exists on Hadoop
>
> As mentioned in the background, the business requirements triggered by
> increase in data volume drove eBay to invest in building a solution
> from scratch to offer Analytics capability on Hadoop cluster. With
> Hadoop’s power of distributed computing Kylin can perform
> pre-calculations in parallel and merge the final results, thereby
> significantly reducing the processing time.
>
> To serve queries by the analyst community, Kylin generates cuboids
> with all possible combinations of dimensions, and calculate all
> metrics at different levels. The cuboids are then integrated to form a
> pre-calculated OLAP cube. All cuboids are key-value structured: keys
> are composites formed from combinations of multiple dimensions and
> values are aggregations results for that particular combination of
> dimensions. Kylin uses HBase to store cubes. HBase is useful because
> it supports efficient searches across ranges of data.
>
> # Current Status
>
> ## Meritocracy
>
> Kylin has been deployed in production at eBay and is processing
> extremely large datasets. The platform has demonstrated great
> performance benefits and has proved to be a better way for analysts to
> leverage data on Hadoop with a more convenient approach using their
> favorite tool.
>
> ## Community
>
> Kylin seeks to develop developer and user communities during incubation.
>
> ## Core Developers
>
> Kylin is currently being designed and developed by six engineers from
> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> Xiaodong Duo. In addition, some outside contributors are actively
> contributing in design and development. Among them, Julian Hyde from
> Hortonworks is a very important contributor. All of these core
> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> general.
>
> ## Alignment
>
> The ASF is a natural host for Kylin given that it is already the home
> of Hadoop, Pig, Hive, and other emerging cloud software projects.
> Kylin was designed to offer OLAP capability on Hadoop from the
> beginning in order to solve data access and analysis challenges in
> Hadoop clusters. Kylin complements the existing Hadoop analytics area
> by providing a comprehensive solution based on pre-computed views.
>
> In Kylin, we are leveraging an open-source dynamic data management
> framework called Apache Calcite to parse SQL and plug in our code.
> Apache Calcite was previously called Optiq, was originally authored by
> Julian Hyde and is now an Apache Incubator project.
>
> # Known Risks
>
> ## Orphaned Products
>
> The core developers of Kylin team plan to work full time on this
> project. There is very little risk of Kylin getting orphaned since at
> least one large company (eBay) is extensively using it in their
> production Hadoop clusters. For example, currently there are 3 use
> cases with more that 12+Billion rows and 1000 activity requests per
> day using Kylin in production. Furthermore, since Kylin was open
> sourced at the beginning of October 2014, it has received more than
> 280 stars and been forked nearly 100 times. Kylin has one major
> release so far and and received 5 pull requests from contributors in
> the first month pull requests from external sources in the last month,
> which further demonstrates Kylin as a very active project. We plan to
> extend and diversify this community further through Apache.
>
> ## Inexperience with Open Source
>
> The core developers are all active users and followers of open source.
> They are already committers and contributors to the Kylin Github
> project. All have been involved with the source code that has been
> released under an open source license, and several of them also have
> experience developing code in an open source environment. Though the
> core set of Developers do not have Apache Open Source experience,
> there are plans to onboard individuals with Apache open source
> experience on to the project.
>
> ## Homogenous Developers
>
> The core developers include developers from eBay, Ctrip and
> Hortonworks. Apache Incubation process encourages an open and diverse
> meritocratic community. Apache Kylin has the required amount of
> diversity with committers from three different organizations, but is
> also aware that bulk of the commits come from a single entity. Kylin
> intends to make every possible effort to build a diverse, vibrant and
> involved community and has already received substantial interest from
> various organizations
>
> ## Reliance on Salaried Developers
>
> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> and some of its key engineers are working full time on the project. In
> addition, since there is a growing Big Data need for scalable OLAP
> solutions on Hadoop, we look forward to other Apache developers and
> researchers to contribute to the project. Additional contributors,
> including Apache committers have plans to join this effort shortly.
> Also key to addressing the risk associated with relying on Salaried
> developers from a single entity is to increase the diversity of the
> contributors and actively lobby for Domain experts in the BI space to
> contribute. Apache Kylin intends to do this. One approach already
> taken is to approach the Apache Drill project to explore possible
> cooperation.
>
> ## Relationships with Other Apache Products
>
> Kylin has a strong relationship and dependency with Apache Hadoop
> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> could help with a closer collaboration among these four projects and
> as well as others.
>
> Kylin is likely to have substantial value to Apache Drill due to the
> common use of Calcite as a query optimization engine and similar
> approaches between Kylin's approach to cubing and Drill's approach to
> input sources.
>
> ## An Excessive Fascination with the Apache Brand
>
> Kylin is proposing to enter incubation at Apache in order to help
> efforts to diversify the committer-base, not so much to capitalize on
> the Apache brand. The Kylin project is in production use already
> inside EBay, but is not expected to be an EBay product for external
> customers. As such, the Kylin project is not seeking to use the Apache
> brand as a marketing tool.
>
> # Documentation
>
> Information about Kylin can be found at
> https://github.com/KylinOLAP/Kylin. The following links provide more
> information about Kylin in open source:
>
> - Kylin web site: http://kylin.io
> - Codebase at Github: https://github.com/KylinOLAP/Kylin
> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> - User community: https://groups.google.com/forum/#!forum/kylin-olap
>
> ## Initial Source
>
> Kylin has been under development since 2013 by a team of engineers at
> eBay Inc. It is currently hosted on Github.com under an Apache license
> at https://github.com/KylinOLAP/Kylin
>
> ## External Dependencies
>
> Kylin has the following external dependencies.
>
> * Basic
>
> - JDK 1.6+
> - Apache Maven
> - JUnit
> - DBUnit
> - Log4j
> - Slf4j
> - Apache Commons
> - Google Guava
> - Jackson
>
> * Hadoop
>
> - Apache Hadoop
> - Apache HBase
> - Apache Hive
> - Apache Zookeeper
> - Apache Curator
>
> * Utility
>
> - H2
> - JSCH
>
> * REST Service
>
> - Spring
>
> * Query
>
> - Antlr
> - Apache Calcite (formerly Optiq)
> - Linq4j
>
> * Job
>
> - Quartz
>
> * Web build tool
>
> - NPM
> - Grunt
> - bower
>
> * Web
>
> - Angular JS
> - jQuery
> - Bootstrap
> - D3 JS
> - ACE
>
> ##Cryptography
>
> Kylin will eventually support encryption on the wire. This is not one
> of the initial goals, and we do not expect Kylin to be a controlled
> export item due to the use of encryption. Kylin supports but does not
> require the Kerberos authentication mechanism to access secured Hadoop
> services.
>
> # Required Resources
>
> ## Mailing List
>
> - kylin-private for private PMC discussions (with moderated subscriptions)
> - kylin-dev
> - kylin-commits
>
> ##Subversion Directory
>
> Git is the preferred source control system: git://git.apache.org/Kylin
>
> ## Issue Tracking
>
> JIRA Kylin (KYLIN)
>
> ## Other Resources
>
> The existing code already has unit tests so we will make use of
> existing Apache continuous testing infrastructure. The resulting load
> should not be very large.
>
> # Initial Committers
>
> - Jiang Xu < jiangxu.china at gmail dot com>
> - Luke Han <lukhan at ebay dot com>
> - Yang Li <yangli9 at ebay dot com>
> - George Song <ysong1 at ebay dot com>
> - Hongbin Ma <honma at ebay dot com>
> - Xiaodong Duo < oranjedog at gmail dot com>
> - Julian Hyde < jhyde at apache dot org >
> - Ankur Bansal < abansal at ebay dot com>
>
> ## Affiliations
>
> The initial committers are employees of eBay Inc., Ctrip and
> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> Technologies and Pivotal.
>
> # Sponsors
>
> ## Champion
>
> - Owen O’Malley < omalley at apache dot org >
> - Ted Dunning <tdunning at apache dot org>
>
> ## Nominated Mentors
>
> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> Co-founder and Senior Architect, Hortonworks
> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> Chief Architect, MapR Technologies
> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
> - Jacques Nadeau <jacques at apache dot org> (pending admission to
> IPMC) - Apache Drill PMC Chair, MapR Technologies
>
> #Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.
>



--

Best Regards!
---------------------

Luke Han

Re: [PROPOSAL] Kylin for Incubation

Posted by Luke Han <lu...@gmail.com>.
We have noticed this from the beginning, below is the comments from our
Legal team:
"We’ve done a preliminary trademark search for Kylin in the US, and there
weren’t any directly conflicting brands. "

I think it should be ok to use:)

Thanks.

Luke

2014-11-14 23:47 GMT+08:00 Ross Gardler (MS OPEN TECH) <
Ross.Gardler@microsoft.com>:

> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
>
> Sent from my Windows Phone
> ________________________________
> From: Luke Han<ma...@gmail.com>
> Sent: ‎11/‎14/‎2014 7:38 AM
> To: general@incubator.apache.org<ma...@incubator.apache.org>
> Subject: [PROPOSAL] Kylin for Incubation
>
> Hi all,
> We would like to propose Kylin as an Apache Incubator project. The
> complete proposal can be found:
> https://wiki.apache.org/incubator/KylinProposal and posted the text of
> the proposal below.
>
> Thanks.
> Luke
>
>
> Kylin Proposal
> ==============
>
> # Abstract
>
> Kylin is a distributed and scalable OLAP engine built on Hadoop to
> support extremely large datasets.
>
> # Proposal
>
> Kylin is an open source Distributed Analytics Engine that provides
> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> accelerate analytics on Hadoop by allowing the use of SQL-compatible
> tools. Kylin provides a SQL interface and multi-dimensional analysis
> (MOLAP) on Hadoop to support extremely large datasets and tightly
> integrate with Hadoop ecosystem.
>
> ## Overview of Kylin
>
> Kylin platform has two parts of data processing and interactive:
> First, Kylin will read data from source, Hive, and run a set of tasks
> including Map Reduce job, shell script to pre-calcuate results for a
> specified data model, then save the resulting OLAP cube into storage
> such as HBase. Once these OLAP cubes are ready, a user can submit a
> request from any SQL-based tool or third party applications to Kylin’s
> REST server. The Server calls the Query Engine to determine if the
> target dataset already exists. If so, the engine directly accesses the
> target data in the form of a predefined cube, and returns the result
> with sub-second latency. Otherwise, the engine is designed to route
> non-matching queries to whichever SQL on Hadoop tool is already
> available on a Hadoop cluster, such as Hive.
>
> Kylin platform includes:
>
> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> Metadata Manager is the key component that manages all metadata stored
> in Kylin including all cube metadata. All other components rely on the
> Metadata Manager.
>
> - Job Engine: This engine is designed to handle all of the offline
> jobs including shell script, Java API, and Map Reduce jobs. The Job
> Engine manages and coordinates all of the jobs in Kylin to make sure
> each job executes and handles failures.
>
> - Storage Engine: This engine manages the underlying storage –
> specifically, the cuboids, which are stored as key-value pairs. The
> Storage Engine uses HBase – the best solution from the Hadoop
> ecosystem for leveraging an existing K-V system. Kylin can also be
> extended to support other K-V systems, such as Redis.
>
> - Query Engine: Once the cube is ready, the Query Engine can receive
> and parse user queries. It then interacts with other components to
> return the results to the user.
>
> - REST Server: The REST Server is an entry point for applications to
> develop against Kylin. Applications can submit queries, get results,
> trigger cube build jobs, get metadata, get user privileges, and so on.
>
> - ODBC Driver: To support third-party tools and applications – such as
> Tableau – we have built and open-sourced an ODBC Driver. The goal is
> to make it easy for users to onboard.
>
> # Background
>
> The challenge we face at eBay is that our data volume is becoming
> bigger and bigger while our user base is becoming more diverse. For
> e.g. our business users and analysts consistently ask for minimal
> latency when visualizing data on Tableau and Excel. So, we worked
> closely with our internal analyst community and outlined the product
> requirements for Kylin:
>
> - Sub-second query latency on billions of rows
> - ANSI SQL availability for those using SQL-compatible tools
> - Full OLAP capability to offer advanced functionality
> - Support for high cardinality and very large dimensions
> - High concurrency for thousands of users
> - Distributed and scale-out architecture for analysis in the TB to PB size
> range
>
> Existing SQL-on-Hadoop solutions commonly need to perform partial or
> full table or file scans to compute the results of queries. The cost
> of these large data scans can make many queries very slow (more than a
> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> pre-compute data along dimensions of interest and store resulting
> aggregates as a "cube". MOLAP is much faster but is inflexible. We
> realized that no existing product met our exact requirements
> externally – especially in the open source Hadoop community. To meet
> our emerging business needs, we built a platform from scratch to
> support MOLAP for these business requirements and then to support more
> others include ROLAP. With an excellent development team and several
> pilot customers, we have been able to bring the Kylin platform into
> production as well as open source it.
>
> # Rationale
>
> When data grows to petabyte scale, the process of pre-calculation of a
> query takes a long time and costly and powerful hardware. However,
> with the benefit of Hadoop’s distributed computing architecture, jobs
> can leverage hundreds or thousands of Hadoop data nodes. There still
> exists a big gap between the growing volume of data and interactive
> analytics:
>
> - Existing Business Intelligence (OLAP) platforms cannot scale out to
> support fast growing data.
> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> huge tables joins will always take long time to scan and calculate.
> - No mature OLAP solution exists on Hadoop
>
> As mentioned in the background, the business requirements triggered by
> increase in data volume drove eBay to invest in building a solution
> from scratch to offer Analytics capability on Hadoop cluster. With
> Hadoop’s power of distributed computing Kylin can perform
> pre-calculations in parallel and merge the final results, thereby
> significantly reducing the processing time.
>
> To serve queries by the analyst community, Kylin generates cuboids
> with all possible combinations of dimensions, and calculate all
> metrics at different levels. The cuboids are then integrated to form a
> pre-calculated OLAP cube. All cuboids are key-value structured: keys
> are composites formed from combinations of multiple dimensions and
> values are aggregations results for that particular combination of
> dimensions. Kylin uses HBase to store cubes. HBase is useful because
> it supports efficient searches across ranges of data.
>
> # Current Status
>
> ## Meritocracy
>
> Kylin has been deployed in production at eBay and is processing
> extremely large datasets. The platform has demonstrated great
> performance benefits and has proved to be a better way for analysts to
> leverage data on Hadoop with a more convenient approach using their
> favorite tool.
>
> ## Community
>
> Kylin seeks to develop developer and user communities during incubation.
>
> ## Core Developers
>
> Kylin is currently being designed and developed by six engineers from
> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> Xiaodong Duo. In addition, some outside contributors are actively
> contributing in design and development. Among them, Julian Hyde from
> Hortonworks is a very important contributor. All of these core
> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> general.
>
> ## Alignment
>
> The ASF is a natural host for Kylin given that it is already the home
> of Hadoop, Pig, Hive, and other emerging cloud software projects.
> Kylin was designed to offer OLAP capability on Hadoop from the
> beginning in order to solve data access and analysis challenges in
> Hadoop clusters. Kylin complements the existing Hadoop analytics area
> by providing a comprehensive solution based on pre-computed views.
>
> In Kylin, we are leveraging an open-source dynamic data management
> framework called Apache Calcite to parse SQL and plug in our code.
> Apache Calcite was previously called Optiq, was originally authored by
> Julian Hyde and is now an Apache Incubator project.
>
> # Known Risks
>
> ## Orphaned Products
>
> The core developers of Kylin team plan to work full time on this
> project. There is very little risk of Kylin getting orphaned since at
> least one large company (eBay) is extensively using it in their
> production Hadoop clusters. For example, currently there are 3 use
> cases with more that 12+Billion rows and 1000 activity requests per
> day using Kylin in production. Furthermore, since Kylin was open
> sourced at the beginning of October 2014, it has received more than
> 280 stars and been forked nearly 100 times. Kylin has one major
> release so far and and received 5 pull requests from contributors in
> the first month pull requests from external sources in the last month,
> which further demonstrates Kylin as a very active project. We plan to
> extend and diversify this community further through Apache.
>
> ## Inexperience with Open Source
>
> The core developers are all active users and followers of open source.
> They are already committers and contributors to the Kylin Github
> project. All have been involved with the source code that has been
> released under an open source license, and several of them also have
> experience developing code in an open source environment. Though the
> core set of Developers do not have Apache Open Source experience,
> there are plans to onboard individuals with Apache open source
> experience on to the project.
>
> ## Homogenous Developers
>
> The core developers include developers from eBay, Ctrip and
> Hortonworks. Apache Incubation process encourages an open and diverse
> meritocratic community. Apache Kylin has the required amount of
> diversity with committers from three different organizations, but is
> also aware that bulk of the commits come from a single entity. Kylin
> intends to make every possible effort to build a diverse, vibrant and
> involved community and has already received substantial interest from
> various organizations
>
> ## Reliance on Salaried Developers
>
> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> and some of its key engineers are working full time on the project. In
> addition, since there is a growing Big Data need for scalable OLAP
> solutions on Hadoop, we look forward to other Apache developers and
> researchers to contribute to the project. Additional contributors,
> including Apache committers have plans to join this effort shortly.
> Also key to addressing the risk associated with relying on Salaried
> developers from a single entity is to increase the diversity of the
> contributors and actively lobby for Domain experts in the BI space to
> contribute. Apache Kylin intends to do this. One approach already
> taken is to approach the Apache Drill project to explore possible
> cooperation.
>
> ## Relationships with Other Apache Products
>
> Kylin has a strong relationship and dependency with Apache Hadoop
> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> could help with a closer collaboration among these four projects and
> as well as others.
>
> Kylin is likely to have substantial value to Apache Drill due to the
> common use of Calcite as a query optimization engine and similar
> approaches between Kylin's approach to cubing and Drill's approach to
> input sources.
>
> ## An Excessive Fascination with the Apache Brand
>
> Kylin is proposing to enter incubation at Apache in order to help
> efforts to diversify the committer-base, not so much to capitalize on
> the Apache brand. The Kylin project is in production use already
> inside EBay, but is not expected to be an EBay product for external
> customers. As such, the Kylin project is not seeking to use the Apache
> brand as a marketing tool.
>
> # Documentation
>
> Information about Kylin can be found at
> https://github.com/KylinOLAP/Kylin. The following links provide more
> information about Kylin in open source:
>
> - Kylin web site: http://kylin.io
> - Codebase at Github: https://github.com/KylinOLAP/Kylin
> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> - User community: https://groups.google.com/forum/#!forum/kylin-olap
>
> ## Initial Source
>
> Kylin has been under development since 2013 by a team of engineers at
> eBay Inc. It is currently hosted on Github.com under an Apache license
> at https://github.com/KylinOLAP/Kylin
>
> ## External Dependencies
>
> Kylin has the following external dependencies.
>
> * Basic
>
> - JDK 1.6+
> - Apache Maven
> - JUnit
> - DBUnit
> - Log4j
> - Slf4j
> - Apache Commons
> - Google Guava
> - Jackson
>
> * Hadoop
>
> - Apache Hadoop
> - Apache HBase
> - Apache Hive
> - Apache Zookeeper
> - Apache Curator
>
> * Utility
>
> - H2
> - JSCH
>
> * REST Service
>
> - Spring
>
> * Query
>
> - Antlr
> - Apache Calcite (formerly Optiq)
> - Linq4j
>
> * Job
>
> - Quartz
>
> * Web build tool
>
> - NPM
> - Grunt
> - bower
>
> * Web
>
> - Angular JS
> - jQuery
> - Bootstrap
> - D3 JS
> - ACE
>
> ##Cryptography
>
> Kylin will eventually support encryption on the wire. This is not one
> of the initial goals, and we do not expect Kylin to be a controlled
> export item due to the use of encryption. Kylin supports but does not
> require the Kerberos authentication mechanism to access secured Hadoop
> services.
>
> # Required Resources
>
> ## Mailing List
>
> - kylin-private for private PMC discussions (with moderated subscriptions)
> - kylin-dev
> - kylin-commits
>
> ##Subversion Directory
>
> Git is the preferred source control system: git://git.apache.org/Kylin
>
> ## Issue Tracking
>
> JIRA Kylin (KYLIN)
>
> ## Other Resources
>
> The existing code already has unit tests so we will make use of
> existing Apache continuous testing infrastructure. The resulting load
> should not be very large.
>
> # Initial Committers
>
> - Jiang Xu < jiangxu.china at gmail dot com>
> - Luke Han <lukhan at ebay dot com>
> - Yang Li <yangli9 at ebay dot com>
> - George Song <ysong1 at ebay dot com>
> - Hongbin Ma <honma at ebay dot com>
> - Xiaodong Duo < oranjedog at gmail dot com>
> - Julian Hyde < jhyde at apache dot org >
> - Ankur Bansal < abansal at ebay dot com>
>
> ## Affiliations
>
> The initial committers are employees of eBay Inc., Ctrip and
> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> Technologies and Pivotal.
>
> # Sponsors
>
> ## Champion
>
> - Owen O’Malley < omalley at apache dot org >
> - Ted Dunning <tdunning at apache dot org>
>
> ## Nominated Mentors
>
> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> Co-founder and Senior Architect, Hortonworks
> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> Chief Architect, MapR Technologies
> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
> - Jacques Nadeau <jacques at apache dot org> (pending admission to
> IPMC) - Apache Drill PMC Chair, MapR Technologies
>
> #Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.
>



-- 

Best Regards!
---------------------

Luke Han

Re: [PROPOSAL] Kylin for Incubation

Posted by Ted Dunning <te...@gmail.com>.
Sounds good.

I have started the discussion to get Jacques on IPMC.



On Thu, Nov 20, 2014 at 9:27 AM, Luke Han <lu...@gmail.com> wrote:

> Hi all,
>       Thank you for reviewing the proposal, with the discussion winding
> down we would like to send VOTE email next.
>
> Thanks
> Luke
>
>
> 2014-11-15 11:40 GMT+08:00 Ted Dunning <te...@gmail.com>:
>
> >
> > Also, a Chinese localized operating system is pretty clearly different
> > from an olap engine.
> >
> > For comparison see the recent non-issue regarding Amazon aurora versus
> > apache aurora.
> >
> > Sent from my iPhone
> >
> > > On Nov 14, 2014, at 9:55, Henry Saputra <he...@gmail.com>
> wrote:
> > >
> > > Thanks for the reminder Ross.
> > > Hopefully we could go in the similar route as Apache Spark, Apache
> > > Storm, and Apache MetaModel where the trademark should be used as
> > > 'Apache Kylin'.
> > >
> > >
> > > - Henry
> > >
> > > On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH)
> > > <Ro...@microsoft.com> wrote:
> > >> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
> > >>
> > >> Sent from my Windows Phone
> > >> ________________________________
> > >> From: Luke Han<ma...@gmail.com>
> > >> Sent: ‎11/‎14/‎2014 7:38 AM
> > >> To: general@incubator.apache.org<ma...@incubator.apache.org>
> > >> Subject: [PROPOSAL] Kylin for Incubation
> > >>
> > >> Hi all,
> > >> We would like to propose Kylin as an Apache Incubator project. The
> > >> complete proposal can be found:
> > >> https://wiki.apache.org/incubator/KylinProposal and posted the text
> of
> > >> the proposal below.
> > >>
> > >> Thanks.
> > >> Luke
> > >>
> > >>
> > >> Kylin Proposal
> > >> ==============
> > >>
> > >> # Abstract
> > >>
> > >> Kylin is a distributed and scalable OLAP engine built on Hadoop to
> > >> support extremely large datasets.
> > >>
> > >> # Proposal
> > >>
> > >> Kylin is an open source Distributed Analytics Engine that provides
> > >> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> > >> accelerate analytics on Hadoop by allowing the use of SQL-compatible
> > >> tools. Kylin provides a SQL interface and multi-dimensional analysis
> > >> (MOLAP) on Hadoop to support extremely large datasets and tightly
> > >> integrate with Hadoop ecosystem.
> > >>
> > >> ## Overview of Kylin
> > >>
> > >> Kylin platform has two parts of data processing and interactive:
> > >> First, Kylin will read data from source, Hive, and run a set of tasks
> > >> including Map Reduce job, shell script to pre-calcuate results for a
> > >> specified data model, then save the resulting OLAP cube into storage
> > >> such as HBase. Once these OLAP cubes are ready, a user can submit a
> > >> request from any SQL-based tool or third party applications to Kylin’s
> > >> REST server. The Server calls the Query Engine to determine if the
> > >> target dataset already exists. If so, the engine directly accesses the
> > >> target data in the form of a predefined cube, and returns the result
> > >> with sub-second latency. Otherwise, the engine is designed to route
> > >> non-matching queries to whichever SQL on Hadoop tool is already
> > >> available on a Hadoop cluster, such as Hive.
> > >>
> > >> Kylin platform includes:
> > >>
> > >> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> > >> Metadata Manager is the key component that manages all metadata stored
> > >> in Kylin including all cube metadata. All other components rely on the
> > >> Metadata Manager.
> > >>
> > >> - Job Engine: This engine is designed to handle all of the offline
> > >> jobs including shell script, Java API, and Map Reduce jobs. The Job
> > >> Engine manages and coordinates all of the jobs in Kylin to make sure
> > >> each job executes and handles failures.
> > >>
> > >> - Storage Engine: This engine manages the underlying storage –
> > >> specifically, the cuboids, which are stored as key-value pairs. The
> > >> Storage Engine uses HBase – the best solution from the Hadoop
> > >> ecosystem for leveraging an existing K-V system. Kylin can also be
> > >> extended to support other K-V systems, such as Redis.
> > >>
> > >> - Query Engine: Once the cube is ready, the Query Engine can receive
> > >> and parse user queries. It then interacts with other components to
> > >> return the results to the user.
> > >>
> > >> - REST Server: The REST Server is an entry point for applications to
> > >> develop against Kylin. Applications can submit queries, get results,
> > >> trigger cube build jobs, get metadata, get user privileges, and so on.
> > >>
> > >> - ODBC Driver: To support third-party tools and applications – such as
> > >> Tableau – we have built and open-sourced an ODBC Driver. The goal is
> > >> to make it easy for users to onboard.
> > >>
> > >> # Background
> > >>
> > >> The challenge we face at eBay is that our data volume is becoming
> > >> bigger and bigger while our user base is becoming more diverse. For
> > >> e.g. our business users and analysts consistently ask for minimal
> > >> latency when visualizing data on Tableau and Excel. So, we worked
> > >> closely with our internal analyst community and outlined the product
> > >> requirements for Kylin:
> > >>
> > >> - Sub-second query latency on billions of rows
> > >> - ANSI SQL availability for those using SQL-compatible tools
> > >> - Full OLAP capability to offer advanced functionality
> > >> - Support for high cardinality and very large dimensions
> > >> - High concurrency for thousands of users
> > >> - Distributed and scale-out architecture for analysis in the TB to PB
> > size range
> > >>
> > >> Existing SQL-on-Hadoop solutions commonly need to perform partial or
> > >> full table or file scans to compute the results of queries. The cost
> > >> of these large data scans can make many queries very slow (more than a
> > >> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> > >> pre-compute data along dimensions of interest and store resulting
> > >> aggregates as a "cube". MOLAP is much faster but is inflexible. We
> > >> realized that no existing product met our exact requirements
> > >> externally – especially in the open source Hadoop community. To meet
> > >> our emerging business needs, we built a platform from scratch to
> > >> support MOLAP for these business requirements and then to support more
> > >> others include ROLAP. With an excellent development team and several
> > >> pilot customers, we have been able to bring the Kylin platform into
> > >> production as well as open source it.
> > >>
> > >> # Rationale
> > >>
> > >> When data grows to petabyte scale, the process of pre-calculation of a
> > >> query takes a long time and costly and powerful hardware. However,
> > >> with the benefit of Hadoop’s distributed computing architecture, jobs
> > >> can leverage hundreds or thousands of Hadoop data nodes. There still
> > >> exists a big gap between the growing volume of data and interactive
> > >> analytics:
> > >>
> > >> - Existing Business Intelligence (OLAP) platforms cannot scale out to
> > >> support fast growing data.
> > >> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> > >> huge tables joins will always take long time to scan and calculate.
> > >> - No mature OLAP solution exists on Hadoop
> > >>
> > >> As mentioned in the background, the business requirements triggered by
> > >> increase in data volume drove eBay to invest in building a solution
> > >> from scratch to offer Analytics capability on Hadoop cluster. With
> > >> Hadoop’s power of distributed computing Kylin can perform
> > >> pre-calculations in parallel and merge the final results, thereby
> > >> significantly reducing the processing time.
> > >>
> > >> To serve queries by the analyst community, Kylin generates cuboids
> > >> with all possible combinations of dimensions, and calculate all
> > >> metrics at different levels. The cuboids are then integrated to form a
> > >> pre-calculated OLAP cube. All cuboids are key-value structured: keys
> > >> are composites formed from combinations of multiple dimensions and
> > >> values are aggregations results for that particular combination of
> > >> dimensions. Kylin uses HBase to store cubes. HBase is useful because
> > >> it supports efficient searches across ranges of data.
> > >>
> > >> # Current Status
> > >>
> > >> ## Meritocracy
> > >>
> > >> Kylin has been deployed in production at eBay and is processing
> > >> extremely large datasets. The platform has demonstrated great
> > >> performance benefits and has proved to be a better way for analysts to
> > >> leverage data on Hadoop with a more convenient approach using their
> > >> favorite tool.
> > >>
> > >> ## Community
> > >>
> > >> Kylin seeks to develop developer and user communities during
> incubation.
> > >>
> > >> ## Core Developers
> > >>
> > >> Kylin is currently being designed and developed by six engineers from
> > >> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> > >> Xiaodong Duo. In addition, some outside contributors are actively
> > >> contributing in design and development. Among them, Julian Hyde from
> > >> Hortonworks is a very important contributor. All of these core
> > >> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> > >> general.
> > >>
> > >> ## Alignment
> > >>
> > >> The ASF is a natural host for Kylin given that it is already the home
> > >> of Hadoop, Pig, Hive, and other emerging cloud software projects.
> > >> Kylin was designed to offer OLAP capability on Hadoop from the
> > >> beginning in order to solve data access and analysis challenges in
> > >> Hadoop clusters. Kylin complements the existing Hadoop analytics area
> > >> by providing a comprehensive solution based on pre-computed views.
> > >>
> > >> In Kylin, we are leveraging an open-source dynamic data management
> > >> framework called Apache Calcite to parse SQL and plug in our code.
> > >> Apache Calcite was previously called Optiq, was originally authored by
> > >> Julian Hyde and is now an Apache Incubator project.
> > >>
> > >> # Known Risks
> > >>
> > >> ## Orphaned Products
> > >>
> > >> The core developers of Kylin team plan to work full time on this
> > >> project. There is very little risk of Kylin getting orphaned since at
> > >> least one large company (eBay) is extensively using it in their
> > >> production Hadoop clusters. For example, currently there are 3 use
> > >> cases with more that 12+Billion rows and 1000 activity requests per
> > >> day using Kylin in production. Furthermore, since Kylin was open
> > >> sourced at the beginning of October 2014, it has received more than
> > >> 280 stars and been forked nearly 100 times. Kylin has one major
> > >> release so far and and received 5 pull requests from contributors in
> > >> the first month pull requests from external sources in the last month,
> > >> which further demonstrates Kylin as a very active project. We plan to
> > >> extend and diversify this community further through Apache.
> > >>
> > >> ## Inexperience with Open Source
> > >>
> > >> The core developers are all active users and followers of open source.
> > >> They are already committers and contributors to the Kylin Github
> > >> project. All have been involved with the source code that has been
> > >> released under an open source license, and several of them also have
> > >> experience developing code in an open source environment. Though the
> > >> core set of Developers do not have Apache Open Source experience,
> > >> there are plans to onboard individuals with Apache open source
> > >> experience on to the project.
> > >>
> > >> ## Homogenous Developers
> > >>
> > >> The core developers include developers from eBay, Ctrip and
> > >> Hortonworks. Apache Incubation process encourages an open and diverse
> > >> meritocratic community. Apache Kylin has the required amount of
> > >> diversity with committers from three different organizations, but is
> > >> also aware that bulk of the commits come from a single entity. Kylin
> > >> intends to make every possible effort to build a diverse, vibrant and
> > >> involved community and has already received substantial interest from
> > >> various organizations
> > >>
> > >> ## Reliance on Salaried Developers
> > >>
> > >> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> > >> and some of its key engineers are working full time on the project. In
> > >> addition, since there is a growing Big Data need for scalable OLAP
> > >> solutions on Hadoop, we look forward to other Apache developers and
> > >> researchers to contribute to the project. Additional contributors,
> > >> including Apache committers have plans to join this effort shortly.
> > >> Also key to addressing the risk associated with relying on Salaried
> > >> developers from a single entity is to increase the diversity of the
> > >> contributors and actively lobby for Domain experts in the BI space to
> > >> contribute. Apache Kylin intends to do this. One approach already
> > >> taken is to approach the Apache Drill project to explore possible
> > >> cooperation.
> > >>
> > >> ## Relationships with Other Apache Products
> > >>
> > >> Kylin has a strong relationship and dependency with Apache Hadoop
> > >> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> > >> could help with a closer collaboration among these four projects and
> > >> as well as others.
> > >>
> > >> Kylin is likely to have substantial value to Apache Drill due to the
> > >> common use of Calcite as a query optimization engine and similar
> > >> approaches between Kylin's approach to cubing and Drill's approach to
> > >> input sources.
> > >>
> > >> ## An Excessive Fascination with the Apache Brand
> > >>
> > >> Kylin is proposing to enter incubation at Apache in order to help
> > >> efforts to diversify the committer-base, not so much to capitalize on
> > >> the Apache brand. The Kylin project is in production use already
> > >> inside EBay, but is not expected to be an EBay product for external
> > >> customers. As such, the Kylin project is not seeking to use the Apache
> > >> brand as a marketing tool.
> > >>
> > >> # Documentation
> > >>
> > >> Information about Kylin can be found at
> > >> https://github.com/KylinOLAP/Kylin. The following links provide more
> > >> information about Kylin in open source:
> > >>
> > >> - Kylin web site: http://kylin.io
> > >> - Codebase at Github: https://github.com/KylinOLAP/Kylin
> > >> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> > >> - User community: https://groups.google.com/forum/#!forum/kylin-olap
> > >>
> > >> ## Initial Source
> > >>
> > >> Kylin has been under development since 2013 by a team of engineers at
> > >> eBay Inc. It is currently hosted on Github.com under an Apache license
> > >> at https://github.com/KylinOLAP/Kylin
> > >>
> > >> ## External Dependencies
> > >>
> > >> Kylin has the following external dependencies.
> > >>
> > >> * Basic
> > >>
> > >> - JDK 1.6+
> > >> - Apache Maven
> > >> - JUnit
> > >> - DBUnit
> > >> - Log4j
> > >> - Slf4j
> > >> - Apache Commons
> > >> - Google Guava
> > >> - Jackson
> > >>
> > >> * Hadoop
> > >>
> > >> - Apache Hadoop
> > >> - Apache HBase
> > >> - Apache Hive
> > >> - Apache Zookeeper
> > >> - Apache Curator
> > >>
> > >> * Utility
> > >>
> > >> - H2
> > >> - JSCH
> > >>
> > >> * REST Service
> > >>
> > >> - Spring
> > >>
> > >> * Query
> > >>
> > >> - Antlr
> > >> - Apache Calcite (formerly Optiq)
> > >> - Linq4j
> > >>
> > >> * Job
> > >>
> > >> - Quartz
> > >>
> > >> * Web build tool
> > >>
> > >> - NPM
> > >> - Grunt
> > >> - bower
> > >>
> > >> * Web
> > >>
> > >> - Angular JS
> > >> - jQuery
> > >> - Bootstrap
> > >> - D3 JS
> > >> - ACE
> > >>
> > >> ##Cryptography
> > >>
> > >> Kylin will eventually support encryption on the wire. This is not one
> > >> of the initial goals, and we do not expect Kylin to be a controlled
> > >> export item due to the use of encryption. Kylin supports but does not
> > >> require the Kerberos authentication mechanism to access secured Hadoop
> > >> services.
> > >>
> > >> # Required Resources
> > >>
> > >> ## Mailing List
> > >>
> > >> - kylin-private for private PMC discussions (with moderated
> > subscriptions)
> > >> - kylin-dev
> > >> - kylin-commits
> > >>
> > >> ##Subversion Directory
> > >>
> > >> Git is the preferred source control system: git://
> git.apache.org/Kylin
> > >>
> > >> ## Issue Tracking
> > >>
> > >> JIRA Kylin (KYLIN)
> > >>
> > >> ## Other Resources
> > >>
> > >> The existing code already has unit tests so we will make use of
> > >> existing Apache continuous testing infrastructure. The resulting load
> > >> should not be very large.
> > >>
> > >> # Initial Committers
> > >>
> > >> - Jiang Xu < jiangxu.china at gmail dot com>
> > >> - Luke Han <lukhan at ebay dot com>
> > >> - Yang Li <yangli9 at ebay dot com>
> > >> - George Song <ysong1 at ebay dot com>
> > >> - Hongbin Ma <honma at ebay dot com>
> > >> - Xiaodong Duo < oranjedog at gmail dot com>
> > >> - Julian Hyde < jhyde at apache dot org >
> > >> - Ankur Bansal < abansal at ebay dot com>
> > >>
> > >> ## Affiliations
> > >>
> > >> The initial committers are employees of eBay Inc., Ctrip and
> > >> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> > >> Technologies and Pivotal.
> > >>
> > >> # Sponsors
> > >>
> > >> ## Champion
> > >>
> > >> - Owen O’Malley < omalley at apache dot org >
> > >> - Ted Dunning <tdunning at apache dot org>
> > >>
> > >> ## Nominated Mentors
> > >>
> > >> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> > >> Co-founder and Senior Architect, Hortonworks
> > >> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> > >> Chief Architect, MapR Technologies
> > >> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member,
> > Pivotal
> > >> - Jacques Nadeau <jacques at apache dot org> (pending admission to
> > >> IPMC) - Apache Drill PMC Chair, MapR Technologies
> > >>
> > >> #Sponsoring Entity
> > >>
> > >> We are requesting the Incubator to sponsor this project.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>
>
> --
>
> Best Regards!
> ---------------------
>
> Luke Han
>

Re: [PROPOSAL] Kylin for Incubation

Posted by Luke Han <lu...@gmail.com>.
Hi all,
      Thank you for reviewing the proposal, with the discussion winding
down we would like to send VOTE email next.

Thanks
Luke


2014-11-15 11:40 GMT+08:00 Ted Dunning <te...@gmail.com>:

>
> Also, a Chinese localized operating system is pretty clearly different
> from an olap engine.
>
> For comparison see the recent non-issue regarding Amazon aurora versus
> apache aurora.
>
> Sent from my iPhone
>
> > On Nov 14, 2014, at 9:55, Henry Saputra <he...@gmail.com> wrote:
> >
> > Thanks for the reminder Ross.
> > Hopefully we could go in the similar route as Apache Spark, Apache
> > Storm, and Apache MetaModel where the trademark should be used as
> > 'Apache Kylin'.
> >
> >
> > - Henry
> >
> > On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH)
> > <Ro...@microsoft.com> wrote:
> >> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
> >>
> >> Sent from my Windows Phone
> >> ________________________________
> >> From: Luke Han<ma...@gmail.com>
> >> Sent: ‎11/‎14/‎2014 7:38 AM
> >> To: general@incubator.apache.org<ma...@incubator.apache.org>
> >> Subject: [PROPOSAL] Kylin for Incubation
> >>
> >> Hi all,
> >> We would like to propose Kylin as an Apache Incubator project. The
> >> complete proposal can be found:
> >> https://wiki.apache.org/incubator/KylinProposal and posted the text of
> >> the proposal below.
> >>
> >> Thanks.
> >> Luke
> >>
> >>
> >> Kylin Proposal
> >> ==============
> >>
> >> # Abstract
> >>
> >> Kylin is a distributed and scalable OLAP engine built on Hadoop to
> >> support extremely large datasets.
> >>
> >> # Proposal
> >>
> >> Kylin is an open source Distributed Analytics Engine that provides
> >> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> >> accelerate analytics on Hadoop by allowing the use of SQL-compatible
> >> tools. Kylin provides a SQL interface and multi-dimensional analysis
> >> (MOLAP) on Hadoop to support extremely large datasets and tightly
> >> integrate with Hadoop ecosystem.
> >>
> >> ## Overview of Kylin
> >>
> >> Kylin platform has two parts of data processing and interactive:
> >> First, Kylin will read data from source, Hive, and run a set of tasks
> >> including Map Reduce job, shell script to pre-calcuate results for a
> >> specified data model, then save the resulting OLAP cube into storage
> >> such as HBase. Once these OLAP cubes are ready, a user can submit a
> >> request from any SQL-based tool or third party applications to Kylin’s
> >> REST server. The Server calls the Query Engine to determine if the
> >> target dataset already exists. If so, the engine directly accesses the
> >> target data in the form of a predefined cube, and returns the result
> >> with sub-second latency. Otherwise, the engine is designed to route
> >> non-matching queries to whichever SQL on Hadoop tool is already
> >> available on a Hadoop cluster, such as Hive.
> >>
> >> Kylin platform includes:
> >>
> >> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> >> Metadata Manager is the key component that manages all metadata stored
> >> in Kylin including all cube metadata. All other components rely on the
> >> Metadata Manager.
> >>
> >> - Job Engine: This engine is designed to handle all of the offline
> >> jobs including shell script, Java API, and Map Reduce jobs. The Job
> >> Engine manages and coordinates all of the jobs in Kylin to make sure
> >> each job executes and handles failures.
> >>
> >> - Storage Engine: This engine manages the underlying storage –
> >> specifically, the cuboids, which are stored as key-value pairs. The
> >> Storage Engine uses HBase – the best solution from the Hadoop
> >> ecosystem for leveraging an existing K-V system. Kylin can also be
> >> extended to support other K-V systems, such as Redis.
> >>
> >> - Query Engine: Once the cube is ready, the Query Engine can receive
> >> and parse user queries. It then interacts with other components to
> >> return the results to the user.
> >>
> >> - REST Server: The REST Server is an entry point for applications to
> >> develop against Kylin. Applications can submit queries, get results,
> >> trigger cube build jobs, get metadata, get user privileges, and so on.
> >>
> >> - ODBC Driver: To support third-party tools and applications – such as
> >> Tableau – we have built and open-sourced an ODBC Driver. The goal is
> >> to make it easy for users to onboard.
> >>
> >> # Background
> >>
> >> The challenge we face at eBay is that our data volume is becoming
> >> bigger and bigger while our user base is becoming more diverse. For
> >> e.g. our business users and analysts consistently ask for minimal
> >> latency when visualizing data on Tableau and Excel. So, we worked
> >> closely with our internal analyst community and outlined the product
> >> requirements for Kylin:
> >>
> >> - Sub-second query latency on billions of rows
> >> - ANSI SQL availability for those using SQL-compatible tools
> >> - Full OLAP capability to offer advanced functionality
> >> - Support for high cardinality and very large dimensions
> >> - High concurrency for thousands of users
> >> - Distributed and scale-out architecture for analysis in the TB to PB
> size range
> >>
> >> Existing SQL-on-Hadoop solutions commonly need to perform partial or
> >> full table or file scans to compute the results of queries. The cost
> >> of these large data scans can make many queries very slow (more than a
> >> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> >> pre-compute data along dimensions of interest and store resulting
> >> aggregates as a "cube". MOLAP is much faster but is inflexible. We
> >> realized that no existing product met our exact requirements
> >> externally – especially in the open source Hadoop community. To meet
> >> our emerging business needs, we built a platform from scratch to
> >> support MOLAP for these business requirements and then to support more
> >> others include ROLAP. With an excellent development team and several
> >> pilot customers, we have been able to bring the Kylin platform into
> >> production as well as open source it.
> >>
> >> # Rationale
> >>
> >> When data grows to petabyte scale, the process of pre-calculation of a
> >> query takes a long time and costly and powerful hardware. However,
> >> with the benefit of Hadoop’s distributed computing architecture, jobs
> >> can leverage hundreds or thousands of Hadoop data nodes. There still
> >> exists a big gap between the growing volume of data and interactive
> >> analytics:
> >>
> >> - Existing Business Intelligence (OLAP) platforms cannot scale out to
> >> support fast growing data.
> >> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> >> huge tables joins will always take long time to scan and calculate.
> >> - No mature OLAP solution exists on Hadoop
> >>
> >> As mentioned in the background, the business requirements triggered by
> >> increase in data volume drove eBay to invest in building a solution
> >> from scratch to offer Analytics capability on Hadoop cluster. With
> >> Hadoop’s power of distributed computing Kylin can perform
> >> pre-calculations in parallel and merge the final results, thereby
> >> significantly reducing the processing time.
> >>
> >> To serve queries by the analyst community, Kylin generates cuboids
> >> with all possible combinations of dimensions, and calculate all
> >> metrics at different levels. The cuboids are then integrated to form a
> >> pre-calculated OLAP cube. All cuboids are key-value structured: keys
> >> are composites formed from combinations of multiple dimensions and
> >> values are aggregations results for that particular combination of
> >> dimensions. Kylin uses HBase to store cubes. HBase is useful because
> >> it supports efficient searches across ranges of data.
> >>
> >> # Current Status
> >>
> >> ## Meritocracy
> >>
> >> Kylin has been deployed in production at eBay and is processing
> >> extremely large datasets. The platform has demonstrated great
> >> performance benefits and has proved to be a better way for analysts to
> >> leverage data on Hadoop with a more convenient approach using their
> >> favorite tool.
> >>
> >> ## Community
> >>
> >> Kylin seeks to develop developer and user communities during incubation.
> >>
> >> ## Core Developers
> >>
> >> Kylin is currently being designed and developed by six engineers from
> >> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> >> Xiaodong Duo. In addition, some outside contributors are actively
> >> contributing in design and development. Among them, Julian Hyde from
> >> Hortonworks is a very important contributor. All of these core
> >> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> >> general.
> >>
> >> ## Alignment
> >>
> >> The ASF is a natural host for Kylin given that it is already the home
> >> of Hadoop, Pig, Hive, and other emerging cloud software projects.
> >> Kylin was designed to offer OLAP capability on Hadoop from the
> >> beginning in order to solve data access and analysis challenges in
> >> Hadoop clusters. Kylin complements the existing Hadoop analytics area
> >> by providing a comprehensive solution based on pre-computed views.
> >>
> >> In Kylin, we are leveraging an open-source dynamic data management
> >> framework called Apache Calcite to parse SQL and plug in our code.
> >> Apache Calcite was previously called Optiq, was originally authored by
> >> Julian Hyde and is now an Apache Incubator project.
> >>
> >> # Known Risks
> >>
> >> ## Orphaned Products
> >>
> >> The core developers of Kylin team plan to work full time on this
> >> project. There is very little risk of Kylin getting orphaned since at
> >> least one large company (eBay) is extensively using it in their
> >> production Hadoop clusters. For example, currently there are 3 use
> >> cases with more that 12+Billion rows and 1000 activity requests per
> >> day using Kylin in production. Furthermore, since Kylin was open
> >> sourced at the beginning of October 2014, it has received more than
> >> 280 stars and been forked nearly 100 times. Kylin has one major
> >> release so far and and received 5 pull requests from contributors in
> >> the first month pull requests from external sources in the last month,
> >> which further demonstrates Kylin as a very active project. We plan to
> >> extend and diversify this community further through Apache.
> >>
> >> ## Inexperience with Open Source
> >>
> >> The core developers are all active users and followers of open source.
> >> They are already committers and contributors to the Kylin Github
> >> project. All have been involved with the source code that has been
> >> released under an open source license, and several of them also have
> >> experience developing code in an open source environment. Though the
> >> core set of Developers do not have Apache Open Source experience,
> >> there are plans to onboard individuals with Apache open source
> >> experience on to the project.
> >>
> >> ## Homogenous Developers
> >>
> >> The core developers include developers from eBay, Ctrip and
> >> Hortonworks. Apache Incubation process encourages an open and diverse
> >> meritocratic community. Apache Kylin has the required amount of
> >> diversity with committers from three different organizations, but is
> >> also aware that bulk of the commits come from a single entity. Kylin
> >> intends to make every possible effort to build a diverse, vibrant and
> >> involved community and has already received substantial interest from
> >> various organizations
> >>
> >> ## Reliance on Salaried Developers
> >>
> >> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> >> and some of its key engineers are working full time on the project. In
> >> addition, since there is a growing Big Data need for scalable OLAP
> >> solutions on Hadoop, we look forward to other Apache developers and
> >> researchers to contribute to the project. Additional contributors,
> >> including Apache committers have plans to join this effort shortly.
> >> Also key to addressing the risk associated with relying on Salaried
> >> developers from a single entity is to increase the diversity of the
> >> contributors and actively lobby for Domain experts in the BI space to
> >> contribute. Apache Kylin intends to do this. One approach already
> >> taken is to approach the Apache Drill project to explore possible
> >> cooperation.
> >>
> >> ## Relationships with Other Apache Products
> >>
> >> Kylin has a strong relationship and dependency with Apache Hadoop
> >> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> >> could help with a closer collaboration among these four projects and
> >> as well as others.
> >>
> >> Kylin is likely to have substantial value to Apache Drill due to the
> >> common use of Calcite as a query optimization engine and similar
> >> approaches between Kylin's approach to cubing and Drill's approach to
> >> input sources.
> >>
> >> ## An Excessive Fascination with the Apache Brand
> >>
> >> Kylin is proposing to enter incubation at Apache in order to help
> >> efforts to diversify the committer-base, not so much to capitalize on
> >> the Apache brand. The Kylin project is in production use already
> >> inside EBay, but is not expected to be an EBay product for external
> >> customers. As such, the Kylin project is not seeking to use the Apache
> >> brand as a marketing tool.
> >>
> >> # Documentation
> >>
> >> Information about Kylin can be found at
> >> https://github.com/KylinOLAP/Kylin. The following links provide more
> >> information about Kylin in open source:
> >>
> >> - Kylin web site: http://kylin.io
> >> - Codebase at Github: https://github.com/KylinOLAP/Kylin
> >> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> >> - User community: https://groups.google.com/forum/#!forum/kylin-olap
> >>
> >> ## Initial Source
> >>
> >> Kylin has been under development since 2013 by a team of engineers at
> >> eBay Inc. It is currently hosted on Github.com under an Apache license
> >> at https://github.com/KylinOLAP/Kylin
> >>
> >> ## External Dependencies
> >>
> >> Kylin has the following external dependencies.
> >>
> >> * Basic
> >>
> >> - JDK 1.6+
> >> - Apache Maven
> >> - JUnit
> >> - DBUnit
> >> - Log4j
> >> - Slf4j
> >> - Apache Commons
> >> - Google Guava
> >> - Jackson
> >>
> >> * Hadoop
> >>
> >> - Apache Hadoop
> >> - Apache HBase
> >> - Apache Hive
> >> - Apache Zookeeper
> >> - Apache Curator
> >>
> >> * Utility
> >>
> >> - H2
> >> - JSCH
> >>
> >> * REST Service
> >>
> >> - Spring
> >>
> >> * Query
> >>
> >> - Antlr
> >> - Apache Calcite (formerly Optiq)
> >> - Linq4j
> >>
> >> * Job
> >>
> >> - Quartz
> >>
> >> * Web build tool
> >>
> >> - NPM
> >> - Grunt
> >> - bower
> >>
> >> * Web
> >>
> >> - Angular JS
> >> - jQuery
> >> - Bootstrap
> >> - D3 JS
> >> - ACE
> >>
> >> ##Cryptography
> >>
> >> Kylin will eventually support encryption on the wire. This is not one
> >> of the initial goals, and we do not expect Kylin to be a controlled
> >> export item due to the use of encryption. Kylin supports but does not
> >> require the Kerberos authentication mechanism to access secured Hadoop
> >> services.
> >>
> >> # Required Resources
> >>
> >> ## Mailing List
> >>
> >> - kylin-private for private PMC discussions (with moderated
> subscriptions)
> >> - kylin-dev
> >> - kylin-commits
> >>
> >> ##Subversion Directory
> >>
> >> Git is the preferred source control system: git://git.apache.org/Kylin
> >>
> >> ## Issue Tracking
> >>
> >> JIRA Kylin (KYLIN)
> >>
> >> ## Other Resources
> >>
> >> The existing code already has unit tests so we will make use of
> >> existing Apache continuous testing infrastructure. The resulting load
> >> should not be very large.
> >>
> >> # Initial Committers
> >>
> >> - Jiang Xu < jiangxu.china at gmail dot com>
> >> - Luke Han <lukhan at ebay dot com>
> >> - Yang Li <yangli9 at ebay dot com>
> >> - George Song <ysong1 at ebay dot com>
> >> - Hongbin Ma <honma at ebay dot com>
> >> - Xiaodong Duo < oranjedog at gmail dot com>
> >> - Julian Hyde < jhyde at apache dot org >
> >> - Ankur Bansal < abansal at ebay dot com>
> >>
> >> ## Affiliations
> >>
> >> The initial committers are employees of eBay Inc., Ctrip and
> >> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> >> Technologies and Pivotal.
> >>
> >> # Sponsors
> >>
> >> ## Champion
> >>
> >> - Owen O’Malley < omalley at apache dot org >
> >> - Ted Dunning <tdunning at apache dot org>
> >>
> >> ## Nominated Mentors
> >>
> >> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> >> Co-founder and Senior Architect, Hortonworks
> >> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> >> Chief Architect, MapR Technologies
> >> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member,
> Pivotal
> >> - Jacques Nadeau <jacques at apache dot org> (pending admission to
> >> IPMC) - Apache Drill PMC Chair, MapR Technologies
> >>
> >> #Sponsoring Entity
> >>
> >> We are requesting the Incubator to sponsor this project.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 

Best Regards!
---------------------

Luke Han

Re: [PROPOSAL] Kylin for Incubation

Posted by Ted Dunning <te...@gmail.com>.
Also, a Chinese localized operating system is pretty clearly different from an olap engine. 

For comparison see the recent non-issue regarding Amazon aurora versus apache aurora. 

Sent from my iPhone

> On Nov 14, 2014, at 9:55, Henry Saputra <he...@gmail.com> wrote:
> 
> Thanks for the reminder Ross.
> Hopefully we could go in the similar route as Apache Spark, Apache
> Storm, and Apache MetaModel where the trademark should be used as
> 'Apache Kylin'.
> 
> 
> - Henry
> 
> On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH)
> <Ro...@microsoft.com> wrote:
>> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
>> 
>> Sent from my Windows Phone
>> ________________________________
>> From: Luke Han<ma...@gmail.com>
>> Sent: ‎11/‎14/‎2014 7:38 AM
>> To: general@incubator.apache.org<ma...@incubator.apache.org>
>> Subject: [PROPOSAL] Kylin for Incubation
>> 
>> Hi all,
>> We would like to propose Kylin as an Apache Incubator project. The
>> complete proposal can be found:
>> https://wiki.apache.org/incubator/KylinProposal and posted the text of
>> the proposal below.
>> 
>> Thanks.
>> Luke
>> 
>> 
>> Kylin Proposal
>> ==============
>> 
>> # Abstract
>> 
>> Kylin is a distributed and scalable OLAP engine built on Hadoop to
>> support extremely large datasets.
>> 
>> # Proposal
>> 
>> Kylin is an open source Distributed Analytics Engine that provides
>> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
>> accelerate analytics on Hadoop by allowing the use of SQL-compatible
>> tools. Kylin provides a SQL interface and multi-dimensional analysis
>> (MOLAP) on Hadoop to support extremely large datasets and tightly
>> integrate with Hadoop ecosystem.
>> 
>> ## Overview of Kylin
>> 
>> Kylin platform has two parts of data processing and interactive:
>> First, Kylin will read data from source, Hive, and run a set of tasks
>> including Map Reduce job, shell script to pre-calcuate results for a
>> specified data model, then save the resulting OLAP cube into storage
>> such as HBase. Once these OLAP cubes are ready, a user can submit a
>> request from any SQL-based tool or third party applications to Kylin’s
>> REST server. The Server calls the Query Engine to determine if the
>> target dataset already exists. If so, the engine directly accesses the
>> target data in the form of a predefined cube, and returns the result
>> with sub-second latency. Otherwise, the engine is designed to route
>> non-matching queries to whichever SQL on Hadoop tool is already
>> available on a Hadoop cluster, such as Hive.
>> 
>> Kylin platform includes:
>> 
>> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
>> Metadata Manager is the key component that manages all metadata stored
>> in Kylin including all cube metadata. All other components rely on the
>> Metadata Manager.
>> 
>> - Job Engine: This engine is designed to handle all of the offline
>> jobs including shell script, Java API, and Map Reduce jobs. The Job
>> Engine manages and coordinates all of the jobs in Kylin to make sure
>> each job executes and handles failures.
>> 
>> - Storage Engine: This engine manages the underlying storage –
>> specifically, the cuboids, which are stored as key-value pairs. The
>> Storage Engine uses HBase – the best solution from the Hadoop
>> ecosystem for leveraging an existing K-V system. Kylin can also be
>> extended to support other K-V systems, such as Redis.
>> 
>> - Query Engine: Once the cube is ready, the Query Engine can receive
>> and parse user queries. It then interacts with other components to
>> return the results to the user.
>> 
>> - REST Server: The REST Server is an entry point for applications to
>> develop against Kylin. Applications can submit queries, get results,
>> trigger cube build jobs, get metadata, get user privileges, and so on.
>> 
>> - ODBC Driver: To support third-party tools and applications – such as
>> Tableau – we have built and open-sourced an ODBC Driver. The goal is
>> to make it easy for users to onboard.
>> 
>> # Background
>> 
>> The challenge we face at eBay is that our data volume is becoming
>> bigger and bigger while our user base is becoming more diverse. For
>> e.g. our business users and analysts consistently ask for minimal
>> latency when visualizing data on Tableau and Excel. So, we worked
>> closely with our internal analyst community and outlined the product
>> requirements for Kylin:
>> 
>> - Sub-second query latency on billions of rows
>> - ANSI SQL availability for those using SQL-compatible tools
>> - Full OLAP capability to offer advanced functionality
>> - Support for high cardinality and very large dimensions
>> - High concurrency for thousands of users
>> - Distributed and scale-out architecture for analysis in the TB to PB size range
>> 
>> Existing SQL-on-Hadoop solutions commonly need to perform partial or
>> full table or file scans to compute the results of queries. The cost
>> of these large data scans can make many queries very slow (more than a
>> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
>> pre-compute data along dimensions of interest and store resulting
>> aggregates as a "cube". MOLAP is much faster but is inflexible. We
>> realized that no existing product met our exact requirements
>> externally – especially in the open source Hadoop community. To meet
>> our emerging business needs, we built a platform from scratch to
>> support MOLAP for these business requirements and then to support more
>> others include ROLAP. With an excellent development team and several
>> pilot customers, we have been able to bring the Kylin platform into
>> production as well as open source it.
>> 
>> # Rationale
>> 
>> When data grows to petabyte scale, the process of pre-calculation of a
>> query takes a long time and costly and powerful hardware. However,
>> with the benefit of Hadoop’s distributed computing architecture, jobs
>> can leverage hundreds or thousands of Hadoop data nodes. There still
>> exists a big gap between the growing volume of data and interactive
>> analytics:
>> 
>> - Existing Business Intelligence (OLAP) platforms cannot scale out to
>> support fast growing data.
>> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
>> huge tables joins will always take long time to scan and calculate.
>> - No mature OLAP solution exists on Hadoop
>> 
>> As mentioned in the background, the business requirements triggered by
>> increase in data volume drove eBay to invest in building a solution
>> from scratch to offer Analytics capability on Hadoop cluster. With
>> Hadoop’s power of distributed computing Kylin can perform
>> pre-calculations in parallel and merge the final results, thereby
>> significantly reducing the processing time.
>> 
>> To serve queries by the analyst community, Kylin generates cuboids
>> with all possible combinations of dimensions, and calculate all
>> metrics at different levels. The cuboids are then integrated to form a
>> pre-calculated OLAP cube. All cuboids are key-value structured: keys
>> are composites formed from combinations of multiple dimensions and
>> values are aggregations results for that particular combination of
>> dimensions. Kylin uses HBase to store cubes. HBase is useful because
>> it supports efficient searches across ranges of data.
>> 
>> # Current Status
>> 
>> ## Meritocracy
>> 
>> Kylin has been deployed in production at eBay and is processing
>> extremely large datasets. The platform has demonstrated great
>> performance benefits and has proved to be a better way for analysts to
>> leverage data on Hadoop with a more convenient approach using their
>> favorite tool.
>> 
>> ## Community
>> 
>> Kylin seeks to develop developer and user communities during incubation.
>> 
>> ## Core Developers
>> 
>> Kylin is currently being designed and developed by six engineers from
>> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
>> Xiaodong Duo. In addition, some outside contributors are actively
>> contributing in design and development. Among them, Julian Hyde from
>> Hortonworks is a very important contributor. All of these core
>> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
>> general.
>> 
>> ## Alignment
>> 
>> The ASF is a natural host for Kylin given that it is already the home
>> of Hadoop, Pig, Hive, and other emerging cloud software projects.
>> Kylin was designed to offer OLAP capability on Hadoop from the
>> beginning in order to solve data access and analysis challenges in
>> Hadoop clusters. Kylin complements the existing Hadoop analytics area
>> by providing a comprehensive solution based on pre-computed views.
>> 
>> In Kylin, we are leveraging an open-source dynamic data management
>> framework called Apache Calcite to parse SQL and plug in our code.
>> Apache Calcite was previously called Optiq, was originally authored by
>> Julian Hyde and is now an Apache Incubator project.
>> 
>> # Known Risks
>> 
>> ## Orphaned Products
>> 
>> The core developers of Kylin team plan to work full time on this
>> project. There is very little risk of Kylin getting orphaned since at
>> least one large company (eBay) is extensively using it in their
>> production Hadoop clusters. For example, currently there are 3 use
>> cases with more that 12+Billion rows and 1000 activity requests per
>> day using Kylin in production. Furthermore, since Kylin was open
>> sourced at the beginning of October 2014, it has received more than
>> 280 stars and been forked nearly 100 times. Kylin has one major
>> release so far and and received 5 pull requests from contributors in
>> the first month pull requests from external sources in the last month,
>> which further demonstrates Kylin as a very active project. We plan to
>> extend and diversify this community further through Apache.
>> 
>> ## Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source.
>> They are already committers and contributors to the Kylin Github
>> project. All have been involved with the source code that has been
>> released under an open source license, and several of them also have
>> experience developing code in an open source environment. Though the
>> core set of Developers do not have Apache Open Source experience,
>> there are plans to onboard individuals with Apache open source
>> experience on to the project.
>> 
>> ## Homogenous Developers
>> 
>> The core developers include developers from eBay, Ctrip and
>> Hortonworks. Apache Incubation process encourages an open and diverse
>> meritocratic community. Apache Kylin has the required amount of
>> diversity with committers from three different organizations, but is
>> also aware that bulk of the commits come from a single entity. Kylin
>> intends to make every possible effort to build a diverse, vibrant and
>> involved community and has already received substantial interest from
>> various organizations
>> 
>> ## Reliance on Salaried Developers
>> 
>> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
>> and some of its key engineers are working full time on the project. In
>> addition, since there is a growing Big Data need for scalable OLAP
>> solutions on Hadoop, we look forward to other Apache developers and
>> researchers to contribute to the project. Additional contributors,
>> including Apache committers have plans to join this effort shortly.
>> Also key to addressing the risk associated with relying on Salaried
>> developers from a single entity is to increase the diversity of the
>> contributors and actively lobby for Domain experts in the BI space to
>> contribute. Apache Kylin intends to do this. One approach already
>> taken is to approach the Apache Drill project to explore possible
>> cooperation.
>> 
>> ## Relationships with Other Apache Products
>> 
>> Kylin has a strong relationship and dependency with Apache Hadoop
>> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
>> could help with a closer collaboration among these four projects and
>> as well as others.
>> 
>> Kylin is likely to have substantial value to Apache Drill due to the
>> common use of Calcite as a query optimization engine and similar
>> approaches between Kylin's approach to cubing and Drill's approach to
>> input sources.
>> 
>> ## An Excessive Fascination with the Apache Brand
>> 
>> Kylin is proposing to enter incubation at Apache in order to help
>> efforts to diversify the committer-base, not so much to capitalize on
>> the Apache brand. The Kylin project is in production use already
>> inside EBay, but is not expected to be an EBay product for external
>> customers. As such, the Kylin project is not seeking to use the Apache
>> brand as a marketing tool.
>> 
>> # Documentation
>> 
>> Information about Kylin can be found at
>> https://github.com/KylinOLAP/Kylin. The following links provide more
>> information about Kylin in open source:
>> 
>> - Kylin web site: http://kylin.io
>> - Codebase at Github: https://github.com/KylinOLAP/Kylin
>> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
>> - User community: https://groups.google.com/forum/#!forum/kylin-olap
>> 
>> ## Initial Source
>> 
>> Kylin has been under development since 2013 by a team of engineers at
>> eBay Inc. It is currently hosted on Github.com under an Apache license
>> at https://github.com/KylinOLAP/Kylin
>> 
>> ## External Dependencies
>> 
>> Kylin has the following external dependencies.
>> 
>> * Basic
>> 
>> - JDK 1.6+
>> - Apache Maven
>> - JUnit
>> - DBUnit
>> - Log4j
>> - Slf4j
>> - Apache Commons
>> - Google Guava
>> - Jackson
>> 
>> * Hadoop
>> 
>> - Apache Hadoop
>> - Apache HBase
>> - Apache Hive
>> - Apache Zookeeper
>> - Apache Curator
>> 
>> * Utility
>> 
>> - H2
>> - JSCH
>> 
>> * REST Service
>> 
>> - Spring
>> 
>> * Query
>> 
>> - Antlr
>> - Apache Calcite (formerly Optiq)
>> - Linq4j
>> 
>> * Job
>> 
>> - Quartz
>> 
>> * Web build tool
>> 
>> - NPM
>> - Grunt
>> - bower
>> 
>> * Web
>> 
>> - Angular JS
>> - jQuery
>> - Bootstrap
>> - D3 JS
>> - ACE
>> 
>> ##Cryptography
>> 
>> Kylin will eventually support encryption on the wire. This is not one
>> of the initial goals, and we do not expect Kylin to be a controlled
>> export item due to the use of encryption. Kylin supports but does not
>> require the Kerberos authentication mechanism to access secured Hadoop
>> services.
>> 
>> # Required Resources
>> 
>> ## Mailing List
>> 
>> - kylin-private for private PMC discussions (with moderated subscriptions)
>> - kylin-dev
>> - kylin-commits
>> 
>> ##Subversion Directory
>> 
>> Git is the preferred source control system: git://git.apache.org/Kylin
>> 
>> ## Issue Tracking
>> 
>> JIRA Kylin (KYLIN)
>> 
>> ## Other Resources
>> 
>> The existing code already has unit tests so we will make use of
>> existing Apache continuous testing infrastructure. The resulting load
>> should not be very large.
>> 
>> # Initial Committers
>> 
>> - Jiang Xu < jiangxu.china at gmail dot com>
>> - Luke Han <lukhan at ebay dot com>
>> - Yang Li <yangli9 at ebay dot com>
>> - George Song <ysong1 at ebay dot com>
>> - Hongbin Ma <honma at ebay dot com>
>> - Xiaodong Duo < oranjedog at gmail dot com>
>> - Julian Hyde < jhyde at apache dot org >
>> - Ankur Bansal < abansal at ebay dot com>
>> 
>> ## Affiliations
>> 
>> The initial committers are employees of eBay Inc., Ctrip and
>> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
>> Technologies and Pivotal.
>> 
>> # Sponsors
>> 
>> ## Champion
>> 
>> - Owen O’Malley < omalley at apache dot org >
>> - Ted Dunning <tdunning at apache dot org>
>> 
>> ## Nominated Mentors
>> 
>> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
>> Co-founder and Senior Architect, Hortonworks
>> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
>> Chief Architect, MapR Technologies
>> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
>> - Jacques Nadeau <jacques at apache dot org> (pending admission to
>> IPMC) - Apache Drill PMC Chair, MapR Technologies
>> 
>> #Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [PROPOSAL] Kylin for Incubation

Posted by Henry Saputra <he...@gmail.com>.
Thanks for the reminder Ross.
Hopefully we could go in the similar route as Apache Spark, Apache
Storm, and Apache MetaModel where the trademark should be used as
'Apache Kylin'.


- Henry

On Fri, Nov 14, 2014 at 7:47 AM, Ross Gardler (MS OPEN TECH)
<Ro...@microsoft.com> wrote:
> Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin
>
> Sent from my Windows Phone
> ________________________________
> From: Luke Han<ma...@gmail.com>
> Sent: ‎11/‎14/‎2014 7:38 AM
> To: general@incubator.apache.org<ma...@incubator.apache.org>
> Subject: [PROPOSAL] Kylin for Incubation
>
> Hi all,
> We would like to propose Kylin as an Apache Incubator project. The
> complete proposal can be found:
> https://wiki.apache.org/incubator/KylinProposal and posted the text of
> the proposal below.
>
> Thanks.
> Luke
>
>
> Kylin Proposal
> ==============
>
> # Abstract
>
> Kylin is a distributed and scalable OLAP engine built on Hadoop to
> support extremely large datasets.
>
> # Proposal
>
> Kylin is an open source Distributed Analytics Engine that provides
> multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
> accelerate analytics on Hadoop by allowing the use of SQL-compatible
> tools. Kylin provides a SQL interface and multi-dimensional analysis
> (MOLAP) on Hadoop to support extremely large datasets and tightly
> integrate with Hadoop ecosystem.
>
> ## Overview of Kylin
>
> Kylin platform has two parts of data processing and interactive:
> First, Kylin will read data from source, Hive, and run a set of tasks
> including Map Reduce job, shell script to pre-calcuate results for a
> specified data model, then save the resulting OLAP cube into storage
> such as HBase. Once these OLAP cubes are ready, a user can submit a
> request from any SQL-based tool or third party applications to Kylin’s
> REST server. The Server calls the Query Engine to determine if the
> target dataset already exists. If so, the engine directly accesses the
> target data in the form of a predefined cube, and returns the result
> with sub-second latency. Otherwise, the engine is designed to route
> non-matching queries to whichever SQL on Hadoop tool is already
> available on a Hadoop cluster, such as Hive.
>
> Kylin platform includes:
>
> - Metadata Manager: Kylin is a metadata-driven application. The Kylin
> Metadata Manager is the key component that manages all metadata stored
> in Kylin including all cube metadata. All other components rely on the
> Metadata Manager.
>
> - Job Engine: This engine is designed to handle all of the offline
> jobs including shell script, Java API, and Map Reduce jobs. The Job
> Engine manages and coordinates all of the jobs in Kylin to make sure
> each job executes and handles failures.
>
> - Storage Engine: This engine manages the underlying storage –
> specifically, the cuboids, which are stored as key-value pairs. The
> Storage Engine uses HBase – the best solution from the Hadoop
> ecosystem for leveraging an existing K-V system. Kylin can also be
> extended to support other K-V systems, such as Redis.
>
> - Query Engine: Once the cube is ready, the Query Engine can receive
> and parse user queries. It then interacts with other components to
> return the results to the user.
>
> - REST Server: The REST Server is an entry point for applications to
> develop against Kylin. Applications can submit queries, get results,
> trigger cube build jobs, get metadata, get user privileges, and so on.
>
> - ODBC Driver: To support third-party tools and applications – such as
> Tableau – we have built and open-sourced an ODBC Driver. The goal is
> to make it easy for users to onboard.
>
> # Background
>
> The challenge we face at eBay is that our data volume is becoming
> bigger and bigger while our user base is becoming more diverse. For
> e.g. our business users and analysts consistently ask for minimal
> latency when visualizing data on Tableau and Excel. So, we worked
> closely with our internal analyst community and outlined the product
> requirements for Kylin:
>
> - Sub-second query latency on billions of rows
> - ANSI SQL availability for those using SQL-compatible tools
> - Full OLAP capability to offer advanced functionality
> - Support for high cardinality and very large dimensions
> - High concurrency for thousands of users
> - Distributed and scale-out architecture for analysis in the TB to PB size range
>
> Existing SQL-on-Hadoop solutions commonly need to perform partial or
> full table or file scans to compute the results of queries. The cost
> of these large data scans can make many queries very slow (more than a
> minute). The core idea of MOLAP (multi-dimensional OLAP) is to
> pre-compute data along dimensions of interest and store resulting
> aggregates as a "cube". MOLAP is much faster but is inflexible. We
> realized that no existing product met our exact requirements
> externally – especially in the open source Hadoop community. To meet
> our emerging business needs, we built a platform from scratch to
> support MOLAP for these business requirements and then to support more
> others include ROLAP. With an excellent development team and several
> pilot customers, we have been able to bring the Kylin platform into
> production as well as open source it.
>
> # Rationale
>
> When data grows to petabyte scale, the process of pre-calculation of a
> query takes a long time and costly and powerful hardware. However,
> with the benefit of Hadoop’s distributed computing architecture, jobs
> can leverage hundreds or thousands of Hadoop data nodes. There still
> exists a big gap between the growing volume of data and interactive
> analytics:
>
> - Existing Business Intelligence (OLAP) platforms cannot scale out to
> support fast growing data.
> - Existing SQL on Hadoop projects are not designed for OLAP use cases,
> huge tables joins will always take long time to scan and calculate.
> - No mature OLAP solution exists on Hadoop
>
> As mentioned in the background, the business requirements triggered by
> increase in data volume drove eBay to invest in building a solution
> from scratch to offer Analytics capability on Hadoop cluster. With
> Hadoop’s power of distributed computing Kylin can perform
> pre-calculations in parallel and merge the final results, thereby
> significantly reducing the processing time.
>
> To serve queries by the analyst community, Kylin generates cuboids
> with all possible combinations of dimensions, and calculate all
> metrics at different levels. The cuboids are then integrated to form a
> pre-calculated OLAP cube. All cuboids are key-value structured: keys
> are composites formed from combinations of multiple dimensions and
> values are aggregations results for that particular combination of
> dimensions. Kylin uses HBase to store cubes. HBase is useful because
> it supports efficient searches across ranges of data.
>
> # Current Status
>
> ## Meritocracy
>
> Kylin has been deployed in production at eBay and is processing
> extremely large datasets. The platform has demonstrated great
> performance benefits and has proved to be a better way for analysts to
> leverage data on Hadoop with a more convenient approach using their
> favorite tool.
>
> ## Community
>
> Kylin seeks to develop developer and user communities during incubation.
>
> ## Core Developers
>
> Kylin is currently being designed and developed by six engineers from
> eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
> Xiaodong Duo. In addition, some outside contributors are actively
> contributing in design and development. Among them, Julian Hyde from
> Hortonworks is a very important contributor. All of these core
> developers have deep expertise in Hadoop and the Hadoop Ecosystem in
> general.
>
> ## Alignment
>
> The ASF is a natural host for Kylin given that it is already the home
> of Hadoop, Pig, Hive, and other emerging cloud software projects.
> Kylin was designed to offer OLAP capability on Hadoop from the
> beginning in order to solve data access and analysis challenges in
> Hadoop clusters. Kylin complements the existing Hadoop analytics area
> by providing a comprehensive solution based on pre-computed views.
>
> In Kylin, we are leveraging an open-source dynamic data management
> framework called Apache Calcite to parse SQL and plug in our code.
> Apache Calcite was previously called Optiq, was originally authored by
> Julian Hyde and is now an Apache Incubator project.
>
> # Known Risks
>
> ## Orphaned Products
>
> The core developers of Kylin team plan to work full time on this
> project. There is very little risk of Kylin getting orphaned since at
> least one large company (eBay) is extensively using it in their
> production Hadoop clusters. For example, currently there are 3 use
> cases with more that 12+Billion rows and 1000 activity requests per
> day using Kylin in production. Furthermore, since Kylin was open
> sourced at the beginning of October 2014, it has received more than
> 280 stars and been forked nearly 100 times. Kylin has one major
> release so far and and received 5 pull requests from contributors in
> the first month pull requests from external sources in the last month,
> which further demonstrates Kylin as a very active project. We plan to
> extend and diversify this community further through Apache.
>
> ## Inexperience with Open Source
>
> The core developers are all active users and followers of open source.
> They are already committers and contributors to the Kylin Github
> project. All have been involved with the source code that has been
> released under an open source license, and several of them also have
> experience developing code in an open source environment. Though the
> core set of Developers do not have Apache Open Source experience,
> there are plans to onboard individuals with Apache open source
> experience on to the project.
>
> ## Homogenous Developers
>
> The core developers include developers from eBay, Ctrip and
> Hortonworks. Apache Incubation process encourages an open and diverse
> meritocratic community. Apache Kylin has the required amount of
> diversity with committers from three different organizations, but is
> also aware that bulk of the commits come from a single entity. Kylin
> intends to make every possible effort to build a diverse, vibrant and
> involved community and has already received substantial interest from
> various organizations
>
> ## Reliance on Salaried Developers
>
> eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
> and some of its key engineers are working full time on the project. In
> addition, since there is a growing Big Data need for scalable OLAP
> solutions on Hadoop, we look forward to other Apache developers and
> researchers to contribute to the project. Additional contributors,
> including Apache committers have plans to join this effort shortly.
> Also key to addressing the risk associated with relying on Salaried
> developers from a single entity is to increase the diversity of the
> contributors and actively lobby for Domain experts in the BI space to
> contribute. Apache Kylin intends to do this. One approach already
> taken is to approach the Apache Drill project to explore possible
> cooperation.
>
> ## Relationships with Other Apache Products
>
> Kylin has a strong relationship and dependency with Apache Hadoop
> HBase, Hive and Calcite. Being part of Apache’s Incubation community,
> could help with a closer collaboration among these four projects and
> as well as others.
>
> Kylin is likely to have substantial value to Apache Drill due to the
> common use of Calcite as a query optimization engine and similar
> approaches between Kylin's approach to cubing and Drill's approach to
> input sources.
>
> ## An Excessive Fascination with the Apache Brand
>
> Kylin is proposing to enter incubation at Apache in order to help
> efforts to diversify the committer-base, not so much to capitalize on
> the Apache brand. The Kylin project is in production use already
> inside EBay, but is not expected to be an EBay product for external
> customers. As such, the Kylin project is not seeking to use the Apache
> brand as a marketing tool.
>
> # Documentation
>
> Information about Kylin can be found at
> https://github.com/KylinOLAP/Kylin. The following links provide more
> information about Kylin in open source:
>
> - Kylin web site: http://kylin.io
> - Codebase at Github: https://github.com/KylinOLAP/Kylin
> - Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
> - User community: https://groups.google.com/forum/#!forum/kylin-olap
>
> ## Initial Source
>
> Kylin has been under development since 2013 by a team of engineers at
> eBay Inc. It is currently hosted on Github.com under an Apache license
> at https://github.com/KylinOLAP/Kylin
>
> ## External Dependencies
>
> Kylin has the following external dependencies.
>
> * Basic
>
> - JDK 1.6+
> - Apache Maven
> - JUnit
> - DBUnit
> - Log4j
> - Slf4j
> - Apache Commons
> - Google Guava
> - Jackson
>
> * Hadoop
>
> - Apache Hadoop
> - Apache HBase
> - Apache Hive
> - Apache Zookeeper
> - Apache Curator
>
> * Utility
>
> - H2
> - JSCH
>
> * REST Service
>
> - Spring
>
> * Query
>
> - Antlr
> - Apache Calcite (formerly Optiq)
> - Linq4j
>
> * Job
>
> - Quartz
>
> * Web build tool
>
> - NPM
> - Grunt
> - bower
>
> * Web
>
> - Angular JS
> - jQuery
> - Bootstrap
> - D3 JS
> - ACE
>
> ##Cryptography
>
> Kylin will eventually support encryption on the wire. This is not one
> of the initial goals, and we do not expect Kylin to be a controlled
> export item due to the use of encryption. Kylin supports but does not
> require the Kerberos authentication mechanism to access secured Hadoop
> services.
>
> # Required Resources
>
> ## Mailing List
>
> - kylin-private for private PMC discussions (with moderated subscriptions)
> - kylin-dev
> - kylin-commits
>
> ##Subversion Directory
>
> Git is the preferred source control system: git://git.apache.org/Kylin
>
> ## Issue Tracking
>
> JIRA Kylin (KYLIN)
>
> ## Other Resources
>
> The existing code already has unit tests so we will make use of
> existing Apache continuous testing infrastructure. The resulting load
> should not be very large.
>
> # Initial Committers
>
> - Jiang Xu < jiangxu.china at gmail dot com>
> - Luke Han <lukhan at ebay dot com>
> - Yang Li <yangli9 at ebay dot com>
> - George Song <ysong1 at ebay dot com>
> - Hongbin Ma <honma at ebay dot com>
> - Xiaodong Duo < oranjedog at gmail dot com>
> - Julian Hyde < jhyde at apache dot org >
> - Ankur Bansal < abansal at ebay dot com>
>
> ## Affiliations
>
> The initial committers are employees of eBay Inc., Ctrip and
> Hortonworks. The nominated mentors are employees of Hortonworks, MapR
> Technologies and Pivotal.
>
> # Sponsors
>
> ## Champion
>
> - Owen O’Malley < omalley at apache dot org >
> - Ted Dunning <tdunning at apache dot org>
>
> ## Nominated Mentors
>
> - Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
> Co-founder and Senior Architect, Hortonworks
> - Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
> Chief Architect, MapR Technologies
> - Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
> - Jacques Nadeau <jacques at apache dot org> (pending admission to
> IPMC) - Apache Drill PMC Chair, MapR Technologies
>
> #Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


RE: [PROPOSAL] Kylin for Incubation

Posted by "Ross Gardler (MS OPEN TECH)" <Ro...@microsoft.com>.
Potential trademark clash: http://www.ubuntu.com/desktop/ubuntu-kylin

Sent from my Windows Phone
________________________________
From: Luke Han<ma...@gmail.com>
Sent: ‎11/‎14/‎2014 7:38 AM
To: general@incubator.apache.org<ma...@incubator.apache.org>
Subject: [PROPOSAL] Kylin for Incubation

Hi all,
We would like to propose Kylin as an Apache Incubator project. The
complete proposal can be found:
https://wiki.apache.org/incubator/KylinProposal and posted the text of
the proposal below.

Thanks.
Luke


Kylin Proposal
==============

# Abstract

Kylin is a distributed and scalable OLAP engine built on Hadoop to
support extremely large datasets.

# Proposal

Kylin is an open source Distributed Analytics Engine that provides
multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
accelerate analytics on Hadoop by allowing the use of SQL-compatible
tools. Kylin provides a SQL interface and multi-dimensional analysis
(MOLAP) on Hadoop to support extremely large datasets and tightly
integrate with Hadoop ecosystem.

## Overview of Kylin

Kylin platform has two parts of data processing and interactive:
First, Kylin will read data from source, Hive, and run a set of tasks
including Map Reduce job, shell script to pre-calcuate results for a
specified data model, then save the resulting OLAP cube into storage
such as HBase. Once these OLAP cubes are ready, a user can submit a
request from any SQL-based tool or third party applications to Kylin’s
REST server. The Server calls the Query Engine to determine if the
target dataset already exists. If so, the engine directly accesses the
target data in the form of a predefined cube, and returns the result
with sub-second latency. Otherwise, the engine is designed to route
non-matching queries to whichever SQL on Hadoop tool is already
available on a Hadoop cluster, such as Hive.

Kylin platform includes:

- Metadata Manager: Kylin is a metadata-driven application. The Kylin
Metadata Manager is the key component that manages all metadata stored
in Kylin including all cube metadata. All other components rely on the
Metadata Manager.

- Job Engine: This engine is designed to handle all of the offline
jobs including shell script, Java API, and Map Reduce jobs. The Job
Engine manages and coordinates all of the jobs in Kylin to make sure
each job executes and handles failures.

- Storage Engine: This engine manages the underlying storage –
specifically, the cuboids, which are stored as key-value pairs. The
Storage Engine uses HBase – the best solution from the Hadoop
ecosystem for leveraging an existing K-V system. Kylin can also be
extended to support other K-V systems, such as Redis.

- Query Engine: Once the cube is ready, the Query Engine can receive
and parse user queries. It then interacts with other components to
return the results to the user.

- REST Server: The REST Server is an entry point for applications to
develop against Kylin. Applications can submit queries, get results,
trigger cube build jobs, get metadata, get user privileges, and so on.

- ODBC Driver: To support third-party tools and applications – such as
Tableau – we have built and open-sourced an ODBC Driver. The goal is
to make it easy for users to onboard.

# Background

The challenge we face at eBay is that our data volume is becoming
bigger and bigger while our user base is becoming more diverse. For
e.g. our business users and analysts consistently ask for minimal
latency when visualizing data on Tableau and Excel. So, we worked
closely with our internal analyst community and outlined the product
requirements for Kylin:

- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size range

Existing SQL-on-Hadoop solutions commonly need to perform partial or
full table or file scans to compute the results of queries. The cost
of these large data scans can make many queries very slow (more than a
minute). The core idea of MOLAP (multi-dimensional OLAP) is to
pre-compute data along dimensions of interest and store resulting
aggregates as a "cube". MOLAP is much faster but is inflexible. We
realized that no existing product met our exact requirements
externally – especially in the open source Hadoop community. To meet
our emerging business needs, we built a platform from scratch to
support MOLAP for these business requirements and then to support more
others include ROLAP. With an excellent development team and several
pilot customers, we have been able to bring the Kylin platform into
production as well as open source it.

# Rationale

When data grows to petabyte scale, the process of pre-calculation of a
query takes a long time and costly and powerful hardware. However,
with the benefit of Hadoop’s distributed computing architecture, jobs
can leverage hundreds or thousands of Hadoop data nodes. There still
exists a big gap between the growing volume of data and interactive
analytics:

- Existing Business Intelligence (OLAP) platforms cannot scale out to
support fast growing data.
- Existing SQL on Hadoop projects are not designed for OLAP use cases,
huge tables joins will always take long time to scan and calculate.
- No mature OLAP solution exists on Hadoop

As mentioned in the background, the business requirements triggered by
increase in data volume drove eBay to invest in building a solution
from scratch to offer Analytics capability on Hadoop cluster. With
Hadoop’s power of distributed computing Kylin can perform
pre-calculations in parallel and merge the final results, thereby
significantly reducing the processing time.

To serve queries by the analyst community, Kylin generates cuboids
with all possible combinations of dimensions, and calculate all
metrics at different levels. The cuboids are then integrated to form a
pre-calculated OLAP cube. All cuboids are key-value structured: keys
are composites formed from combinations of multiple dimensions and
values are aggregations results for that particular combination of
dimensions. Kylin uses HBase to store cubes. HBase is useful because
it supports efficient searches across ranges of data.

# Current Status

## Meritocracy

Kylin has been deployed in production at eBay and is processing
extremely large datasets. The platform has demonstrated great
performance benefits and has proved to be a better way for analysts to
leverage data on Hadoop with a more convenient approach using their
favorite tool.

## Community

Kylin seeks to develop developer and user communities during incubation.

## Core Developers

Kylin is currently being designed and developed by six engineers from
eBay Inc. – Jiang Xu, Luke Han, Yang Li, George Song, Hongbin Ma and
Xiaodong Duo. In addition, some outside contributors are actively
contributing in design and development. Among them, Julian Hyde from
Hortonworks is a very important contributor. All of these core
developers have deep expertise in Hadoop and the Hadoop Ecosystem in
general.

## Alignment

The ASF is a natural host for Kylin given that it is already the home
of Hadoop, Pig, Hive, and other emerging cloud software projects.
Kylin was designed to offer OLAP capability on Hadoop from the
beginning in order to solve data access and analysis challenges in
Hadoop clusters. Kylin complements the existing Hadoop analytics area
by providing a comprehensive solution based on pre-computed views.

In Kylin, we are leveraging an open-source dynamic data management
framework called Apache Calcite to parse SQL and plug in our code.
Apache Calcite was previously called Optiq, was originally authored by
Julian Hyde and is now an Apache Incubator project.

# Known Risks

## Orphaned Products

The core developers of Kylin team plan to work full time on this
project. There is very little risk of Kylin getting orphaned since at
least one large company (eBay) is extensively using it in their
production Hadoop clusters. For example, currently there are 3 use
cases with more that 12+Billion rows and 1000 activity requests per
day using Kylin in production. Furthermore, since Kylin was open
sourced at the beginning of October 2014, it has received more than
280 stars and been forked nearly 100 times. Kylin has one major
release so far and and received 5 pull requests from contributors in
the first month pull requests from external sources in the last month,
which further demonstrates Kylin as a very active project. We plan to
extend and diversify this community further through Apache.

## Inexperience with Open Source

The core developers are all active users and followers of open source.
They are already committers and contributors to the Kylin Github
project. All have been involved with the source code that has been
released under an open source license, and several of them also have
experience developing code in an open source environment. Though the
core set of Developers do not have Apache Open Source experience,
there are plans to onboard individuals with Apache open source
experience on to the project.

## Homogenous Developers

The core developers include developers from eBay, Ctrip and
Hortonworks. Apache Incubation process encourages an open and diverse
meritocratic community. Apache Kylin has the required amount of
diversity with committers from three different organizations, but is
also aware that bulk of the commits come from a single entity. Kylin
intends to make every possible effort to build a diverse, vibrant and
involved community and has already received substantial interest from
various organizations

## Reliance on Salaried Developers

eBay invested in Kylin as the OLAP solution on top of Hadoop clusters
and some of its key engineers are working full time on the project. In
addition, since there is a growing Big Data need for scalable OLAP
solutions on Hadoop, we look forward to other Apache developers and
researchers to contribute to the project. Additional contributors,
including Apache committers have plans to join this effort shortly.
Also key to addressing the risk associated with relying on Salaried
developers from a single entity is to increase the diversity of the
contributors and actively lobby for Domain experts in the BI space to
contribute. Apache Kylin intends to do this. One approach already
taken is to approach the Apache Drill project to explore possible
cooperation.

## Relationships with Other Apache Products

Kylin has a strong relationship and dependency with Apache Hadoop
HBase, Hive and Calcite. Being part of Apache’s Incubation community,
could help with a closer collaboration among these four projects and
as well as others.

Kylin is likely to have substantial value to Apache Drill due to the
common use of Calcite as a query optimization engine and similar
approaches between Kylin's approach to cubing and Drill's approach to
input sources.

## An Excessive Fascination with the Apache Brand

Kylin is proposing to enter incubation at Apache in order to help
efforts to diversify the committer-base, not so much to capitalize on
the Apache brand. The Kylin project is in production use already
inside EBay, but is not expected to be an EBay product for external
customers. As such, the Kylin project is not seeking to use the Apache
brand as a marketing tool.

# Documentation

Information about Kylin can be found at
https://github.com/KylinOLAP/Kylin. The following links provide more
information about Kylin in open source:

- Kylin web site: http://kylin.io
- Codebase at Github: https://github.com/KylinOLAP/Kylin
- Issue Tracking: https://github.com/KylinOLAP/Kylin/issues
- User community: https://groups.google.com/forum/#!forum/kylin-olap

## Initial Source

Kylin has been under development since 2013 by a team of engineers at
eBay Inc. It is currently hosted on Github.com under an Apache license
at https://github.com/KylinOLAP/Kylin

## External Dependencies

Kylin has the following external dependencies.

* Basic

- JDK 1.6+
- Apache Maven
- JUnit
- DBUnit
- Log4j
- Slf4j
- Apache Commons
- Google Guava
- Jackson

* Hadoop

- Apache Hadoop
- Apache HBase
- Apache Hive
- Apache Zookeeper
- Apache Curator

* Utility

- H2
- JSCH

* REST Service

- Spring

* Query

- Antlr
- Apache Calcite (formerly Optiq)
- Linq4j

* Job

- Quartz

* Web build tool

- NPM
- Grunt
- bower

* Web

- Angular JS
- jQuery
- Bootstrap
- D3 JS
- ACE

##Cryptography

Kylin will eventually support encryption on the wire. This is not one
of the initial goals, and we do not expect Kylin to be a controlled
export item due to the use of encryption. Kylin supports but does not
require the Kerberos authentication mechanism to access secured Hadoop
services.

# Required Resources

## Mailing List

- kylin-private for private PMC discussions (with moderated subscriptions)
- kylin-dev
- kylin-commits

##Subversion Directory

Git is the preferred source control system: git://git.apache.org/Kylin

## Issue Tracking

JIRA Kylin (KYLIN)

## Other Resources

The existing code already has unit tests so we will make use of
existing Apache continuous testing infrastructure. The resulting load
should not be very large.

# Initial Committers

- Jiang Xu < jiangxu.china at gmail dot com>
- Luke Han <lukhan at ebay dot com>
- Yang Li <yangli9 at ebay dot com>
- George Song <ysong1 at ebay dot com>
- Hongbin Ma <honma at ebay dot com>
- Xiaodong Duo < oranjedog at gmail dot com>
- Julian Hyde < jhyde at apache dot org >
- Ankur Bansal < abansal at ebay dot com>

## Affiliations

The initial committers are employees of eBay Inc., Ctrip and
Hortonworks. The nominated mentors are employees of Hortonworks, MapR
Technologies and Pivotal.

# Sponsors

## Champion

- Owen O’Malley < omalley at apache dot org >
- Ted Dunning <tdunning at apache dot org>

## Nominated Mentors

- Owen O’Malley < omalley at apache dot org > - Apache IPMC member,
Co-founder and Senior Architect, Hortonworks
- Ted Dunning < tdunning at apache dot org> - Apache IPMC member,
Chief Architect, MapR Technologies
- Henry Saputra <hsaputra at apache dot org> - Apache IPMC member, Pivotal
- Jacques Nadeau <jacques at apache dot org> (pending admission to
IPMC) - Apache Drill PMC Chair, MapR Technologies

#Sponsoring Entity

We are requesting the Incubator to sponsor this project.