You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Li Yang <li...@apache.org> on 2022/01/11 05:59:13 UTC

[DISCUSS] The future of Apache Kylin

Hi All

Apache Kylin has been stable for quite a while and it may be a good time to
think about the future of it. Below are thoughts from my team and myself.
Love to hear yours as well. Ideas and comments are very welcome.  :-)

*APACHE KYLIN TODAY*

Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
Parquet to replace HBase as storage engine, so as to improve file scanning
performance. At the same time, Kylin 4.0 reimplements the spark based build
engine and query engine, making it possible to separate computing and
storage, and better adapt to the technology trend of cloud native. Kylin
4.0 comprehensively updated the build and query engine, realized the
deployment mode without Hadoop dependency, decreasing the complexity of
deployment. However, Kylin also has a lot to improve, such as the ability
of business semantic layer needs to be strengthened and the modification of
model/cube is not flexible. With these, we thinking a few things to do:

   - Multi-dimensional query ability friendly to non-technical personnel.
   Multi-dimensional model is the key to distinguish Kylin from the general
   OLAP engines. The feature is that the model concept based on dimension and
   measurement is more friendly to non-technical personnel and closer to the
   goal of citizen analyst. The multi-dimensional query capability that
   non-technical personnel can use should be the new focus of Kylin
   technology.


   - Native Engine. The query engine of Kylin still has much room for
   improvement in vector acceleration and cpu instruction level optimization.
   The Spark community Kylin relies on also has a strong demand for native
   engine. It is optimistic that native engine can improve the performance of
   Kylin by at least three times, which is worthy of investment.


   - More cloud native capabilities. Kylin 4.0 has only completed the
   initial cloud deployment and realized the features of rapid deployment and
   dynamic resource scaling on the cloud, but there are still many cloud
   native capabilities to be developed.

More explanations are following.

*KYLIN AS A MULTI-DIMENSIONAL DATABASE*

The core of Kylin is a multi-dimensional database, which is a special OLAP
engine. Although Kylin has always had the ability of a relational database
since its birth, and it is often compared with other relational OLAP
engines, what really makes Kylin different is multi-dimensional model and
multi-dimensional database ability. Considering the essence of Kylin and
its wide range of business uses in the future (not only technical uses),
positioning Kylin as a multi-dimensional database makes perfect sense. With
business semantics and precomputation technology, Apache Kylin helps
non-technical people understand and afford big data, and realizes data
democratization.

*THE SEMANTIC LAYER*

The key difference between the multi-dimensional database and the
relational database is business expression ability. Although SQL has strong
expression ability and is the basic skill of data analysts, SQL and the RDB
are still too difficult for non-technical personnel if we aim at "everyone
is a data analyst". From the perspective of non-technical personnel, the
data lake and data warehouse are like a dark room. They know that there is
a lot of data, but they can't see clearly, understand and use this data
because they don't understand database theory and SQL.

How to make the Data Lake (and data warehouse) clear to non-technical
personnel? This requires introducing a more friendly data model for
non-technical personnel — multi-dimensional data model. While the
relational model describes the technical form of data, the
multi-dimensional model describes the business form of data. In a MDB,
measurement corresponds to business indicators that everyone understands,
and dimension is the perspective of comparing and observing these business
indicators. Compare KPI with last month and compare performance between
parallel business units, which are concepts understood by every
non-technical personnel. By mapping the relational model to the
multi-dimensional model, the essence is to enhance the business semantics
on the technical data, form a business semantic layer, and help
non-technical personnel understand, explore and use the data. In order to
enhance Kylin's ability as the semantic layer, supporting multi-dimensional
query language is the key content of Kylin roadmap, such as MDX and DAX.
MDX can transform the data model in Kylin into a business friendly
language, endow data with business value, and facilitate Kylin's
multi-dimensional analysis with BI tools such as Excel and Tableau.

*PRECOMPUTATION AND MODEL FLEXIBILITY*

It is kylin's unchanging mission to continue to reduce the cost of a single
query through precomputation technology so that ordinary people can afford
big data. If the multi-dimensional model solves the problem that
non-technical personnel can understand data, then precomputation can solve
the problem that ordinary people can afford data. Both are necessary
conditions for data democratization. Through one calculation and multiple
use, the data cost can be shared by multiple users to achieve the scale
effect that the more users, the cheaper. Precalculation is Kylin's
traditional strength, but it lacks some flexibility in the change of
precalculation model. In order to strengthen the ability to change models
flexibly of Kylin and bring more optimization room, Kylin community expects
to propose a new metadata format in Kylin in the future to make
precalculation more flexible, be able to cope with that table format or
business requirements may change at any time.

*SUMMARY*

To sum up, we would like to propose Kylin as a multi-dimensional database.
Through multi-dimensional model and precomputation technology, ordinary
people can understand and afford big data, and finally realize the vision
of data democratization. Meanwhile, for today's users who use Kylin as the
SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
ensure that the precomputation technology can be used by both relational
model and multi-dimensional model. In the figure below, we picture the
future of Kylin. The newly added and modified parts are roughly marked in
blue and orange.

*FURTHER READING*

   - https://en.wikipedia.org/wiki/Data_model
   - https://en.wikipedia.org/wiki/Semantic_layer
   - https://en.wikipedia.org/wiki/Multidimensional_analysis
   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
   - https://en.wikipedia.org/wiki/XML_for_Analysis
   - https://en.wikipedia.org/wiki/SIMD
   - https://en.wikipedia.org/wiki/Cloud_native_computing
   -
   https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/


Please share your ideas and comments.  :-)

Cheers
Yang

Re: [DISCUSS] The future of Apache Kylin

Posted by ShaoFeng Shi <sh...@apache.org>.
+1

Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL
is the main query language, which makes it is a little confusing for users
to differentiate it from other technologies. Introducing the new semantic
layer will make Kylin a more complete solution.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Yaqian Zhang <Ya...@126.com> 于2022年1月11日周二 16:07写道:

> Cool!
> Looking forward to the new features of the next generation Apache Kylin.
>
> 在 2022年1月11日,下午2:30,Xiaoxiang Yu <xx...@apache.org> 写道:
>
> Thanks Yang, there are two new features that I really looking forward to,
> and they are:
>
> 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and
> more BI tools.
> 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as
> add/delete dimensions/measures) which status is Ready without purge the any
> useful cuboid/segmemnt .
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2022-01-11 13:59:13, "Li Yang" <li...@apache.org> wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires introducing a more friendly data model for
> >non-technical personnel — multi-dimensional data model. While the
> >relational model describes the technical form of data, the
> >multi-dimensional model describes the business form of data. In a MDB,
> >measurement corresponds to business indicators that everyone understands,
> >and dimension is the perspective of comparing and observing these business
> >indicators. Compare KPI with last month and compare performance between
> >parallel business units, which are concepts understood by every
> >non-technical personnel. By mapping the relational model to the
> >multi-dimensional model, the essence is to enhance the business semantics
> >on the technical data, form a business semantic layer, and help
> >non-technical personnel understand, explore and use the data. In order to
> >enhance Kylin's ability as the semantic layer, supporting multi-dimensional
> >query language is the key content of Kylin roadmap, such as MDX and DAX.
> >MDX can transform the data model in Kylin into a business friendly
> >language, endow data with business value, and facilitate Kylin's
> >multi-dimensional analysis with BI tools such as Excel and Tableau.
> >
> >*PRECOMPUTATION AND MODEL FLEXIBILITY*
> >
> >It is kylin's unchanging mission to continue to reduce the cost of a single
> >query through precomputation technology so that ordinary people can afford
> >big data. If the multi-dimensional model solves the problem that
> >non-technical personnel can understand data, then precomputation can solve
> >the problem that ordinary people can afford data. Both are necessary
> >conditions for data democratization. Through one calculation and multiple
> >use, the data cost can be shared by multiple users to achieve the scale
> >effect that the more users, the cheaper. Precalculation is Kylin's
> >traditional strength, but it lacks some flexibility in the change of
> >precalculation model. In order to strengthen the ability to change models
> >flexibly of Kylin and bring more optimization room, Kylin community expects
> >to propose a new metadata format in Kylin in the future to make
> >precalculation more flexible, be able to cope with that table format or
> >business requirements may change at any time.
> >
> >*SUMMARY*
> >
> >To sum up, we would like to propose Kylin as a multi-dimensional database.
> >Through multi-dimensional model and precomputation technology, ordinary
> >people can understand and afford big data, and finally realize the vision
> >of data democratization. Meanwhile, for today's users who use Kylin as the
> >SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
> >ensure that the precomputation technology can be used by both relational
> >model and multi-dimensional model. In the figure below, we picture the
> >future of Kylin. The newly added and modified parts are roughly marked in
> >blue and orange.
> >
> >*FURTHER READING*
> >
> >   - https://en.wikipedia.org/wiki/Data_model
> >   - https://en.wikipedia.org/wiki/Semantic_layer
> >   - https://en.wikipedia.org/wiki/Multidimensional_analysis
> >   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
> >   - https://en.wikipedia.org/wiki/XML_for_Analysis
> >   - https://en.wikipedia.org/wiki/SIMD
> >   - https://en.wikipedia.org/wiki/Cloud_native_computing
> >   -
> >   https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
> >
> >
> >Please share your ideas and comments.  :-)
> >
> >Cheers
> >Yang
>
>
>

Re: [DISCUSS] The future of Apache Kylin

Posted by ShaoFeng Shi <sh...@apache.org>.
+1

Kylin is a multi-dimensional OLAP (MOLAP) engine from day one; But as SQL
is the main query language, which makes it is a little confusing for users
to differentiate it from other technologies. Introducing the new semantic
layer will make Kylin a more complete solution.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Yaqian Zhang <Ya...@126.com> 于2022年1月11日周二 16:07写道:

> Cool!
> Looking forward to the new features of the next generation Apache Kylin.
>
> 在 2022年1月11日,下午2:30,Xiaoxiang Yu <xx...@apache.org> 写道:
>
> Thanks Yang, there are two new features that I really looking forward to,
> and they are:
>
> 1. New *SEMANTIC LAYER* will make Kylin be accessible by excel (MDX) and
> more BI tools.
> 2. New *flexible** ModeL *will let Kylin user modify Model/Cube (such as
> add/delete dimensions/measures) which status is Ready without purge the any
> useful cuboid/segmemnt .
>
> --
> *Best wishes to you ! *
> *From :**Xiaoxiang Yu*
>
>
> At 2022-01-11 13:59:13, "Li Yang" <li...@apache.org> wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires introducing a more friendly data model for
> >non-technical personnel — multi-dimensional data model. While the
> >relational model describes the technical form of data, the
> >multi-dimensional model describes the business form of data. In a MDB,
> >measurement corresponds to business indicators that everyone understands,
> >and dimension is the perspective of comparing and observing these business
> >indicators. Compare KPI with last month and compare performance between
> >parallel business units, which are concepts understood by every
> >non-technical personnel. By mapping the relational model to the
> >multi-dimensional model, the essence is to enhance the business semantics
> >on the technical data, form a business semantic layer, and help
> >non-technical personnel understand, explore and use the data. In order to
> >enhance Kylin's ability as the semantic layer, supporting multi-dimensional
> >query language is the key content of Kylin roadmap, such as MDX and DAX.
> >MDX can transform the data model in Kylin into a business friendly
> >language, endow data with business value, and facilitate Kylin's
> >multi-dimensional analysis with BI tools such as Excel and Tableau.
> >
> >*PRECOMPUTATION AND MODEL FLEXIBILITY*
> >
> >It is kylin's unchanging mission to continue to reduce the cost of a single
> >query through precomputation technology so that ordinary people can afford
> >big data. If the multi-dimensional model solves the problem that
> >non-technical personnel can understand data, then precomputation can solve
> >the problem that ordinary people can afford data. Both are necessary
> >conditions for data democratization. Through one calculation and multiple
> >use, the data cost can be shared by multiple users to achieve the scale
> >effect that the more users, the cheaper. Precalculation is Kylin's
> >traditional strength, but it lacks some flexibility in the change of
> >precalculation model. In order to strengthen the ability to change models
> >flexibly of Kylin and bring more optimization room, Kylin community expects
> >to propose a new metadata format in Kylin in the future to make
> >precalculation more flexible, be able to cope with that table format or
> >business requirements may change at any time.
> >
> >*SUMMARY*
> >
> >To sum up, we would like to propose Kylin as a multi-dimensional database.
> >Through multi-dimensional model and precomputation technology, ordinary
> >people can understand and afford big data, and finally realize the vision
> >of data democratization. Meanwhile, for today's users who use Kylin as the
> >SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
> >ensure that the precomputation technology can be used by both relational
> >model and multi-dimensional model. In the figure below, we picture the
> >future of Kylin. The newly added and modified parts are roughly marked in
> >blue and orange.
> >
> >*FURTHER READING*
> >
> >   - https://en.wikipedia.org/wiki/Data_model
> >   - https://en.wikipedia.org/wiki/Semantic_layer
> >   - https://en.wikipedia.org/wiki/Multidimensional_analysis
> >   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
> >   - https://en.wikipedia.org/wiki/XML_for_Analysis
> >   - https://en.wikipedia.org/wiki/SIMD
> >   - https://en.wikipedia.org/wiki/Cloud_native_computing
> >   -
> >   https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
> >
> >
> >Please share your ideas and comments.  :-)
> >
> >Cheers
> >Yang
>
>
>

Re: [DISCUSS] The future of Apache Kylin

Posted by Yaqian Zhang <Ya...@126.com>.
Cool! 
Looking forward to the new features of the next generation Apache Kylin.

> 在 2022年1月11日,下午2:30,Xiaoxiang Yu <xx...@apache.org> 写道:
> 
> Thanks Yang, there are two new features that I really looking forward to, and they are:
> 
> 1. New SEMANTIC LAYER will make Kylin be accessible by excel (MDX) and more BI tools.
> 2. New flexible ModeL will let Kylin user modify Model/Cube (such as add/delete dimensions/measures) which status is Ready without purge the any useful cuboid/segmemnt .
> 
> --
> Best wishes to you ! 
> From :Xiaoxiang Yu
> 
> 
> At 2022-01-11 13:59:13, "Li Yang" <li...@apache.org> wrote:
> >Hi All
> >
> >Apache Kylin has been stable for quite a while and it may be a good time to
> >think about the future of it. Below are thoughts from my team and myself.
> >Love to hear yours as well. Ideas and comments are very welcome.  :-)
> >
> >*APACHE KYLIN TODAY*
> >
> >Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
> >a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
> >Parquet to replace HBase as storage engine, so as to improve file scanning
> >performance. At the same time, Kylin 4.0 reimplements the spark based build
> >engine and query engine, making it possible to separate computing and
> >storage, and better adapt to the technology trend of cloud native. Kylin
> >4.0 comprehensively updated the build and query engine, realized the
> >deployment mode without Hadoop dependency, decreasing the complexity of
> >deployment. However, Kylin also has a lot to improve, such as the ability
> >of business semantic layer needs to be strengthened and the modification of
> >model/cube is not flexible. With these, we thinking a few things to do:
> >
> >   - Multi-dimensional query ability friendly to non-technical personnel.
> >   Multi-dimensional model is the key to distinguish Kylin from the general
> >   OLAP engines. The feature is that the model concept based on dimension and
> >   measurement is more friendly to non-technical personnel and closer to the
> >   goal of citizen analyst. The multi-dimensional query capability that
> >   non-technical personnel can use should be the new focus of Kylin
> >   technology.
> >
> >
> >   - Native Engine. The query engine of Kylin still has much room for
> >   improvement in vector acceleration and cpu instruction level optimization.
> >   The Spark community Kylin relies on also has a strong demand for native
> >   engine. It is optimistic that native engine can improve the performance of
> >   Kylin by at least three times, which is worthy of investment.
> >
> >
> >   - More cloud native capabilities. Kylin 4.0 has only completed the
> >   initial cloud deployment and realized the features of rapid deployment and
> >   dynamic resource scaling on the cloud, but there are still many cloud
> >   native capabilities to be developed.
> >
> >More explanations are following.
> >
> >*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
> >
> >The core of Kylin is a multi-dimensional database, which is a special OLAP
> >engine. Although Kylin has always had the ability of a relational database
> >since its birth, and it is often compared with other relational OLAP
> >engines, what really makes Kylin different is multi-dimensional model and
> >multi-dimensional database ability. Considering the essence of Kylin and
> >its wide range of business uses in the future (not only technical uses),
> >positioning Kylin as a multi-dimensional database makes perfect sense. With
> >business semantics and precomputation technology, Apache Kylin helps
> >non-technical people understand and afford big data, and realizes data
> >democratization.
> >
> >*THE SEMANTIC LAYER*
> >
> >The key difference between the multi-dimensional database and the
> >relational database is business expression ability. Although SQL has strong
> >expression ability and is the basic skill of data analysts, SQL and the RDB
> >are still too difficult for non-technical personnel if we aim at "everyone
> >is a data analyst". From the perspective of non-technical personnel, the
> >data lake and data warehouse are like a dark room. They know that there is
> >a lot of data, but they can't see clearly, understand and use this data
> >because they don't understand database theory and SQL.
> >
> >How to make the Data Lake (and data warehouse) clear to non-technical
> >personnel? This requires introducing a more friendly data model for
> >non-technical personnel — multi-dimensional data model. While the
> >relational model describes the technical form of data, the
> >multi-dimensional model describes the business form of data. In a MDB,
> >measurement corresponds to business indicators that everyone understands,
> >and dimension is the perspective of comparing and observing these business
> >indicators. Compare KPI with last month and compare performance between
> >parallel business units, which are concepts understood by every
> >non-technical personnel. By mapping the relational model to the
> >multi-dimensional model, the essence is to enhance the business semantics
> >on the technical data, form a business semantic layer, and help
> >non-technical personnel understand, explore and use the data. In order to
> >enhance Kylin's ability as the semantic layer, supporting multi-dimensional
> >query language is the key content of Kylin roadmap, such as MDX and DAX.
> >MDX can transform the data model in Kylin into a business friendly
> >language, endow data with business value, and facilitate Kylin's
> >multi-dimensional analysis with BI tools such as Excel and Tableau.
> >
> >*PRECOMPUTATION AND MODEL FLEXIBILITY*
> >
> >It is kylin's unchanging mission to continue to reduce the cost of a single
> >query through precomputation technology so that ordinary people can afford
> >big data. If the multi-dimensional model solves the problem that
> >non-technical personnel can understand data, then precomputation can solve
> >the problem that ordinary people can afford data. Both are necessary
> >conditions for data democratization. Through one calculation and multiple
> >use, the data cost can be shared by multiple users to achieve the scale
> >effect that the more users, the cheaper. Precalculation is Kylin's
> >traditional strength, but it lacks some flexibility in the change of
> >precalculation model. In order to strengthen the ability to change models
> >flexibly of Kylin and bring more optimization room, Kylin community expects
> >to propose a new metadata format in Kylin in the future to make
> >precalculation more flexible, be able to cope with that table format or
> >business requirements may change at any time.
> >
> >*SUMMARY*
> >
> >To sum up, we would like to propose Kylin as a multi-dimensional database.
> >Through multi-dimensional model and precomputation technology, ordinary
> >people can understand and afford big data, and finally realize the vision
> >of data democratization. Meanwhile, for today's users who use Kylin as the
> >SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
> >ensure that the precomputation technology can be used by both relational
> >model and multi-dimensional model. In the figure below, we picture the
> >future of Kylin. The newly added and modified parts are roughly marked in
> >blue and orange.
> >
> >*FURTHER READING*
> >
> >   - https://en.wikipedia.org/wiki/Data_model
> >   - https://en.wikipedia.org/wiki/Semantic_layer
> >   - https://en.wikipedia.org/wiki/Multidimensional_analysis
> >   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
> >   - https://en.wikipedia.org/wiki/XML_for_Analysis
> >   - https://en.wikipedia.org/wiki/SIMD
> >   - https://en.wikipedia.org/wiki/Cloud_native_computing
> >   -
> >   https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
> >
> >
> >Please share your ideas and comments.  :-)
> >
> >Cheers
> >Yang


Re:[DISCUSS] The future of Apache Kylin

Posted by Xiaoxiang Yu <xx...@apache.org>.
Thanks Yang, there are two new features that I really looking forward to, and they are:


1. New SEMANTIC LAYER will make Kylin be accessible by excel (MDX) and more BI tools.
2. New flexible ModeL will let Kylin user modify Model/Cube (such as add/delete dimensions/measures) which status is Ready without purge the any useful cuboid/segmemnt .




--

Best wishes to you ! 
From :Xiaoxiang Yu





At 2022-01-11 13:59:13, "Li Yang" <li...@apache.org> wrote:
>Hi All
>
>Apache Kylin has been stable for quite a while and it may be a good time to
>think about the future of it. Below are thoughts from my team and myself.
>Love to hear yours as well. Ideas and comments are very welcome.  :-)
>
>*APACHE KYLIN TODAY*
>
>Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
>a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
>Parquet to replace HBase as storage engine, so as to improve file scanning
>performance. At the same time, Kylin 4.0 reimplements the spark based build
>engine and query engine, making it possible to separate computing and
>storage, and better adapt to the technology trend of cloud native. Kylin
>4.0 comprehensively updated the build and query engine, realized the
>deployment mode without Hadoop dependency, decreasing the complexity of
>deployment. However, Kylin also has a lot to improve, such as the ability
>of business semantic layer needs to be strengthened and the modification of
>model/cube is not flexible. With these, we thinking a few things to do:
>
>   - Multi-dimensional query ability friendly to non-technical personnel.
>   Multi-dimensional model is the key to distinguish Kylin from the general
>   OLAP engines. The feature is that the model concept based on dimension and
>   measurement is more friendly to non-technical personnel and closer to the
>   goal of citizen analyst. The multi-dimensional query capability that
>   non-technical personnel can use should be the new focus of Kylin
>   technology.
>
>
>   - Native Engine. The query engine of Kylin still has much room for
>   improvement in vector acceleration and cpu instruction level optimization.
>   The Spark community Kylin relies on also has a strong demand for native
>   engine. It is optimistic that native engine can improve the performance of
>   Kylin by at least three times, which is worthy of investment.
>
>
>   - More cloud native capabilities. Kylin 4.0 has only completed the
>   initial cloud deployment and realized the features of rapid deployment and
>   dynamic resource scaling on the cloud, but there are still many cloud
>   native capabilities to be developed.
>
>More explanations are following.
>
>*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
>
>The core of Kylin is a multi-dimensional database, which is a special OLAP
>engine. Although Kylin has always had the ability of a relational database
>since its birth, and it is often compared with other relational OLAP
>engines, what really makes Kylin different is multi-dimensional model and
>multi-dimensional database ability. Considering the essence of Kylin and
>its wide range of business uses in the future (not only technical uses),
>positioning Kylin as a multi-dimensional database makes perfect sense. With
>business semantics and precomputation technology, Apache Kylin helps
>non-technical people understand and afford big data, and realizes data
>democratization.
>
>*THE SEMANTIC LAYER*
>
>The key difference between the multi-dimensional database and the
>relational database is business expression ability. Although SQL has strong
>expression ability and is the basic skill of data analysts, SQL and the RDB
>are still too difficult for non-technical personnel if we aim at "everyone
>is a data analyst". From the perspective of non-technical personnel, the
>data lake and data warehouse are like a dark room. They know that there is
>a lot of data, but they can't see clearly, understand and use this data
>because they don't understand database theory and SQL.
>
>How to make the Data Lake (and data warehouse) clear to non-technical
>personnel? This requires introducing a more friendly data model for
>non-technical personnel — multi-dimensional data model. While the
>relational model describes the technical form of data, the
>multi-dimensional model describes the business form of data. In a MDB,
>measurement corresponds to business indicators that everyone understands,
>and dimension is the perspective of comparing and observing these business
>indicators. Compare KPI with last month and compare performance between
>parallel business units, which are concepts understood by every
>non-technical personnel. By mapping the relational model to the
>multi-dimensional model, the essence is to enhance the business semantics
>on the technical data, form a business semantic layer, and help
>non-technical personnel understand, explore and use the data. In order to
>enhance Kylin's ability as the semantic layer, supporting multi-dimensional
>query language is the key content of Kylin roadmap, such as MDX and DAX.
>MDX can transform the data model in Kylin into a business friendly
>language, endow data with business value, and facilitate Kylin's
>multi-dimensional analysis with BI tools such as Excel and Tableau.
>
>*PRECOMPUTATION AND MODEL FLEXIBILITY*
>
>It is kylin's unchanging mission to continue to reduce the cost of a single
>query through precomputation technology so that ordinary people can afford
>big data. If the multi-dimensional model solves the problem that
>non-technical personnel can understand data, then precomputation can solve
>the problem that ordinary people can afford data. Both are necessary
>conditions for data democratization. Through one calculation and multiple
>use, the data cost can be shared by multiple users to achieve the scale
>effect that the more users, the cheaper. Precalculation is Kylin's
>traditional strength, but it lacks some flexibility in the change of
>precalculation model. In order to strengthen the ability to change models
>flexibly of Kylin and bring more optimization room, Kylin community expects
>to propose a new metadata format in Kylin in the future to make
>precalculation more flexible, be able to cope with that table format or
>business requirements may change at any time.
>
>*SUMMARY*
>
>To sum up, we would like to propose Kylin as a multi-dimensional database.
>Through multi-dimensional model and precomputation technology, ordinary
>people can understand and afford big data, and finally realize the vision
>of data democratization. Meanwhile, for today's users who use Kylin as the
>SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
>ensure that the precomputation technology can be used by both relational
>model and multi-dimensional model. In the figure below, we picture the
>future of Kylin. The newly added and modified parts are roughly marked in
>blue and orange.
>
>*FURTHER READING*
>
>   - https://en.wikipedia.org/wiki/Data_model
>   - https://en.wikipedia.org/wiki/Semantic_layer
>   - https://en.wikipedia.org/wiki/Multidimensional_analysis
>   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
>   - https://en.wikipedia.org/wiki/XML_for_Analysis
>   - https://en.wikipedia.org/wiki/SIMD
>   - https://en.wikipedia.org/wiki/Cloud_native_computing
>   -
>   https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
>
>
>Please share your ideas and comments.  :-)
>
>Cheers
>Yang