You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2020/07/24 11:23:11 UTC

[DISCUSS] Kylin Parquet storage and 4.0 plan

Hello, Kylin users,

Regarding the Kylin Parquet storage, we hope to update the progress here.
At present, we have completed the main development work[1], design
document[2], and the benchmark. With the new architecture, Kylin is going
to be more efficient and be more cloud-friendly: fully on Spark, less
dependency on Hadoop stack, which made the DevOps easier.

Here we discuss the future plan, which includes the two aspects.

1. The plan for Kylin 4.0

In Kylin 3.x, we have released some important functions/features, such as
real-time analysis, Flink building engine, global dictionary with Hive,
etc. In the next phase, we hope to concentrate on the Parquet storage
engine and to release it in Kylin v4.0 within this year. In this period,
3.x will be keeping maintained for bug fix and security vulnerability, but
won't introduce big change or major features.

2. Backward compatibility for HBase storage.

When we develop the Parquet storage engine, we find it is very difficult to
make the Parquet and HBase engines co-exist. The codebase becomes very
complicated and ugly, inevitably bring big challenges to the maintenance
and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
CDHs' are different from the community's'), which makes the testing and
release effort be doubled or tripled in the past years.

So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
metadata will also migrate to MySQL. For existing users, if you want to use
the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
Parquet storage, a migration tool can be provided later (another discuss
thread).

Welcome to tell us your concerns and suggestions! Thank you for your
participation.

## Reference
[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
[2]
https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Kang, it will still be KV; If changing to relational, there is too much
work to do.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Zhou Kang <zh...@outlook.com> 于2020年7月24日周五 下午10:21写道:

> I have a question:
> Metadata based on MySQL,  data in MySQL is KV or relational ?
>
> Thank you!
>
>
> > 2020年7月24日 下午7:23,ShaoFeng Shi <sh...@apache.org> 写道:
> >
> > Hello, Kylin users,
> >
> > Regarding the Kylin Parquet storage, we hope to update the progress
> here. At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
> >
> > Here we discuss the future plan, which includes the two aspects.
> >
> > 1. The plan for Kylin 4.0
> >
> > In Kylin 3.x, we have released some important functions/features, such
> as real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
> >
> > 2. Backward compatibility for HBase storage.
> >
> > When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
> >
> > So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
> >
> > Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
> >
> > ## Reference
> > [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> > [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
> >
> > Best regards,
> >
> > Shaofeng Shi 史少锋
> > Apache Kylin PMC
> > Email: shaofengshi@apache.org
> >
> > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> > Join Kylin user mail group: user-subscribe@kylin.apache.org
> > Join Kylin dev mail group: dev-subscribe@kylin.apache.org
> >
> >
>
>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by Zhou Kang <zh...@outlook.com>.
I have a question: 
Metadata based on MySQL,  data in MySQL is KV or relational ?

Thank you!


> 2020年7月24日 下午7:23,ShaoFeng Shi <sh...@apache.org> 写道:
> 
> Hello, Kylin users,
> 
> Regarding the Kylin Parquet storage, we hope to update the progress here. At present, we have completed the main development work[1], design document[2], and the benchmark. With the new architecture, Kylin is going to be more efficient and be more cloud-friendly: fully on Spark, less dependency on Hadoop stack, which made the DevOps easier.
> 
> Here we discuss the future plan, which includes the two aspects.
> 
> 1. The plan for Kylin 4.0
> 
> In Kylin 3.x, we have released some important functions/features, such as real-time analysis, Flink building engine, global dictionary with Hive, etc. In the next phase, we hope to concentrate on the Parquet storage engine and to release it in Kylin v4.0 within this year. In this period, 3.x will be keeping maintained for bug fix and security vulnerability, but won't introduce big change or major features.
> 
> 2. Backward compatibility for HBase storage.
> 
> When we develop the Parquet storage engine, we find it is very difficult to make the Parquet and HBase engines co-exist. The codebase becomes very complicated and ugly, inevitably bring big challenges to the maintenance and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the CDHs' are different from the community's'), which makes the testing and release effort be doubled or tripled in the past years.
> 
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin metadata will also migrate to MySQL. For existing users, if you want to use the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the Parquet storage, a migration tool can be provided later (another discuss thread). 
> 
> Welcome to tell us your concerns and suggestions! Thank you for your participation.
> 
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2] https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
> 
> Best regards,
> 
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
> 
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
> 
> 


Re:[DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by Xiaoxiang Yu <xx...@apache.org>.
Parquet storage is a major feature for Kylin, I have heard a lot of user are waiting on the future Kylin 4.0. 
Kylin team is working on this feature, we hope to provided a technical preview as soon as possible. I have open a ticket to keep track on this, here is the link https://issues.apache.org/jira/browse/KYLIN-4659 . If you is interested in its progress, you can watch that issue.










--

Best wishes to you ! 
From :Xiaoxiang Yu





At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:
>Hello, Kylin users,
>
>Regarding the Kylin Parquet storage, we hope to update the progress here.
>At present, we have completed the main development work[1], design
>document[2], and the benchmark. With the new architecture, Kylin is going
>to be more efficient and be more cloud-friendly: fully on Spark, less
>dependency on Hadoop stack, which made the DevOps easier.
>
>Here we discuss the future plan, which includes the two aspects.
>
>1. The plan for Kylin 4.0
>
>In Kylin 3.x, we have released some important functions/features, such as
>real-time analysis, Flink building engine, global dictionary with Hive,
>etc. In the next phase, we hope to concentrate on the Parquet storage
>engine and to release it in Kylin v4.0 within this year. In this period,
>3.x will be keeping maintained for bug fix and security vulnerability, but
>won't introduce big change or major features.
>
>2. Backward compatibility for HBase storage.
>
>When we develop the Parquet storage engine, we find it is very difficult to
>make the Parquet and HBase engines co-exist. The codebase becomes very
>complicated and ugly, inevitably bring big challenges to the maintenance
>and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>CDHs' are different from the community's'), which makes the testing and
>release effort be doubled or tripled in the past years.
>
>So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>metadata will also migrate to MySQL. For existing users, if you want to use
>the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>Parquet storage, a migration tool can be provided later (another discuss
>thread).
>
>Welcome to tell us your concerns and suggestions! Thank you for your
>participation.
>
>## Reference
>[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>[2]
>https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
>Best regards,
>
>Shaofeng Shi 史少锋
>Apache Kylin PMC
>Email: shaofengshi@apache.org
>
>Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>Join Kylin user mail group: user-subscribe@kylin.apache.org
>Join Kylin dev mail group: dev-subscribe@kylin.apache.org

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Xiao,

The 3.x will continue to release, especially for bug fix and security
issues. For new features and enhancements, it depends. The main
consideration is the testing and release effort: now each 3.x release needs
to build and test with 4 HBase API versions; even so, many users still
encounter environment problems in the even newer Hadoop platform like CDH
6.3, CDP 7, etc. So we will slow down the 3.x release frequency, so to move
more efforts on the parquet storage. The parquet storage has much better
compatibility on different platforms.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




chuxiao <cr...@163.com> 于2020年7月25日周六 下午12:12写道:

> Will 3.x continue to release?For example,support hbase rsgroup.
>
>
>
>
>
> At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:
>
> Hello, Kylin users,
>
> Regarding the Kylin Parquet storage, we hope to update the progress here.
> At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
>
> Here we discuss the future plan, which includes the two aspects.
>
> 1. The plan for Kylin 4.0
>
> In Kylin 3.x, we have released some important functions/features, such as
> real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
>
> 2. Backward compatibility for HBase storage.
>
> When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
>
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
>
> Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
>
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Xiao,

The 3.x will continue to release, especially for bug fix and security
issues. For new features and enhancements, it depends. The main
consideration is the testing and release effort: now each 3.x release needs
to build and test with 4 HBase API versions; even so, many users still
encounter environment problems in the even newer Hadoop platform like CDH
6.3, CDP 7, etc. So we will slow down the 3.x release frequency, so to move
more efforts on the parquet storage. The parquet storage has much better
compatibility on different platforms.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




chuxiao <cr...@163.com> 于2020年7月25日周六 下午12:12写道:

> Will 3.x continue to release?For example,support hbase rsgroup.
>
>
>
>
>
> At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:
>
> Hello, Kylin users,
>
> Regarding the Kylin Parquet storage, we hope to update the progress here.
> At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
>
> Here we discuss the future plan, which includes the two aspects.
>
> 1. The plan for Kylin 4.0
>
> In Kylin 3.x, we have released some important functions/features, such as
> real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
>
> 2. Backward compatibility for HBase storage.
>
> When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
>
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
>
> Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
>
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>

Re:[DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by chuxiao <cr...@163.com>.
Will 3.x continue to release?For example,support hbase rsgroup.













At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:

Hello, Kylin users,


Regarding the Kylin Parquet storage, we hope to update the progress here. At present, we have completed the main development work[1], design document[2], and the benchmark. With the new architecture, Kylin is going to be more efficient and be more cloud-friendly: fully on Spark, less dependency on Hadoop stack, which made the DevOps easier.

Here we discuss the future plan, which includes the two aspects.

1. The plan for Kylin 4.0


In Kylin 3.x, we have released some important functions/features, such as real-time analysis, Flink building engine, global dictionary with Hive, etc. In the next phase, we hope to concentrate on the Parquet storage engine and to release it in Kylin v4.0 within this year. In this period, 3.x will be keeping maintained for bug fix and security vulnerability, but won't introduce big change or major features.

2. Backward compatibility for HBase storage.

When we develop the Parquet storage engine, we find it is very difficult to make the Parquet and HBase engines co-exist. The codebase becomes very complicated and ugly, inevitably bring big challenges to the maintenance and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the CDHs' are different from the community's'), which makes the testing and release effort be doubled or tripled in the past years.

So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin metadata will also migrate to MySQL. For existing users, if you want to use the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the Parquet storage, a migration tool can be provided later (another discuss thread).

Welcome to tell us your concerns and suggestions! Thank you for your participation.

## Reference
[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
[2] https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org





Re:[DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by Xiaoxiang Yu <xx...@apache.org>.
Parquet storage is a major feature for Kylin, I have heard a lot of user are waiting on the future Kylin 4.0. 
Kylin team is working on this feature, we hope to provided a technical preview as soon as possible. I have open a ticket to keep track on this, here is the link https://issues.apache.org/jira/browse/KYLIN-4659 . If you is interested in its progress, you can watch that issue.










--

Best wishes to you ! 
From :Xiaoxiang Yu





At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:
>Hello, Kylin users,
>
>Regarding the Kylin Parquet storage, we hope to update the progress here.
>At present, we have completed the main development work[1], design
>document[2], and the benchmark. With the new architecture, Kylin is going
>to be more efficient and be more cloud-friendly: fully on Spark, less
>dependency on Hadoop stack, which made the DevOps easier.
>
>Here we discuss the future plan, which includes the two aspects.
>
>1. The plan for Kylin 4.0
>
>In Kylin 3.x, we have released some important functions/features, such as
>real-time analysis, Flink building engine, global dictionary with Hive,
>etc. In the next phase, we hope to concentrate on the Parquet storage
>engine and to release it in Kylin v4.0 within this year. In this period,
>3.x will be keeping maintained for bug fix and security vulnerability, but
>won't introduce big change or major features.
>
>2. Backward compatibility for HBase storage.
>
>When we develop the Parquet storage engine, we find it is very difficult to
>make the Parquet and HBase engines co-exist. The codebase becomes very
>complicated and ugly, inevitably bring big challenges to the maintenance
>and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>CDHs' are different from the community's'), which makes the testing and
>release effort be doubled or tripled in the past years.
>
>So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>metadata will also migrate to MySQL. For existing users, if you want to use
>the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>Parquet storage, a migration tool can be provided later (another discuss
>thread).
>
>Welcome to tell us your concerns and suggestions! Thank you for your
>participation.
>
>## Reference
>[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>[2]
>https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
>Best regards,
>
>Shaofeng Shi 史少锋
>Apache Kylin PMC
>Email: shaofengshi@apache.org
>
>Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>Join Kylin user mail group: user-subscribe@kylin.apache.org
>Join Kylin dev mail group: dev-subscribe@kylin.apache.org

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Cinto,

Currently, it uses the native Parquet, no additional indexing; in the
future, if Parquet enhances its index, Kylin can benefit from it;

== "are we using any metastore (like Hive) along with this ?"
I'm not sure whether I understand properly. The Cube parquet files are
directly persisted on HDFS or object storage, with no dependency on the
Hive meta store.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Cinto Sunny <ci...@gmail.com> 于2020年7月24日周五 下午10:47写道:

> Is there any documentation on the additional indexing (if any) we are
> doing on parquet. Also, are we using any metastore (like Hive) along with
> this ?
>
> - Cinto
>
>
> On Fri, Jul 24, 2020 at 4:23 AM ShaoFeng Shi <sh...@apache.org>
> wrote:
>
>> Hello, Kylin users,
>>
>> Regarding the Kylin Parquet storage, we hope to update the progress here.
>> At present, we have completed the main development work[1], design
>> document[2], and the benchmark. With the new architecture, Kylin is going
>> to be more efficient and be more cloud-friendly: fully on Spark, less
>> dependency on Hadoop stack, which made the DevOps easier.
>>
>> Here we discuss the future plan, which includes the two aspects.
>>
>> 1. The plan for Kylin 4.0
>>
>> In Kylin 3.x, we have released some important functions/features, such as
>> real-time analysis, Flink building engine, global dictionary with Hive,
>> etc. In the next phase, we hope to concentrate on the Parquet storage
>> engine and to release it in Kylin v4.0 within this year. In this period,
>> 3.x will be keeping maintained for bug fix and security vulnerability, but
>> won't introduce big change or major features.
>>
>> 2. Backward compatibility for HBase storage.
>>
>> When we develop the Parquet storage engine, we find it is very difficult
>> to make the Parquet and HBase engines co-exist. The codebase becomes very
>> complicated and ugly, inevitably bring big challenges to the maintenance
>> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>> CDHs' are different from the community's'), which makes the testing and
>> release effort be doubled or tripled in the past years.
>>
>> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>> metadata will also migrate to MySQL. For existing users, if you want to use
>> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>> Parquet storage, a migration tool can be provided later (another discuss
>> thread).
>>
>> Welcome to tell us your concerns and suggestions! Thank you for your
>> participation.
>>
>> ## Reference
>> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>> [2]
>> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofengshi@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscribe@kylin.apache.org
>> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>>
>>
>>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by ShaoFeng Shi <sh...@apache.org>.
Hi Cinto,

Currently, it uses the native Parquet, no additional indexing; in the
future, if Parquet enhances its index, Kylin can benefit from it;

== "are we using any metastore (like Hive) along with this ?"
I'm not sure whether I understand properly. The Cube parquet files are
directly persisted on HDFS or object storage, with no dependency on the
Hive meta store.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




Cinto Sunny <ci...@gmail.com> 于2020年7月24日周五 下午10:47写道:

> Is there any documentation on the additional indexing (if any) we are
> doing on parquet. Also, are we using any metastore (like Hive) along with
> this ?
>
> - Cinto
>
>
> On Fri, Jul 24, 2020 at 4:23 AM ShaoFeng Shi <sh...@apache.org>
> wrote:
>
>> Hello, Kylin users,
>>
>> Regarding the Kylin Parquet storage, we hope to update the progress here.
>> At present, we have completed the main development work[1], design
>> document[2], and the benchmark. With the new architecture, Kylin is going
>> to be more efficient and be more cloud-friendly: fully on Spark, less
>> dependency on Hadoop stack, which made the DevOps easier.
>>
>> Here we discuss the future plan, which includes the two aspects.
>>
>> 1. The plan for Kylin 4.0
>>
>> In Kylin 3.x, we have released some important functions/features, such as
>> real-time analysis, Flink building engine, global dictionary with Hive,
>> etc. In the next phase, we hope to concentrate on the Parquet storage
>> engine and to release it in Kylin v4.0 within this year. In this period,
>> 3.x will be keeping maintained for bug fix and security vulnerability, but
>> won't introduce big change or major features.
>>
>> 2. Backward compatibility for HBase storage.
>>
>> When we develop the Parquet storage engine, we find it is very difficult
>> to make the Parquet and HBase engines co-exist. The codebase becomes very
>> complicated and ugly, inevitably bring big challenges to the maintenance
>> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
>> CDHs' are different from the community's'), which makes the testing and
>> release effort be doubled or tripled in the past years.
>>
>> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
>> metadata will also migrate to MySQL. For existing users, if you want to use
>> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
>> Parquet storage, a migration tool can be provided later (another discuss
>> thread).
>>
>> Welcome to tell us your concerns and suggestions! Thank you for your
>> participation.
>>
>> ## Reference
>> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
>> [2]
>> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>>
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>> Apache Kylin PMC
>> Email: shaofengshi@apache.org
>>
>> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
>> Join Kylin user mail group: user-subscribe@kylin.apache.org
>> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>>
>>
>>

Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by Cinto Sunny <ci...@gmail.com>.
Is there any documentation on the additional indexing (if any) we are doing
on parquet. Also, are we using any metastore (like Hive) along with this ?

- Cinto


On Fri, Jul 24, 2020 at 4:23 AM ShaoFeng Shi <sh...@apache.org> wrote:

> Hello, Kylin users,
>
> Regarding the Kylin Parquet storage, we hope to update the progress here.
> At present, we have completed the main development work[1], design
> document[2], and the benchmark. With the new architecture, Kylin is going
> to be more efficient and be more cloud-friendly: fully on Spark, less
> dependency on Hadoop stack, which made the DevOps easier.
>
> Here we discuss the future plan, which includes the two aspects.
>
> 1. The plan for Kylin 4.0
>
> In Kylin 3.x, we have released some important functions/features, such as
> real-time analysis, Flink building engine, global dictionary with Hive,
> etc. In the next phase, we hope to concentrate on the Parquet storage
> engine and to release it in Kylin v4.0 within this year. In this period,
> 3.x will be keeping maintained for bug fix and security vulnerability, but
> won't introduce big change or major features.
>
> 2. Backward compatibility for HBase storage.
>
> When we develop the Parquet storage engine, we find it is very difficult
> to make the Parquet and HBase engines co-exist. The codebase becomes very
> complicated and ugly, inevitably bring big challenges to the maintenance
> and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the
> CDHs' are different from the community's'), which makes the testing and
> release effort be doubled or tripled in the past years.
>
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin
> metadata will also migrate to MySQL. For existing users, if you want to use
> the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the
> Parquet storage, a migration tool can be provided later (another discuss
> thread).
>
> Welcome to tell us your concerns and suggestions! Thank you for your
> participation.
>
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2]
> https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
>
> Best regards,
>
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
>
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
>
>
>

Re:[DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by chuxiao <cr...@163.com>.
Will 3.x continue to release?For example,support hbase rsgroup.













At 2020-07-24 19:23:11, "ShaoFeng Shi" <sh...@apache.org> wrote:

Hello, Kylin users,


Regarding the Kylin Parquet storage, we hope to update the progress here. At present, we have completed the main development work[1], design document[2], and the benchmark. With the new architecture, Kylin is going to be more efficient and be more cloud-friendly: fully on Spark, less dependency on Hadoop stack, which made the DevOps easier.

Here we discuss the future plan, which includes the two aspects.

1. The plan for Kylin 4.0


In Kylin 3.x, we have released some important functions/features, such as real-time analysis, Flink building engine, global dictionary with Hive, etc. In the next phase, we hope to concentrate on the Parquet storage engine and to release it in Kylin v4.0 within this year. In this period, 3.x will be keeping maintained for bug fix and security vulnerability, but won't introduce big change or major features.

2. Backward compatibility for HBase storage.

When we develop the Parquet storage engine, we find it is very difficult to make the Parquet and HBase engines co-exist. The codebase becomes very complicated and ugly, inevitably bring big challenges to the maintenance and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the CDHs' are different from the community's'), which makes the testing and release effort be doubled or tripled in the past years.

So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin metadata will also migrate to MySQL. For existing users, if you want to use the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the Parquet storage, a migration tool can be provided later (another discuss thread).

Welcome to tell us your concerns and suggestions! Thank you for your participation.

## Reference
[1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
[2] https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org





Re: [DISCUSS] Kylin Parquet storage and 4.0 plan

Posted by Zhou Kang <zh...@outlook.com>.
I have a question: 
Metadata based on MySQL,  data in MySQL is KV or relational ?

Thank you!


> 2020年7月24日 下午7:23,ShaoFeng Shi <sh...@apache.org> 写道:
> 
> Hello, Kylin users,
> 
> Regarding the Kylin Parquet storage, we hope to update the progress here. At present, we have completed the main development work[1], design document[2], and the benchmark. With the new architecture, Kylin is going to be more efficient and be more cloud-friendly: fully on Spark, less dependency on Hadoop stack, which made the DevOps easier.
> 
> Here we discuss the future plan, which includes the two aspects.
> 
> 1. The plan for Kylin 4.0
> 
> In Kylin 3.x, we have released some important functions/features, such as real-time analysis, Flink building engine, global dictionary with Hive, etc. In the next phase, we hope to concentrate on the Parquet storage engine and to release it in Kylin v4.0 within this year. In this period, 3.x will be keeping maintained for bug fix and security vulnerability, but won't introduce big change or major features.
> 
> 2. Backward compatibility for HBase storage.
> 
> When we develop the Parquet storage engine, we find it is very difficult to make the Parquet and HBase engines co-exist. The codebase becomes very complicated and ugly, inevitably bring big challenges to the maintenance and release. Besides, as HBase has different APIs (v1.1, v2.0, besides, the CDHs' are different from the community's'), which makes the testing and release effort be doubled or tripled in the past years.
> 
> So, we plan to remove the HBase storage engine in Kylin 4.0; The Kylin metadata will also migrate to MySQL. For existing users, if you want to use the HBase engine, then keep in the Kylin 3.x; If you want to upgrade to the Parquet storage, a migration tool can be provided later (another discuss thread). 
> 
> Welcome to tell us your concerns and suggestions! Thank you for your participation.
> 
> ## Reference
> [1] https://github.com/apache/kylin/tree/kylin-on-parquet-v2
> [2] https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage
> 
> Best regards,
> 
> Shaofeng Shi 史少锋
> Apache Kylin PMC
> Email: shaofengshi@apache.org
> 
> Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
> Join Kylin user mail group: user-subscribe@kylin.apache.org
> Join Kylin dev mail group: dev-subscribe@kylin.apache.org
> 
>