You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Li Yang <li...@apache.org> on 2016/02/01 08:46:47 UTC

[DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Just  to add more colors.

The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month. The
2.0 rc1 contains:

- A plugin-able architecture, to allow alternative cube engine / storage
engine / data source.
- A better MR cubing algorithm, about 1.5 times faster than 1.x by
comparing hundreds of jobs.
- A better storage engine, makes query roughly 2 times faster (especially
for slow queries) than 1.x by comparing tens of thousands sqls.
- Streaming cubing experimental support, source from kafka, build cube
in-mem at minutes interval
- TopN pre-calculation (more UDFs coming)
- ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
- SAML authentication support

As the release manager, I will kickoff the release process in two weeks
(once back from vacation). ETA by end of Feb.

Would love to hear more feedback from our community.  :-)


Yang



On Monday, February 1, 2016, Adunuthula, Seshu <sa...@ebay.com> wrote:

> Hello Folks,
>
> We are actively working towards Apache Kylin 2.0 Release and would like a
> discussion with the community on what they would like to see in 2.0 release
> of the product. We have three big rock items we are working towards in 2.0
> and lot of additional minor feature enhancements.
>
> Streaming Data Source support.
> This feature is semi baked in where the source of Kylin Cubes is Kafka
> Topics. Cube Segment are built on micro batches of messages arriving on
> Kafka topics. Currently a lot of work is going on to productize this
> feature. Primary areas of work are Stream Processing Engines/Frameworks to
> process the micro batches and UI to support out of the box integration of
> Kafka topics with Kylin Cubes.
>
> Spark based Cube building Engine.
> The initial performance numbers for a Spark based cubing engine did not
> show substantial improvement over MR based engine, but would like this
> feature to be baked in for the 2.0 Release. Lot of work underway to
> stabilize this feature.
>
> Amazon EMR Integration
> We had initial conversations with Amazon EMR to support Apache Kylin on
> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will be
> enabled feature on Amazon EMR. Limited work has gone into this area, but
> this will be an important milestone for 2.0
>
> We are also working towards creating an area for community driven
> improvements page similar to Apache Kafka’s KIP
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals.
> Stay tuned.
>
> Regards
> Seshu Adunuthula
>
>
>
>
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by Li Yang <li...@apache.org>.
Great to see many expectation to 2.0 too.

I'm continuing the release plan of 2.0, but in a more paced and steady way
as suggested.

Still I'm fully confident of the quality of 2.x branch. Its cubing engine,
storage, and query modules are the most well tested.

On Wed, Feb 3, 2016 at 5:30 PM, Jerome liu <je...@gmail.com> wrote:

> I am looking forward the release of 2.0 , and support the streaming process
> .  I hope it will be test to us .
>
> 2016-02-03 9:45 GMT+08:00 yu feng <ol...@gmail.com>:
>
> > We are looking forward the release of 2.0,  fast cubing algorithm, spark
> > supporting and streaming cube is very useful to us.
> > I have test 2.0-rc in our environment and it works fine, wish the release
> > comes soon.
> >
> > 2016-02-02 18:02 GMT+08:00 Yerui Sun <su...@gmail.com>:
> >
> > > We’ve been looking forward the release of 2.0 for a long time.
> > > We also have tested the 2.0-rc internally for a quite while, and proved
> > > it’s stable.
> > >
> > > We’re confident the release for now.
> > >
> > > > 在 2016年2月2日,17:22,杨海乐 <ya...@letv.com> 写道:
> > > >
> > > > hello all,
> > > >       As users of kylin.We all help Kylin released version 2.0 as
> soon
> > as
> > > > possible in order to get better performance。As a member of the kylin
> > > > community , I sincerely hope Kylin will be more powerful。
> > > >
> > > > --
> > > > View this message in context:
> > >
> >
> http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
> > > > Sent from the Apache Kylin mailing list archive at Nabble.com.
> > >
> > >
> >
>
>
>
> --
> Welcome to jerome.liuheng@gmail.com !
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by Jerome liu <je...@gmail.com>.
I am looking forward the release of 2.0 , and support the streaming process
.  I hope it will be test to us .

2016-02-03 9:45 GMT+08:00 yu feng <ol...@gmail.com>:

> We are looking forward the release of 2.0,  fast cubing algorithm, spark
> supporting and streaming cube is very useful to us.
> I have test 2.0-rc in our environment and it works fine, wish the release
> comes soon.
>
> 2016-02-02 18:02 GMT+08:00 Yerui Sun <su...@gmail.com>:
>
> > We’ve been looking forward the release of 2.0 for a long time.
> > We also have tested the 2.0-rc internally for a quite while, and proved
> > it’s stable.
> >
> > We’re confident the release for now.
> >
> > > 在 2016年2月2日,17:22,杨海乐 <ya...@letv.com> 写道:
> > >
> > > hello all,
> > >       As users of kylin.We all help Kylin released version 2.0 as soon
> as
> > > possible in order to get better performance。As a member of the kylin
> > > community , I sincerely hope Kylin will be more powerful。
> > >
> > > --
> > > View this message in context:
> >
> http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
> > > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
> >
>



-- 
Welcome to jerome.liuheng@gmail.com !

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by yu feng <ol...@gmail.com>.
We are looking forward the release of 2.0,  fast cubing algorithm, spark
supporting and streaming cube is very useful to us.
I have test 2.0-rc in our environment and it works fine, wish the release
comes soon.

2016-02-02 18:02 GMT+08:00 Yerui Sun <su...@gmail.com>:

> We’ve been looking forward the release of 2.0 for a long time.
> We also have tested the 2.0-rc internally for a quite while, and proved
> it’s stable.
>
> We’re confident the release for now.
>
> > 在 2016年2月2日,17:22,杨海乐 <ya...@letv.com> 写道:
> >
> > hello all,
> >       As users of kylin.We all help Kylin released version 2.0 as soon as
> > possible in order to get better performance。As a member of the kylin
> > community , I sincerely hope Kylin will be more powerful。
> >
> > --
> > View this message in context:
> http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
>
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by Yerui Sun <su...@gmail.com>.
We’ve been looking forward the release of 2.0 for a long time. 
We also have tested the 2.0-rc internally for a quite while, and proved it’s stable. 

We’re confident the release for now.

> 在 2016年2月2日,17:22,杨海乐 <ya...@letv.com> 写道:
> 
> hello all,
>       As users of kylin.We all help Kylin released version 2.0 as soon as
> possible in order to get better performance。As a member of the kylin
> community , I sincerely hope Kylin will be more powerful。
> 
> --
> View this message in context: http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.


Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by 杨海乐 <ya...@letv.com>.
hello all,
       As users of kylin.We all help Kylin released version 2.0 as soon as
possible in order to get better performance。As a member of the kylin
community , I sincerely hope Kylin will be more powerful。

--
View this message in context: http://apache-kylin.74782.x6.nabble.com/DISCUSS-Apache-Kylin-2-0-Release-Features-Criteria-tp3524p3555.html
Sent from the Apache Kylin mailing list archive at Nabble.com.

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
Yes, we will be filing a whole bunch of JIRAs. This release is not Done,
so no point in arguing about whether it is perfect. Luke, I do not want
you to push this release through.

 

On 2/1/16, 7:54 PM, "Luke Han" <lu...@gmail.com> wrote:

>Hi Seshu,
>      "Done is better than Perfect" is one practice in our development:
>release early, ask users
>to try and test, then fix bugs, bring other features if any, and then
>release a new one...
>It works very well in the past and I believe it will continue benefit
>further development.
>
>      And you could see, the 2.x branch is active development code base
>over several months,
>as Yang mentioned, we are confident to release first version now. Also
>there are already many
>users in community are building package from 2.0 and reported many tickets
>to help improve Kylin,
>they are looking forward for the first release very much. With the Apache
> release process,
>the entire community will help to test and try with each release candidate
>for sure there's
>no critical issues, please also help log JIRA if any.
>
>  Back to Spark Cubing, as previous discussed with Spark community,
>there's
>still one pending
>JIRA for performance, so Spark Cubing already be excluded from the first
>release. But with plug-able architecture, it could be very easy to
>introduce back to coming version once the community happy for it.
>
>And, for Amazon EMR part, it's more about how to deploy rather than one
>"feature", it not  make
>sense to set this as one criteria.
>
>        Thanks to bring this discussion to help community:-)
>
>Luke
>
>
>Best Regards!
>---------------------
>
>Luke Han
>
>On Tue, Feb 2, 2016 at 8:48 AM, Adunuthula, Seshu <sa...@ebay.com>
>wrote:
>
>> Yang,
>>
>> Implementing the old MR engine on the pluggable architecture does not
>> prove that the architecture works. You need two points to draw a line. A
>> single point does not prove that the architecture works.
>>
>> Improving the MR engine performance can be done on 1.0 code are without
>> making it pluggable
>>
>>
>> External talks and POCs are not the release criteria for a feature.
>>
>> Regards
>> Seshu
>>
>> Sent from my iPhone
>>
>> > On Feb 1, 2016, at 6:01 PM, Li Yang <li...@apache.org> wrote:
>> >
>> > Seshu's understanding of the 2.0 and its plugin-able architecture is
>>very
>> > wrong. Let me correct. :-)
>> >
>> > The plugin-able architecture is rock solid. Its first commit went
>>back to
>> > Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
>> > which give much improved build and query performance. At the same
>>time,
>> the
>> > old V1 engines are still available on 2.0 branch. The plugin-able
>> > architecture allows coexistence of alternative engines. And user is
>>free
>> to
>> > choose any of the engines that suits the need.
>> >
>> > In the last few month, thorough testing has been done on the 2.0-rc
>> branch.
>> > Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
>> > compare the results by running tens of thousands of queries against
>>both
>> V1
>> > and V2 cubes. The correctness is confirmed and performance
>>improvement is
>> > measured. The 2.0-rc branch is definitely the most well tested branch
>>so
>> > far. I am very confident of its quality.
>> >
>> > I believe Seshu also agrees with the improved performance and its
>> quality,
>> > as he proposed to release as v1.3. However he didn't know the improved
>> > results are right on top of plugin-able architecture.
>> >
>> > So the saying plugin-able architecture is
>> >> "POC quality features that should not be part of a release. We have
>>not
>> > built a single of these plugins that are production quality."
>> > is very wrong.
>> >
>> > Streaming cubing is a less mature feature. It's in semi-production
>> > quality.  As shared in a few public talks, eBay has a SEO dashboard
>>case
>> > that leverages the streaming cubing feature and achieves 5 minutes
>>data
>> > latency.
>> >
>> > And I made the point very clear -- "Streaming cubing experimental
>> support,
>> > ... minutes interval" -- think no one will be confused.
>> >
>> > If more concerns about 2.0 quality, I suggest JIRA be opened and test
>> case
>> > be created. So we have evidence and can collaborate to improve.
>> >
>> > Still many thanks to the comments. Things become clearer through
>>healthy
>> > discussions. :-)
>> >
>> > Cheers
>> > Yang
>> >
>> > On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuthula@ebay.com
>> > <javascript:_e(%7B%7D,'cvml','sadunuthula@ebay.com');>> wrote:
>> >
>> >> A strong -1 on this.
>> >>
>> >> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> >> comparing hundreds of jobs.
>> >> - TopN pre-calculation (more UDFs coming)
>> >> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> >>
>> >>
>> >>
>> >> These are incremental enhancements and does not warrant bumping up to
>> 2.0
>> >> release. We should release them as in 1.3
>> >>
>> >>
>> >> - Streaming cubing experimental support, source from kafka, build
>>cube
>> >> in-mem at minutes interval
>> >> - A plugin-able architecture, to allow alternative cube engine /
>>storage
>> >> engine / data source.
>> >>
>> >>
>> >>
>> >> These are POC quality features that should not be part of a release.
>>We
>> >> have not built a single of these plugins that are production quality.
>> >>
>> >> Luke/Yang I have told you multiple times not to push out a release
>>when
>> it
>> >> is not ready. We nearly got down the entire HBase cluster in eBay
>>with
>> the
>> >> bad design for the Streaming. If we scale this up to 100s of
>>Streaming
>> >> Cubes this design will render an HBase cluster unusable.
>> >>
>> >> I have spent substantial time looking into the release and it does
>>not
>> >> meet eBay¹s standards for a quality release.
>> >>
>> >> We will be doing the community a huge disservice by pushing this out
>>by
>> >> end of February.
>> >>
>> >> Regards
>> >> Seshu Adunuthula
>> >>
>> >>
>> >>> On 1/31/16, 11:46 PM, "Li Yang" <li...@apache.org> wrote:
>> >>>
>> >>> Just  to add more colors.
>> >>>
>> >>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few
>>month.
>> The
>> >>> 2.0 rc1 contains:
>> >>>
>> >>> - A plugin-able architecture, to allow alternative cube engine /
>> storage
>> >>> engine / data source.
>> >>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> >>> comparing hundreds of jobs.
>> >>> - A better storage engine, makes query roughly 2 times faster
>> (especially
>> >>> for slow queries) than 1.x by comparing tens of thousands sqls.
>> >>> - Streaming cubing experimental support, source from kafka, build
>>cube
>> >>> in-mem at minutes interval
>> >>> - TopN pre-calculation (more UDFs coming)
>> >>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> >>> - SAML authentication support
>> >>>
>> >>> As the release manager, I will kickoff the release process in two
>>weeks
>> >>> (once back from vacation). ETA by end of Feb.
>> >>>
>> >>> Would love to hear more feedback from our community.  :-)
>> >>>
>> >>>
>> >>> Yang
>> >>>
>> >>>
>> >>>
>> >>> On Monday, February 1, 2016, Adunuthula, Seshu
>><sa...@ebay.com>
>> >>> wrote:
>> >>>
>> >>>> Hello Folks,
>> >>>>
>> >>>> We are actively working towards Apache Kylin 2.0 Release and would
>> like
>> >>>> a
>> >>>> discussion with the community on what they would like to see in 2.0
>> >>>> release
>> >>>> of the product. We have three big rock items we are working
>>towards in
>> >>>> 2.0
>> >>>> and lot of additional minor feature enhancements.
>> >>>>
>> >>>> Streaming Data Source support.
>> >>>> This feature is semi baked in where the source of Kylin Cubes is
>>Kafka
>> >>>> Topics. Cube Segment are built on micro batches of messages
>>arriving
>> on
>> >>>> Kafka topics. Currently a lot of work is going on to productize
>>this
>> >>>> feature. Primary areas of work are Stream Processing
>> Engines/Frameworks
>> >>>> to
>> >>>> process the micro batches and UI to support out of the box
>>integration
>> >>>> of
>> >>>> Kafka topics with Kylin Cubes.
>> >>>>
>> >>>> Spark based Cube building Engine.
>> >>>> The initial performance numbers for a Spark based cubing engine did
>> not
>> >>>> show substantial improvement over MR based engine, but would like
>>this
>> >>>> feature to be baked in for the 2.0 Release. Lot of work underway to
>> >>>> stabilize this feature.
>> >>>>
>> >>>> Amazon EMR Integration
>> >>>> We had initial conversations with Amazon EMR to support Apache
>>Kylin
>> on
>> >>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin
>>will
>> be
>> >>>> enabled feature on Amazon EMR. Limited work has gone into this
>>area,
>> but
>> >>>> this will be an important milestone for 2.0
>> >>>>
>> >>>> We are also working towards creating an area for community driven
>> >>>> improvements page similar to Apache Kafka¹s KIP
>> >>>>
>> >>>>
>> >>
>> 
>>https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
>> >>>> sals.
>> >>>> Stay tuned.
>> >>>>
>> >>>> Regards
>> >>>> Seshu Adunuthula
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>


Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by Luke Han <lu...@gmail.com>.
Hi Seshu,
      "Done is better than Perfect" is one practice in our development:
release early, ask users
to try and test, then fix bugs, bring other features if any, and then
release a new one...
It works very well in the past and I believe it will continue benefit
further development.

      And you could see, the 2.x branch is active development code base
over several months,
as Yang mentioned, we are confident to release first version now. Also
there are already many
users in community are building package from 2.0 and reported many tickets
to help improve Kylin,
they are looking forward for the first release very much. With the Apache
 release process,
the entire community will help to test and try with each release candidate
for sure there's
no critical issues, please also help log JIRA if any.

  Back to Spark Cubing, as previous discussed with Spark community, there's
still one pending
JIRA for performance, so Spark Cubing already be excluded from the first
release. But with plug-able architecture, it could be very easy to
introduce back to coming version once the community happy for it.

And, for Amazon EMR part, it's more about how to deploy rather than one
"feature", it not  make
sense to set this as one criteria.

        Thanks to bring this discussion to help community:-)

Luke


Best Regards!
---------------------

Luke Han

On Tue, Feb 2, 2016 at 8:48 AM, Adunuthula, Seshu <sa...@ebay.com>
wrote:

> Yang,
>
> Implementing the old MR engine on the pluggable architecture does not
> prove that the architecture works. You need two points to draw a line. A
> single point does not prove that the architecture works.
>
> Improving the MR engine performance can be done on 1.0 code are without
> making it pluggable
>
>
> External talks and POCs are not the release criteria for a feature.
>
> Regards
> Seshu
>
> Sent from my iPhone
>
> > On Feb 1, 2016, at 6:01 PM, Li Yang <li...@apache.org> wrote:
> >
> > Seshu's understanding of the 2.0 and its plugin-able architecture is very
> > wrong. Let me correct. :-)
> >
> > The plugin-able architecture is rock solid. Its first commit went back to
> > Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
> > which give much improved build and query performance. At the same time,
> the
> > old V1 engines are still available on 2.0 branch. The plugin-able
> > architecture allows coexistence of alternative engines. And user is free
> to
> > choose any of the engines that suits the need.
> >
> > In the last few month, thorough testing has been done on the 2.0-rc
> branch.
> > Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
> > compare the results by running tens of thousands of queries against both
> V1
> > and V2 cubes. The correctness is confirmed and performance improvement is
> > measured. The 2.0-rc branch is definitely the most well tested branch so
> > far. I am very confident of its quality.
> >
> > I believe Seshu also agrees with the improved performance and its
> quality,
> > as he proposed to release as v1.3. However he didn't know the improved
> > results are right on top of plugin-able architecture.
> >
> > So the saying plugin-able architecture is
> >> "POC quality features that should not be part of a release. We have not
> > built a single of these plugins that are production quality."
> > is very wrong.
> >
> > Streaming cubing is a less mature feature. It's in semi-production
> > quality.  As shared in a few public talks, eBay has a SEO dashboard case
> > that leverages the streaming cubing feature and achieves 5 minutes data
> > latency.
> >
> > And I made the point very clear -- "Streaming cubing experimental
> support,
> > ... minutes interval" -- think no one will be confused.
> >
> > If more concerns about 2.0 quality, I suggest JIRA be opened and test
> case
> > be created. So we have evidence and can collaborate to improve.
> >
> > Still many thanks to the comments. Things become clearer through healthy
> > discussions. :-)
> >
> > Cheers
> > Yang
> >
> > On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuthula@ebay.com
> > <javascript:_e(%7B%7D,'cvml','sadunuthula@ebay.com');>> wrote:
> >
> >> A strong -1 on this.
> >>
> >> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
> >> comparing hundreds of jobs.
> >> - TopN pre-calculation (more UDFs coming)
> >> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
> >>
> >>
> >>
> >> These are incremental enhancements and does not warrant bumping up to
> 2.0
> >> release. We should release them as in 1.3
> >>
> >>
> >> - Streaming cubing experimental support, source from kafka, build cube
> >> in-mem at minutes interval
> >> - A plugin-able architecture, to allow alternative cube engine / storage
> >> engine / data source.
> >>
> >>
> >>
> >> These are POC quality features that should not be part of a release. We
> >> have not built a single of these plugins that are production quality.
> >>
> >> Luke/Yang I have told you multiple times not to push out a release when
> it
> >> is not ready. We nearly got down the entire HBase cluster in eBay with
> the
> >> bad design for the Streaming. If we scale this up to 100s of Streaming
> >> Cubes this design will render an HBase cluster unusable.
> >>
> >> I have spent substantial time looking into the release and it does not
> >> meet eBay¹s standards for a quality release.
> >>
> >> We will be doing the community a huge disservice by pushing this out by
> >> end of February.
> >>
> >> Regards
> >> Seshu Adunuthula
> >>
> >>
> >>> On 1/31/16, 11:46 PM, "Li Yang" <li...@apache.org> wrote:
> >>>
> >>> Just  to add more colors.
> >>>
> >>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month.
> The
> >>> 2.0 rc1 contains:
> >>>
> >>> - A plugin-able architecture, to allow alternative cube engine /
> storage
> >>> engine / data source.
> >>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
> >>> comparing hundreds of jobs.
> >>> - A better storage engine, makes query roughly 2 times faster
> (especially
> >>> for slow queries) than 1.x by comparing tens of thousands sqls.
> >>> - Streaming cubing experimental support, source from kafka, build cube
> >>> in-mem at minutes interval
> >>> - TopN pre-calculation (more UDFs coming)
> >>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
> >>> - SAML authentication support
> >>>
> >>> As the release manager, I will kickoff the release process in two weeks
> >>> (once back from vacation). ETA by end of Feb.
> >>>
> >>> Would love to hear more feedback from our community.  :-)
> >>>
> >>>
> >>> Yang
> >>>
> >>>
> >>>
> >>> On Monday, February 1, 2016, Adunuthula, Seshu <sa...@ebay.com>
> >>> wrote:
> >>>
> >>>> Hello Folks,
> >>>>
> >>>> We are actively working towards Apache Kylin 2.0 Release and would
> like
> >>>> a
> >>>> discussion with the community on what they would like to see in 2.0
> >>>> release
> >>>> of the product. We have three big rock items we are working towards in
> >>>> 2.0
> >>>> and lot of additional minor feature enhancements.
> >>>>
> >>>> Streaming Data Source support.
> >>>> This feature is semi baked in where the source of Kylin Cubes is Kafka
> >>>> Topics. Cube Segment are built on micro batches of messages arriving
> on
> >>>> Kafka topics. Currently a lot of work is going on to productize this
> >>>> feature. Primary areas of work are Stream Processing
> Engines/Frameworks
> >>>> to
> >>>> process the micro batches and UI to support out of the box integration
> >>>> of
> >>>> Kafka topics with Kylin Cubes.
> >>>>
> >>>> Spark based Cube building Engine.
> >>>> The initial performance numbers for a Spark based cubing engine did
> not
> >>>> show substantial improvement over MR based engine, but would like this
> >>>> feature to be baked in for the 2.0 Release. Lot of work underway to
> >>>> stabilize this feature.
> >>>>
> >>>> Amazon EMR Integration
> >>>> We had initial conversations with Amazon EMR to support Apache Kylin
> on
> >>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will
> be
> >>>> enabled feature on Amazon EMR. Limited work has gone into this area,
> but
> >>>> this will be an important milestone for 2.0
> >>>>
> >>>> We are also working towards creating an area for community driven
> >>>> improvements page similar to Apache Kafka¹s KIP
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
> >>>> sals.
> >>>> Stay tuned.
> >>>>
> >>>> Regards
> >>>> Seshu Adunuthula
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
Yang,

Implementing the old MR engine on the pluggable architecture does not prove that the architecture works. You need two points to draw a line. A single point does not prove that the architecture works. 

Improving the MR engine performance can be done on 1.0 code are without making it pluggable 


External talks and POCs are not the release criteria for a feature. 

Regards
Seshu

Sent from my iPhone

> On Feb 1, 2016, at 6:01 PM, Li Yang <li...@apache.org> wrote:
> 
> Seshu's understanding of the 2.0 and its plugin-able architecture is very
> wrong. Let me correct. :-)
> 
> The plugin-able architecture is rock solid. Its first commit went back to
> Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
> which give much improved build and query performance. At the same time, the
> old V1 engines are still available on 2.0 branch. The plugin-able
> architecture allows coexistence of alternative engines. And user is free to
> choose any of the engines that suits the need.
> 
> In the last few month, thorough testing has been done on the 2.0-rc branch.
> Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
> compare the results by running tens of thousands of queries against both V1
> and V2 cubes. The correctness is confirmed and performance improvement is
> measured. The 2.0-rc branch is definitely the most well tested branch so
> far. I am very confident of its quality.
> 
> I believe Seshu also agrees with the improved performance and its quality,
> as he proposed to release as v1.3. However he didn't know the improved
> results are right on top of plugin-able architecture.
> 
> So the saying plugin-able architecture is
>> "POC quality features that should not be part of a release. We have not
> built a single of these plugins that are production quality."
> is very wrong.
> 
> Streaming cubing is a less mature feature. It's in semi-production
> quality.  As shared in a few public talks, eBay has a SEO dashboard case
> that leverages the streaming cubing feature and achieves 5 minutes data
> latency.
> 
> And I made the point very clear -- "Streaming cubing experimental support,
> ... minutes interval" -- think no one will be confused.
> 
> If more concerns about 2.0 quality, I suggest JIRA be opened and test case
> be created. So we have evidence and can collaborate to improve.
> 
> Still many thanks to the comments. Things become clearer through healthy
> discussions. :-)
> 
> Cheers
> Yang
> 
> On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuthula@ebay.com
> <javascript:_e(%7B%7D,'cvml','sadunuthula@ebay.com');>> wrote:
> 
>> A strong -1 on this.
>> 
>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>> comparing hundreds of jobs.
>> - TopN pre-calculation (more UDFs coming)
>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>> 
>> 
>> 
>> These are incremental enhancements and does not warrant bumping up to 2.0
>> release. We should release them as in 1.3
>> 
>> 
>> - Streaming cubing experimental support, source from kafka, build cube
>> in-mem at minutes interval
>> - A plugin-able architecture, to allow alternative cube engine / storage
>> engine / data source.
>> 
>> 
>> 
>> These are POC quality features that should not be part of a release. We
>> have not built a single of these plugins that are production quality.
>> 
>> Luke/Yang I have told you multiple times not to push out a release when it
>> is not ready. We nearly got down the entire HBase cluster in eBay with the
>> bad design for the Streaming. If we scale this up to 100s of Streaming
>> Cubes this design will render an HBase cluster unusable.
>> 
>> I have spent substantial time looking into the release and it does not
>> meet eBay¹s standards for a quality release.
>> 
>> We will be doing the community a huge disservice by pushing this out by
>> end of February.
>> 
>> Regards
>> Seshu Adunuthula
>> 
>> 
>>> On 1/31/16, 11:46 PM, "Li Yang" <li...@apache.org> wrote:
>>> 
>>> Just  to add more colors.
>>> 
>>> The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month. The
>>> 2.0 rc1 contains:
>>> 
>>> - A plugin-able architecture, to allow alternative cube engine / storage
>>> engine / data source.
>>> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
>>> comparing hundreds of jobs.
>>> - A better storage engine, makes query roughly 2 times faster (especially
>>> for slow queries) than 1.x by comparing tens of thousands sqls.
>>> - Streaming cubing experimental support, source from kafka, build cube
>>> in-mem at minutes interval
>>> - TopN pre-calculation (more UDFs coming)
>>> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>>> - SAML authentication support
>>> 
>>> As the release manager, I will kickoff the release process in two weeks
>>> (once back from vacation). ETA by end of Feb.
>>> 
>>> Would love to hear more feedback from our community.  :-)
>>> 
>>> 
>>> Yang
>>> 
>>> 
>>> 
>>> On Monday, February 1, 2016, Adunuthula, Seshu <sa...@ebay.com>
>>> wrote:
>>> 
>>>> Hello Folks,
>>>> 
>>>> We are actively working towards Apache Kylin 2.0 Release and would like
>>>> a
>>>> discussion with the community on what they would like to see in 2.0
>>>> release
>>>> of the product. We have three big rock items we are working towards in
>>>> 2.0
>>>> and lot of additional minor feature enhancements.
>>>> 
>>>> Streaming Data Source support.
>>>> This feature is semi baked in where the source of Kylin Cubes is Kafka
>>>> Topics. Cube Segment are built on micro batches of messages arriving on
>>>> Kafka topics. Currently a lot of work is going on to productize this
>>>> feature. Primary areas of work are Stream Processing Engines/Frameworks
>>>> to
>>>> process the micro batches and UI to support out of the box integration
>>>> of
>>>> Kafka topics with Kylin Cubes.
>>>> 
>>>> Spark based Cube building Engine.
>>>> The initial performance numbers for a Spark based cubing engine did not
>>>> show substantial improvement over MR based engine, but would like this
>>>> feature to be baked in for the 2.0 Release. Lot of work underway to
>>>> stabilize this feature.
>>>> 
>>>> Amazon EMR Integration
>>>> We had initial conversations with Amazon EMR to support Apache Kylin on
>>>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will be
>>>> enabled feature on Amazon EMR. Limited work has gone into this area, but
>>>> this will be an important milestone for 2.0
>>>> 
>>>> We are also working towards creating an area for community driven
>>>> improvements page similar to Apache Kafka¹s KIP
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
>>>> sals.
>>>> Stay tuned.
>>>> 
>>>> Regards
>>>> Seshu Adunuthula
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 

[DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by Li Yang <li...@apache.org>.
Seshu's understanding of the 2.0 and its plugin-able architecture is very
wrong. Let me correct. :-)

The plugin-able architecture is rock solid. Its first commit went back to
Jul 2015. On top it, we built MR cube engine V2 and storage engine V2,
which give much improved build and query performance. At the same time, the
old V1 engines are still available on 2.0 branch. The plugin-able
architecture allows coexistence of alternative engines. And user is free to
choose any of the engines that suits the need.

In the last few month, thorough testing has been done on the 2.0-rc branch.
Like mentioned, we have rebuild hundreds of jobs on the V2 engines and
compare the results by running tens of thousands of queries against both V1
and V2 cubes. The correctness is confirmed and performance improvement is
measured. The 2.0-rc branch is definitely the most well tested branch so
far. I am very confident of its quality.

I believe Seshu also agrees with the improved performance and its quality,
as he proposed to release as v1.3. However he didn't know the improved
results are right on top of plugin-able architecture.

So the saying plugin-able architecture is
> "POC quality features that should not be part of a release. We have not
built a single of these plugins that are production quality."
is very wrong.

Streaming cubing is a less mature feature. It's in semi-production
quality.  As shared in a few public talks, eBay has a SEO dashboard case
that leverages the streaming cubing feature and achieves 5 minutes data
latency.

And I made the point very clear -- "Streaming cubing experimental support,
... minutes interval" -- think no one will be confused.

If more concerns about 2.0 quality, I suggest JIRA be opened and test case
be created. So we have evidence and can collaborate to improve.

Still many thanks to the comments. Things become clearer through healthy
discussions. :-)

Cheers
Yang

On Tuesday, February 2, 2016, Adunuthula, Seshu <sadunuthula@ebay.com
<javascript:_e(%7B%7D,'cvml','sadunuthula@ebay.com');>> wrote:

> A strong -1 on this.
>
> - A better MR cubing algorithm, about 1.5 times faster than 1.x by
> comparing hundreds of jobs.
> - TopN pre-calculation (more UDFs coming)
> - ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>
>
>
> These are incremental enhancements and does not warrant bumping up to 2.0
> release. We should release them as in 1.3
>
>
> - Streaming cubing experimental support, source from kafka, build cube
> in-mem at minutes interval
> - A plugin-able architecture, to allow alternative cube engine / storage
> engine / data source.
>
>
>
> These are POC quality features that should not be part of a release. We
> have not built a single of these plugins that are production quality.
>
> Luke/Yang I have told you multiple times not to push out a release when it
> is not ready. We nearly got down the entire HBase cluster in eBay with the
> bad design for the Streaming. If we scale this up to 100s of Streaming
> Cubes this design will render an HBase cluster unusable.
>
> I have spent substantial time looking into the release and it does not
> meet eBay¹s standards for a quality release.
>
> We will be doing the community a huge disservice by pushing this out by
> end of February.
>
> Regards
> Seshu Adunuthula
>
>
> On 1/31/16, 11:46 PM, "Li Yang" <li...@apache.org> wrote:
>
> >Just  to add more colors.
> >
> >The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month. The
> >2.0 rc1 contains:
> >
> >- A plugin-able architecture, to allow alternative cube engine / storage
> >engine / data source.
> >- A better MR cubing algorithm, about 1.5 times faster than 1.x by
> >comparing hundreds of jobs.
> >- A better storage engine, makes query roughly 2 times faster (especially
> >for slow queries) than 1.x by comparing tens of thousands sqls.
> >- Streaming cubing experimental support, source from kafka, build cube
> >in-mem at minutes interval
> >- TopN pre-calculation (more UDFs coming)
> >- ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
> >- SAML authentication support
> >
> >As the release manager, I will kickoff the release process in two weeks
> >(once back from vacation). ETA by end of Feb.
> >
> >Would love to hear more feedback from our community.  :-)
> >
> >
> >Yang
> >
> >
> >
> >On Monday, February 1, 2016, Adunuthula, Seshu <sa...@ebay.com>
> >wrote:
> >
> >> Hello Folks,
> >>
> >> We are actively working towards Apache Kylin 2.0 Release and would like
> >>a
> >> discussion with the community on what they would like to see in 2.0
> >>release
> >> of the product. We have three big rock items we are working towards in
> >>2.0
> >> and lot of additional minor feature enhancements.
> >>
> >> Streaming Data Source support.
> >> This feature is semi baked in where the source of Kylin Cubes is Kafka
> >> Topics. Cube Segment are built on micro batches of messages arriving on
> >> Kafka topics. Currently a lot of work is going on to productize this
> >> feature. Primary areas of work are Stream Processing Engines/Frameworks
> >>to
> >> process the micro batches and UI to support out of the box integration
> >>of
> >> Kafka topics with Kylin Cubes.
> >>
> >> Spark based Cube building Engine.
> >> The initial performance numbers for a Spark based cubing engine did not
> >> show substantial improvement over MR based engine, but would like this
> >> feature to be baked in for the 2.0 Release. Lot of work underway to
> >> stabilize this feature.
> >>
> >> Amazon EMR Integration
> >> We had initial conversations with Amazon EMR to support Apache Kylin on
> >> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will be
> >> enabled feature on Amazon EMR. Limited work has gone into this area, but
> >> this will be an important milestone for 2.0
> >>
> >> We are also working towards creating an area for community driven
> >> improvements page similar to Apache Kafka¹s KIP
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
> >>sals.
> >> Stay tuned.
> >>
> >> Regards
> >> Seshu Adunuthula
> >>
> >>
> >>
> >>
> >>
>
>

Re: [DISCUSS]Apache Kylin 2.0 Release Features & Criteria

Posted by "Adunuthula, Seshu" <sa...@ebay.com>.
A strong -1 on this.

- A better MR cubing algorithm, about 1.5 times faster than 1.x by
comparing hundreds of jobs.
- TopN pre-calculation (more UDFs coming)
- ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI



These are incremental enhancements and does not warrant bumping up to 2.0
release. We should release them as in 1.3


- Streaming cubing experimental support, source from kafka, build cube
in-mem at minutes interval
- A plugin-able architecture, to allow alternative cube engine / storage
engine / data source.



These are POC quality features that should not be part of a release. We
have not built a single of these plugins that are production quality.

Luke/Yang I have told you multiple times not to push out a release when it
is not ready. We nearly got down the entire HBase cluster in eBay with the
bad design for the Streaming. If we scale this up to 100s of Streaming
Cubes this design will render an HBase cluster unusable.

I have spent substantial time looking into the release and it does not
meet eBay¹s standards for a quality release.

We will be doing the community a huge disservice by pushing this out by
end of February.

Regards
Seshu Adunuthula


On 1/31/16, 11:46 PM, "Li Yang" <li...@apache.org> wrote:

>Just  to add more colors.
>
>The 2.0 rc1 has been stabilizing in the 2.0-rc branch for a few month. The
>2.0 rc1 contains:
>
>- A plugin-able architecture, to allow alternative cube engine / storage
>engine / data source.
>- A better MR cubing algorithm, about 1.5 times faster than 1.x by
>comparing hundreds of jobs.
>- A better storage engine, makes query roughly 2 times faster (especially
>for slow queries) than 1.x by comparing tens of thousands sqls.
>- Streaming cubing experimental support, source from kafka, build cube
>in-mem at minutes interval
>- TopN pre-calculation (more UDFs coming)
>- ODBC compatible with Tableau 9.1, MS Excel, MS PowerBI
>- SAML authentication support
>
>As the release manager, I will kickoff the release process in two weeks
>(once back from vacation). ETA by end of Feb.
>
>Would love to hear more feedback from our community.  :-)
>
>
>Yang
>
>
>
>On Monday, February 1, 2016, Adunuthula, Seshu <sa...@ebay.com>
>wrote:
>
>> Hello Folks,
>>
>> We are actively working towards Apache Kylin 2.0 Release and would like
>>a
>> discussion with the community on what they would like to see in 2.0
>>release
>> of the product. We have three big rock items we are working towards in
>>2.0
>> and lot of additional minor feature enhancements.
>>
>> Streaming Data Source support.
>> This feature is semi baked in where the source of Kylin Cubes is Kafka
>> Topics. Cube Segment are built on micro batches of messages arriving on
>> Kafka topics. Currently a lot of work is going on to productize this
>> feature. Primary areas of work are Stream Processing Engines/Frameworks
>>to
>> process the micro batches and UI to support out of the box integration
>>of
>> Kafka topics with Kylin Cubes.
>>
>> Spark based Cube building Engine.
>> The initial performance numbers for a Spark based cubing engine did not
>> show substantial improvement over MR based engine, but would like this
>> feature to be baked in for the 2.0 Release. Lot of work underway to
>> stabilize this feature.
>>
>> Amazon EMR Integration
>> We had initial conversations with Amazon EMR to support Apache Kylin on
>> Amazon EMR which was received well. With Kylin 2.0 Apache Kylin will be
>> enabled feature on Amazon EMR. Limited work has gone into this area, but
>> this will be an important milestone for 2.0
>>
>> We are also working towards creating an area for community driven
>> improvements page similar to Apache Kafka¹s KIP
>> 
>>https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
>>sals.
>> Stay tuned.
>>
>> Regards
>> Seshu Adunuthula
>>
>>
>>
>>
>>