You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by Luke Han <lu...@gmail.com> on 2015/10/21 16:33:54 UTC

Re: why not Spark?

Is there anyone still interesting to ask this question?

We have some roughly benchmark result between Spark cubing and MR Cubing.

Would like to know from community about the requirement and use cases.

Thanks.




Best Regards!
---------------------

Luke Han

On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:

> Using this JIRA to track: https://issues.apache.org/jira/browse/KYLIN-679
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Spark could improve cube build greatly when the data fits in memory. We
>> have the extension point in design already. If anyone like to contribute
>> effort here, let us know.
>>
>> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
>>
>> > Subscribe this mailing list, follow up @ApacheKylin twitter.
>> > The Kylin website also under refactoring to add more content about such
>> > materials, coming soon.
>> >
>> > Thanks.
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
>> >
>> > > Thanks, Luke! but how do I get such materials in the future?
>> > >
>> > > 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
>> > >
>> > > > Hi Rong,
>> > > >     Spark is actually hot topic around Kylin, please refer to my
>> > > > presentation last month on Spark Meetup bay area:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
>> > > >
>> > > >     We also would like to have more comments, inputs and ideas from
>> > > > community to adding Spark support in Kylin.
>> > > >
>> > > >     Thanks.
>> > > >
>> > > > Luke
>> > > >
>> > > >
>> > > > Best Regards!
>> > > > ---------------------
>> > > >
>> > > > Luke Han
>> > > >
>> > > > 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
>> > > >
>> > > > > I have a question, why does Kylin not use Spark for cube building
>> > job?
>> > > > > Spark features DAG and in-memory computing, and these features
>> will
>> > > > improve
>> > > > > the cube building speed that is under mapreduce.
>> > > > > Thanks for any response, Shon
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Re: why not Spark?

Posted by "250635732@qq.com" <25...@qq.com>.

Hi, Luke

Did the benchmark test run within your production environment ? 

Use spark rdd or dataframe API ? 

Best,
Sun.



250635732@qq.com
 
From: Luke Han
Date: 2015-10-22 09:56
To: dev@kylin.incubator.apache.org
Subject: Re: why not Spark?
Well, the result is really interesting...
 
Is spark really faster than MR?
 
Qianhao will share more detail very soon;-)
 
 
Best Regards!
---------------------
 
Luke Han
 
On Thu, Oct 22, 2015 at 9:33 AM, Xiaoyu Wang <wa...@jd.com> wrote:
 
> +1
>
>
> 在 2015年10月22日 09:27, 250635732@qq.com 写道:
>
>> Hi, Luke
>>
>> Yes. We are interested in this progress. Would love to know the
>>
>> performance test between these two engines.
>>
>> Best,
>> Sun.
>>
>>
>>
>> 250635732@qq.com
>>   From: Luke Han
>> Date: 2015-10-21 22:33
>> To: dev@kylin.incubator.apache.org
>> Subject: Re: why not Spark?
>> Is there anyone still interesting to ask this question?
>>   We have some roughly benchmark result between Spark cubing and MR
>> Cubing.
>>   Would like to know from community about the requirement and use cases.
>>   Thanks.
>>         Best Regards!
>> ---------------------
>>   Luke Han
>>   On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:
>>
>>
>>> Using this JIRA to track:
>>> https://issues.apache.org/jira/browse/KYLIN-679
>>>
>>> Thanks.
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
>>>
>>> Spark could improve cube build greatly when the data fits in memory. We
>>>> have the extension point in design already. If anyone like to contribute
>>>> effort here, let us know.
>>>>
>>>> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
>>>>
>>>> Subscribe this mailing list, follow up @ApacheKylin twitter.
>>>>> The Kylin website also under refactoring to add more content about such
>>>>> materials, coming soon.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> ---------------------
>>>>>
>>>>> Luke Han
>>>>>
>>>>> 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>>
>>>>> Thanks, Luke! but how do I get such materials in the future?
>>>>>>
>>>>>> 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
>>>>>>
>>>>>> Hi Rong,
>>>>>>>      Spark is actually hot topic around Kylin, please refer to my
>>>>>>> presentation last month on Spark Meetup bay area:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
>>>>
>>>>>      We also would like to have more comments, inputs and ideas from
>>>>>>> community to adding Spark support in Kylin.
>>>>>>>
>>>>>>>      Thanks.
>>>>>>>
>>>>>>> Luke
>>>>>>>
>>>>>>>
>>>>>>> Best Regards!
>>>>>>> ---------------------
>>>>>>>
>>>>>>> Luke Han
>>>>>>>
>>>>>>> 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>>>>
>>>>>>> I have a question, why does Kylin not use Spark for cube building
>>>>>>>>
>>>>>>> job?
>>>>>
>>>>>> Spark features DAG and in-memory computing, and these features
>>>>>>>>
>>>>>>> will
>>>>
>>>>> improve
>>>>>>>
>>>>>>>> the cube building speed that is under mapreduce.
>>>>>>>> Thanks for any response, Shon
>>>>>>>>
>>>>>>>>
>>>

Re: why not Spark?

Posted by Luke Han <lu...@gmail.com>.

Well, the result is really interesting...

Is spark really faster than MR?

Qianhao will share more detail very soon;-)


Best Regards!
---------------------

Luke Han

On Thu, Oct 22, 2015 at 9:33 AM, Xiaoyu Wang <wa...@jd.com> wrote:

> +1
>
>
> 在 2015年10月22日 09:27, 250635732@qq.com 写道:
>
>> Hi, Luke
>>
>> Yes. We are interested in this progress. Would love to know the
>>
>> performance test between these two engines.
>>
>> Best,
>> Sun.
>>
>>
>>
>> 250635732@qq.com
>>   From: Luke Han
>> Date: 2015-10-21 22:33
>> To: dev@kylin.incubator.apache.org
>> Subject: Re: why not Spark?
>> Is there anyone still interesting to ask this question?
>>   We have some roughly benchmark result between Spark cubing and MR
>> Cubing.
>>   Would like to know from community about the requirement and use cases.
>>   Thanks.
>>         Best Regards!
>> ---------------------
>>   Luke Han
>>   On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:
>>
>>
>>> Using this JIRA to track:
>>> https://issues.apache.org/jira/browse/KYLIN-679
>>>
>>> Thanks.
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
>>>
>>> Spark could improve cube build greatly when the data fits in memory. We
>>>> have the extension point in design already. If anyone like to contribute
>>>> effort here, let us know.
>>>>
>>>> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
>>>>
>>>> Subscribe this mailing list, follow up @ApacheKylin twitter.
>>>>> The Kylin website also under refactoring to add more content about such
>>>>> materials, coming soon.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> ---------------------
>>>>>
>>>>> Luke Han
>>>>>
>>>>> 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>>
>>>>> Thanks, Luke! but how do I get such materials in the future?
>>>>>>
>>>>>> 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
>>>>>>
>>>>>> Hi Rong,
>>>>>>>      Spark is actually hot topic around Kylin, please refer to my
>>>>>>> presentation last month on Spark Meetup bay area:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
>>>>
>>>>>      We also would like to have more comments, inputs and ideas from
>>>>>>> community to adding Spark support in Kylin.
>>>>>>>
>>>>>>>      Thanks.
>>>>>>>
>>>>>>> Luke
>>>>>>>
>>>>>>>
>>>>>>> Best Regards!
>>>>>>> ---------------------
>>>>>>>
>>>>>>> Luke Han
>>>>>>>
>>>>>>> 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>>>>
>>>>>>> I have a question, why does Kylin not use Spark for cube building
>>>>>>>>
>>>>>>> job?
>>>>>
>>>>>> Spark features DAG and in-memory computing, and these features
>>>>>>>>
>>>>>>> will
>>>>
>>>>> improve
>>>>>>>
>>>>>>>> the cube building speed that is under mapreduce.
>>>>>>>> Thanks for any response, Shon
>>>>>>>>
>>>>>>>>
>>>

Re: why not Spark?

Posted by Xiaoyu Wang <wa...@jd.com>.

+1

在 2015年10月22日 09:27, 250635732@qq.com 写道:
> Hi, Luke
>
> Yes. We are interested in this progress. Would love to know the
>
> performance test between these two engines.
>
> Best,
> Sun.
>
>
>
> 250635732@qq.com
>   
> From: Luke Han
> Date: 2015-10-21 22:33
> To: dev@kylin.incubator.apache.org
> Subject: Re: why not Spark?
> Is there anyone still interesting to ask this question?
>   
> We have some roughly benchmark result between Spark cubing and MR Cubing.
>   
> Would like to know from community about the requirement and use cases.
>   
> Thanks.
>   
>   
>   
>   
> Best Regards!
> ---------------------
>   
> Luke Han
>   
> On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:
>   
>> Using this JIRA to track: https://issues.apache.org/jira/browse/KYLIN-679
>>
>> Thanks.
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
>>
>>> Spark could improve cube build greatly when the data fits in memory. We
>>> have the extension point in design already. If anyone like to contribute
>>> effort here, let us know.
>>>
>>> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
>>>
>>>> Subscribe this mailing list, follow up @ApacheKylin twitter.
>>>> The Kylin website also under refactoring to add more content about such
>>>> materials, coming soon.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> Best Regards!
>>>> ---------------------
>>>>
>>>> Luke Han
>>>>
>>>> 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>
>>>>> Thanks, Luke! but how do I get such materials in the future?
>>>>>
>>>>> 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
>>>>>
>>>>>> Hi Rong,
>>>>>>      Spark is actually hot topic around Kylin, please refer to my
>>>>>> presentation last month on Spark Meetup bay area:
>>>>>>
>>>>>>
>>>>>>
>>> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
>>>>>>      We also would like to have more comments, inputs and ideas from
>>>>>> community to adding Spark support in Kylin.
>>>>>>
>>>>>>      Thanks.
>>>>>>
>>>>>> Luke
>>>>>>
>>>>>>
>>>>>> Best Regards!
>>>>>> ---------------------
>>>>>>
>>>>>> Luke Han
>>>>>>
>>>>>> 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
>>>>>>
>>>>>>> I have a question, why does Kylin not use Spark for cube building
>>>> job?
>>>>>>> Spark features DAG and in-memory computing, and these features
>>> will
>>>>>> improve
>>>>>>> the cube building speed that is under mapreduce.
>>>>>>> Thanks for any response, Shon
>>>>>>>
>>

Re: Re: why not Spark?

Posted by "250635732@qq.com" <25...@qq.com>.

Hi, Luke

Yes. We are interested in this progress. Would love to know the 

performance test between these two engines.

Best,
Sun.



250635732@qq.com
 
From: Luke Han
Date: 2015-10-21 22:33
To: dev@kylin.incubator.apache.org
Subject: Re: why not Spark?
Is there anyone still interesting to ask this question?
 
We have some roughly benchmark result between Spark cubing and MR Cubing.
 
Would like to know from community about the requirement and use cases.
 
Thanks.
 
 
 
 
Best Regards!
---------------------
 
Luke Han
 
On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:
 
> Using this JIRA to track: https://issues.apache.org/jira/browse/KYLIN-679
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
>
>> Spark could improve cube build greatly when the data fits in memory. We
>> have the extension point in design already. If anyone like to contribute
>> effort here, let us know.
>>
>> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
>>
>> > Subscribe this mailing list, follow up @ApacheKylin twitter.
>> > The Kylin website also under refactoring to add more content about such
>> > materials, coming soon.
>> >
>> > Thanks.
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
>> >
>> > > Thanks, Luke! but how do I get such materials in the future?
>> > >
>> > > 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
>> > >
>> > > > Hi Rong,
>> > > >     Spark is actually hot topic around Kylin, please refer to my
>> > > > presentation last month on Spark Meetup bay area:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
>> > > >
>> > > >     We also would like to have more comments, inputs and ideas from
>> > > > community to adding Spark support in Kylin.
>> > > >
>> > > >     Thanks.
>> > > >
>> > > > Luke
>> > > >
>> > > >
>> > > > Best Regards!
>> > > > ---------------------
>> > > >
>> > > > Luke Han
>> > > >
>> > > > 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
>> > > >
>> > > > > I have a question, why does Kylin not use Spark for cube building
>> > job?
>> > > > > Spark features DAG and in-memory computing, and these features
>> will
>> > > > improve
>> > > > > the cube building speed that is under mapreduce.
>> > > > > Thanks for any response, Shon
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: why not Spark?

Posted by Shailesh Dangi <sd...@datalenz.com>.

Spark and Flink are becoming de-facto standards, replacing traditional
map-reduce jobs given their significant performance improvements.  Kylin,
as I have seen in some of the roadmap presentations, does plan to embrace
Spark.  This would be key for some of our customers/prospects where ETAs
around data/cube refresh are key.  We would definitely like to see this
built out at the earliest.

Thanks,
Regards,

On Wed, Oct 21, 2015 at 10:33 AM, Luke Han <lu...@gmail.com> wrote:

> Is there anyone still interesting to ask this question?
>
> We have some roughly benchmark result between Spark cubing and MR Cubing.
>
> Would like to know from community about the requirement and use cases.
>
> Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Mon, Apr 13, 2015 at 9:29 PM, Luke Han <lu...@gmail.com> wrote:
>
> > Using this JIRA to track:
> https://issues.apache.org/jira/browse/KYLIN-679
> >
> > Thanks.
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > 2015-04-11 9:15 GMT+08:00 Li Yang <li...@apache.org>:
> >
> >> Spark could improve cube build greatly when the data fits in memory. We
> >> have the extension point in design already. If anyone like to contribute
> >> effort here, let us know.
> >>
> >> On Thu, Apr 9, 2015 at 10:35 PM, Luke Han <lu...@gmail.com> wrote:
> >>
> >> > Subscribe this mailing list, follow up @ApacheKylin twitter.
> >> > The Kylin website also under refactoring to add more content about
> such
> >> > materials, coming soon.
> >> >
> >> > Thanks.
> >> >
> >> >
> >> > Best Regards!
> >> > ---------------------
> >> >
> >> > Luke Han
> >> >
> >> > 2015-04-09 21:14 GMT+08:00 林澍荣 <li...@gmail.com>:
> >> >
> >> > > Thanks, Luke! but how do I get such materials in the future?
> >> > >
> >> > > 2015-04-09 16:59 GMT+08:00 Luke Han <lu...@gmail.com>:
> >> > >
> >> > > > Hi Rong,
> >> > > >     Spark is actually hot topic around Kylin, please refer to my
> >> > > > presentation last month on Spark Meetup bay area:
> >> > > >
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> http://www.slideshare.net/lukehan/adding-spark-support-to-kylin-spark-meetupv11
> >> > > >
> >> > > >     We also would like to have more comments, inputs and ideas
> from
> >> > > > community to adding Spark support in Kylin.
> >> > > >
> >> > > >     Thanks.
> >> > > >
> >> > > > Luke
> >> > > >
> >> > > >
> >> > > > Best Regards!
> >> > > > ---------------------
> >> > > >
> >> > > > Luke Han
> >> > > >
> >> > > > 2015-04-09 16:34 GMT+08:00 林澍荣 <li...@gmail.com>:
> >> > > >
> >> > > > > I have a question, why does Kylin not use Spark for cube
> building
> >> > job?
> >> > > > > Spark features DAG and in-memory computing, and these features
> >> will
> >> > > > improve
> >> > > > > the cube building speed that is under mapreduce.
> >> > > > > Thanks for any response, Shon
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>