You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by 蒋旭 <ji...@qq.com> on 2015/09/15 07:17:58 UTC

回复： Kylin Real time

I also think that it's a mini batch cubing.   It's time to bring back the inverted index into roadmap. The inverted index will be the true real-time solution and can provide the low-level query capability on the raw data. 


Thanks!
JiangXu


------------------ 原始邮件 ------------------
发件人: "Henry Saputra";<he...@gmail.com>;
发送时间: 2015年9月15日(星期二) 中午12:39
收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>; 

主题: Re: Kylin Real time



Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Luke,
>>
>> Could you clarify again what is the streaming OLAP means here?
>>
>> By definition OLAP work with historical data.
>>
>> Maybe I missed it but was there any discussions or proposed design for it?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>
>> > Hi Siddharth,
>> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> which
>> > will coming in Q4 since it still under development now, as Hongbin
>> > mentioned above.
>> >     Could  you please drop me a mail about your case? I would like to
>> > better understand your scenario to well manage coming features?
>> >
>> >     Thanks.
>> >
>> >
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > For current 0.7  releases, you cannot.
>> > >
>> > > Real time data processing and querying will be added in 0.8 release. It
>> > is
>> > > still under development and testing. We have achieved good progress on
>> > it,
>> > > please wait for announcements.
>> > >
>> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> > >
>> > > > Hi ,
>> > > >
>> > > > I would like to ask whether Kylin can be used as a real time querying
>> > > > system?
>> > > > The process of building a cube , makes it look like a batch process
>> > after
>> > > > which the queries are with low latency.. however can
>> > > > We get a real time idea of what the OLAP system's state is at the
>> query
>> > > > instance?
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>

Re: 回复： Kylin Real time

Posted by Sarnath <st...@gmail.com>.

Hi,

Can you share some reasons why "Inverted Index" did not work..
Coz, I am precisely trying to do the same for storing cubes - in our own
private implementation.
Wondering - what problems are upstream?

Thanks,
Best,
Sarnath

Re: 回复： Kylin Real time

Posted by hongbin ma <ma...@apache.org>.

Hi luke

I'm afraid you answer might be a little confusing to outside customers.
Cube, Streaming, and Inverted Index are not concepts in the same context.
My understanding is:

1. "Cube" or "Inverted Index" is the two options we store digested data.
This is what we allow modeler to specify data model. Cube is Kylin's
original choice for storage, and later we introduced "Inverted Index" in an
attempt to serve near real time requirements.(Because with v1 engine,
building cube process is very time consuming, whereas putting digested data
into inverted index is much faster), However development on Inverted Index
is paused due to several reasons.

2."Streaming" is a concept compared with "Batch". Before 2.x versions,
Kylin uses v1 engine to build cubes which only supports loading data from
hive tables in a batch fashion, this is why it is called "Batch" mode. In
2.x versions, we invented the new v2 engine and started to support building
cubes from streaming queues like Kafka. As previously explained, current
streaming solutions is not strictly "real time streaming" because it is
basically consuming the streaming data to build mini cubes.

On Wed, Sep 23, 2015 at 9:39 PM, Luke Han <lu...@gmail.com> wrote:

> Hi gaspare,
>     You have raised a great discussion about those things.
>     As orignial idea, there's only cube, but we come up a new concept: Data
> Model since "Cube" itself is just one storage.
>
>     There's one option for modelor to define/pickup which kind of storage
> for the Data Model, actually we call it
> as Realization interface for Cube, Streaming and Inverted Index
> and extensible for any others in the future.
>
>    So you are right, there's will be one UI setting part for Data Model for
> this which will come later since 2.x is under heavy refactoring and
> turning, just like Hongbin mentioned.
>
>     Please stay tuned for the latest update of streaming/realtime
> capability of Kylin.
>
>     Thanks.
>
> Luke
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Sep 23, 2015 at 2:55 PM, hongbin ma <ma...@apache.org> wrote:
>
> > hi gaspare
> >
> > Actually we do have a similar solutions in the 2.x-staging code base. It
> is
> > called "Streaming Cubing" (In contrast to Inverted Index, it is using a
> > mini batch cubing solution to tackle the near real time problem)
> >
> > There will be daemon threads that starts up periodically to consume data
> > from the data batch (maybe five-minute batch) from Kafka, and build a
> > mini-cube in memory before writing it into HBase. We have not officially
> > announced the functionality because:
> >
> > 1. Currently we do not have front end UI to do the configurations,
> > including specifying Kafka configurations, etc. This makes  Streaming
> > Cubing difficult to use now. The good news is that we're actively working
> > on it (https://issues.apache.org/jira/browse/KYLIN-1041)
> > 2. Lack of Documentation
> > 3. Currently we have not leveraged spark streaming(or other alternatives)
> > to process the data batch. Our daemon thread is a simple java thread and
> it
> > will be problematic when the data batch grows too large. We intended to
> > migrate to horizontal scalable solutions like spark streaming, but havn't
> > got enough bandwidth to start it.(
> > https://issues.apache.org/jira/browse/KYLIN-1042)
> >
> > Anyway customers should be able to use Streaming cubing when we
> officially
> > annnouce 2.x versions.
> >
> >
> >
> >
> >
> > On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
> > gaspare.maria@gfmintegration.it> wrote:
> >
> > > Hi,
> > >
> > > one more question/feedback regarding Kylin Real time.
> > >
> > > There are many use-cases (in particular in the TELCO environment) where
> > > stream of data arrive at regular intervals (usually every 5 or 15
> > minutes)
> > > and "real-time" aggregations could be always done per intervals (for
> > > example SUM(upLink) per node in the last interval). In such use-cases
> the
> > > "maybe" the CUBE could be update in near real-time from after
> > > pre-aggregation with Spark Streaming (of course without create the
> HFiles
> > > but using parallel PUT on HBase with Spark). According to our
> experience
> > > for "simple" CUBEs this should be faster then Inverted Indexes.
> > >
> > > Of course there are use-cases where this approach is not applicable, in
> > > those cases Inverted Indexes are still valid.
> > >
> > > Should be good if Kylin will be able to give to the "CUBE
> Administrator"
> > > the possibility to choose how to do "Real-time CUBE Update". For
> example,
> > > give the option to  choose wither "Inverted Indexes" or "HBase".
> > >
> > > Do you think a such approach could be applicable to Kylin ?
> > >
> > > Regards,
> > >
> > > -- gas
> > >
> > >
> > >
> > > On 09/21/2015 11:36 AM, Li Yang wrote:
> > >
> > >> Gas is mostly right, with one addition that, query can hit both
> > >> inverted-index and cube if it asks for both latest and historic data.
> > The
> > >> result from two sources will get aggregated at query time.
> > >>
> > >> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> > >> gaspare.maria@gfmintegration.it> wrote:
> > >>
> > >> Hi,
> > >>>
> > >>> so if I understood the idea behind Kylin Real Time is:
> > >>>
> > >>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase)
> will
> > >>>     be built according to CUBE Schema in near-realtime by using Spark
> > >>>     (streaming) Kafka Consumers;
> > >>>   * On query Time if the query impacts latest data it will be routed
> to
> > >>>     Inverted Indexes otherwise on the CUBE on HBase.
> > >>>   * Query that impacts latest data should be limited due to
> limitation
> > >>>     of inverted indexes;
> > >>>   * Query on long period of time back (e.g. from now back to 2 months
> > >>>     ago) will be routed part on HBase and part on Inverted Indexes.
> > >>>
> > >>>
> > >>> Am I right?
> > >>>
> > >>> Regards,
> > >>>
> > >>> -- gas
> > >>>
> > >>>
> > >>>
> > >>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
> > >>>
> > >>> Awesome, thanks Luke
> > >>>>
> > >>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com>
> wrote:
> > >>>>
> > >>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
> > >>>>>
> > >>>>>
> > >>>>> Best Regards!
> > >>>>> ---------------------
> > >>>>>
> > >>>>> Luke Han
> > >>>>>
> > >>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
> > >>>>> henry.saputra@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>> That is good to know. Li Yang, Luke, could one of you share the
> > design
> > >>>>>
> > >>>>>> document for this realtime OLAP query in the JIRA?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> - Henry
> > >>>>>>
> > >>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org>
> > wrote:
> > >>>>>>
> > >>>>>> There will be incremental updates on the existing cubes, but
> during
> > >>>>>>>
> > >>>>>>>> that updates I suppose no queries will be ran against them?
> > >>>>>>>>
> > >>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
> > >>>>>>> cube
> > >>>>>>> CAN
> > >>>>>>> serve query while the mini incremental is under built. How can we
> > let
> > >>>>>>> the
> > >>>>>>> cube offline every few minutes, that's impossible.  :-)
> > >>>>>>>
> > >>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com>
> > wrote:
> > >>>>>>>
> > >>>>>>> Inverted index? That sounds interesting. We use inverted index to
> > >>>>>>> serve
> > >>>>>>> the
> > >>>>>>> cubes in our internal implementation.
> > >>>>>>>
> > >>>>>>>> I come from Big Data Center of excellence from an Indian IT
> major.
> > >>>>>>>>
> > >>>>>>>> We have been experimenting with the idea of serving cubes
> through
> > >>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is
> our
> > >>>>>>>> own
> > >>>>>>>> internal development.
> > >>>>>>>>
> > >>>>>>>> The motivation for this is --- Once the cube is built, it needs
> to
> > >>>>>>>> be
> > >>>>>>>> served.
> > >>>>>>>>
> > >>>>>>>> The query looks somewhat like this:
> > >>>>>>>>
> > >>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> > >>>>>>>>
> > >>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
> > >>>>>>>>
> > >>>>>>>> Find all entries that match K1=V1, K2=V2
> > >>>>>>>>
> > >>>>>>>> This relieves us from lot of things - storage, REST API etc. and
> > >>>>>>>> makes
> > >>>>>>>>
> > >>>>>>>> the
> > >>>>>>> cubes easily searchable.
> > >>>>>>>
> > >>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> > >>>>>>>> experimenting with Web-Data-Connector which we believe can be
> used
> > >>>>>>>> for
> > >>>>>>>> Visualization... Apart from that, we experimented with a few
> > >>>>>>>> auto-generated Kibana dashboards which were just okay. But
> Kibana
> > >>>>>>>> was
> > >>>>>>>>
> > >>>>>>>> not
> > >>>>>>> designed for Cubes and so it has its own limitations.
> > >>>>>>>
> > >>>>>>>> Appreciate any feedback!
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>>
> > >>>>>>>> Sarnath
> > >>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
> > >>>>>>>> back
> > >>>>>>>>
> > >>>>>>>> the
> > >>>>>>> inverted index into roadmap. The inverted index will be the true
> > >>>>>>> real-time
> > >>>>>>> solution and can provide the low-level query capability on the
> raw
> > >>>>>>>
> > >>>>>>>> data.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>> JiangXu
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> ------------------ 原始邮件 ------------------
> > >>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
> > >>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
> > >>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
> > >>>>>>>> dev@kylin.incubator.apache.org
> > >>>>>>>>
> > >>>>>>>>> ;
> > >>>>>>>>>
> > >>>>>>>> 主题: Re: Kylin Real time
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Ok, but that still seems like mini batch to me.
> > >>>>>>>>
> > >>>>>>>> There will be incremental updates on the existing cubes, but
> > during
> > >>>>>>>> that updates I suppose no queries will be ran against them?
> > >>>>>>>>
> > >>>>>>>> - Henry
> > >>>>>>>>
> > >>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay
> > can
> > >>>>>>>>>
> > >>>>>>>>> be as
> > >>>>>>>>
> > >>>>>>> short as a few minutes.
> > >>>>>>>
> > >>>>>>>> Traditional daily build allows user to analyze yesterday's data.
> > If
> > >>>>>>>>> increase the frequency to hourly, then user can analyze last
> > hour's
> > >>>>>>>>>
> > >>>>>>>>> data.
> > >>>>>>>>
> > >>>>>>> Further down the line, how about incremental build every 5
> minutes
> > >>>>>>>
> > >>>>>>>> from a
> > >>>>>>>>
> > >>>>>>> streaming source? Then user can analyze data 5 minutes ago.
> That's
> > >>>>>>>
> > >>>>>>>> Streaming OLAP!
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> > >>>>>>>>>
> > >>>>>>>>> henry.saputra@gmail.com
> > >>>>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Luke,
> > >>>>>>>>>
> > >>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
> > >>>>>>>>>>
> > >>>>>>>>>> By definition OLAP work with historical data.
> > >>>>>>>>>>
> > >>>>>>>>>> Maybe I missed it but was there any discussions or proposed
> > design
> > >>>>>>>>>>
> > >>>>>>>>>> for
> > >>>>>>>>>
> > >>>>>>>> it?
> > >>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>>> - Henry
> > >>>>>>>>>>
> > >>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com>
> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Siddharth,
> > >>>>>>>>>>
> > >>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
> > >>>>>>>>>>> Streaming
> > >>>>>>>>>>>
> > >>>>>>>>>>> OLAP
> > >>>>>>>>>>
> > >>>>>>>>> which
> > >>>>>>>
> > >>>>>>>> will coming in Q4 since it still under development now, as
> Hongbin
> > >>>>>>>>>>> mentioned above.
> > >>>>>>>>>>>       Could  you please drop me a mail about your case? I
> would
> > >>>>>>>>>>> like
> > >>>>>>>>>>>
> > >>>>>>>>>>> to
> > >>>>>>>>>>
> > >>>>>>>>> better understand your scenario to well manage coming features?
> > >>>>>>>
> > >>>>>>>>       Thanks.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best Regards!
> > >>>>>>>>>>> ---------------------
> > >>>>>>>>>>>
> > >>>>>>>>>>> Luke Han
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
> > >>>>>>>>>>> mahongbin@apache.org
> > >>>>>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> For current 0.7  releases, you cannot.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Real time data processing and querying will be added in 0.8
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> release.
> > >>>>>>>>>>>
> > >>>>>>>>>> It
> > >>>>>>>
> > >>>>>>>> is
> > >>>>>>>>>
> > >>>>>>>>>> still under development and testing. We have achieved good
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> progress
> > >>>>>>>>>>>
> > >>>>>>>>>> on
> > >>>>>>>
> > >>>>>>>> it,
> > >>>>>>>>>
> > >>>>>>>>>> please wait for announcements.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > >>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi ,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real
> time
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> querying
> > >>>>>>>>>>>>
> > >>>>>>>>>>> system?
> > >>>>>>>>>
> > >>>>>>>>>> The process of building a cube , makes it look like a batch
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> process
> > >>>>>>>>>>>>
> > >>>>>>>>>>> after
> > >>>>>>>>>
> > >>>>>>>>>> which the queries are with low latency.. however can
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is
> at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>
> > >>>>>>>>>>> query
> > >>>>>>>
> > >>>>>>>> instance?
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Siddharth
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Bin Mahone | 马洪宾*
> > >>>>>>>>>>>> Apache Kylin: http://kylin.io
> > >>>>>>>>>>>> Github: https://github.com/binmahone
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 回复： Kylin Real time

Posted by Luke Han <lu...@gmail.com>.

Hi gaspare,
    You have raised a great discussion about those things.
    As orignial idea, there's only cube, but we come up a new concept: Data
Model since "Cube" itself is just one storage.

    There's one option for modelor to define/pickup which kind of storage
for the Data Model, actually we call it
as Realization interface for Cube, Streaming and Inverted Index
and extensible for any others in the future.

   So you are right, there's will be one UI setting part for Data Model for
this which will come later since 2.x is under heavy refactoring and
turning, just like Hongbin mentioned.

    Please stay tuned for the latest update of streaming/realtime
capability of Kylin.

    Thanks.

Luke


Best Regards!
---------------------

Luke Han

On Wed, Sep 23, 2015 at 2:55 PM, hongbin ma <ma...@apache.org> wrote:

> hi gaspare
>
> Actually we do have a similar solutions in the 2.x-staging code base. It is
> called "Streaming Cubing" (In contrast to Inverted Index, it is using a
> mini batch cubing solution to tackle the near real time problem)
>
> There will be daemon threads that starts up periodically to consume data
> from the data batch (maybe five-minute batch) from Kafka, and build a
> mini-cube in memory before writing it into HBase. We have not officially
> announced the functionality because:
>
> 1. Currently we do not have front end UI to do the configurations,
> including specifying Kafka configurations, etc. This makes  Streaming
> Cubing difficult to use now. The good news is that we're actively working
> on it (https://issues.apache.org/jira/browse/KYLIN-1041)
> 2. Lack of Documentation
> 3. Currently we have not leveraged spark streaming(or other alternatives)
> to process the data batch. Our daemon thread is a simple java thread and it
> will be problematic when the data batch grows too large. We intended to
> migrate to horizontal scalable solutions like spark streaming, but havn't
> got enough bandwidth to start it.(
> https://issues.apache.org/jira/browse/KYLIN-1042)
>
> Anyway customers should be able to use Streaming cubing when we officially
> annnouce 2.x versions.
>
>
>
>
>
> On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
> gaspare.maria@gfmintegration.it> wrote:
>
> > Hi,
> >
> > one more question/feedback regarding Kylin Real time.
> >
> > There are many use-cases (in particular in the TELCO environment) where
> > stream of data arrive at regular intervals (usually every 5 or 15
> minutes)
> > and "real-time" aggregations could be always done per intervals (for
> > example SUM(upLink) per node in the last interval). In such use-cases the
> > "maybe" the CUBE could be update in near real-time from after
> > pre-aggregation with Spark Streaming (of course without create the HFiles
> > but using parallel PUT on HBase with Spark). According to our experience
> > for "simple" CUBEs this should be faster then Inverted Indexes.
> >
> > Of course there are use-cases where this approach is not applicable, in
> > those cases Inverted Indexes are still valid.
> >
> > Should be good if Kylin will be able to give to the "CUBE Administrator"
> > the possibility to choose how to do "Real-time CUBE Update". For example,
> > give the option to  choose wither "Inverted Indexes" or "HBase".
> >
> > Do you think a such approach could be applicable to Kylin ?
> >
> > Regards,
> >
> > -- gas
> >
> >
> >
> > On 09/21/2015 11:36 AM, Li Yang wrote:
> >
> >> Gas is mostly right, with one addition that, query can hit both
> >> inverted-index and cube if it asks for both latest and historic data.
> The
> >> result from two sources will get aggregated at query time.
> >>
> >> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> >> gaspare.maria@gfmintegration.it> wrote:
> >>
> >> Hi,
> >>>
> >>> so if I understood the idea behind Kylin Real Time is:
> >>>
> >>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
> >>>     be built according to CUBE Schema in near-realtime by using Spark
> >>>     (streaming) Kafka Consumers;
> >>>   * On query Time if the query impacts latest data it will be routed to
> >>>     Inverted Indexes otherwise on the CUBE on HBase.
> >>>   * Query that impacts latest data should be limited due to limitation
> >>>     of inverted indexes;
> >>>   * Query on long period of time back (e.g. from now back to 2 months
> >>>     ago) will be routed part on HBase and part on Inverted Indexes.
> >>>
> >>>
> >>> Am I right?
> >>>
> >>> Regards,
> >>>
> >>> -- gas
> >>>
> >>>
> >>>
> >>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
> >>>
> >>> Awesome, thanks Luke
> >>>>
> >>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
> >>>>
> >>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
> >>>>>
> >>>>>
> >>>>> Best Regards!
> >>>>> ---------------------
> >>>>>
> >>>>> Luke Han
> >>>>>
> >>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
> >>>>> henry.saputra@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> That is good to know. Li Yang, Luke, could one of you share the
> design
> >>>>>
> >>>>>> document for this realtime OLAP query in the JIRA?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> - Henry
> >>>>>>
> >>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org>
> wrote:
> >>>>>>
> >>>>>> There will be incremental updates on the existing cubes, but during
> >>>>>>>
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
> >>>>>>> cube
> >>>>>>> CAN
> >>>>>>> serve query while the mini incremental is under built. How can we
> let
> >>>>>>> the
> >>>>>>> cube offline every few minutes, that's impossible.  :-)
> >>>>>>>
> >>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Inverted index? That sounds interesting. We use inverted index to
> >>>>>>> serve
> >>>>>>> the
> >>>>>>> cubes in our internal implementation.
> >>>>>>>
> >>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
> >>>>>>>>
> >>>>>>>> We have been experimenting with the idea of serving cubes through
> >>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our
> >>>>>>>> own
> >>>>>>>> internal development.
> >>>>>>>>
> >>>>>>>> The motivation for this is --- Once the cube is built, it needs to
> >>>>>>>> be
> >>>>>>>> served.
> >>>>>>>>
> >>>>>>>> The query looks somewhat like this:
> >>>>>>>>
> >>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >>>>>>>>
> >>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
> >>>>>>>>
> >>>>>>>> Find all entries that match K1=V1, K2=V2
> >>>>>>>>
> >>>>>>>> This relieves us from lot of things - storage, REST API etc. and
> >>>>>>>> makes
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> cubes easily searchable.
> >>>>>>>
> >>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> >>>>>>>> experimenting with Web-Data-Connector which we believe can be used
> >>>>>>>> for
> >>>>>>>> Visualization... Apart from that, we experimented with a few
> >>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana
> >>>>>>>> was
> >>>>>>>>
> >>>>>>>> not
> >>>>>>> designed for Cubes and so it has its own limitations.
> >>>>>>>
> >>>>>>>> Appreciate any feedback!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Sarnath
> >>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
> >>>>>>>> back
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> inverted index into roadmap. The inverted index will be the true
> >>>>>>> real-time
> >>>>>>> solution and can provide the low-level query capability on the raw
> >>>>>>>
> >>>>>>>> data.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> JiangXu
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------ 原始邮件 ------------------
> >>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
> >>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
> >>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
> >>>>>>>> dev@kylin.incubator.apache.org
> >>>>>>>>
> >>>>>>>>> ;
> >>>>>>>>>
> >>>>>>>> 主题: Re: Kylin Real time
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok, but that still seems like mini batch to me.
> >>>>>>>>
> >>>>>>>> There will be incremental updates on the existing cubes, but
> during
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> - Henry
> >>>>>>>>
> >>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay
> can
> >>>>>>>>>
> >>>>>>>>> be as
> >>>>>>>>
> >>>>>>> short as a few minutes.
> >>>>>>>
> >>>>>>>> Traditional daily build allows user to analyze yesterday's data.
> If
> >>>>>>>>> increase the frequency to hourly, then user can analyze last
> hour's
> >>>>>>>>>
> >>>>>>>>> data.
> >>>>>>>>
> >>>>>>> Further down the line, how about incremental build every 5 minutes
> >>>>>>>
> >>>>>>>> from a
> >>>>>>>>
> >>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
> >>>>>>>
> >>>>>>>> Streaming OLAP!
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> >>>>>>>>>
> >>>>>>>>> henry.saputra@gmail.com
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Luke,
> >>>>>>>>>
> >>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
> >>>>>>>>>>
> >>>>>>>>>> By definition OLAP work with historical data.
> >>>>>>>>>>
> >>>>>>>>>> Maybe I missed it but was there any discussions or proposed
> design
> >>>>>>>>>>
> >>>>>>>>>> for
> >>>>>>>>>
> >>>>>>>> it?
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>>> - Henry
> >>>>>>>>>>
> >>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Siddharth,
> >>>>>>>>>>
> >>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
> >>>>>>>>>>> Streaming
> >>>>>>>>>>>
> >>>>>>>>>>> OLAP
> >>>>>>>>>>
> >>>>>>>>> which
> >>>>>>>
> >>>>>>>> will coming in Q4 since it still under development now, as Hongbin
> >>>>>>>>>>> mentioned above.
> >>>>>>>>>>>       Could  you please drop me a mail about your case? I would
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>> better understand your scenario to well manage coming features?
> >>>>>>>
> >>>>>>>>       Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best Regards!
> >>>>>>>>>>> ---------------------
> >>>>>>>>>>>
> >>>>>>>>>>> Luke Han
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
> >>>>>>>>>>> mahongbin@apache.org
> >>>>>>>>>>> <javascript:;>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> For current 0.7  releases, you cannot.
> >>>>>>>>>>>
> >>>>>>>>>>>> Real time data processing and querying will be added in 0.8
> >>>>>>>>>>>>
> >>>>>>>>>>>> release.
> >>>>>>>>>>>
> >>>>>>>>>> It
> >>>>>>>
> >>>>>>>> is
> >>>>>>>>>
> >>>>>>>>>> still under development and testing. We have achieved good
> >>>>>>>>>>>>
> >>>>>>>>>>>> progress
> >>>>>>>>>>>
> >>>>>>>>>> on
> >>>>>>>
> >>>>>>>> it,
> >>>>>>>>>
> >>>>>>>>>> please wait for announcements.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi ,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> querying
> >>>>>>>>>>>>
> >>>>>>>>>>> system?
> >>>>>>>>>
> >>>>>>>>>> The process of building a cube , makes it look like a batch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> process
> >>>>>>>>>>>>
> >>>>>>>>>>> after
> >>>>>>>>>
> >>>>>>>>>> which the queries are with low latency.. however can
> >>>>>>>>>>>>
> >>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>
> >>>>>>>>>>> query
> >>>>>>>
> >>>>>>>> instance?
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Siddharth
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Bin Mahone | 马洪宾*
> >>>>>>>>>>>> Apache Kylin: http://kylin.io
> >>>>>>>>>>>> Github: https://github.com/binmahone
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: 回复： Kylin Real time

Posted by hongbin ma <ma...@apache.org>.

hi gaspare

Actually we do have a similar solutions in the 2.x-staging code base. It is
called "Streaming Cubing" (In contrast to Inverted Index, it is using a
mini batch cubing solution to tackle the near real time problem)

There will be daemon threads that starts up periodically to consume data
from the data batch (maybe five-minute batch) from Kafka, and build a
mini-cube in memory before writing it into HBase. We have not officially
announced the functionality because:

1. Currently we do not have front end UI to do the configurations,
including specifying Kafka configurations, etc. This makes  Streaming
Cubing difficult to use now. The good news is that we're actively working
on it (https://issues.apache.org/jira/browse/KYLIN-1041)
2. Lack of Documentation
3. Currently we have not leveraged spark streaming(or other alternatives)
to process the data batch. Our daemon thread is a simple java thread and it
will be problematic when the data batch grows too large. We intended to
migrate to horizontal scalable solutions like spark streaming, but havn't
got enough bandwidth to start it.(
https://issues.apache.org/jira/browse/KYLIN-1042)

Anyway customers should be able to use Streaming cubing when we officially
annnouce 2.x versions.





On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
gaspare.maria@gfmintegration.it> wrote:

> Hi,
>
> one more question/feedback regarding Kylin Real time.
>
> There are many use-cases (in particular in the TELCO environment) where
> stream of data arrive at regular intervals (usually every 5 or 15 minutes)
> and "real-time" aggregations could be always done per intervals (for
> example SUM(upLink) per node in the last interval). In such use-cases the
> "maybe" the CUBE could be update in near real-time from after
> pre-aggregation with Spark Streaming (of course without create the HFiles
> but using parallel PUT on HBase with Spark). According to our experience
> for "simple" CUBEs this should be faster then Inverted Indexes.
>
> Of course there are use-cases where this approach is not applicable, in
> those cases Inverted Indexes are still valid.
>
> Should be good if Kylin will be able to give to the "CUBE Administrator"
> the possibility to choose how to do "Real-time CUBE Update". For example,
> give the option to  choose wither "Inverted Indexes" or "HBase".
>
> Do you think a such approach could be applicable to Kylin ?
>
> Regards,
>
> -- gas
>
>
>
> On 09/21/2015 11:36 AM, Li Yang wrote:
>
>> Gas is mostly right, with one addition that, query can hit both
>> inverted-index and cube if it asks for both latest and historic data. The
>> result from two sources will get aggregated at query time.
>>
>> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
>> gaspare.maria@gfmintegration.it> wrote:
>>
>> Hi,
>>>
>>> so if I understood the idea behind Kylin Real Time is:
>>>
>>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>>>     be built according to CUBE Schema in near-realtime by using Spark
>>>     (streaming) Kafka Consumers;
>>>   * On query Time if the query impacts latest data it will be routed to
>>>     Inverted Indexes otherwise on the CUBE on HBase.
>>>   * Query that impacts latest data should be limited due to limitation
>>>     of inverted indexes;
>>>   * Query on long period of time back (e.g. from now back to 2 months
>>>     ago) will be routed part on HBase and part on Inverted Indexes.
>>>
>>>
>>> Am I right?
>>>
>>> Regards,
>>>
>>> -- gas
>>>
>>>
>>>
>>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>>>
>>> Awesome, thanks Luke
>>>>
>>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>>>
>>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> ---------------------
>>>>>
>>>>> Luke Han
>>>>>
>>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
>>>>> henry.saputra@gmail.com>
>>>>> wrote:
>>>>>
>>>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>>>
>>>>>> document for this realtime OLAP query in the JIRA?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>>
>>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>>
>>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
>>>>>>> cube
>>>>>>> CAN
>>>>>>> serve query while the mini incremental is under built. How can we let
>>>>>>> the
>>>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>>>
>>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>>>
>>>>>>> Inverted index? That sounds interesting. We use inverted index to
>>>>>>> serve
>>>>>>> the
>>>>>>> cubes in our internal implementation.
>>>>>>>
>>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>>>
>>>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our
>>>>>>>> own
>>>>>>>> internal development.
>>>>>>>>
>>>>>>>> The motivation for this is --- Once the cube is built, it needs to
>>>>>>>> be
>>>>>>>> served.
>>>>>>>>
>>>>>>>> The query looks somewhat like this:
>>>>>>>>
>>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>>>
>>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>>>
>>>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>>>
>>>>>>>> This relieves us from lot of things - storage, REST API etc. and
>>>>>>>> makes
>>>>>>>>
>>>>>>>> the
>>>>>>> cubes easily searchable.
>>>>>>>
>>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>>>> experimenting with Web-Data-Connector which we believe can be used
>>>>>>>> for
>>>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana
>>>>>>>> was
>>>>>>>>
>>>>>>>> not
>>>>>>> designed for Cubes and so it has its own limitations.
>>>>>>>
>>>>>>>> Appreciate any feedback!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Sarnath
>>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
>>>>>>>> back
>>>>>>>>
>>>>>>>> the
>>>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>>> real-time
>>>>>>> solution and can provide the low-level query capability on the raw
>>>>>>>
>>>>>>>> data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> JiangXu
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------ 原始邮件 ------------------
>>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
>>>>>>>> dev@kylin.incubator.apache.org
>>>>>>>>
>>>>>>>>> ;
>>>>>>>>>
>>>>>>>> 主题: Re: Kylin Real time
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>>>
>>>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>>
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>>>
>>>>>>>>> be as
>>>>>>>>
>>>>>>> short as a few minutes.
>>>>>>>
>>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>>>
>>>>>>>>> data.
>>>>>>>>
>>>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>>
>>>>>>>> from a
>>>>>>>>
>>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>>
>>>>>>>> Streaming OLAP!
>>>>>>>>>
>>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>>>
>>>>>>>>> henry.saputra@gmail.com
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Luke,
>>>>>>>>>
>>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>>>
>>>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>>>
>>>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>> it?
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>> - Henry
>>>>>>>>>>
>>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Siddharth,
>>>>>>>>>>
>>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
>>>>>>>>>>> Streaming
>>>>>>>>>>>
>>>>>>>>>>> OLAP
>>>>>>>>>>
>>>>>>>>> which
>>>>>>>
>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>>>> mentioned above.
>>>>>>>>>>>       Could  you please drop me a mail about your case? I would
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> better understand your scenario to well manage coming features?
>>>>>>>
>>>>>>>>       Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards!
>>>>>>>>>>> ---------------------
>>>>>>>>>>>
>>>>>>>>>>> Luke Han
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
>>>>>>>>>>> mahongbin@apache.org
>>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>>
>>>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>>>
>>>>>>>>>>>> release.
>>>>>>>>>>>
>>>>>>>>>> It
>>>>>>>
>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>>>
>>>>>>>>>>>> progress
>>>>>>>>>>>
>>>>>>>>>> on
>>>>>>>
>>>>>>>> it,
>>>>>>>>>
>>>>>>>>>> please wait for announcements.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>>>
>>>>>>>>>>>>> querying
>>>>>>>>>>>>
>>>>>>>>>>> system?
>>>>>>>>>
>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>>>
>>>>>>>>>>>>> process
>>>>>>>>>>>>
>>>>>>>>>>> after
>>>>>>>>>
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>>>
>>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> query
>>>>>>>
>>>>>>>> instance?
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Siddharth
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 回复： Kylin Real time

Posted by Gaspare Maria <ga...@gfmintegration.it>.

Hi,

one more question/feedback regarding Kylin Real time.

There are many use-cases (in particular in the TELCO environment) where 
stream of data arrive at regular intervals (usually every 5 or 15 
minutes) and "real-time" aggregations could be always done per intervals 
(for example SUM(upLink) per node in the last interval). In such 
use-cases the "maybe" the CUBE could be update in near real-time from 
after pre-aggregation with Spark Streaming (of course without create the 
HFiles but using parallel PUT on HBase with Spark). According to our 
experience for "simple" CUBEs this should be faster then Inverted Indexes.

Of course there are use-cases where this approach is not applicable, in 
those cases Inverted Indexes are still valid.

Should be good if Kylin will be able to give to the "CUBE Administrator" 
the possibility to choose how to do "Real-time CUBE Update". For 
example, give the option to  choose wither "Inverted Indexes" or "HBase".

Do you think a such approach could be applicable to Kylin ?

Regards,

-- gas


On 09/21/2015 11:36 AM, Li Yang wrote:
> Gas is mostly right, with one addition that, query can hit both
> inverted-index and cube if it asks for both latest and historic data. The
> result from two sources will get aggregated at query time.
>
> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> gaspare.maria@gfmintegration.it> wrote:
>
>> Hi,
>>
>> so if I understood the idea behind Kylin Real Time is:
>>
>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>>     be built according to CUBE Schema in near-realtime by using Spark
>>     (streaming) Kafka Consumers;
>>   * On query Time if the query impacts latest data it will be routed to
>>     Inverted Indexes otherwise on the CUBE on HBase.
>>   * Query that impacts latest data should be limited due to limitation
>>     of inverted indexes;
>>   * Query on long period of time back (e.g. from now back to 2 months
>>     ago) will be routed part on HBase and part on Inverted Indexes.
>>
>>
>> Am I right?
>>
>> Regards,
>>
>> -- gas
>>
>>
>>
>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>>
>>> Awesome, thanks Luke
>>>
>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>>
>>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>>
>>>>
>>>> Best Regards!
>>>> ---------------------
>>>>
>>>> Luke Han
>>>>
>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>>>> wrote:
>>>>
>>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>>> document for this realtime OLAP query in the JIRA?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> - Henry
>>>>>
>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>
>>>>>> Yes, it's mini batch, usually at minutes interval. And of course cube
>>>>>> CAN
>>>>>> serve query while the mini incremental is under built. How can we let
>>>>>> the
>>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>>
>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>>
>>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>>>>> the
>>>>>> cubes in our internal implementation.
>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>>
>>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>>>> internal development.
>>>>>>>
>>>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>>>> served.
>>>>>>>
>>>>>>> The query looks somewhat like this:
>>>>>>>
>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>>
>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>>
>>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>>
>>>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>>>>>>
>>>>>> the
>>>>>> cubes easily searchable.
>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>>>>>>
>>>>>> not
>>>>>> designed for Cubes and so it has its own limitations.
>>>>>>> Appreciate any feedback!
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Sarnath
>>>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>>>>>>
>>>>>> the
>>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>> real-time
>>>>>> solution and can provide the low-level query capability on the raw
>>>>>>> data.
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> JiangXu
>>>>>>>
>>>>>>>
>>>>>>> ------------------ 原始邮件 ------------------
>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<dev@kylin.incubator.apache.org
>>>>>>>> ;
>>>>>>> 主题: Re: Kylin Real time
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>>
>>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>
>>>>>>> - Henry
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>>>
>>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>>
>>>>>>> be as
>>>>>> short as a few minutes.
>>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>>
>>>>>>> data.
>>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>> from a
>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>>> Streaming OLAP!
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>>
>>>>>>> henry.saputra@gmail.com
>>>>>> wrote:
>>>>>>>> Hi Luke,
>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>>
>>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>>
>>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>>
>>>>>>>> for
>>>>>> it?
>>>>>>>> Thanks,
>>>>>>>>> - Henry
>>>>>>>>>
>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Siddharth,
>>>>>>>>>>       Kylin's next majority release (0.8.x) will support Streaming
>>>>>>>>>>
>>>>>>>>> OLAP
>>>>>> which
>>>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>>> mentioned above.
>>>>>>>>>>       Could  you please drop me a mail about your case? I would like
>>>>>>>>>>
>>>>>>>>> to
>>>>>> better understand your scenario to well manage coming features?
>>>>>>>>>>       Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards!
>>>>>>>>>> ---------------------
>>>>>>>>>>
>>>>>>>>>> Luke Han
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>>
>>>>>>>>>> release.
>>>>>> It
>>>>>>>> is
>>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>>
>>>>>>>>>> progress
>>>>>> on
>>>>>>>> it,
>>>>>>>>>>> please wait for announcements.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi ,
>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>>
>>>>>>>>>>> querying
>>>>>>>> system?
>>>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>>
>>>>>>>>>>> process
>>>>>>>> after
>>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>> query
>>>>>>>>>> instance?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Siddharth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>>
>>>>>>>>>>>

Re: 回复： Kylin Real time

Posted by Li Yang <li...@apache.org>.

Gas is mostly right, with one addition that, query can hit both
inverted-index and cube if it asks for both latest and historic data. The
result from two sources will get aggregated at query time.

On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
gaspare.maria@gfmintegration.it> wrote:

> Hi,
>
> so if I understood the idea behind Kylin Real Time is:
>
>  *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>    be built according to CUBE Schema in near-realtime by using Spark
>    (streaming) Kafka Consumers;
>  * On query Time if the query impacts latest data it will be routed to
>    Inverted Indexes otherwise on the CUBE on HBase.
>  * Query that impacts latest data should be limited due to limitation
>    of inverted indexes;
>  * Query on long period of time back (e.g. from now back to 2 months
>    ago) will be routed part on HBase and part on Inverted Indexes.
>
>
> Am I right?
>
> Regards,
>
> -- gas
>
>
>
> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>
>> Awesome, thanks Luke
>>
>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>
>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>>> wrote:
>>>
>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>> document for this realtime OLAP query in the JIRA?
>>>>
>>>> Thanks,
>>>>
>>>> - Henry
>>>>
>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>
>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>> Yes, it's mini batch, usually at minutes interval. And of course cube
>>>>> CAN
>>>>> serve query while the mini incremental is under built. How can we let
>>>>> the
>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>
>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>
>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>>>>>
>>>>> the
>>>>
>>>>> cubes in our internal implementation.
>>>>>>
>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>
>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>>> internal development.
>>>>>>
>>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>>> served.
>>>>>>
>>>>>> The query looks somewhat like this:
>>>>>>
>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>
>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>
>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>
>>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>>>>>
>>>>> the
>>>>
>>>>> cubes easily searchable.
>>>>>>
>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>>>>>
>>>>> not
>>>>
>>>>> designed for Cubes and so it has its own limitations.
>>>>>>
>>>>>> Appreciate any feedback!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Sarnath
>>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>>>>>
>>>>> the
>>>>
>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>>
>>>>> real-time
>>>>
>>>>> solution and can provide the low-level query capability on the raw
>>>>>> data.
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> JiangXu
>>>>>>
>>>>>>
>>>>>> ------------------ 原始邮件 ------------------
>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>> 收件人: "dev@kylin.incubator.apache.org"<dev@kylin.incubator.apache.org
>>>>>> >;
>>>>>>
>>>>>> 主题: Re: Kylin Real time
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>>
>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>
>>>>>> be as
>>>>
>>>>> short as a few minutes.
>>>>>>>
>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>
>>>>>> data.
>>>>
>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>>
>>>>>> from a
>>>>
>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>> Streaming OLAP!
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>
>>>>>> henry.saputra@gmail.com
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> Hi Luke,
>>>>>>>>
>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>
>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>
>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>
>>>>>>> for
>>>>
>>>>> it?
>>>>>>
>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Siddharth,
>>>>>>>>>      Kylin's next majority release (0.8.x) will support Streaming
>>>>>>>>>
>>>>>>>> OLAP
>>>>
>>>>> which
>>>>>>>>
>>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>> mentioned above.
>>>>>>>>>      Could  you please drop me a mail about your case? I would like
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> better understand your scenario to well manage coming features?
>>>>>>>>>
>>>>>>>>>      Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards!
>>>>>>>>> ---------------------
>>>>>>>>>
>>>>>>>>> Luke Han
>>>>>>>>>
>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>
>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>
>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>
>>>>>>>>> release.
>>>>
>>>>> It
>>>>>>
>>>>>>> is
>>>>>>>>>
>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>
>>>>>>>>> progress
>>>>
>>>>> on
>>>>>>
>>>>>>> it,
>>>>>>>>>
>>>>>>>>>> please wait for announcements.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>>
>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>
>>>>>>>>>> querying
>>>>>>
>>>>>>> system?
>>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>
>>>>>>>>>> process
>>>>>>
>>>>>>> after
>>>>>>>>>
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> query
>>>>>>>>
>>>>>>>>> instance?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Siddharth
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>
>>>>>>>>>>
>

Re: 回复： Kylin Real time

Posted by Gaspare Maria <ga...@gfmintegration.it>.

Hi,

so if I understood the idea behind Kylin Real Time is:

  *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
    be built according to CUBE Schema in near-realtime by using Spark
    (streaming) Kafka Consumers;
  * On query Time if the query impacts latest data it will be routed to
    Inverted Indexes otherwise on the CUBE on HBase.
  * Query that impacts latest data should be limited due to limitation
    of inverted indexes;
  * Query on long period of time back (e.g. from now back to 2 months
    ago) will be routed part on HBase and part on Inverted Indexes.


Am I right?

Regards,

-- gas


On 09/18/2015 12:35 AM, Henry Saputra wrote:
> Awesome, thanks Luke
>
> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>> wrote:
>>
>>> That is good to know. Li Yang, Luke, could one of you share the design
>>> document for this realtime OLAP query in the JIRA?
>>>
>>> Thanks,
>>>
>>> - Henry
>>>
>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>> There will be incremental updates on the existing cubes, but during
>>>>> that updates I suppose no queries will be ran against them?
>>>> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
>>>> serve query while the mini incremental is under built. How can we let the
>>>> cube offline every few minutes, that's impossible.  :-)
>>>>
>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>
>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>> the
>>>>> cubes in our internal implementation.
>>>>>
>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>
>>>>> We have been experimenting with the idea of serving cubes through
>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>> internal development.
>>>>>
>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>> served.
>>>>>
>>>>> The query looks somewhat like this:
>>>>>
>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>
>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>
>>>>> Find all entries that match K1=V1, K2=V2
>>>>>
>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>> the
>>>>> cubes easily searchable.
>>>>>
>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>> Visualization... Apart from that, we experimented with a few
>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>> not
>>>>> designed for Cubes and so it has its own limitations.
>>>>>
>>>>> Appreciate any feedback!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Best,
>>>>>
>>>>> Sarnath
>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>> the
>>>>> inverted index into roadmap. The inverted index will be the true
>>> real-time
>>>>> solution and can provide the low-level query capability on the raw data.
>>>>>
>>>>>
>>>>> Thanks!
>>>>> JiangXu
>>>>>
>>>>>
>>>>> ------------------ 原始邮件 ------------------
>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>>>>>
>>>>> 主题: Re: Kylin Real time
>>>>>
>>>>>
>>>>>
>>>>> Ok, but that still seems like mini batch to me.
>>>>>
>>>>> There will be incremental updates on the existing cubes, but during
>>>>> that updates I suppose no queries will be ran against them?
>>>>>
>>>>> - Henry
>>>>>
>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>> be as
>>>>>> short as a few minutes.
>>>>>>
>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>> data.
>>>>>> Further down the line, how about incremental build every 5 minutes
>>> from a
>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>> Streaming OLAP!
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>> henry.saputra@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Luke,
>>>>>>>
>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>
>>>>>>> By definition OLAP work with historical data.
>>>>>>>
>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>> for
>>>>> it?
>>>>>>> Thanks,
>>>>>>>
>>>>>>> - Henry
>>>>>>>
>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Siddharth,
>>>>>>>>      Kylin's next majority release (0.8.x) will support Streaming
>>> OLAP
>>>>>>> which
>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>> mentioned above.
>>>>>>>>      Could  you please drop me a mail about your case? I would like
>>> to
>>>>>>>> better understand your scenario to well manage coming features?
>>>>>>>>
>>>>>>>>      Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards!
>>>>>>>> ---------------------
>>>>>>>>
>>>>>>>> Luke Han
>>>>>>>>
>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>> <javascript:;>> wrote:
>>>>>>>>
>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>
>>>>>>>>> Real time data processing and querying will be added in 0.8
>>> release.
>>>>> It
>>>>>>>> is
>>>>>>>>> still under development and testing. We have achieved good
>>> progress
>>>>> on
>>>>>>>> it,
>>>>>>>>> please wait for announcements.
>>>>>>>>>
>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>
>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>> querying
>>>>>>>>>> system?
>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>> process
>>>>>>>> after
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>> the
>>>>>>> query
>>>>>>>>>> instance?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Siddharth
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>

Re: 回复： Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.

Awesome, thanks Luke

On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> That is good to know. Li Yang, Luke, could one of you share the design
>> document for this realtime OLAP query in the JIRA?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>> >> There will be incremental updates on the existing cubes, but during
>> >> that updates I suppose no queries will be ran against them?
>> >
>> > Yes, it's mini batch, usually at minutes interval. And of course cube CAN
>> > serve query while the mini incremental is under built. How can we let the
>> > cube offline every few minutes, that's impossible.  :-)
>> >
>> > On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>> >
>> >> Inverted index? That sounds interesting. We use inverted index to serve
>> the
>> >> cubes in our internal implementation.
>> >>
>> >> I come from Big Data Center of excellence from an Indian IT major.
>> >>
>> >> We have been experimenting with the idea of serving cubes through
>> >> ElasticSearch REST API. This is not related to Kylin. This is our own
>> >> internal development.
>> >>
>> >> The motivation for this is --- Once the cube is built, it needs to be
>> >> served.
>> >>
>> >> The query looks somewhat like this:
>> >>
>> >> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>> >>
>> >> "Given ProductID=XX, Fetch how much it has sold every Month"
>> >>
>> >> Find all entries that match K1=V1, K2=V2
>> >>
>> >> This relieves us from lot of things - storage, REST API etc. and makes
>> the
>> >> cubes easily searchable.
>> >>
>> >> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>> >> experimenting with Web-Data-Connector which we believe can be used for
>> >> Visualization... Apart from that, we experimented with a few
>> >> auto-generated Kibana dashboards which were just okay. But Kibana was
>> not
>> >> designed for Cubes and so it has its own limitations.
>> >>
>> >> Appreciate any feedback!
>> >>
>> >> Thanks,
>> >>
>> >> Best,
>> >>
>> >> Sarnath
>> >> I also think that it's a mini batch cubing.   It's time to bring back
>> the
>> >> inverted index into roadmap. The inverted index will be the true
>> real-time
>> >> solution and can provide the low-level query capability on the raw data.
>> >>
>> >>
>> >> Thanks!
>> >> JiangXu
>> >>
>> >>
>> >> ------------------ 原始邮件 ------------------
>> >> 发件人: "Henry Saputra";<he...@gmail.com>;
>> >> 发送时间: 2015年9月15日(星期二) 中午12:39
>> >> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>> >>
>> >> 主题: Re: Kylin Real time
>> >>
>> >>
>> >>
>> >> Ok, but that still seems like mini batch to me.
>> >>
>> >> There will be incremental updates on the existing cubes, but during
>> >> that updates I suppose no queries will be ran against them?
>> >>
>> >> - Henry
>> >>
>> >> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>> >> > Streaming OLAP provides Near-Realtime analysis where data delay can
>> be as
>> >> > short as a few minutes.
>> >> >
>> >> > Traditional daily build allows user to analyze yesterday's data. If
>> >> > increase the frequency to hourly, then user can analyze last hour's
>> data.
>> >> > Further down the line, how about incremental build every 5 minutes
>> from a
>> >> > streaming source? Then user can analyze data 5 minutes ago. That's
>> >> > Streaming OLAP!
>> >> >
>> >> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>> henry.saputra@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Hi Luke,
>> >> >>
>> >> >> Could you clarify again what is the streaming OLAP means here?
>> >> >>
>> >> >> By definition OLAP work with historical data.
>> >> >>
>> >> >> Maybe I missed it but was there any discussions or proposed design
>> for
>> >> it?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> - Henry
>> >> >>
>> >> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>> >> >>
>> >> >> > Hi Siddharth,
>> >> >> >     Kylin's next majority release (0.8.x) will support Streaming
>> OLAP
>> >> >> which
>> >> >> > will coming in Q4 since it still under development now, as Hongbin
>> >> >> > mentioned above.
>> >> >> >     Could  you please drop me a mail about your case? I would like
>> to
>> >> >> > better understand your scenario to well manage coming features?
>> >> >> >
>> >> >> >     Thanks.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Best Regards!
>> >> >> > ---------------------
>> >> >> >
>> >> >> > Luke Han
>> >> >> >
>> >> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> >> >> > <javascript:;>> wrote:
>> >> >> >
>> >> >> > > For current 0.7  releases, you cannot.
>> >> >> > >
>> >> >> > > Real time data processing and querying will be added in 0.8
>> release.
>> >> It
>> >> >> > is
>> >> >> > > still under development and testing. We have achieved good
>> progress
>> >> on
>> >> >> > it,
>> >> >> > > please wait for announcements.
>> >> >> > >
>> >> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> >> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> >> >> > >
>> >> >> > > > Hi ,
>> >> >> > > >
>> >> >> > > > I would like to ask whether Kylin can be used as a real time
>> >> querying
>> >> >> > > > system?
>> >> >> > > > The process of building a cube , makes it look like a batch
>> >> process
>> >> >> > after
>> >> >> > > > which the queries are with low latency.. however can
>> >> >> > > > We get a real time idea of what the OLAP system's state is at
>> the
>> >> >> query
>> >> >> > > > instance?
>> >> >> > > >
>> >> >> > > > Thanks,
>> >> >> > > > Siddharth
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --
>> >> >> > > Regards,
>> >> >> > >
>> >> >> > > *Bin Mahone | 马洪宾*
>> >> >> > > Apache Kylin: http://kylin.io
>> >> >> > > Github: https://github.com/binmahone
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Re: 回复： Kylin Real time

Posted by Luke Han <lu...@gmail.com>.

Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599


Best Regards!
---------------------

Luke Han

On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
wrote:

> That is good to know. Li Yang, Luke, could one of you share the design
> document for this realtime OLAP query in the JIRA?
>
> Thanks,
>
> - Henry
>
> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
> >> There will be incremental updates on the existing cubes, but during
> >> that updates I suppose no queries will be ran against them?
> >
> > Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> > serve query while the mini incremental is under built. How can we let the
> > cube offline every few minutes, that's impossible.  :-)
> >
> > On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
> >
> >> Inverted index? That sounds interesting. We use inverted index to serve
> the
> >> cubes in our internal implementation.
> >>
> >> I come from Big Data Center of excellence from an Indian IT major.
> >>
> >> We have been experimenting with the idea of serving cubes through
> >> ElasticSearch REST API. This is not related to Kylin. This is our own
> >> internal development.
> >>
> >> The motivation for this is --- Once the cube is built, it needs to be
> >> served.
> >>
> >> The query looks somewhat like this:
> >>
> >> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >>
> >> "Given ProductID=XX, Fetch how much it has sold every Month"
> >>
> >> Find all entries that match K1=V1, K2=V2
> >>
> >> This relieves us from lot of things - storage, REST API etc. and makes
> the
> >> cubes easily searchable.
> >>
> >> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> >> experimenting with Web-Data-Connector which we believe can be used for
> >> Visualization... Apart from that, we experimented with a few
> >> auto-generated Kibana dashboards which were just okay. But Kibana was
> not
> >> designed for Cubes and so it has its own limitations.
> >>
> >> Appreciate any feedback!
> >>
> >> Thanks,
> >>
> >> Best,
> >>
> >> Sarnath
> >> I also think that it's a mini batch cubing.   It's time to bring back
> the
> >> inverted index into roadmap. The inverted index will be the true
> real-time
> >> solution and can provide the low-level query capability on the raw data.
> >>
> >>
> >> Thanks!
> >> JiangXu
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "Henry Saputra";<he...@gmail.com>;
> >> 发送时间: 2015年9月15日(星期二) 中午12:39
> >> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
> >>
> >> 主题: Re: Kylin Real time
> >>
> >>
> >>
> >> Ok, but that still seems like mini batch to me.
> >>
> >> There will be incremental updates on the existing cubes, but during
> >> that updates I suppose no queries will be ran against them?
> >>
> >> - Henry
> >>
> >> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> >> > Streaming OLAP provides Near-Realtime analysis where data delay can
> be as
> >> > short as a few minutes.
> >> >
> >> > Traditional daily build allows user to analyze yesterday's data. If
> >> > increase the frequency to hourly, then user can analyze last hour's
> data.
> >> > Further down the line, how about incremental build every 5 minutes
> from a
> >> > streaming source? Then user can analyze data 5 minutes ago. That's
> >> > Streaming OLAP!
> >> >
> >> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> henry.saputra@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi Luke,
> >> >>
> >> >> Could you clarify again what is the streaming OLAP means here?
> >> >>
> >> >> By definition OLAP work with historical data.
> >> >>
> >> >> Maybe I missed it but was there any discussions or proposed design
> for
> >> it?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> - Henry
> >> >>
> >> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >> >>
> >> >> > Hi Siddharth,
> >> >> >     Kylin's next majority release (0.8.x) will support Streaming
> OLAP
> >> >> which
> >> >> > will coming in Q4 since it still under development now, as Hongbin
> >> >> > mentioned above.
> >> >> >     Could  you please drop me a mail about your case? I would like
> to
> >> >> > better understand your scenario to well manage coming features?
> >> >> >
> >> >> >     Thanks.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Best Regards!
> >> >> > ---------------------
> >> >> >
> >> >> > Luke Han
> >> >> >
> >> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> >> >> > <javascript:;>> wrote:
> >> >> >
> >> >> > > For current 0.7  releases, you cannot.
> >> >> > >
> >> >> > > Real time data processing and querying will be added in 0.8
> release.
> >> It
> >> >> > is
> >> >> > > still under development and testing. We have achieved good
> progress
> >> on
> >> >> > it,
> >> >> > > please wait for announcements.
> >> >> > >
> >> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >> >> > >
> >> >> > > > Hi ,
> >> >> > > >
> >> >> > > > I would like to ask whether Kylin can be used as a real time
> >> querying
> >> >> > > > system?
> >> >> > > > The process of building a cube , makes it look like a batch
> >> process
> >> >> > after
> >> >> > > > which the queries are with low latency.. however can
> >> >> > > > We get a real time idea of what the OLAP system's state is at
> the
> >> >> query
> >> >> > > > instance?
> >> >> > > >
> >> >> > > > Thanks,
> >> >> > > > Siddharth
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Regards,
> >> >> > >
> >> >> > > *Bin Mahone | 马洪宾*
> >> >> > > Apache Kylin: http://kylin.io
> >> >> > > Github: https://github.com/binmahone
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: 回复： Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.

That is good to know. Li Yang, Luke, could one of you share the design
document for this realtime OLAP query in the JIRA?

Thanks,

- Henry

On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>> There will be incremental updates on the existing cubes, but during
>> that updates I suppose no queries will be ran against them?
>
> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> serve query while the mini incremental is under built. How can we let the
> cube offline every few minutes, that's impossible.  :-)
>
> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>
>> Inverted index? That sounds interesting. We use inverted index to serve the
>> cubes in our internal implementation.
>>
>> I come from Big Data Center of excellence from an Indian IT major.
>>
>> We have been experimenting with the idea of serving cubes through
>> ElasticSearch REST API. This is not related to Kylin. This is our own
>> internal development.
>>
>> The motivation for this is --- Once the cube is built, it needs to be
>> served.
>>
>> The query looks somewhat like this:
>>
>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>
>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>
>> Find all entries that match K1=V1, K2=V2
>>
>> This relieves us from lot of things - storage, REST API etc. and makes the
>> cubes easily searchable.
>>
>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>> experimenting with Web-Data-Connector which we believe can be used for
>> Visualization... Apart from that, we experimented with a few
>> auto-generated Kibana dashboards which were just okay. But Kibana was not
>> designed for Cubes and so it has its own limitations.
>>
>> Appreciate any feedback!
>>
>> Thanks,
>>
>> Best,
>>
>> Sarnath
>> I also think that it's a mini batch cubing.   It's time to bring back the
>> inverted index into roadmap. The inverted index will be the true real-time
>> solution and can provide the low-level query capability on the raw data.
>>
>>
>> Thanks!
>> JiangXu
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "Henry Saputra";<he...@gmail.com>;
>> 发送时间: 2015年9月15日(星期二) 中午12:39
>> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>>
>> 主题: Re: Kylin Real time
>>
>>
>>
>> Ok, but that still seems like mini batch to me.
>>
>> There will be incremental updates on the existing cubes, but during
>> that updates I suppose no queries will be ran against them?
>>
>> - Henry
>>
>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>> > Streaming OLAP provides Near-Realtime analysis where data delay can be as
>> > short as a few minutes.
>> >
>> > Traditional daily build allows user to analyze yesterday's data. If
>> > increase the frequency to hourly, then user can analyze last hour's data.
>> > Further down the line, how about incremental build every 5 minutes from a
>> > streaming source? Then user can analyze data 5 minutes ago. That's
>> > Streaming OLAP!
>> >
>> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <henry.saputra@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi Luke,
>> >>
>> >> Could you clarify again what is the streaming OLAP means here?
>> >>
>> >> By definition OLAP work with historical data.
>> >>
>> >> Maybe I missed it but was there any discussions or proposed design for
>> it?
>> >>
>> >> Thanks,
>> >>
>> >> - Henry
>> >>
>> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>> >>
>> >> > Hi Siddharth,
>> >> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> >> which
>> >> > will coming in Q4 since it still under development now, as Hongbin
>> >> > mentioned above.
>> >> >     Could  you please drop me a mail about your case? I would like to
>> >> > better understand your scenario to well manage coming features?
>> >> >
>> >> >     Thanks.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Best Regards!
>> >> > ---------------------
>> >> >
>> >> > Luke Han
>> >> >
>> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> >> > <javascript:;>> wrote:
>> >> >
>> >> > > For current 0.7  releases, you cannot.
>> >> > >
>> >> > > Real time data processing and querying will be added in 0.8 release.
>> It
>> >> > is
>> >> > > still under development and testing. We have achieved good progress
>> on
>> >> > it,
>> >> > > please wait for announcements.
>> >> > >
>> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> >> > >
>> >> > > > Hi ,
>> >> > > >
>> >> > > > I would like to ask whether Kylin can be used as a real time
>> querying
>> >> > > > system?
>> >> > > > The process of building a cube , makes it look like a batch
>> process
>> >> > after
>> >> > > > which the queries are with low latency.. however can
>> >> > > > We get a real time idea of what the OLAP system's state is at the
>> >> query
>> >> > > > instance?
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Siddharth
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Regards,
>> >> > >
>> >> > > *Bin Mahone | 马洪宾*
>> >> > > Apache Kylin: http://kylin.io
>> >> > > Github: https://github.com/binmahone
>> >> > >
>> >> >
>> >>
>>

Re: 回复： Kylin Real time

Posted by Luke Han <lu...@gmail.com>.

The inverted index development is paused a while, agree to Xu, it's time to
resume it back for extreme low latency cases.


Best Regards!
---------------------

Luke Han

On Wed, Sep 16, 2015 at 2:12 PM, Li Yang <li...@apache.org> wrote:

> > There will be incremental updates on the existing cubes, but during
> > that updates I suppose no queries will be ran against them?
>
> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> serve query while the mini incremental is under built. How can we let the
> cube offline every few minutes, that's impossible.  :-)
>
> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>
> > Inverted index? That sounds interesting. We use inverted index to serve
> the
> > cubes in our internal implementation.
> >
> > I come from Big Data Center of excellence from an Indian IT major.
> >
> > We have been experimenting with the idea of serving cubes through
> > ElasticSearch REST API. This is not related to Kylin. This is our own
> > internal development.
> >
> > The motivation for this is --- Once the cube is built, it needs to be
> > served.
> >
> > The query looks somewhat like this:
> >
> > "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >
> > "Given ProductID=XX, Fetch how much it has sold every Month"
> >
> > Find all entries that match K1=V1, K2=V2
> >
> > This relieves us from lot of things - storage, REST API etc. and makes
> the
> > cubes easily searchable.
> >
> > However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> > experimenting with Web-Data-Connector which we believe can be used for
> > Visualization... Apart from that, we experimented with a few
> > auto-generated Kibana dashboards which were just okay. But Kibana was not
> > designed for Cubes and so it has its own limitations.
> >
> > Appreciate any feedback!
> >
> > Thanks,
> >
> > Best,
> >
> > Sarnath
> > I also think that it's a mini batch cubing.   It's time to bring back the
> > inverted index into roadmap. The inverted index will be the true
> real-time
> > solution and can provide the low-level query capability on the raw data.
> >
> >
> > Thanks!
> > JiangXu
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Henry Saputra";<he...@gmail.com>;
> > 发送时间: 2015年9月15日(星期二) 中午12:39
> > 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
> >
> > 主题: Re: Kylin Real time
> >
> >
> >
> > Ok, but that still seems like mini batch to me.
> >
> > There will be incremental updates on the existing cubes, but during
> > that updates I suppose no queries will be ran against them?
> >
> > - Henry
> >
> > On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> > > Streaming OLAP provides Near-Realtime analysis where data delay can be
> as
> > > short as a few minutes.
> > >
> > > Traditional daily build allows user to analyze yesterday's data. If
> > > increase the frequency to hourly, then user can analyze last hour's
> data.
> > > Further down the line, how about incremental build every 5 minutes
> from a
> > > streaming source? Then user can analyze data 5 minutes ago. That's
> > > Streaming OLAP!
> > >
> > > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> henry.saputra@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi Luke,
> > >>
> > >> Could you clarify again what is the streaming OLAP means here?
> > >>
> > >> By definition OLAP work with historical data.
> > >>
> > >> Maybe I missed it but was there any discussions or proposed design for
> > it?
> > >>
> > >> Thanks,
> > >>
> > >> - Henry
> > >>
> > >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> > >>
> > >> > Hi Siddharth,
> > >> >     Kylin's next majority release (0.8.x) will support Streaming
> OLAP
> > >> which
> > >> > will coming in Q4 since it still under development now, as Hongbin
> > >> > mentioned above.
> > >> >     Could  you please drop me a mail about your case? I would like
> to
> > >> > better understand your scenario to well manage coming features?
> > >> >
> > >> >     Thanks.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Best Regards!
> > >> > ---------------------
> > >> >
> > >> > Luke Han
> > >> >
> > >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > >> > <javascript:;>> wrote:
> > >> >
> > >> > > For current 0.7  releases, you cannot.
> > >> > >
> > >> > > Real time data processing and querying will be added in 0.8
> release.
> > It
> > >> > is
> > >> > > still under development and testing. We have achieved good
> progress
> > on
> > >> > it,
> > >> > > please wait for announcements.
> > >> > >
> > >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > >> > >
> > >> > > > Hi ,
> > >> > > >
> > >> > > > I would like to ask whether Kylin can be used as a real time
> > querying
> > >> > > > system?
> > >> > > > The process of building a cube , makes it look like a batch
> > process
> > >> > after
> > >> > > > which the queries are with low latency.. however can
> > >> > > > We get a real time idea of what the OLAP system's state is at
> the
> > >> query
> > >> > > > instance?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Siddharth
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Regards,
> > >> > >
> > >> > > *Bin Mahone | 马洪宾*
> > >> > > Apache Kylin: http://kylin.io
> > >> > > Github: https://github.com/binmahone
> > >> > >
> > >> >
> > >>
> >
>

Re: 回复： Kylin Real time

Posted by Li Yang <li...@apache.org>.

> There will be incremental updates on the existing cubes, but during
> that updates I suppose no queries will be ran against them?

Yes, it's mini batch, usually at minutes interval. And of course cube CAN
serve query while the mini incremental is under built. How can we let the
cube offline every few minutes, that's impossible.  :-)

On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:

> Inverted index? That sounds interesting. We use inverted index to serve the
> cubes in our internal implementation.
>
> I come from Big Data Center of excellence from an Indian IT major.
>
> We have been experimenting with the idea of serving cubes through
> ElasticSearch REST API. This is not related to Kylin. This is our own
> internal development.
>
> The motivation for this is --- Once the cube is built, it needs to be
> served.
>
> The query looks somewhat like this:
>
> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>
> "Given ProductID=XX, Fetch how much it has sold every Month"
>
> Find all entries that match K1=V1, K2=V2
>
> This relieves us from lot of things - storage, REST API etc. and makes the
> cubes easily searchable.
>
> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> experimenting with Web-Data-Connector which we believe can be used for
> Visualization... Apart from that, we experimented with a few
> auto-generated Kibana dashboards which were just okay. But Kibana was not
> designed for Cubes and so it has its own limitations.
>
> Appreciate any feedback!
>
> Thanks,
>
> Best,
>
> Sarnath
> I also think that it's a mini batch cubing.   It's time to bring back the
> inverted index into roadmap. The inverted index will be the true real-time
> solution and can provide the low-level query capability on the raw data.
>
>
> Thanks!
> JiangXu
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Henry Saputra";<he...@gmail.com>;
> 发送时间: 2015年9月15日(星期二) 中午12:39
> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>
> 主题: Re: Kylin Real time
>
>
>
> Ok, but that still seems like mini batch to me.
>
> There will be incremental updates on the existing cubes, but during
> that updates I suppose no queries will be ran against them?
>
> - Henry
>
> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> > Streaming OLAP provides Near-Realtime analysis where data delay can be as
> > short as a few minutes.
> >
> > Traditional daily build allows user to analyze yesterday's data. If
> > increase the frequency to hourly, then user can analyze last hour's data.
> > Further down the line, how about incremental build every 5 minutes from a
> > streaming source? Then user can analyze data 5 minutes ago. That's
> > Streaming OLAP!
> >
> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <henry.saputra@gmail.com
> >
> > wrote:
> >
> >> Hi Luke,
> >>
> >> Could you clarify again what is the streaming OLAP means here?
> >>
> >> By definition OLAP work with historical data.
> >>
> >> Maybe I missed it but was there any discussions or proposed design for
> it?
> >>
> >> Thanks,
> >>
> >> - Henry
> >>
> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >>
> >> > Hi Siddharth,
> >> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> >> which
> >> > will coming in Q4 since it still under development now, as Hongbin
> >> > mentioned above.
> >> >     Could  you please drop me a mail about your case? I would like to
> >> > better understand your scenario to well manage coming features?
> >> >
> >> >     Thanks.
> >> >
> >> >
> >> >
> >> >
> >> > Best Regards!
> >> > ---------------------
> >> >
> >> > Luke Han
> >> >
> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> >> > <javascript:;>> wrote:
> >> >
> >> > > For current 0.7  releases, you cannot.
> >> > >
> >> > > Real time data processing and querying will be added in 0.8 release.
> It
> >> > is
> >> > > still under development and testing. We have achieved good progress
> on
> >> > it,
> >> > > please wait for announcements.
> >> > >
> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >> > >
> >> > > > Hi ,
> >> > > >
> >> > > > I would like to ask whether Kylin can be used as a real time
> querying
> >> > > > system?
> >> > > > The process of building a cube , makes it look like a batch
> process
> >> > after
> >> > > > which the queries are with low latency.. however can
> >> > > > We get a real time idea of what the OLAP system's state is at the
> >> query
> >> > > > instance?
> >> > > >
> >> > > > Thanks,
> >> > > > Siddharth
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Regards,
> >> > >
> >> > > *Bin Mahone | 马洪宾*
> >> > > Apache Kylin: http://kylin.io
> >> > > Github: https://github.com/binmahone
> >> > >
> >> >
> >>
>

Re: 回复： Kylin Real time

Posted by Sarnath <st...@gmail.com>.

Inverted index? That sounds interesting. We use inverted index to serve the
cubes in our internal implementation.

I come from Big Data Center of excellence from an Indian IT major.

We have been experimenting with the idea of serving cubes through
ElasticSearch REST API. This is not related to Kylin. This is our own
internal development.

The motivation for this is --- Once the cube is built, it needs to be
served.

The query looks somewhat like this:

"Given ProductID=*, Year=2015, Fetch All Quantities Sold"

"Given ProductID=XX, Fetch how much it has sold every Month"

Find all entries that match K1=V1, K2=V2

This relieves us from lot of things - storage, REST API etc. and makes the
cubes easily searchable.

However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
experimenting with Web-Data-Connector which we believe can be used for
Visualization... Apart from that, we experimented with a few
auto-generated Kibana dashboards which were just okay. But Kibana was not
designed for Cubes and so it has its own limitations.

Appreciate any feedback!

Thanks,

Best,

Sarnath
I also think that it's a mini batch cubing.   It's time to bring back the
inverted index into roadmap. The inverted index will be the true real-time
solution and can provide the low-level query capability on the raw data.

Thanks!
JiangXu

------------------ 原始邮件 ------------------
发件人: "Henry Saputra";<he...@gmail.com>;
发送时间: 2015年9月15日(星期二) 中午12:39
收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;

主题: Re: Kylin Real time

Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Luke,
>>
>> Could you clarify again what is the streaming OLAP means here?
>>
>> By definition OLAP work with historical data.
>>
>> Maybe I missed it but was there any discussions or proposed design for
it?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>
>> > Hi Siddharth,
>> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> which
>> > will coming in Q4 since it still under development now, as Hongbin
>> > mentioned above.
>> >     Could  you please drop me a mail about your case? I would like to
>> > better understand your scenario to well manage coming features?
>> >
>> >     Thanks.
>> >
>> >
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > For current 0.7  releases, you cannot.
>> > >
>> > > Real time data processing and querying will be added in 0.8 release.
It
>> > is
>> > > still under development and testing. We have achieved good progress
on
>> > it,
>> > > please wait for announcements.
>> > >
>> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> > >
>> > > > Hi ,
>> > > >
>> > > > I would like to ask whether Kylin can be used as a real time
querying
>> > > > system?
>> > > > The process of building a cube , makes it look like a batch process
>> > after
>> > > > which the queries are with low latency.. however can
>> > > > We get a real time idea of what the OLAP system's state is at the
>> query
>> > > > instance?
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>