You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by hongbin ma <ma...@apache.org> on 2015/07/29 08:08:54 UTC

Re: Kylin Real time

For current 0.7  releases, you cannot.

Real time data processing and querying will be added in 0.8 release. It is
still under development and testing. We have achieved good progress on it,
please wait for announcements.

On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
siddharth.ubale@syncoms.com> wrote:

> Hi ,
>
> I would like to ask whether Kylin can be used as a real time querying
> system?
> The process of building a cube , makes it look like a batch process after
> which the queries are with low latency.. however can
> We get a real time idea of what the OLAP system's state is at the query
> instance?
>
> Thanks,
> Siddharth
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Kylin Real time

Posted by George Lu <lu...@gmail.com>.
Hey Li Yang,

Thanks for your reply. I am new to Kylin and current we are considering
using that for the data analyzing.
We use storm and store result into mysql for low-latency query now, but as
you said, it is quite low level and need custom storm jobs.
May I ask whether Kylin can have any custom hooks into the cube build
process?

Thank you!

George

On Mon, Sep 14, 2015 at 4:53 PM, Li Yang <li...@apache.org> wrote:

> The Streaming OLAP feature comes with Kylin 2.x that is to be released this
> year. Document is limited at the moment. Watch related JIRAs if you want to
> track our progress. KYLIN-972
> <https://issues.apache.org/jira/browse/KYLIN-972>
>
> Storm + Kafka are just low level components and is far from a query engine
> that accepts ad-hoc queries [1]. While Kylin Streaming OLAP provides full
> SQL interface on historic and real-time data in seconds response time.
>
> [1]
>
> http://nguyentantrieu.info/blog/building-an-query-engine-for-time-series-data-using-redis-kafka-and-storm/
>
>
>
>
> On Mon, Sep 14, 2015 at 4:28 PM, George Lu <lu...@gmail.com> wrote:
>
> > Hey all,
> >
> > Currently, storm and kafka can be used to achieve near real-time query.
> > Can you share any doc on how to use streaming in Kylin and how does Kylin
> > Streaming differs to Storm + Kafka?
> >
> > Thanks!
> >
> > George Lu
> >
> > On Mon, Sep 14, 2015 at 3:33 PM, Li Yang <li...@apache.org> wrote:
> >
> > > Streaming OLAP provides Near-Realtime analysis where data delay can be
> as
> > > short as a few minutes.
> > >
> > > Traditional daily build allows user to analyze yesterday's data. If
> > > increase the frequency to hourly, then user can analyze last hour's
> data.
> > > Further down the line, how about incremental build every 5 minutes
> from a
> > > streaming source? Then user can analyze data 5 minutes ago. That's
> > > Streaming OLAP!
> > >
> > > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> henry.saputra@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Luke,
> > > >
> > > > Could you clarify again what is the streaming OLAP means here?
> > > >
> > > > By definition OLAP work with historical data.
> > > >
> > > > Maybe I missed it but was there any discussions or proposed design
> for
> > > it?
> > > >
> > > > Thanks,
> > > >
> > > > - Henry
> > > >
> > > > On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> > > >
> > > > > Hi Siddharth,
> > > > >     Kylin's next majority release (0.8.x) will support Streaming
> OLAP
> > > > which
> > > > > will coming in Q4 since it still under development now, as Hongbin
> > > > > mentioned above.
> > > > >     Could  you please drop me a mail about your case? I would like
> to
> > > > > better understand your scenario to well manage coming features?
> > > > >
> > > > >     Thanks.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Best Regards!
> > > > > ---------------------
> > > > >
> > > > > Luke Han
> > > > >
> > > > > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > For current 0.7  releases, you cannot.
> > > > > >
> > > > > > Real time data processing and querying will be added in 0.8
> > release.
> > > It
> > > > > is
> > > > > > still under development and testing. We have achieved good
> progress
> > > on
> > > > > it,
> > > > > > please wait for announcements.
> > > > > >
> > > > > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > > > > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > > > > >
> > > > > > > Hi ,
> > > > > > >
> > > > > > > I would like to ask whether Kylin can be used as a real time
> > > querying
> > > > > > > system?
> > > > > > > The process of building a cube , makes it look like a batch
> > process
> > > > > after
> > > > > > > which the queries are with low latency.. however can
> > > > > > > We get a real time idea of what the OLAP system's state is at
> the
> > > > query
> > > > > > > instance?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Siddharth
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > *Bin Mahone | 马洪宾*
> > > > > > Apache Kylin: http://kylin.io
> > > > > > Github: https://github.com/binmahone
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kylin Real time

Posted by Li Yang <li...@apache.org>.
The Streaming OLAP feature comes with Kylin 2.x that is to be released this
year. Document is limited at the moment. Watch related JIRAs if you want to
track our progress. KYLIN-972
<https://issues.apache.org/jira/browse/KYLIN-972>

Storm + Kafka are just low level components and is far from a query engine
that accepts ad-hoc queries [1]. While Kylin Streaming OLAP provides full
SQL interface on historic and real-time data in seconds response time.

[1]
http://nguyentantrieu.info/blog/building-an-query-engine-for-time-series-data-using-redis-kafka-and-storm/




On Mon, Sep 14, 2015 at 4:28 PM, George Lu <lu...@gmail.com> wrote:

> Hey all,
>
> Currently, storm and kafka can be used to achieve near real-time query.
> Can you share any doc on how to use streaming in Kylin and how does Kylin
> Streaming differs to Storm + Kafka?
>
> Thanks!
>
> George Lu
>
> On Mon, Sep 14, 2015 at 3:33 PM, Li Yang <li...@apache.org> wrote:
>
> > Streaming OLAP provides Near-Realtime analysis where data delay can be as
> > short as a few minutes.
> >
> > Traditional daily build allows user to analyze yesterday's data. If
> > increase the frequency to hourly, then user can analyze last hour's data.
> > Further down the line, how about incremental build every 5 minutes from a
> > streaming source? Then user can analyze data 5 minutes ago. That's
> > Streaming OLAP!
> >
> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <henry.saputra@gmail.com
> >
> > wrote:
> >
> > > Hi Luke,
> > >
> > > Could you clarify again what is the streaming OLAP means here?
> > >
> > > By definition OLAP work with historical data.
> > >
> > > Maybe I missed it but was there any discussions or proposed design for
> > it?
> > >
> > > Thanks,
> > >
> > > - Henry
> > >
> > > On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> > >
> > > > Hi Siddharth,
> > > >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> > > which
> > > > will coming in Q4 since it still under development now, as Hongbin
> > > > mentioned above.
> > > >     Could  you please drop me a mail about your case? I would like to
> > > > better understand your scenario to well manage coming features?
> > > >
> > > >     Thanks.
> > > >
> > > >
> > > >
> > > >
> > > > Best Regards!
> > > > ---------------------
> > > >
> > > > Luke Han
> > > >
> > > > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > > > <javascript:;>> wrote:
> > > >
> > > > > For current 0.7  releases, you cannot.
> > > > >
> > > > > Real time data processing and querying will be added in 0.8
> release.
> > It
> > > > is
> > > > > still under development and testing. We have achieved good progress
> > on
> > > > it,
> > > > > please wait for announcements.
> > > > >
> > > > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > > > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > > > >
> > > > > > Hi ,
> > > > > >
> > > > > > I would like to ask whether Kylin can be used as a real time
> > querying
> > > > > > system?
> > > > > > The process of building a cube , makes it look like a batch
> process
> > > > after
> > > > > > which the queries are with low latency.. however can
> > > > > > We get a real time idea of what the OLAP system's state is at the
> > > query
> > > > > > instance?
> > > > > >
> > > > > > Thanks,
> > > > > > Siddharth
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > *Bin Mahone | 马洪宾*
> > > > > Apache Kylin: http://kylin.io
> > > > > Github: https://github.com/binmahone
> > > > >
> > > >
> > >
> >
>

Re: Kylin Real time

Posted by George Lu <lu...@gmail.com>.
Hey all,

Currently, storm and kafka can be used to achieve near real-time query.
Can you share any doc on how to use streaming in Kylin and how does Kylin
Streaming differs to Storm + Kafka?

Thanks!

George Lu

On Mon, Sep 14, 2015 at 3:33 PM, Li Yang <li...@apache.org> wrote:

> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
> > Hi Luke,
> >
> > Could you clarify again what is the streaming OLAP means here?
> >
> > By definition OLAP work with historical data.
> >
> > Maybe I missed it but was there any discussions or proposed design for
> it?
> >
> > Thanks,
> >
> > - Henry
> >
> > On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >
> > > Hi Siddharth,
> > >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> > which
> > > will coming in Q4 since it still under development now, as Hongbin
> > > mentioned above.
> > >     Could  you please drop me a mail about your case? I would like to
> > > better understand your scenario to well manage coming features?
> > >
> > >     Thanks.
> > >
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > > <javascript:;>> wrote:
> > >
> > > > For current 0.7  releases, you cannot.
> > > >
> > > > Real time data processing and querying will be added in 0.8 release.
> It
> > > is
> > > > still under development and testing. We have achieved good progress
> on
> > > it,
> > > > please wait for announcements.
> > > >
> > > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > > >
> > > > > Hi ,
> > > > >
> > > > > I would like to ask whether Kylin can be used as a real time
> querying
> > > > > system?
> > > > > The process of building a cube , makes it look like a batch process
> > > after
> > > > > which the queries are with low latency.. however can
> > > > > We get a real time idea of what the OLAP system's state is at the
> > query
> > > > > instance?
> > > > >
> > > > > Thanks,
> > > > > Siddharth
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> >
>

Re: 回复: Kylin Real time

Posted by Sarnath <st...@gmail.com>.
Hi,

Can you share some reasons why "Inverted Index" did not work..
Coz, I am precisely trying to do the same for storing cubes - in our own
private implementation.
Wondering - what problems are upstream?

Thanks,
Best,
Sarnath

Re: 回复: Kylin Real time

Posted by hongbin ma <ma...@apache.org>.
Hi luke

I'm afraid you answer might be a little confusing to outside customers.
Cube, Streaming, and Inverted Index are not concepts in the same context.
My understanding is:

1. "Cube" or "Inverted Index" is the two options we store digested data.
This is what we allow modeler to specify data model. Cube is Kylin's
original choice for storage, and later we introduced "Inverted Index" in an
attempt to serve near real time requirements.(Because with v1 engine,
building cube process is very time consuming, whereas putting digested data
into inverted index is much faster), However development on Inverted Index
is paused due to several reasons.

2."Streaming" is a concept compared with "Batch". Before 2.x versions,
Kylin uses v1 engine to build cubes which only supports loading data from
hive tables in a batch fashion, this is why it is called "Batch" mode. In
2.x versions, we invented the new v2 engine and started to support building
cubes from streaming queues like Kafka. As previously explained, current
streaming solutions is not strictly "real time streaming" because it is
basically consuming the streaming data to build mini cubes.

On Wed, Sep 23, 2015 at 9:39 PM, Luke Han <lu...@gmail.com> wrote:

> Hi gaspare,
>     You have raised a great discussion about those things.
>     As orignial idea, there's only cube, but we come up a new concept: Data
> Model since "Cube" itself is just one storage.
>
>     There's one option for modelor to define/pickup which kind of storage
> for the Data Model, actually we call it
> as Realization interface for Cube, Streaming and Inverted Index
> and extensible for any others in the future.
>
>    So you are right, there's will be one UI setting part for Data Model for
> this which will come later since 2.x is under heavy refactoring and
> turning, just like Hongbin mentioned.
>
>     Please stay tuned for the latest update of streaming/realtime
> capability of Kylin.
>
>     Thanks.
>
> Luke
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Sep 23, 2015 at 2:55 PM, hongbin ma <ma...@apache.org> wrote:
>
> > hi gaspare
> >
> > Actually we do have a similar solutions in the 2.x-staging code base. It
> is
> > called "Streaming Cubing" (In contrast to Inverted Index, it is using a
> > mini batch cubing solution to tackle the near real time problem)
> >
> > There will be daemon threads that starts up periodically to consume data
> > from the data batch (maybe five-minute batch) from Kafka, and build a
> > mini-cube in memory before writing it into HBase. We have not officially
> > announced the functionality because:
> >
> > 1. Currently we do not have front end UI to do the configurations,
> > including specifying Kafka configurations, etc. This makes  Streaming
> > Cubing difficult to use now. The good news is that we're actively working
> > on it (https://issues.apache.org/jira/browse/KYLIN-1041)
> > 2. Lack of Documentation
> > 3. Currently we have not leveraged spark streaming(or other alternatives)
> > to process the data batch. Our daemon thread is a simple java thread and
> it
> > will be problematic when the data batch grows too large. We intended to
> > migrate to horizontal scalable solutions like spark streaming, but havn't
> > got enough bandwidth to start it.(
> > https://issues.apache.org/jira/browse/KYLIN-1042)
> >
> > Anyway customers should be able to use Streaming cubing when we
> officially
> > annnouce 2.x versions.
> >
> >
> >
> >
> >
> > On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
> > gaspare.maria@gfmintegration.it> wrote:
> >
> > > Hi,
> > >
> > > one more question/feedback regarding Kylin Real time.
> > >
> > > There are many use-cases (in particular in the TELCO environment) where
> > > stream of data arrive at regular intervals (usually every 5 or 15
> > minutes)
> > > and "real-time" aggregations could be always done per intervals (for
> > > example SUM(upLink) per node in the last interval). In such use-cases
> the
> > > "maybe" the CUBE could be update in near real-time from after
> > > pre-aggregation with Spark Streaming (of course without create the
> HFiles
> > > but using parallel PUT on HBase with Spark). According to our
> experience
> > > for "simple" CUBEs this should be faster then Inverted Indexes.
> > >
> > > Of course there are use-cases where this approach is not applicable, in
> > > those cases Inverted Indexes are still valid.
> > >
> > > Should be good if Kylin will be able to give to the "CUBE
> Administrator"
> > > the possibility to choose how to do "Real-time CUBE Update". For
> example,
> > > give the option to  choose wither "Inverted Indexes" or "HBase".
> > >
> > > Do you think a such approach could be applicable to Kylin ?
> > >
> > > Regards,
> > >
> > > -- gas
> > >
> > >
> > >
> > > On 09/21/2015 11:36 AM, Li Yang wrote:
> > >
> > >> Gas is mostly right, with one addition that, query can hit both
> > >> inverted-index and cube if it asks for both latest and historic data.
> > The
> > >> result from two sources will get aggregated at query time.
> > >>
> > >> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> > >> gaspare.maria@gfmintegration.it> wrote:
> > >>
> > >> Hi,
> > >>>
> > >>> so if I understood the idea behind Kylin Real Time is:
> > >>>
> > >>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase)
> will
> > >>>     be built according to CUBE Schema in near-realtime by using Spark
> > >>>     (streaming) Kafka Consumers;
> > >>>   * On query Time if the query impacts latest data it will be routed
> to
> > >>>     Inverted Indexes otherwise on the CUBE on HBase.
> > >>>   * Query that impacts latest data should be limited due to
> limitation
> > >>>     of inverted indexes;
> > >>>   * Query on long period of time back (e.g. from now back to 2 months
> > >>>     ago) will be routed part on HBase and part on Inverted Indexes.
> > >>>
> > >>>
> > >>> Am I right?
> > >>>
> > >>> Regards,
> > >>>
> > >>> -- gas
> > >>>
> > >>>
> > >>>
> > >>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
> > >>>
> > >>> Awesome, thanks Luke
> > >>>>
> > >>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com>
> wrote:
> > >>>>
> > >>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
> > >>>>>
> > >>>>>
> > >>>>> Best Regards!
> > >>>>> ---------------------
> > >>>>>
> > >>>>> Luke Han
> > >>>>>
> > >>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
> > >>>>> henry.saputra@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>> That is good to know. Li Yang, Luke, could one of you share the
> > design
> > >>>>>
> > >>>>>> document for this realtime OLAP query in the JIRA?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>> - Henry
> > >>>>>>
> > >>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org>
> > wrote:
> > >>>>>>
> > >>>>>> There will be incremental updates on the existing cubes, but
> during
> > >>>>>>>
> > >>>>>>>> that updates I suppose no queries will be ran against them?
> > >>>>>>>>
> > >>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
> > >>>>>>> cube
> > >>>>>>> CAN
> > >>>>>>> serve query while the mini incremental is under built. How can we
> > let
> > >>>>>>> the
> > >>>>>>> cube offline every few minutes, that's impossible.  :-)
> > >>>>>>>
> > >>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com>
> > wrote:
> > >>>>>>>
> > >>>>>>> Inverted index? That sounds interesting. We use inverted index to
> > >>>>>>> serve
> > >>>>>>> the
> > >>>>>>> cubes in our internal implementation.
> > >>>>>>>
> > >>>>>>>> I come from Big Data Center of excellence from an Indian IT
> major.
> > >>>>>>>>
> > >>>>>>>> We have been experimenting with the idea of serving cubes
> through
> > >>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is
> our
> > >>>>>>>> own
> > >>>>>>>> internal development.
> > >>>>>>>>
> > >>>>>>>> The motivation for this is --- Once the cube is built, it needs
> to
> > >>>>>>>> be
> > >>>>>>>> served.
> > >>>>>>>>
> > >>>>>>>> The query looks somewhat like this:
> > >>>>>>>>
> > >>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> > >>>>>>>>
> > >>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
> > >>>>>>>>
> > >>>>>>>> Find all entries that match K1=V1, K2=V2
> > >>>>>>>>
> > >>>>>>>> This relieves us from lot of things - storage, REST API etc. and
> > >>>>>>>> makes
> > >>>>>>>>
> > >>>>>>>> the
> > >>>>>>> cubes easily searchable.
> > >>>>>>>
> > >>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> > >>>>>>>> experimenting with Web-Data-Connector which we believe can be
> used
> > >>>>>>>> for
> > >>>>>>>> Visualization... Apart from that, we experimented with a few
> > >>>>>>>> auto-generated Kibana dashboards which were just okay. But
> Kibana
> > >>>>>>>> was
> > >>>>>>>>
> > >>>>>>>> not
> > >>>>>>> designed for Cubes and so it has its own limitations.
> > >>>>>>>
> > >>>>>>>> Appreciate any feedback!
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>>
> > >>>>>>>> Sarnath
> > >>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
> > >>>>>>>> back
> > >>>>>>>>
> > >>>>>>>> the
> > >>>>>>> inverted index into roadmap. The inverted index will be the true
> > >>>>>>> real-time
> > >>>>>>> solution and can provide the low-level query capability on the
> raw
> > >>>>>>>
> > >>>>>>>> data.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>> JiangXu
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> ------------------ 原始邮件 ------------------
> > >>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
> > >>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
> > >>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
> > >>>>>>>> dev@kylin.incubator.apache.org
> > >>>>>>>>
> > >>>>>>>>> ;
> > >>>>>>>>>
> > >>>>>>>> 主题: Re: Kylin Real time
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Ok, but that still seems like mini batch to me.
> > >>>>>>>>
> > >>>>>>>> There will be incremental updates on the existing cubes, but
> > during
> > >>>>>>>> that updates I suppose no queries will be ran against them?
> > >>>>>>>>
> > >>>>>>>> - Henry
> > >>>>>>>>
> > >>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay
> > can
> > >>>>>>>>>
> > >>>>>>>>> be as
> > >>>>>>>>
> > >>>>>>> short as a few minutes.
> > >>>>>>>
> > >>>>>>>> Traditional daily build allows user to analyze yesterday's data.
> > If
> > >>>>>>>>> increase the frequency to hourly, then user can analyze last
> > hour's
> > >>>>>>>>>
> > >>>>>>>>> data.
> > >>>>>>>>
> > >>>>>>> Further down the line, how about incremental build every 5
> minutes
> > >>>>>>>
> > >>>>>>>> from a
> > >>>>>>>>
> > >>>>>>> streaming source? Then user can analyze data 5 minutes ago.
> That's
> > >>>>>>>
> > >>>>>>>> Streaming OLAP!
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> > >>>>>>>>>
> > >>>>>>>>> henry.saputra@gmail.com
> > >>>>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi Luke,
> > >>>>>>>>>
> > >>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
> > >>>>>>>>>>
> > >>>>>>>>>> By definition OLAP work with historical data.
> > >>>>>>>>>>
> > >>>>>>>>>> Maybe I missed it but was there any discussions or proposed
> > design
> > >>>>>>>>>>
> > >>>>>>>>>> for
> > >>>>>>>>>
> > >>>>>>>> it?
> > >>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>>
> > >>>>>>>>>> - Henry
> > >>>>>>>>>>
> > >>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com>
> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Siddharth,
> > >>>>>>>>>>
> > >>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
> > >>>>>>>>>>> Streaming
> > >>>>>>>>>>>
> > >>>>>>>>>>> OLAP
> > >>>>>>>>>>
> > >>>>>>>>> which
> > >>>>>>>
> > >>>>>>>> will coming in Q4 since it still under development now, as
> Hongbin
> > >>>>>>>>>>> mentioned above.
> > >>>>>>>>>>>       Could  you please drop me a mail about your case? I
> would
> > >>>>>>>>>>> like
> > >>>>>>>>>>>
> > >>>>>>>>>>> to
> > >>>>>>>>>>
> > >>>>>>>>> better understand your scenario to well manage coming features?
> > >>>>>>>
> > >>>>>>>>       Thanks.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best Regards!
> > >>>>>>>>>>> ---------------------
> > >>>>>>>>>>>
> > >>>>>>>>>>> Luke Han
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
> > >>>>>>>>>>> mahongbin@apache.org
> > >>>>>>>>>>> <javascript:;>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> For current 0.7  releases, you cannot.
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Real time data processing and querying will be added in 0.8
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> release.
> > >>>>>>>>>>>
> > >>>>>>>>>> It
> > >>>>>>>
> > >>>>>>>> is
> > >>>>>>>>>
> > >>>>>>>>>> still under development and testing. We have achieved good
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> progress
> > >>>>>>>>>>>
> > >>>>>>>>>> on
> > >>>>>>>
> > >>>>>>>> it,
> > >>>>>>>>>
> > >>>>>>>>>> please wait for announcements.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > >>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi ,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real
> time
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> querying
> > >>>>>>>>>>>>
> > >>>>>>>>>>> system?
> > >>>>>>>>>
> > >>>>>>>>>> The process of building a cube , makes it look like a batch
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> process
> > >>>>>>>>>>>>
> > >>>>>>>>>>> after
> > >>>>>>>>>
> > >>>>>>>>>> which the queries are with low latency.. however can
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is
> at
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>
> > >>>>>>>>>>> query
> > >>>>>>>
> > >>>>>>>> instance?
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Siddharth
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>> Regards,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *Bin Mahone | 马洪宾*
> > >>>>>>>>>>>> Apache Kylin: http://kylin.io
> > >>>>>>>>>>>> Github: https://github.com/binmahone
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 回复: Kylin Real time

Posted by Luke Han <lu...@gmail.com>.
Hi gaspare,
    You have raised a great discussion about those things.
    As orignial idea, there's only cube, but we come up a new concept: Data
Model since "Cube" itself is just one storage.

    There's one option for modelor to define/pickup which kind of storage
for the Data Model, actually we call it
as Realization interface for Cube, Streaming and Inverted Index
and extensible for any others in the future.

   So you are right, there's will be one UI setting part for Data Model for
this which will come later since 2.x is under heavy refactoring and
turning, just like Hongbin mentioned.

    Please stay tuned for the latest update of streaming/realtime
capability of Kylin.

    Thanks.

Luke


Best Regards!
---------------------

Luke Han

On Wed, Sep 23, 2015 at 2:55 PM, hongbin ma <ma...@apache.org> wrote:

> hi gaspare
>
> Actually we do have a similar solutions in the 2.x-staging code base. It is
> called "Streaming Cubing" (In contrast to Inverted Index, it is using a
> mini batch cubing solution to tackle the near real time problem)
>
> There will be daemon threads that starts up periodically to consume data
> from the data batch (maybe five-minute batch) from Kafka, and build a
> mini-cube in memory before writing it into HBase. We have not officially
> announced the functionality because:
>
> 1. Currently we do not have front end UI to do the configurations,
> including specifying Kafka configurations, etc. This makes  Streaming
> Cubing difficult to use now. The good news is that we're actively working
> on it (https://issues.apache.org/jira/browse/KYLIN-1041)
> 2. Lack of Documentation
> 3. Currently we have not leveraged spark streaming(or other alternatives)
> to process the data batch. Our daemon thread is a simple java thread and it
> will be problematic when the data batch grows too large. We intended to
> migrate to horizontal scalable solutions like spark streaming, but havn't
> got enough bandwidth to start it.(
> https://issues.apache.org/jira/browse/KYLIN-1042)
>
> Anyway customers should be able to use Streaming cubing when we officially
> annnouce 2.x versions.
>
>
>
>
>
> On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
> gaspare.maria@gfmintegration.it> wrote:
>
> > Hi,
> >
> > one more question/feedback regarding Kylin Real time.
> >
> > There are many use-cases (in particular in the TELCO environment) where
> > stream of data arrive at regular intervals (usually every 5 or 15
> minutes)
> > and "real-time" aggregations could be always done per intervals (for
> > example SUM(upLink) per node in the last interval). In such use-cases the
> > "maybe" the CUBE could be update in near real-time from after
> > pre-aggregation with Spark Streaming (of course without create the HFiles
> > but using parallel PUT on HBase with Spark). According to our experience
> > for "simple" CUBEs this should be faster then Inverted Indexes.
> >
> > Of course there are use-cases where this approach is not applicable, in
> > those cases Inverted Indexes are still valid.
> >
> > Should be good if Kylin will be able to give to the "CUBE Administrator"
> > the possibility to choose how to do "Real-time CUBE Update". For example,
> > give the option to  choose wither "Inverted Indexes" or "HBase".
> >
> > Do you think a such approach could be applicable to Kylin ?
> >
> > Regards,
> >
> > -- gas
> >
> >
> >
> > On 09/21/2015 11:36 AM, Li Yang wrote:
> >
> >> Gas is mostly right, with one addition that, query can hit both
> >> inverted-index and cube if it asks for both latest and historic data.
> The
> >> result from two sources will get aggregated at query time.
> >>
> >> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> >> gaspare.maria@gfmintegration.it> wrote:
> >>
> >> Hi,
> >>>
> >>> so if I understood the idea behind Kylin Real Time is:
> >>>
> >>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
> >>>     be built according to CUBE Schema in near-realtime by using Spark
> >>>     (streaming) Kafka Consumers;
> >>>   * On query Time if the query impacts latest data it will be routed to
> >>>     Inverted Indexes otherwise on the CUBE on HBase.
> >>>   * Query that impacts latest data should be limited due to limitation
> >>>     of inverted indexes;
> >>>   * Query on long period of time back (e.g. from now back to 2 months
> >>>     ago) will be routed part on HBase and part on Inverted Indexes.
> >>>
> >>>
> >>> Am I right?
> >>>
> >>> Regards,
> >>>
> >>> -- gas
> >>>
> >>>
> >>>
> >>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
> >>>
> >>> Awesome, thanks Luke
> >>>>
> >>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
> >>>>
> >>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
> >>>>>
> >>>>>
> >>>>> Best Regards!
> >>>>> ---------------------
> >>>>>
> >>>>> Luke Han
> >>>>>
> >>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
> >>>>> henry.saputra@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> That is good to know. Li Yang, Luke, could one of you share the
> design
> >>>>>
> >>>>>> document for this realtime OLAP query in the JIRA?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> - Henry
> >>>>>>
> >>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org>
> wrote:
> >>>>>>
> >>>>>> There will be incremental updates on the existing cubes, but during
> >>>>>>>
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
> >>>>>>> cube
> >>>>>>> CAN
> >>>>>>> serve query while the mini incremental is under built. How can we
> let
> >>>>>>> the
> >>>>>>> cube offline every few minutes, that's impossible.  :-)
> >>>>>>>
> >>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Inverted index? That sounds interesting. We use inverted index to
> >>>>>>> serve
> >>>>>>> the
> >>>>>>> cubes in our internal implementation.
> >>>>>>>
> >>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
> >>>>>>>>
> >>>>>>>> We have been experimenting with the idea of serving cubes through
> >>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our
> >>>>>>>> own
> >>>>>>>> internal development.
> >>>>>>>>
> >>>>>>>> The motivation for this is --- Once the cube is built, it needs to
> >>>>>>>> be
> >>>>>>>> served.
> >>>>>>>>
> >>>>>>>> The query looks somewhat like this:
> >>>>>>>>
> >>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >>>>>>>>
> >>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
> >>>>>>>>
> >>>>>>>> Find all entries that match K1=V1, K2=V2
> >>>>>>>>
> >>>>>>>> This relieves us from lot of things - storage, REST API etc. and
> >>>>>>>> makes
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> cubes easily searchable.
> >>>>>>>
> >>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> >>>>>>>> experimenting with Web-Data-Connector which we believe can be used
> >>>>>>>> for
> >>>>>>>> Visualization... Apart from that, we experimented with a few
> >>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana
> >>>>>>>> was
> >>>>>>>>
> >>>>>>>> not
> >>>>>>> designed for Cubes and so it has its own limitations.
> >>>>>>>
> >>>>>>>> Appreciate any feedback!
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Sarnath
> >>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
> >>>>>>>> back
> >>>>>>>>
> >>>>>>>> the
> >>>>>>> inverted index into roadmap. The inverted index will be the true
> >>>>>>> real-time
> >>>>>>> solution and can provide the low-level query capability on the raw
> >>>>>>>
> >>>>>>>> data.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>> JiangXu
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ------------------ 原始邮件 ------------------
> >>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
> >>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
> >>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
> >>>>>>>> dev@kylin.incubator.apache.org
> >>>>>>>>
> >>>>>>>>> ;
> >>>>>>>>>
> >>>>>>>> 主题: Re: Kylin Real time
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok, but that still seems like mini batch to me.
> >>>>>>>>
> >>>>>>>> There will be incremental updates on the existing cubes, but
> during
> >>>>>>>> that updates I suppose no queries will be ran against them?
> >>>>>>>>
> >>>>>>>> - Henry
> >>>>>>>>
> >>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay
> can
> >>>>>>>>>
> >>>>>>>>> be as
> >>>>>>>>
> >>>>>>> short as a few minutes.
> >>>>>>>
> >>>>>>>> Traditional daily build allows user to analyze yesterday's data.
> If
> >>>>>>>>> increase the frequency to hourly, then user can analyze last
> hour's
> >>>>>>>>>
> >>>>>>>>> data.
> >>>>>>>>
> >>>>>>> Further down the line, how about incremental build every 5 minutes
> >>>>>>>
> >>>>>>>> from a
> >>>>>>>>
> >>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
> >>>>>>>
> >>>>>>>> Streaming OLAP!
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> >>>>>>>>>
> >>>>>>>>> henry.saputra@gmail.com
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Luke,
> >>>>>>>>>
> >>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
> >>>>>>>>>>
> >>>>>>>>>> By definition OLAP work with historical data.
> >>>>>>>>>>
> >>>>>>>>>> Maybe I missed it but was there any discussions or proposed
> design
> >>>>>>>>>>
> >>>>>>>>>> for
> >>>>>>>>>
> >>>>>>>> it?
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>>> - Henry
> >>>>>>>>>>
> >>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Siddharth,
> >>>>>>>>>>
> >>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
> >>>>>>>>>>> Streaming
> >>>>>>>>>>>
> >>>>>>>>>>> OLAP
> >>>>>>>>>>
> >>>>>>>>> which
> >>>>>>>
> >>>>>>>> will coming in Q4 since it still under development now, as Hongbin
> >>>>>>>>>>> mentioned above.
> >>>>>>>>>>>       Could  you please drop me a mail about your case? I would
> >>>>>>>>>>> like
> >>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>
> >>>>>>>>> better understand your scenario to well manage coming features?
> >>>>>>>
> >>>>>>>>       Thanks.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Best Regards!
> >>>>>>>>>>> ---------------------
> >>>>>>>>>>>
> >>>>>>>>>>> Luke Han
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
> >>>>>>>>>>> mahongbin@apache.org
> >>>>>>>>>>> <javascript:;>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> For current 0.7  releases, you cannot.
> >>>>>>>>>>>
> >>>>>>>>>>>> Real time data processing and querying will be added in 0.8
> >>>>>>>>>>>>
> >>>>>>>>>>>> release.
> >>>>>>>>>>>
> >>>>>>>>>> It
> >>>>>>>
> >>>>>>>> is
> >>>>>>>>>
> >>>>>>>>>> still under development and testing. We have achieved good
> >>>>>>>>>>>>
> >>>>>>>>>>>> progress
> >>>>>>>>>>>
> >>>>>>>>>> on
> >>>>>>>
> >>>>>>>> it,
> >>>>>>>>>
> >>>>>>>>>> please wait for announcements.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi ,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> querying
> >>>>>>>>>>>>
> >>>>>>>>>>> system?
> >>>>>>>>>
> >>>>>>>>>> The process of building a cube , makes it look like a batch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> process
> >>>>>>>>>>>>
> >>>>>>>>>>> after
> >>>>>>>>>
> >>>>>>>>>> which the queries are with low latency.. however can
> >>>>>>>>>>>>
> >>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>
> >>>>>>>>>>> query
> >>>>>>>
> >>>>>>>> instance?
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Siddharth
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> *Bin Mahone | 马洪宾*
> >>>>>>>>>>>> Apache Kylin: http://kylin.io
> >>>>>>>>>>>> Github: https://github.com/binmahone
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: 回复: Kylin Real time

Posted by hongbin ma <ma...@apache.org>.
hi gaspare

Actually we do have a similar solutions in the 2.x-staging code base. It is
called "Streaming Cubing" (In contrast to Inverted Index, it is using a
mini batch cubing solution to tackle the near real time problem)

There will be daemon threads that starts up periodically to consume data
from the data batch (maybe five-minute batch) from Kafka, and build a
mini-cube in memory before writing it into HBase. We have not officially
announced the functionality because:

1. Currently we do not have front end UI to do the configurations,
including specifying Kafka configurations, etc. This makes  Streaming
Cubing difficult to use now. The good news is that we're actively working
on it (https://issues.apache.org/jira/browse/KYLIN-1041)
2. Lack of Documentation
3. Currently we have not leveraged spark streaming(or other alternatives)
to process the data batch. Our daemon thread is a simple java thread and it
will be problematic when the data batch grows too large. We intended to
migrate to horizontal scalable solutions like spark streaming, but havn't
got enough bandwidth to start it.(
https://issues.apache.org/jira/browse/KYLIN-1042)

Anyway customers should be able to use Streaming cubing when we officially
annnouce 2.x versions.





On Wed, Sep 23, 2015 at 6:00 AM, Gaspare Maria <
gaspare.maria@gfmintegration.it> wrote:

> Hi,
>
> one more question/feedback regarding Kylin Real time.
>
> There are many use-cases (in particular in the TELCO environment) where
> stream of data arrive at regular intervals (usually every 5 or 15 minutes)
> and "real-time" aggregations could be always done per intervals (for
> example SUM(upLink) per node in the last interval). In such use-cases the
> "maybe" the CUBE could be update in near real-time from after
> pre-aggregation with Spark Streaming (of course without create the HFiles
> but using parallel PUT on HBase with Spark). According to our experience
> for "simple" CUBEs this should be faster then Inverted Indexes.
>
> Of course there are use-cases where this approach is not applicable, in
> those cases Inverted Indexes are still valid.
>
> Should be good if Kylin will be able to give to the "CUBE Administrator"
> the possibility to choose how to do "Real-time CUBE Update". For example,
> give the option to  choose wither "Inverted Indexes" or "HBase".
>
> Do you think a such approach could be applicable to Kylin ?
>
> Regards,
>
> -- gas
>
>
>
> On 09/21/2015 11:36 AM, Li Yang wrote:
>
>> Gas is mostly right, with one addition that, query can hit both
>> inverted-index and cube if it asks for both latest and historic data. The
>> result from two sources will get aggregated at query time.
>>
>> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
>> gaspare.maria@gfmintegration.it> wrote:
>>
>> Hi,
>>>
>>> so if I understood the idea behind Kylin Real Time is:
>>>
>>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>>>     be built according to CUBE Schema in near-realtime by using Spark
>>>     (streaming) Kafka Consumers;
>>>   * On query Time if the query impacts latest data it will be routed to
>>>     Inverted Indexes otherwise on the CUBE on HBase.
>>>   * Query that impacts latest data should be limited due to limitation
>>>     of inverted indexes;
>>>   * Query on long period of time back (e.g. from now back to 2 months
>>>     ago) will be routed part on HBase and part on Inverted Indexes.
>>>
>>>
>>> Am I right?
>>>
>>> Regards,
>>>
>>> -- gas
>>>
>>>
>>>
>>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>>>
>>> Awesome, thanks Luke
>>>>
>>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>>>
>>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>>>
>>>>>
>>>>> Best Regards!
>>>>> ---------------------
>>>>>
>>>>> Luke Han
>>>>>
>>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <
>>>>> henry.saputra@gmail.com>
>>>>> wrote:
>>>>>
>>>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>>>
>>>>>> document for this realtime OLAP query in the JIRA?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>>
>>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>>
>>>>>>>> Yes, it's mini batch, usually at minutes interval. And of course
>>>>>>> cube
>>>>>>> CAN
>>>>>>> serve query while the mini incremental is under built. How can we let
>>>>>>> the
>>>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>>>
>>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>>>
>>>>>>> Inverted index? That sounds interesting. We use inverted index to
>>>>>>> serve
>>>>>>> the
>>>>>>> cubes in our internal implementation.
>>>>>>>
>>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>>>
>>>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our
>>>>>>>> own
>>>>>>>> internal development.
>>>>>>>>
>>>>>>>> The motivation for this is --- Once the cube is built, it needs to
>>>>>>>> be
>>>>>>>> served.
>>>>>>>>
>>>>>>>> The query looks somewhat like this:
>>>>>>>>
>>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>>>
>>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>>>
>>>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>>>
>>>>>>>> This relieves us from lot of things - storage, REST API etc. and
>>>>>>>> makes
>>>>>>>>
>>>>>>>> the
>>>>>>> cubes easily searchable.
>>>>>>>
>>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>>>> experimenting with Web-Data-Connector which we believe can be used
>>>>>>>> for
>>>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana
>>>>>>>> was
>>>>>>>>
>>>>>>>> not
>>>>>>> designed for Cubes and so it has its own limitations.
>>>>>>>
>>>>>>>> Appreciate any feedback!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Sarnath
>>>>>>>> I also think that it's a mini batch cubing.   It's time to bring
>>>>>>>> back
>>>>>>>>
>>>>>>>> the
>>>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>>> real-time
>>>>>>> solution and can provide the low-level query capability on the raw
>>>>>>>
>>>>>>>> data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> JiangXu
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------ 原始邮件 ------------------
>>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<
>>>>>>>> dev@kylin.incubator.apache.org
>>>>>>>>
>>>>>>>>> ;
>>>>>>>>>
>>>>>>>> 主题: Re: Kylin Real time
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>>>
>>>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>>
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>>>
>>>>>>>>> be as
>>>>>>>>
>>>>>>> short as a few minutes.
>>>>>>>
>>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>>>
>>>>>>>>> data.
>>>>>>>>
>>>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>>
>>>>>>>> from a
>>>>>>>>
>>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>>
>>>>>>>> Streaming OLAP!
>>>>>>>>>
>>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>>>
>>>>>>>>> henry.saputra@gmail.com
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Luke,
>>>>>>>>>
>>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>>>
>>>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>>>
>>>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>>>
>>>>>>>>>> for
>>>>>>>>>
>>>>>>>> it?
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>> - Henry
>>>>>>>>>>
>>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Siddharth,
>>>>>>>>>>
>>>>>>>>>>>       Kylin's next majority release (0.8.x) will support
>>>>>>>>>>> Streaming
>>>>>>>>>>>
>>>>>>>>>>> OLAP
>>>>>>>>>>
>>>>>>>>> which
>>>>>>>
>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>>>> mentioned above.
>>>>>>>>>>>       Could  you please drop me a mail about your case? I would
>>>>>>>>>>> like
>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>
>>>>>>>>> better understand your scenario to well manage coming features?
>>>>>>>
>>>>>>>>       Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards!
>>>>>>>>>>> ---------------------
>>>>>>>>>>>
>>>>>>>>>>> Luke Han
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <
>>>>>>>>>>> mahongbin@apache.org
>>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>>
>>>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>>>
>>>>>>>>>>>> release.
>>>>>>>>>>>
>>>>>>>>>> It
>>>>>>>
>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>>>
>>>>>>>>>>>> progress
>>>>>>>>>>>
>>>>>>>>>> on
>>>>>>>
>>>>>>>> it,
>>>>>>>>>
>>>>>>>>>> please wait for announcements.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>>>
>>>>>>>>>>>>> querying
>>>>>>>>>>>>
>>>>>>>>>>> system?
>>>>>>>>>
>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>>>
>>>>>>>>>>>>> process
>>>>>>>>>>>>
>>>>>>>>>>> after
>>>>>>>>>
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>>>
>>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>>>
>>>>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>> query
>>>>>>>
>>>>>>>> instance?
>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Siddharth
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: 回复: Kylin Real time

Posted by Gaspare Maria <ga...@gfmintegration.it>.
Hi,

one more question/feedback regarding Kylin Real time.

There are many use-cases (in particular in the TELCO environment) where 
stream of data arrive at regular intervals (usually every 5 or 15 
minutes) and "real-time" aggregations could be always done per intervals 
(for example SUM(upLink) per node in the last interval). In such 
use-cases the "maybe" the CUBE could be update in near real-time from 
after pre-aggregation with Spark Streaming (of course without create the 
HFiles but using parallel PUT on HBase with Spark). According to our 
experience for "simple" CUBEs this should be faster then Inverted Indexes.

Of course there are use-cases where this approach is not applicable, in 
those cases Inverted Indexes are still valid.

Should be good if Kylin will be able to give to the "CUBE Administrator" 
the possibility to choose how to do "Real-time CUBE Update". For 
example, give the option to  choose wither "Inverted Indexes" or "HBase".

Do you think a such approach could be applicable to Kylin ?

Regards,

-- gas


On 09/21/2015 11:36 AM, Li Yang wrote:
> Gas is mostly right, with one addition that, query can hit both
> inverted-index and cube if it asks for both latest and historic data. The
> result from two sources will get aggregated at query time.
>
> On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
> gaspare.maria@gfmintegration.it> wrote:
>
>> Hi,
>>
>> so if I understood the idea behind Kylin Real Time is:
>>
>>   *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>>     be built according to CUBE Schema in near-realtime by using Spark
>>     (streaming) Kafka Consumers;
>>   * On query Time if the query impacts latest data it will be routed to
>>     Inverted Indexes otherwise on the CUBE on HBase.
>>   * Query that impacts latest data should be limited due to limitation
>>     of inverted indexes;
>>   * Query on long period of time back (e.g. from now back to 2 months
>>     ago) will be routed part on HBase and part on Inverted Indexes.
>>
>>
>> Am I right?
>>
>> Regards,
>>
>> -- gas
>>
>>
>>
>> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>>
>>> Awesome, thanks Luke
>>>
>>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>>
>>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>>
>>>>
>>>> Best Regards!
>>>> ---------------------
>>>>
>>>> Luke Han
>>>>
>>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>>>> wrote:
>>>>
>>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>>> document for this realtime OLAP query in the JIRA?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> - Henry
>>>>>
>>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>
>>>>>> Yes, it's mini batch, usually at minutes interval. And of course cube
>>>>>> CAN
>>>>>> serve query while the mini incremental is under built. How can we let
>>>>>> the
>>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>>
>>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>>
>>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>>>>> the
>>>>>> cubes in our internal implementation.
>>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>>
>>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>>>> internal development.
>>>>>>>
>>>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>>>> served.
>>>>>>>
>>>>>>> The query looks somewhat like this:
>>>>>>>
>>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>>
>>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>>
>>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>>
>>>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>>>>>>
>>>>>> the
>>>>>> cubes easily searchable.
>>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>>>>>>
>>>>>> not
>>>>>> designed for Cubes and so it has its own limitations.
>>>>>>> Appreciate any feedback!
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Sarnath
>>>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>>>>>>
>>>>>> the
>>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>> real-time
>>>>>> solution and can provide the low-level query capability on the raw
>>>>>>> data.
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>> JiangXu
>>>>>>>
>>>>>>>
>>>>>>> ------------------ 原始邮件 ------------------
>>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>>> 收件人: "dev@kylin.incubator.apache.org"<dev@kylin.incubator.apache.org
>>>>>>>> ;
>>>>>>> 主题: Re: Kylin Real time
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>>
>>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>>
>>>>>>> - Henry
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>>>
>>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>>
>>>>>>> be as
>>>>>> short as a few minutes.
>>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>>
>>>>>>> data.
>>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>> from a
>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>>> Streaming OLAP!
>>>>>>>>
>>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>>
>>>>>>> henry.saputra@gmail.com
>>>>>> wrote:
>>>>>>>> Hi Luke,
>>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>>
>>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>>
>>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>>
>>>>>>>> for
>>>>>> it?
>>>>>>>> Thanks,
>>>>>>>>> - Henry
>>>>>>>>>
>>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Siddharth,
>>>>>>>>>>       Kylin's next majority release (0.8.x) will support Streaming
>>>>>>>>>>
>>>>>>>>> OLAP
>>>>>> which
>>>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>>> mentioned above.
>>>>>>>>>>       Could  you please drop me a mail about your case? I would like
>>>>>>>>>>
>>>>>>>>> to
>>>>>> better understand your scenario to well manage coming features?
>>>>>>>>>>       Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards!
>>>>>>>>>> ---------------------
>>>>>>>>>>
>>>>>>>>>> Luke Han
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>>
>>>>>>>>>> release.
>>>>>> It
>>>>>>>> is
>>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>>
>>>>>>>>>> progress
>>>>>> on
>>>>>>>> it,
>>>>>>>>>>> please wait for announcements.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi ,
>>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>>
>>>>>>>>>>> querying
>>>>>>>> system?
>>>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>>
>>>>>>>>>>> process
>>>>>>>> after
>>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>> query
>>>>>>>>>> instance?
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Siddharth
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>>
>>>>>>>>>>>


Re: 回复: Kylin Real time

Posted by Li Yang <li...@apache.org>.
Gas is mostly right, with one addition that, query can hit both
inverted-index and cube if it asks for both latest and historic data. The
result from two sources will get aggregated at query time.

On Fri, Sep 18, 2015 at 11:26 PM, Gaspare Maria <
gaspare.maria@gfmintegration.it> wrote:

> Hi,
>
> so if I understood the idea behind Kylin Real Time is:
>
>  *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
>    be built according to CUBE Schema in near-realtime by using Spark
>    (streaming) Kafka Consumers;
>  * On query Time if the query impacts latest data it will be routed to
>    Inverted Indexes otherwise on the CUBE on HBase.
>  * Query that impacts latest data should be limited due to limitation
>    of inverted indexes;
>  * Query on long period of time back (e.g. from now back to 2 months
>    ago) will be routed part on HBase and part on Inverted Indexes.
>
>
> Am I right?
>
> Regards,
>
> -- gas
>
>
>
> On 09/18/2015 12:35 AM, Henry Saputra wrote:
>
>> Awesome, thanks Luke
>>
>> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>>
>>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>>> wrote:
>>>
>>> That is good to know. Li Yang, Luke, could one of you share the design
>>>> document for this realtime OLAP query in the JIRA?
>>>>
>>>> Thanks,
>>>>
>>>> - Henry
>>>>
>>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>
>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>> Yes, it's mini batch, usually at minutes interval. And of course cube
>>>>> CAN
>>>>> serve query while the mini incremental is under built. How can we let
>>>>> the
>>>>> cube offline every few minutes, that's impossible.  :-)
>>>>>
>>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>>
>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>>>>>
>>>>> the
>>>>
>>>>> cubes in our internal implementation.
>>>>>>
>>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>>
>>>>>> We have been experimenting with the idea of serving cubes through
>>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>>> internal development.
>>>>>>
>>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>>> served.
>>>>>>
>>>>>> The query looks somewhat like this:
>>>>>>
>>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>>
>>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>>
>>>>>> Find all entries that match K1=V1, K2=V2
>>>>>>
>>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>>>>>
>>>>> the
>>>>
>>>>> cubes easily searchable.
>>>>>>
>>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>>> Visualization... Apart from that, we experimented with a few
>>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>>>>>
>>>>> not
>>>>
>>>>> designed for Cubes and so it has its own limitations.
>>>>>>
>>>>>> Appreciate any feedback!
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Sarnath
>>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>>>>>
>>>>> the
>>>>
>>>>> inverted index into roadmap. The inverted index will be the true
>>>>>>
>>>>> real-time
>>>>
>>>>> solution and can provide the low-level query capability on the raw
>>>>>> data.
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>> JiangXu
>>>>>>
>>>>>>
>>>>>> ------------------ 原始邮件 ------------------
>>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>>> 收件人: "dev@kylin.incubator.apache.org"<dev@kylin.incubator.apache.org
>>>>>> >;
>>>>>>
>>>>>> 主题: Re: Kylin Real time
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ok, but that still seems like mini batch to me.
>>>>>>
>>>>>> There will be incremental updates on the existing cubes, but during
>>>>>> that updates I suppose no queries will be ran against them?
>>>>>>
>>>>>> - Henry
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>>
>>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>>>>>>
>>>>>> be as
>>>>
>>>>> short as a few minutes.
>>>>>>>
>>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>>>>>>
>>>>>> data.
>>>>
>>>>> Further down the line, how about incremental build every 5 minutes
>>>>>>>
>>>>>> from a
>>>>
>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>>> Streaming OLAP!
>>>>>>>
>>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>>>>>>
>>>>>> henry.saputra@gmail.com
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>> Hi Luke,
>>>>>>>>
>>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>>
>>>>>>>> By definition OLAP work with historical data.
>>>>>>>>
>>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>>>>>>>
>>>>>>> for
>>>>
>>>>> it?
>>>>>>
>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Siddharth,
>>>>>>>>>      Kylin's next majority release (0.8.x) will support Streaming
>>>>>>>>>
>>>>>>>> OLAP
>>>>
>>>>> which
>>>>>>>>
>>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>>> mentioned above.
>>>>>>>>>      Could  you please drop me a mail about your case? I would like
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> better understand your scenario to well manage coming features?
>>>>>>>>>
>>>>>>>>>      Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards!
>>>>>>>>> ---------------------
>>>>>>>>>
>>>>>>>>> Luke Han
>>>>>>>>>
>>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>>> <javascript:;>> wrote:
>>>>>>>>>
>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>>
>>>>>>>>>> Real time data processing and querying will be added in 0.8
>>>>>>>>>>
>>>>>>>>> release.
>>>>
>>>>> It
>>>>>>
>>>>>>> is
>>>>>>>>>
>>>>>>>>>> still under development and testing. We have achieved good
>>>>>>>>>>
>>>>>>>>> progress
>>>>
>>>>> on
>>>>>>
>>>>>>> it,
>>>>>>>>>
>>>>>>>>>> please wait for announcements.
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>>
>>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>>>>>>>>
>>>>>>>>>> querying
>>>>>>
>>>>>>> system?
>>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>>>>>>>>
>>>>>>>>>> process
>>>>>>
>>>>>>> after
>>>>>>>>>
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>>>>>>>>>>
>>>>>>>>>> the
>>>>
>>>>> query
>>>>>>>>
>>>>>>>>> instance?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Siddharth
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>>
>>>>>>>>>>
>

Re: 回复: Kylin Real time

Posted by Gaspare Maria <ga...@gfmintegration.it>.
Hi,

so if I understood the idea behind Kylin Real Time is:

  *   Inverted Indexes (maybe Lucene or inverted indexes on HBase) will
    be built according to CUBE Schema in near-realtime by using Spark
    (streaming) Kafka Consumers;
  * On query Time if the query impacts latest data it will be routed to
    Inverted Indexes otherwise on the CUBE on HBase.
  * Query that impacts latest data should be limited due to limitation
    of inverted indexes;
  * Query on long period of time back (e.g. from now back to 2 months
    ago) will be routed part on HBase and part on Inverted Indexes.


Am I right?

Regards,

-- gas


On 09/18/2015 12:35 AM, Henry Saputra wrote:
> Awesome, thanks Luke
>
> On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
>> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
>> wrote:
>>
>>> That is good to know. Li Yang, Luke, could one of you share the design
>>> document for this realtime OLAP query in the JIRA?
>>>
>>> Thanks,
>>>
>>> - Henry
>>>
>>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>>>>> There will be incremental updates on the existing cubes, but during
>>>>> that updates I suppose no queries will be ran against them?
>>>> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
>>>> serve query while the mini incremental is under built. How can we let the
>>>> cube offline every few minutes, that's impossible.  :-)
>>>>
>>>> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>>>>
>>>>> Inverted index? That sounds interesting. We use inverted index to serve
>>> the
>>>>> cubes in our internal implementation.
>>>>>
>>>>> I come from Big Data Center of excellence from an Indian IT major.
>>>>>
>>>>> We have been experimenting with the idea of serving cubes through
>>>>> ElasticSearch REST API. This is not related to Kylin. This is our own
>>>>> internal development.
>>>>>
>>>>> The motivation for this is --- Once the cube is built, it needs to be
>>>>> served.
>>>>>
>>>>> The query looks somewhat like this:
>>>>>
>>>>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>>>>
>>>>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>>>>
>>>>> Find all entries that match K1=V1, K2=V2
>>>>>
>>>>> This relieves us from lot of things - storage, REST API etc. and makes
>>> the
>>>>> cubes easily searchable.
>>>>>
>>>>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>>>>> experimenting with Web-Data-Connector which we believe can be used for
>>>>> Visualization... Apart from that, we experimented with a few
>>>>> auto-generated Kibana dashboards which were just okay. But Kibana was
>>> not
>>>>> designed for Cubes and so it has its own limitations.
>>>>>
>>>>> Appreciate any feedback!
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Best,
>>>>>
>>>>> Sarnath
>>>>> I also think that it's a mini batch cubing.   It's time to bring back
>>> the
>>>>> inverted index into roadmap. The inverted index will be the true
>>> real-time
>>>>> solution and can provide the low-level query capability on the raw data.
>>>>>
>>>>>
>>>>> Thanks!
>>>>> JiangXu
>>>>>
>>>>>
>>>>> ------------------ 原始邮件 ------------------
>>>>> 发件人: "Henry Saputra";<he...@gmail.com>;
>>>>> 发送时间: 2015年9月15日(星期二) 中午12:39
>>>>> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>>>>>
>>>>> 主题: Re: Kylin Real time
>>>>>
>>>>>
>>>>>
>>>>> Ok, but that still seems like mini batch to me.
>>>>>
>>>>> There will be incremental updates on the existing cubes, but during
>>>>> that updates I suppose no queries will be ran against them?
>>>>>
>>>>> - Henry
>>>>>
>>>>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>>>>>> Streaming OLAP provides Near-Realtime analysis where data delay can
>>> be as
>>>>>> short as a few minutes.
>>>>>>
>>>>>> Traditional daily build allows user to analyze yesterday's data. If
>>>>>> increase the frequency to hourly, then user can analyze last hour's
>>> data.
>>>>>> Further down the line, how about incremental build every 5 minutes
>>> from a
>>>>>> streaming source? Then user can analyze data 5 minutes ago. That's
>>>>>> Streaming OLAP!
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>>> henry.saputra@gmail.com
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Luke,
>>>>>>>
>>>>>>> Could you clarify again what is the streaming OLAP means here?
>>>>>>>
>>>>>>> By definition OLAP work with historical data.
>>>>>>>
>>>>>>> Maybe I missed it but was there any discussions or proposed design
>>> for
>>>>> it?
>>>>>>> Thanks,
>>>>>>>
>>>>>>> - Henry
>>>>>>>
>>>>>>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Siddharth,
>>>>>>>>      Kylin's next majority release (0.8.x) will support Streaming
>>> OLAP
>>>>>>> which
>>>>>>>> will coming in Q4 since it still under development now, as Hongbin
>>>>>>>> mentioned above.
>>>>>>>>      Could  you please drop me a mail about your case? I would like
>>> to
>>>>>>>> better understand your scenario to well manage coming features?
>>>>>>>>
>>>>>>>>      Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards!
>>>>>>>> ---------------------
>>>>>>>>
>>>>>>>> Luke Han
>>>>>>>>
>>>>>>>> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>>>>>>>> <javascript:;>> wrote:
>>>>>>>>
>>>>>>>>> For current 0.7  releases, you cannot.
>>>>>>>>>
>>>>>>>>> Real time data processing and querying will be added in 0.8
>>> release.
>>>>> It
>>>>>>>> is
>>>>>>>>> still under development and testing. We have achieved good
>>> progress
>>>>> on
>>>>>>>> it,
>>>>>>>>> please wait for announcements.
>>>>>>>>>
>>>>>>>>> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>>>>>>>>> siddharth.ubale@syncoms.com <javascript:;>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>
>>>>>>>>>> I would like to ask whether Kylin can be used as a real time
>>>>> querying
>>>>>>>>>> system?
>>>>>>>>>> The process of building a cube , makes it look like a batch
>>>>> process
>>>>>>>> after
>>>>>>>>>> which the queries are with low latency.. however can
>>>>>>>>>> We get a real time idea of what the OLAP system's state is at
>>> the
>>>>>>> query
>>>>>>>>>> instance?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Siddharth
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> *Bin Mahone | 马洪宾*
>>>>>>>>> Apache Kylin: http://kylin.io
>>>>>>>>> Github: https://github.com/binmahone
>>>>>>>>>


Re: 回复: Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.
Awesome, thanks Luke

On Thu, Sep 17, 2015 at 2:37 AM, Luke Han <lu...@gmail.com> wrote:
> Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> That is good to know. Li Yang, Luke, could one of you share the design
>> document for this realtime OLAP query in the JIRA?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>> >> There will be incremental updates on the existing cubes, but during
>> >> that updates I suppose no queries will be ran against them?
>> >
>> > Yes, it's mini batch, usually at minutes interval. And of course cube CAN
>> > serve query while the mini incremental is under built. How can we let the
>> > cube offline every few minutes, that's impossible.  :-)
>> >
>> > On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>> >
>> >> Inverted index? That sounds interesting. We use inverted index to serve
>> the
>> >> cubes in our internal implementation.
>> >>
>> >> I come from Big Data Center of excellence from an Indian IT major.
>> >>
>> >> We have been experimenting with the idea of serving cubes through
>> >> ElasticSearch REST API. This is not related to Kylin. This is our own
>> >> internal development.
>> >>
>> >> The motivation for this is --- Once the cube is built, it needs to be
>> >> served.
>> >>
>> >> The query looks somewhat like this:
>> >>
>> >> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>> >>
>> >> "Given ProductID=XX, Fetch how much it has sold every Month"
>> >>
>> >> Find all entries that match K1=V1, K2=V2
>> >>
>> >> This relieves us from lot of things - storage, REST API etc. and makes
>> the
>> >> cubes easily searchable.
>> >>
>> >> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>> >> experimenting with Web-Data-Connector which we believe can be used for
>> >> Visualization... Apart from that, we experimented with a few
>> >> auto-generated Kibana dashboards which were just okay. But Kibana was
>> not
>> >> designed for Cubes and so it has its own limitations.
>> >>
>> >> Appreciate any feedback!
>> >>
>> >> Thanks,
>> >>
>> >> Best,
>> >>
>> >> Sarnath
>> >> I also think that it's a mini batch cubing.   It's time to bring back
>> the
>> >> inverted index into roadmap. The inverted index will be the true
>> real-time
>> >> solution and can provide the low-level query capability on the raw data.
>> >>
>> >>
>> >> Thanks!
>> >> JiangXu
>> >>
>> >>
>> >> ------------------ 原始邮件 ------------------
>> >> 发件人: "Henry Saputra";<he...@gmail.com>;
>> >> 发送时间: 2015年9月15日(星期二) 中午12:39
>> >> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>> >>
>> >> 主题: Re: Kylin Real time
>> >>
>> >>
>> >>
>> >> Ok, but that still seems like mini batch to me.
>> >>
>> >> There will be incremental updates on the existing cubes, but during
>> >> that updates I suppose no queries will be ran against them?
>> >>
>> >> - Henry
>> >>
>> >> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>> >> > Streaming OLAP provides Near-Realtime analysis where data delay can
>> be as
>> >> > short as a few minutes.
>> >> >
>> >> > Traditional daily build allows user to analyze yesterday's data. If
>> >> > increase the frequency to hourly, then user can analyze last hour's
>> data.
>> >> > Further down the line, how about incremental build every 5 minutes
>> from a
>> >> > streaming source? Then user can analyze data 5 minutes ago. That's
>> >> > Streaming OLAP!
>> >> >
>> >> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
>> henry.saputra@gmail.com
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Hi Luke,
>> >> >>
>> >> >> Could you clarify again what is the streaming OLAP means here?
>> >> >>
>> >> >> By definition OLAP work with historical data.
>> >> >>
>> >> >> Maybe I missed it but was there any discussions or proposed design
>> for
>> >> it?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> - Henry
>> >> >>
>> >> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>> >> >>
>> >> >> > Hi Siddharth,
>> >> >> >     Kylin's next majority release (0.8.x) will support Streaming
>> OLAP
>> >> >> which
>> >> >> > will coming in Q4 since it still under development now, as Hongbin
>> >> >> > mentioned above.
>> >> >> >     Could  you please drop me a mail about your case? I would like
>> to
>> >> >> > better understand your scenario to well manage coming features?
>> >> >> >
>> >> >> >     Thanks.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > Best Regards!
>> >> >> > ---------------------
>> >> >> >
>> >> >> > Luke Han
>> >> >> >
>> >> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> >> >> > <javascript:;>> wrote:
>> >> >> >
>> >> >> > > For current 0.7  releases, you cannot.
>> >> >> > >
>> >> >> > > Real time data processing and querying will be added in 0.8
>> release.
>> >> It
>> >> >> > is
>> >> >> > > still under development and testing. We have achieved good
>> progress
>> >> on
>> >> >> > it,
>> >> >> > > please wait for announcements.
>> >> >> > >
>> >> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> >> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> >> >> > >
>> >> >> > > > Hi ,
>> >> >> > > >
>> >> >> > > > I would like to ask whether Kylin can be used as a real time
>> >> querying
>> >> >> > > > system?
>> >> >> > > > The process of building a cube , makes it look like a batch
>> >> process
>> >> >> > after
>> >> >> > > > which the queries are with low latency.. however can
>> >> >> > > > We get a real time idea of what the OLAP system's state is at
>> the
>> >> >> query
>> >> >> > > > instance?
>> >> >> > > >
>> >> >> > > > Thanks,
>> >> >> > > > Siddharth
>> >> >> > > >
>> >> >> > >
>> >> >> > >
>> >> >> > >
>> >> >> > > --
>> >> >> > > Regards,
>> >> >> > >
>> >> >> > > *Bin Mahone | 马洪宾*
>> >> >> > > Apache Kylin: http://kylin.io
>> >> >> > > Github: https://github.com/binmahone
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Re: 回复: Kylin Real time

Posted by Luke Han <lu...@gmail.com>.
Here's JIRA: https://issues.apache.org/jira/browse/KYLIN-599


Best Regards!
---------------------

Luke Han

On Thu, Sep 17, 2015 at 1:09 AM, Henry Saputra <he...@gmail.com>
wrote:

> That is good to know. Li Yang, Luke, could one of you share the design
> document for this realtime OLAP query in the JIRA?
>
> Thanks,
>
> - Henry
>
> On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
> >> There will be incremental updates on the existing cubes, but during
> >> that updates I suppose no queries will be ran against them?
> >
> > Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> > serve query while the mini incremental is under built. How can we let the
> > cube offline every few minutes, that's impossible.  :-)
> >
> > On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
> >
> >> Inverted index? That sounds interesting. We use inverted index to serve
> the
> >> cubes in our internal implementation.
> >>
> >> I come from Big Data Center of excellence from an Indian IT major.
> >>
> >> We have been experimenting with the idea of serving cubes through
> >> ElasticSearch REST API. This is not related to Kylin. This is our own
> >> internal development.
> >>
> >> The motivation for this is --- Once the cube is built, it needs to be
> >> served.
> >>
> >> The query looks somewhat like this:
> >>
> >> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >>
> >> "Given ProductID=XX, Fetch how much it has sold every Month"
> >>
> >> Find all entries that match K1=V1, K2=V2
> >>
> >> This relieves us from lot of things - storage, REST API etc. and makes
> the
> >> cubes easily searchable.
> >>
> >> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> >> experimenting with Web-Data-Connector which we believe can be used for
> >> Visualization... Apart from that, we experimented with a few
> >> auto-generated Kibana dashboards which were just okay. But Kibana was
> not
> >> designed for Cubes and so it has its own limitations.
> >>
> >> Appreciate any feedback!
> >>
> >> Thanks,
> >>
> >> Best,
> >>
> >> Sarnath
> >> I also think that it's a mini batch cubing.   It's time to bring back
> the
> >> inverted index into roadmap. The inverted index will be the true
> real-time
> >> solution and can provide the low-level query capability on the raw data.
> >>
> >>
> >> Thanks!
> >> JiangXu
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "Henry Saputra";<he...@gmail.com>;
> >> 发送时间: 2015年9月15日(星期二) 中午12:39
> >> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
> >>
> >> 主题: Re: Kylin Real time
> >>
> >>
> >>
> >> Ok, but that still seems like mini batch to me.
> >>
> >> There will be incremental updates on the existing cubes, but during
> >> that updates I suppose no queries will be ran against them?
> >>
> >> - Henry
> >>
> >> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> >> > Streaming OLAP provides Near-Realtime analysis where data delay can
> be as
> >> > short as a few minutes.
> >> >
> >> > Traditional daily build allows user to analyze yesterday's data. If
> >> > increase the frequency to hourly, then user can analyze last hour's
> data.
> >> > Further down the line, how about incremental build every 5 minutes
> from a
> >> > streaming source? Then user can analyze data 5 minutes ago. That's
> >> > Streaming OLAP!
> >> >
> >> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> henry.saputra@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi Luke,
> >> >>
> >> >> Could you clarify again what is the streaming OLAP means here?
> >> >>
> >> >> By definition OLAP work with historical data.
> >> >>
> >> >> Maybe I missed it but was there any discussions or proposed design
> for
> >> it?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> - Henry
> >> >>
> >> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >> >>
> >> >> > Hi Siddharth,
> >> >> >     Kylin's next majority release (0.8.x) will support Streaming
> OLAP
> >> >> which
> >> >> > will coming in Q4 since it still under development now, as Hongbin
> >> >> > mentioned above.
> >> >> >     Could  you please drop me a mail about your case? I would like
> to
> >> >> > better understand your scenario to well manage coming features?
> >> >> >
> >> >> >     Thanks.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > Best Regards!
> >> >> > ---------------------
> >> >> >
> >> >> > Luke Han
> >> >> >
> >> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> >> >> > <javascript:;>> wrote:
> >> >> >
> >> >> > > For current 0.7  releases, you cannot.
> >> >> > >
> >> >> > > Real time data processing and querying will be added in 0.8
> release.
> >> It
> >> >> > is
> >> >> > > still under development and testing. We have achieved good
> progress
> >> on
> >> >> > it,
> >> >> > > please wait for announcements.
> >> >> > >
> >> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >> >> > >
> >> >> > > > Hi ,
> >> >> > > >
> >> >> > > > I would like to ask whether Kylin can be used as a real time
> >> querying
> >> >> > > > system?
> >> >> > > > The process of building a cube , makes it look like a batch
> >> process
> >> >> > after
> >> >> > > > which the queries are with low latency.. however can
> >> >> > > > We get a real time idea of what the OLAP system's state is at
> the
> >> >> query
> >> >> > > > instance?
> >> >> > > >
> >> >> > > > Thanks,
> >> >> > > > Siddharth
> >> >> > > >
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > --
> >> >> > > Regards,
> >> >> > >
> >> >> > > *Bin Mahone | 马洪宾*
> >> >> > > Apache Kylin: http://kylin.io
> >> >> > > Github: https://github.com/binmahone
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: 回复: Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.
That is good to know. Li Yang, Luke, could one of you share the design
document for this realtime OLAP query in the JIRA?

Thanks,

- Henry

On Tue, Sep 15, 2015 at 11:12 PM, Li Yang <li...@apache.org> wrote:
>> There will be incremental updates on the existing cubes, but during
>> that updates I suppose no queries will be ran against them?
>
> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> serve query while the mini incremental is under built. How can we let the
> cube offline every few minutes, that's impossible.  :-)
>
> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>
>> Inverted index? That sounds interesting. We use inverted index to serve the
>> cubes in our internal implementation.
>>
>> I come from Big Data Center of excellence from an Indian IT major.
>>
>> We have been experimenting with the idea of serving cubes through
>> ElasticSearch REST API. This is not related to Kylin. This is our own
>> internal development.
>>
>> The motivation for this is --- Once the cube is built, it needs to be
>> served.
>>
>> The query looks somewhat like this:
>>
>> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>>
>> "Given ProductID=XX, Fetch how much it has sold every Month"
>>
>> Find all entries that match K1=V1, K2=V2
>>
>> This relieves us from lot of things - storage, REST API etc. and makes the
>> cubes easily searchable.
>>
>> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
>> experimenting with Web-Data-Connector which we believe can be used for
>> Visualization... Apart from that, we experimented with a few
>> auto-generated Kibana dashboards which were just okay. But Kibana was not
>> designed for Cubes and so it has its own limitations.
>>
>> Appreciate any feedback!
>>
>> Thanks,
>>
>> Best,
>>
>> Sarnath
>> I also think that it's a mini batch cubing.   It's time to bring back the
>> inverted index into roadmap. The inverted index will be the true real-time
>> solution and can provide the low-level query capability on the raw data.
>>
>>
>> Thanks!
>> JiangXu
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "Henry Saputra";<he...@gmail.com>;
>> 发送时间: 2015年9月15日(星期二) 中午12:39
>> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>>
>> 主题: Re: Kylin Real time
>>
>>
>>
>> Ok, but that still seems like mini batch to me.
>>
>> There will be incremental updates on the existing cubes, but during
>> that updates I suppose no queries will be ran against them?
>>
>> - Henry
>>
>> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
>> > Streaming OLAP provides Near-Realtime analysis where data delay can be as
>> > short as a few minutes.
>> >
>> > Traditional daily build allows user to analyze yesterday's data. If
>> > increase the frequency to hourly, then user can analyze last hour's data.
>> > Further down the line, how about incremental build every 5 minutes from a
>> > streaming source? Then user can analyze data 5 minutes ago. That's
>> > Streaming OLAP!
>> >
>> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <henry.saputra@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi Luke,
>> >>
>> >> Could you clarify again what is the streaming OLAP means here?
>> >>
>> >> By definition OLAP work with historical data.
>> >>
>> >> Maybe I missed it but was there any discussions or proposed design for
>> it?
>> >>
>> >> Thanks,
>> >>
>> >> - Henry
>> >>
>> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>> >>
>> >> > Hi Siddharth,
>> >> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> >> which
>> >> > will coming in Q4 since it still under development now, as Hongbin
>> >> > mentioned above.
>> >> >     Could  you please drop me a mail about your case? I would like to
>> >> > better understand your scenario to well manage coming features?
>> >> >
>> >> >     Thanks.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Best Regards!
>> >> > ---------------------
>> >> >
>> >> > Luke Han
>> >> >
>> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> >> > <javascript:;>> wrote:
>> >> >
>> >> > > For current 0.7  releases, you cannot.
>> >> > >
>> >> > > Real time data processing and querying will be added in 0.8 release.
>> It
>> >> > is
>> >> > > still under development and testing. We have achieved good progress
>> on
>> >> > it,
>> >> > > please wait for announcements.
>> >> > >
>> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> >> > >
>> >> > > > Hi ,
>> >> > > >
>> >> > > > I would like to ask whether Kylin can be used as a real time
>> querying
>> >> > > > system?
>> >> > > > The process of building a cube , makes it look like a batch
>> process
>> >> > after
>> >> > > > which the queries are with low latency.. however can
>> >> > > > We get a real time idea of what the OLAP system's state is at the
>> >> query
>> >> > > > instance?
>> >> > > >
>> >> > > > Thanks,
>> >> > > > Siddharth
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Regards,
>> >> > >
>> >> > > *Bin Mahone | 马洪宾*
>> >> > > Apache Kylin: http://kylin.io
>> >> > > Github: https://github.com/binmahone
>> >> > >
>> >> >
>> >>
>>

Re: 回复: Kylin Real time

Posted by Luke Han <lu...@gmail.com>.
The inverted index development is paused a while, agree to Xu, it's time to
resume it back for extreme low latency cases.


Best Regards!
---------------------

Luke Han

On Wed, Sep 16, 2015 at 2:12 PM, Li Yang <li...@apache.org> wrote:

> > There will be incremental updates on the existing cubes, but during
> > that updates I suppose no queries will be ran against them?
>
> Yes, it's mini batch, usually at minutes interval. And of course cube CAN
> serve query while the mini incremental is under built. How can we let the
> cube offline every few minutes, that's impossible.  :-)
>
> On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:
>
> > Inverted index? That sounds interesting. We use inverted index to serve
> the
> > cubes in our internal implementation.
> >
> > I come from Big Data Center of excellence from an Indian IT major.
> >
> > We have been experimenting with the idea of serving cubes through
> > ElasticSearch REST API. This is not related to Kylin. This is our own
> > internal development.
> >
> > The motivation for this is --- Once the cube is built, it needs to be
> > served.
> >
> > The query looks somewhat like this:
> >
> > "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
> >
> > "Given ProductID=XX, Fetch how much it has sold every Month"
> >
> > Find all entries that match K1=V1, K2=V2
> >
> > This relieves us from lot of things - storage, REST API etc. and makes
> the
> > cubes easily searchable.
> >
> > However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> > experimenting with Web-Data-Connector which we believe can be used for
> > Visualization... Apart from that, we experimented with a few
> > auto-generated Kibana dashboards which were just okay. But Kibana was not
> > designed for Cubes and so it has its own limitations.
> >
> > Appreciate any feedback!
> >
> > Thanks,
> >
> > Best,
> >
> > Sarnath
> > I also think that it's a mini batch cubing.   It's time to bring back the
> > inverted index into roadmap. The inverted index will be the true
> real-time
> > solution and can provide the low-level query capability on the raw data.
> >
> >
> > Thanks!
> > JiangXu
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Henry Saputra";<he...@gmail.com>;
> > 发送时间: 2015年9月15日(星期二) 中午12:39
> > 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
> >
> > 主题: Re: Kylin Real time
> >
> >
> >
> > Ok, but that still seems like mini batch to me.
> >
> > There will be incremental updates on the existing cubes, but during
> > that updates I suppose no queries will be ran against them?
> >
> > - Henry
> >
> > On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> > > Streaming OLAP provides Near-Realtime analysis where data delay can be
> as
> > > short as a few minutes.
> > >
> > > Traditional daily build allows user to analyze yesterday's data. If
> > > increase the frequency to hourly, then user can analyze last hour's
> data.
> > > Further down the line, how about incremental build every 5 minutes
> from a
> > > streaming source? Then user can analyze data 5 minutes ago. That's
> > > Streaming OLAP!
> > >
> > > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <
> henry.saputra@gmail.com
> > >
> > > wrote:
> > >
> > >> Hi Luke,
> > >>
> > >> Could you clarify again what is the streaming OLAP means here?
> > >>
> > >> By definition OLAP work with historical data.
> > >>
> > >> Maybe I missed it but was there any discussions or proposed design for
> > it?
> > >>
> > >> Thanks,
> > >>
> > >> - Henry
> > >>
> > >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> > >>
> > >> > Hi Siddharth,
> > >> >     Kylin's next majority release (0.8.x) will support Streaming
> OLAP
> > >> which
> > >> > will coming in Q4 since it still under development now, as Hongbin
> > >> > mentioned above.
> > >> >     Could  you please drop me a mail about your case? I would like
> to
> > >> > better understand your scenario to well manage coming features?
> > >> >
> > >> >     Thanks.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Best Regards!
> > >> > ---------------------
> > >> >
> > >> > Luke Han
> > >> >
> > >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > >> > <javascript:;>> wrote:
> > >> >
> > >> > > For current 0.7  releases, you cannot.
> > >> > >
> > >> > > Real time data processing and querying will be added in 0.8
> release.
> > It
> > >> > is
> > >> > > still under development and testing. We have achieved good
> progress
> > on
> > >> > it,
> > >> > > please wait for announcements.
> > >> > >
> > >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > >> > >
> > >> > > > Hi ,
> > >> > > >
> > >> > > > I would like to ask whether Kylin can be used as a real time
> > querying
> > >> > > > system?
> > >> > > > The process of building a cube , makes it look like a batch
> > process
> > >> > after
> > >> > > > which the queries are with low latency.. however can
> > >> > > > We get a real time idea of what the OLAP system's state is at
> the
> > >> query
> > >> > > > instance?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Siddharth
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Regards,
> > >> > >
> > >> > > *Bin Mahone | 马洪宾*
> > >> > > Apache Kylin: http://kylin.io
> > >> > > Github: https://github.com/binmahone
> > >> > >
> > >> >
> > >>
> >
>

Re: 回复: Kylin Real time

Posted by Li Yang <li...@apache.org>.
> There will be incremental updates on the existing cubes, but during
> that updates I suppose no queries will be ran against them?

Yes, it's mini batch, usually at minutes interval. And of course cube CAN
serve query while the mini incremental is under built. How can we let the
cube offline every few minutes, that's impossible.  :-)

On Tue, Sep 15, 2015 at 6:41 PM, Sarnath <st...@gmail.com> wrote:

> Inverted index? That sounds interesting. We use inverted index to serve the
> cubes in our internal implementation.
>
> I come from Big Data Center of excellence from an Indian IT major.
>
> We have been experimenting with the idea of serving cubes through
> ElasticSearch REST API. This is not related to Kylin. This is our own
> internal development.
>
> The motivation for this is --- Once the cube is built, it needs to be
> served.
>
> The query looks somewhat like this:
>
> "Given ProductID=*, Year=2015, Fetch All Quantities Sold"
>
> "Given ProductID=XX, Fetch how much it has sold every Month"
>
> Find all entries that match K1=V1, K2=V2
>
> This relieves us from lot of things - storage, REST API etc. and makes the
> cubes easily searchable.
>
> However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
> experimenting with Web-Data-Connector which we believe can be used for
> Visualization... Apart from that, we experimented with a few
> auto-generated Kibana dashboards which were just okay. But Kibana was not
> designed for Cubes and so it has its own limitations.
>
> Appreciate any feedback!
>
> Thanks,
>
> Best,
>
> Sarnath
> I also think that it's a mini batch cubing.   It's time to bring back the
> inverted index into roadmap. The inverted index will be the true real-time
> solution and can provide the low-level query capability on the raw data.
>
>
> Thanks!
> JiangXu
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Henry Saputra";<he...@gmail.com>;
> 发送时间: 2015年9月15日(星期二) 中午12:39
> 收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;
>
> 主题: Re: Kylin Real time
>
>
>
> Ok, but that still seems like mini batch to me.
>
> There will be incremental updates on the existing cubes, but during
> that updates I suppose no queries will be ran against them?
>
> - Henry
>
> On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> > Streaming OLAP provides Near-Realtime analysis where data delay can be as
> > short as a few minutes.
> >
> > Traditional daily build allows user to analyze yesterday's data. If
> > increase the frequency to hourly, then user can analyze last hour's data.
> > Further down the line, how about incremental build every 5 minutes from a
> > streaming source? Then user can analyze data 5 minutes ago. That's
> > Streaming OLAP!
> >
> > On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <henry.saputra@gmail.com
> >
> > wrote:
> >
> >> Hi Luke,
> >>
> >> Could you clarify again what is the streaming OLAP means here?
> >>
> >> By definition OLAP work with historical data.
> >>
> >> Maybe I missed it but was there any discussions or proposed design for
> it?
> >>
> >> Thanks,
> >>
> >> - Henry
> >>
> >> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
> >>
> >> > Hi Siddharth,
> >> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> >> which
> >> > will coming in Q4 since it still under development now, as Hongbin
> >> > mentioned above.
> >> >     Could  you please drop me a mail about your case? I would like to
> >> > better understand your scenario to well manage coming features?
> >> >
> >> >     Thanks.
> >> >
> >> >
> >> >
> >> >
> >> > Best Regards!
> >> > ---------------------
> >> >
> >> > Luke Han
> >> >
> >> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> >> > <javascript:;>> wrote:
> >> >
> >> > > For current 0.7  releases, you cannot.
> >> > >
> >> > > Real time data processing and querying will be added in 0.8 release.
> It
> >> > is
> >> > > still under development and testing. We have achieved good progress
> on
> >> > it,
> >> > > please wait for announcements.
> >> > >
> >> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> >> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >> > >
> >> > > > Hi ,
> >> > > >
> >> > > > I would like to ask whether Kylin can be used as a real time
> querying
> >> > > > system?
> >> > > > The process of building a cube , makes it look like a batch
> process
> >> > after
> >> > > > which the queries are with low latency.. however can
> >> > > > We get a real time idea of what the OLAP system's state is at the
> >> query
> >> > > > instance?
> >> > > >
> >> > > > Thanks,
> >> > > > Siddharth
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Regards,
> >> > >
> >> > > *Bin Mahone | 马洪宾*
> >> > > Apache Kylin: http://kylin.io
> >> > > Github: https://github.com/binmahone
> >> > >
> >> >
> >>
>

Re: 回复: Kylin Real time

Posted by Sarnath <st...@gmail.com>.
Inverted index? That sounds interesting. We use inverted index to serve the
cubes in our internal implementation.

I come from Big Data Center of excellence from an Indian IT major.

We have been experimenting with the idea of serving cubes through
ElasticSearch REST API. This is not related to Kylin. This is our own
internal development.

The motivation for this is --- Once the cube is built, it needs to be
served.

The query looks somewhat like this:

"Given ProductID=*, Year=2015, Fetch All Quantities Sold"

"Given ProductID=XX, Fetch how much it has sold every Month"

Find all entries that match K1=V1, K2=V2

This relieves us from lot of things - storage, REST API etc. and makes the
cubes easily searchable.

However, we don't do SQL/MDX on top of it.  Tableau 9.1Beta is
experimenting with Web-Data-Connector which we believe can be used for
Visualization... Apart from that, we experimented with a few
auto-generated Kibana dashboards which were just okay. But Kibana was not
designed for Cubes and so it has its own limitations.

Appreciate any feedback!

Thanks,

Best,

Sarnath
I also think that it's a mini batch cubing.   It's time to bring back the
inverted index into roadmap. The inverted index will be the true real-time
solution and can provide the low-level query capability on the raw data.


Thanks!
JiangXu


------------------ 原始邮件 ------------------
发件人: "Henry Saputra";<he...@gmail.com>;
发送时间: 2015年9月15日(星期二) 中午12:39
收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>;

主题: Re: Kylin Real time



Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Luke,
>>
>> Could you clarify again what is the streaming OLAP means here?
>>
>> By definition OLAP work with historical data.
>>
>> Maybe I missed it but was there any discussions or proposed design for
it?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>
>> > Hi Siddharth,
>> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> which
>> > will coming in Q4 since it still under development now, as Hongbin
>> > mentioned above.
>> >     Could  you please drop me a mail about your case? I would like to
>> > better understand your scenario to well manage coming features?
>> >
>> >     Thanks.
>> >
>> >
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > For current 0.7  releases, you cannot.
>> > >
>> > > Real time data processing and querying will be added in 0.8 release.
It
>> > is
>> > > still under development and testing. We have achieved good progress
on
>> > it,
>> > > please wait for announcements.
>> > >
>> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> > >
>> > > > Hi ,
>> > > >
>> > > > I would like to ask whether Kylin can be used as a real time
querying
>> > > > system?
>> > > > The process of building a cube , makes it look like a batch process
>> > after
>> > > > which the queries are with low latency.. however can
>> > > > We get a real time idea of what the OLAP system's state is at the
>> query
>> > > > instance?
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>

回复: Kylin Real time

Posted by 蒋旭 <ji...@qq.com>.
I also think that it's a mini batch cubing.   It's time to bring back the inverted index into roadmap. The inverted index will be the true real-time solution and can provide the low-level query capability on the raw data. 


Thanks!
JiangXu


------------------ 原始邮件 ------------------
发件人: "Henry Saputra";<he...@gmail.com>;
发送时间: 2015年9月15日(星期二) 中午12:39
收件人: "dev@kylin.incubator.apache.org"<de...@kylin.incubator.apache.org>; 

主题: Re: Kylin Real time



Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Luke,
>>
>> Could you clarify again what is the streaming OLAP means here?
>>
>> By definition OLAP work with historical data.
>>
>> Maybe I missed it but was there any discussions or proposed design for it?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>
>> > Hi Siddharth,
>> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> which
>> > will coming in Q4 since it still under development now, as Hongbin
>> > mentioned above.
>> >     Could  you please drop me a mail about your case? I would like to
>> > better understand your scenario to well manage coming features?
>> >
>> >     Thanks.
>> >
>> >
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > For current 0.7  releases, you cannot.
>> > >
>> > > Real time data processing and querying will be added in 0.8 release. It
>> > is
>> > > still under development and testing. We have achieved good progress on
>> > it,
>> > > please wait for announcements.
>> > >
>> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> > >
>> > > > Hi ,
>> > > >
>> > > > I would like to ask whether Kylin can be used as a real time querying
>> > > > system?
>> > > > The process of building a cube , makes it look like a batch process
>> > after
>> > > > which the queries are with low latency.. however can
>> > > > We get a real time idea of what the OLAP system's state is at the
>> query
>> > > > instance?
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>

Re: Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.
Ok, but that still seems like mini batch to me.

There will be incremental updates on the existing cubes, but during
that updates I suppose no queries will be ran against them?

- Henry

On Mon, Sep 14, 2015 at 12:33 AM, Li Yang <li...@apache.org> wrote:
> Streaming OLAP provides Near-Realtime analysis where data delay can be as
> short as a few minutes.
>
> Traditional daily build allows user to analyze yesterday's data. If
> increase the frequency to hourly, then user can analyze last hour's data.
> Further down the line, how about incremental build every 5 minutes from a
> streaming source? Then user can analyze data 5 minutes ago. That's
> Streaming OLAP!
>
> On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
> wrote:
>
>> Hi Luke,
>>
>> Could you clarify again what is the streaming OLAP means here?
>>
>> By definition OLAP work with historical data.
>>
>> Maybe I missed it but was there any discussions or proposed design for it?
>>
>> Thanks,
>>
>> - Henry
>>
>> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>>
>> > Hi Siddharth,
>> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
>> which
>> > will coming in Q4 since it still under development now, as Hongbin
>> > mentioned above.
>> >     Could  you please drop me a mail about your case? I would like to
>> > better understand your scenario to well manage coming features?
>> >
>> >     Thanks.
>> >
>> >
>> >
>> >
>> > Best Regards!
>> > ---------------------
>> >
>> > Luke Han
>> >
>> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > For current 0.7  releases, you cannot.
>> > >
>> > > Real time data processing and querying will be added in 0.8 release. It
>> > is
>> > > still under development and testing. We have achieved good progress on
>> > it,
>> > > please wait for announcements.
>> > >
>> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
>> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
>> > >
>> > > > Hi ,
>> > > >
>> > > > I would like to ask whether Kylin can be used as a real time querying
>> > > > system?
>> > > > The process of building a cube , makes it look like a batch process
>> > after
>> > > > which the queries are with low latency.. however can
>> > > > We get a real time idea of what the OLAP system's state is at the
>> query
>> > > > instance?
>> > > >
>> > > > Thanks,
>> > > > Siddharth
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>

Re: Kylin Real time

Posted by Li Yang <li...@apache.org>.
Streaming OLAP provides Near-Realtime analysis where data delay can be as
short as a few minutes.

Traditional daily build allows user to analyze yesterday's data. If
increase the frequency to hourly, then user can analyze last hour's data.
Further down the line, how about incremental build every 5 minutes from a
streaming source? Then user can analyze data 5 minutes ago. That's
Streaming OLAP!

On Mon, Sep 14, 2015 at 12:43 AM, Henry Saputra <he...@gmail.com>
wrote:

> Hi Luke,
>
> Could you clarify again what is the streaming OLAP means here?
>
> By definition OLAP work with historical data.
>
> Maybe I missed it but was there any discussions or proposed design for it?
>
> Thanks,
>
> - Henry
>
> On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:
>
> > Hi Siddharth,
> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> which
> > will coming in Q4 since it still under development now, as Hongbin
> > mentioned above.
> >     Could  you please drop me a mail about your case? I would like to
> > better understand your scenario to well manage coming features?
> >
> >     Thanks.
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> > <javascript:;>> wrote:
> >
> > > For current 0.7  releases, you cannot.
> > >
> > > Real time data processing and querying will be added in 0.8 release. It
> > is
> > > still under development and testing. We have achieved good progress on
> > it,
> > > please wait for announcements.
> > >
> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> > >
> > > > Hi ,
> > > >
> > > > I would like to ask whether Kylin can be used as a real time querying
> > > > system?
> > > > The process of building a cube , makes it look like a batch process
> > after
> > > > which the queries are with low latency.. however can
> > > > We get a real time idea of what the OLAP system's state is at the
> query
> > > > instance?
> > > >
> > > > Thanks,
> > > > Siddharth
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>

Re: Kylin Real time

Posted by Henry Saputra <he...@gmail.com>.
Hi Luke,

Could you clarify again what is the streaming OLAP means here?

By definition OLAP work with historical data.

Maybe I missed it but was there any discussions or proposed design for it?

Thanks,

- Henry

On Monday, August 3, 2015, Luke Han <lu...@gmail.com> wrote:

> Hi Siddharth,
>     Kylin's next majority release (0.8.x) will support Streaming OLAP which
> will coming in Q4 since it still under development now, as Hongbin
> mentioned above.
>     Could  you please drop me a mail about your case? I would like to
> better understand your scenario to well manage coming features?
>
>     Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <mahongbin@apache.org
> <javascript:;>> wrote:
>
> > For current 0.7  releases, you cannot.
> >
> > Real time data processing and querying will be added in 0.8 release. It
> is
> > still under development and testing. We have achieved good progress on
> it,
> > please wait for announcements.
> >
> > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > siddharth.ubale@syncoms.com <javascript:;>> wrote:
> >
> > > Hi ,
> > >
> > > I would like to ask whether Kylin can be used as a real time querying
> > > system?
> > > The process of building a cube , makes it look like a batch process
> after
> > > which the queries are with low latency.. however can
> > > We get a real time idea of what the OLAP system's state is at the query
> > > instance?
> > >
> > > Thanks,
> > > Siddharth
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

Re: Kylin Real time

Posted by Li Yang <li...@apache.org>.
Exactly. Second-level data latency is on road map, but we prioritize more
on faster cubing, TopN, and more flexible aggregation groups.

On Wed, Aug 5, 2015 at 11:42 PM, 蒋旭 <ji...@qq.com> wrote:

> I think that Kylin can support second-level data latency if the inverted
> index is ready.
>
>
> The biggest problem is that inverted index is not in the high priority
> tasks. :)
>
>
> Thanks
> JiangXu
>
>
> ------------------ 原始邮件 ------------------
> 发件人: Luke Han <lu...@gmail.com>
> 发送时间: 2015年08月05日 23:02
> 收件人: dev@kylin.incubator.apache.org <de...@kylin.incubator.apache.org>
> 主题: Re: Kylin Real time
>
>
>
> Kylin is OLAP not OLTP, even with streaming feature, it still support Near
> real-time, do not expected sub-second data latency (query latency still
> will be sub-second latency).
> For example, if your customer want to see user's action information on your
> website "right now" (like you said milliseconds), I do not recommend to
> leverage Kylin this monument, but if your user are ok to see the data
> several minutes even seconds before, that could be benefited by Kylin
> streaming.
>
> It's really depends on how fast your user want to see the result from the
> source data.
>
> Realtime monitoring and alert is not Kylin's target to serve:)
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Aug 5, 2015 at 2:43 PM, Siddharth Ubale <
> siddharth.ubale@syncoms.com
> > wrote:
>
> > Hi Luke,
> >
> > Thanks for the update!
> >
> > Our case is a real time streaming OLAP. We want to query the the data in
> > OLAP system and it should respond with low latency(probably 1-2 seconds)
> > and the state of the OLAP table is real time and not pre-calculated. In
> > short, OLAP will function like OLTP..
> > It is probably wishful to think that the system should provide data in
> > milliseconds , however, if that can be provided it will suffice our
> needs.
> >
> >
> > Thanks,
> > Siddharth Ubale
> >
> > -----Original Message-----
> > From: Luke Han [mailto:luke.hq@gmail.com]
> > Sent: Monday, August 03, 2015 7:22 PM
> > To: dev@kylin.incubator.apache.org
> > Subject: Re: Kylin Real time
> >
> > Hi Siddharth,
> >     Kylin's next majority release (0.8.x) will support Streaming OLAP
> > which will coming in Q4 since it still under development now, as Hongbin
> > mentioned above.
> >     Could  you please drop me a mail about your case? I would like to
> > better understand your scenario to well manage coming features?
> >
> >     Thanks.
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > For current 0.7  releases, you cannot.
> > >
> > > Real time data processing and querying will be added in 0.8 release.
> > > It is still under development and testing. We have achieved good
> > > progress on it, please wait for announcements.
> > >
> > > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > > siddharth.ubale@syncoms.com> wrote:
> > >
> > > > Hi ,
> > > >
> > > > I would like to ask whether Kylin can be used as a real time
> > > > querying system?
> > > > The process of building a cube , makes it look like a batch process
> > > > after which the queries are with low latency.. however can We get a
> > > > real time idea of what the OLAP system's state is at the query
> > > > instance?
> > > >
> > > > Thanks,
> > > > Siddharth
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>

回复:Kylin Real time

Posted by 蒋旭 <ji...@qq.com>.
I think that Kylin can support second-level data latency if the inverted index is ready. 


The biggest problem is that inverted index is not in the high priority tasks. :)


Thanks
JiangXu


------------------ 原始邮件 ------------------
发件人: Luke Han <lu...@gmail.com>
发送时间: 2015年08月05日 23:02
收件人: dev@kylin.incubator.apache.org <de...@kylin.incubator.apache.org>
主题: Re: Kylin Real time



Kylin is OLAP not OLTP, even with streaming feature, it still support Near
real-time, do not expected sub-second data latency (query latency still
will be sub-second latency).
For example, if your customer want to see user's action information on your
website "right now" (like you said milliseconds), I do not recommend to
leverage Kylin this monument, but if your user are ok to see the data
several minutes even seconds before, that could be benefited by Kylin
streaming.

It's really depends on how fast your user want to see the result from the
source data.

Realtime monitoring and alert is not Kylin's target to serve:)

Thanks.


Best Regards!
---------------------

Luke Han

On Wed, Aug 5, 2015 at 2:43 PM, Siddharth Ubale <siddharth.ubale@syncoms.com
> wrote:

> Hi Luke,
>
> Thanks for the update!
>
> Our case is a real time streaming OLAP. We want to query the the data in
> OLAP system and it should respond with low latency(probably 1-2 seconds)
> and the state of the OLAP table is real time and not pre-calculated. In
> short, OLAP will function like OLTP..
> It is probably wishful to think that the system should provide data in
> milliseconds , however, if that can be provided it will suffice our needs.
>
>
> Thanks,
> Siddharth Ubale
>
> -----Original Message-----
> From: Luke Han [mailto:luke.hq@gmail.com]
> Sent: Monday, August 03, 2015 7:22 PM
> To: dev@kylin.incubator.apache.org
> Subject: Re: Kylin Real time
>
> Hi Siddharth,
>     Kylin's next majority release (0.8.x) will support Streaming OLAP
> which will coming in Q4 since it still under development now, as Hongbin
> mentioned above.
>     Could  you please drop me a mail about your case? I would like to
> better understand your scenario to well manage coming features?
>
>     Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <ma...@apache.org> wrote:
>
> > For current 0.7  releases, you cannot.
> >
> > Real time data processing and querying will be added in 0.8 release.
> > It is still under development and testing. We have achieved good
> > progress on it, please wait for announcements.
> >
> > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > siddharth.ubale@syncoms.com> wrote:
> >
> > > Hi ,
> > >
> > > I would like to ask whether Kylin can be used as a real time
> > > querying system?
> > > The process of building a cube , makes it look like a batch process
> > > after which the queries are with low latency.. however can We get a
> > > real time idea of what the OLAP system's state is at the query
> > > instance?
> > >
> > > Thanks,
> > > Siddharth
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

Re: Kylin Real time

Posted by Luke Han <lu...@gmail.com>.
Kylin is OLAP not OLTP, even with streaming feature, it still support Near
real-time, do not expected sub-second data latency (query latency still
will be sub-second latency).
For example, if your customer want to see user's action information on your
website "right now" (like you said milliseconds), I do not recommend to
leverage Kylin this monument, but if your user are ok to see the data
several minutes even seconds before, that could be benefited by Kylin
streaming.

It's really depends on how fast your user want to see the result from the
source data.

Realtime monitoring and alert is not Kylin's target to serve:)

Thanks.


Best Regards!
---------------------

Luke Han

On Wed, Aug 5, 2015 at 2:43 PM, Siddharth Ubale <siddharth.ubale@syncoms.com
> wrote:

> Hi Luke,
>
> Thanks for the update!
>
> Our case is a real time streaming OLAP. We want to query the the data in
> OLAP system and it should respond with low latency(probably 1-2 seconds)
> and the state of the OLAP table is real time and not pre-calculated. In
> short, OLAP will function like OLTP..
> It is probably wishful to think that the system should provide data in
> milliseconds , however, if that can be provided it will suffice our needs.
>
>
> Thanks,
> Siddharth Ubale
>
> -----Original Message-----
> From: Luke Han [mailto:luke.hq@gmail.com]
> Sent: Monday, August 03, 2015 7:22 PM
> To: dev@kylin.incubator.apache.org
> Subject: Re: Kylin Real time
>
> Hi Siddharth,
>     Kylin's next majority release (0.8.x) will support Streaming OLAP
> which will coming in Q4 since it still under development now, as Hongbin
> mentioned above.
>     Could  you please drop me a mail about your case? I would like to
> better understand your scenario to well manage coming features?
>
>     Thanks.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <ma...@apache.org> wrote:
>
> > For current 0.7  releases, you cannot.
> >
> > Real time data processing and querying will be added in 0.8 release.
> > It is still under development and testing. We have achieved good
> > progress on it, please wait for announcements.
> >
> > On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> > siddharth.ubale@syncoms.com> wrote:
> >
> > > Hi ,
> > >
> > > I would like to ask whether Kylin can be used as a real time
> > > querying system?
> > > The process of building a cube , makes it look like a batch process
> > > after which the queries are with low latency.. however can We get a
> > > real time idea of what the OLAP system's state is at the query
> > > instance?
> > >
> > > Thanks,
> > > Siddharth
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

RE: Kylin Real time

Posted by Siddharth Ubale <si...@syncoms.com>.
Hi Luke,

Thanks for the update!

Our case is a real time streaming OLAP. We want to query the the data in OLAP system and it should respond with low latency(probably 1-2 seconds) and the state of the OLAP table is real time and not pre-calculated. In short, OLAP will function like OLTP..
It is probably wishful to think that the system should provide data in milliseconds , however, if that can be provided it will suffice our needs.


Thanks,
Siddharth Ubale

-----Original Message-----
From: Luke Han [mailto:luke.hq@gmail.com] 
Sent: Monday, August 03, 2015 7:22 PM
To: dev@kylin.incubator.apache.org
Subject: Re: Kylin Real time

Hi Siddharth,
    Kylin's next majority release (0.8.x) will support Streaming OLAP which will coming in Q4 since it still under development now, as Hongbin mentioned above.
    Could  you please drop me a mail about your case? I would like to better understand your scenario to well manage coming features?

    Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <ma...@apache.org> wrote:

> For current 0.7  releases, you cannot.
>
> Real time data processing and querying will be added in 0.8 release. 
> It is still under development and testing. We have achieved good 
> progress on it, please wait for announcements.
>
> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale < 
> siddharth.ubale@syncoms.com> wrote:
>
> > Hi ,
> >
> > I would like to ask whether Kylin can be used as a real time 
> > querying system?
> > The process of building a cube , makes it look like a batch process 
> > after which the queries are with low latency.. however can We get a 
> > real time idea of what the OLAP system's state is at the query 
> > instance?
> >
> > Thanks,
> > Siddharth
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Kylin Real time

Posted by Luke Han <lu...@gmail.com>.
Hi Siddharth,
    Kylin's next majority release (0.8.x) will support Streaming OLAP which
will coming in Q4 since it still under development now, as Hongbin
mentioned above.
    Could  you please drop me a mail about your case? I would like to
better understand your scenario to well manage coming features?

    Thanks.




Best Regards!
---------------------

Luke Han

On Wed, Jul 29, 2015 at 2:08 PM, hongbin ma <ma...@apache.org> wrote:

> For current 0.7  releases, you cannot.
>
> Real time data processing and querying will be added in 0.8 release. It is
> still under development and testing. We have achieved good progress on it,
> please wait for announcements.
>
> On Wed, Jul 29, 2015 at 2:02 PM, Siddharth Ubale <
> siddharth.ubale@syncoms.com> wrote:
>
> > Hi ,
> >
> > I would like to ask whether Kylin can be used as a real time querying
> > system?
> > The process of building a cube , makes it look like a batch process after
> > which the queries are with low latency.. however can
> > We get a real time idea of what the OLAP system's state is at the query
> > instance?
> >
> > Thanks,
> > Siddharth
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>