You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by 蒋旭 <ji...@qq.com> on 2015/02/09 16:25:21 UTC

proposal for real time support in kylin

Hi Guys,

I write a simple proposal of real time support in kylin as below. Please help to review!


1. Kafka + storm will build inverted index in memory.  These index will be inserted into hbase by batch (e.g. every 5 minutes). 


2. The inverted index in hbase will keep the short term data (e.g. 7 days). These index will be converted into data cube by batch (e.g. every 7 day).


3. The data cube in hbase will keep the long term data. 


4. Query engine will decide to use inverted index or data cube in hbase by time range. In future, the query engine can also use the in-memory inverted index in storm that can reduce the data latency from minutes to seconds.


Thanks
Jiang Xu

Re: proposal for real time support in kylin

Posted by Li Yang <li...@apache.org>.

I created a JIRA for this https://issues.apache.org/jira/browse/KYLIN-599

Further discussion goes there. :-)

On Tue, Feb 10, 2015 at 10:15 AM, vipul jhawar <vi...@gmail.com>
wrote:

> Hi Xu
>
> Could we just have Kafka and use its consumers alone instead of storm as
> well in the setup. If you could provide some more details on how a kafka +
> storm would be a better fit as it also introduces more complexity in the
> system instead of the simple log Q which we can get with kafka alone.
>
> Thanks
>
> On Mon, Feb 9, 2015 at 8:55 PM, 蒋旭 <ji...@qq.com> wrote:
>
> > Hi Guys,
> >
> > I write a simple proposal of real time support in kylin as below. Please
> > help to review!
> >
> >
> > 1. Kafka + storm will build inverted index in memory.  These index will
> be
> > inserted into hbase by batch (e.g. every 5 minutes).
> >
> >
> > 2. The inverted index in hbase will keep the short term data (e.g. 7
> > days). These index will be converted into data cube by batch (e.g. every
> 7
> > day).
> >
> >
> > 3. The data cube in hbase will keep the long term data.
> >
> >
> > 4. Query engine will decide to use inverted index or data cube in hbase
> by
> > time range. In future, the query engine can also use the in-memory
> inverted
> > index in storm that can reduce the data latency from minutes to seconds.
> >
> >
> > Thanks
> > Jiang Xu
>

Re: proposal for real time support in kylin

Posted by vipul jhawar <vi...@gmail.com>.

Hi Xu

Could we just have Kafka and use its consumers alone instead of storm as
well in the setup. If you could provide some more details on how a kafka +
storm would be a better fit as it also introduces more complexity in the
system instead of the simple log Q which we can get with kafka alone.

Thanks

On Mon, Feb 9, 2015 at 8:55 PM, 蒋旭 <ji...@qq.com> wrote:

> Hi Guys,
>
> I write a simple proposal of real time support in kylin as below. Please
> help to review!
>
>
> 1. Kafka + storm will build inverted index in memory.  These index will be
> inserted into hbase by batch (e.g. every 5 minutes).
>
>
> 2. The inverted index in hbase will keep the short term data (e.g. 7
> days). These index will be converted into data cube by batch (e.g. every 7
> day).
>
>
> 3. The data cube in hbase will keep the long term data.
>
>
> 4. Query engine will decide to use inverted index or data cube in hbase by
> time range. In future, the query engine can also use the in-memory inverted
> index in storm that can reduce the data latency from minutes to seconds.
>
>
> Thanks
> Jiang Xu

Re: proposal for real time support in kylin

Posted by Luke Han <lu...@gmail.com>.

Thanks Xu, you are back again:-)


Best Regards!
---------------------

Luke Han

2015-02-09 23:25 GMT+08:00 蒋旭 <ji...@qq.com>:

> Hi Guys,
>
> I write a simple proposal of real time support in kylin as below. Please
> help to review!
>
>
> 1. Kafka + storm will build inverted index in memory.  These index will be
> inserted into hbase by batch (e.g. every 5 minutes).
>
>
> 2. The inverted index in hbase will keep the short term data (e.g. 7
> days). These index will be converted into data cube by batch (e.g. every 7
> day).
>
>
> 3. The data cube in hbase will keep the long term data.
>
>
> 4. Query engine will decide to use inverted index or data cube in hbase by
> time range. In future, the query engine can also use the in-memory inverted
> index in storm that can reduce the data latency from minutes to seconds.
>
>
> Thanks
> Jiang Xu