You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by zhong zhang <zz...@gmail.com> on 2016/01/18 04:04:40 UTC

internal hive table and build the cube backward

Hi All,

I'm wondering can I build the Kylin cube backward along the time. More
specifically, can I build the cube from the current time to six months ago
and then from six months ago to 12 months ago and go on? In this way, I can
have the latest six months' cube result first.

It's well known that the input of Kylin cube is hive table. Does it make
any difference
between using internal hive table and external hive table when building the
cube?

Best regards,
Zhong

Re: internal hive table and build the cube backward

Posted by yu feng <ol...@gmail.com>.
ShaoFeng Shi is right, kylin use hive command to generate  intermediate
table(take it as source data), and use hcatlog get data from hive in step2,
hive performance does have an impact on Kylin's performance, so a newer
version is recommended。

2016-01-20 8:05 GMT+08:00 ShaoFeng Shi <sh...@apache.org>:

> Only the first step actually, Kylin runs "hive -e" command to create an
> intermediate table; The following steps are running MR over the files under
> that table.
>
> 2016-01-20 4:18 GMT+08:00 zhong zhang <zz...@gmail.com>:
>
> > Hi Yu and Everyone,
> >
> > Just a little bit supplement, Hive definitely involves in the step of
> > Create
> > Intermediate Flat Hive Table and Build Dimension Dictionary. The question
> > is that does Hive involve in the following steps of building cuboids?
> >
> > Best regards,
> > Zhong
> >
> > On Sun, Jan 17, 2016 at 10:35 PM, yu feng <ol...@gmail.com> wrote:
> >
> > > Firstly, kylin do not distinguish which kind table in hive,  if only
> you
> > > can query it in hive, so the table can be normal table, external table,
> > > view or table with some serdes.
> > > then I think it is hard to build cube backward along the time in kylin.
> > > maybe someone has some good ideas at this point.
> > >
> > > 2016-01-18 11:04 GMT+08:00 zhong zhang <zz...@gmail.com>:
> > >
> > > > Hi All,
> > > >
> > > > I'm wondering can I build the Kylin cube backward along the time.
> More
> > > > specifically, can I build the cube from the current time to six
> months
> > > ago
> > > > and then from six months ago to 12 months ago and go on? In this
> way, I
> > > can
> > > > have the latest six months' cube result first.
> > > >
> > > > It's well known that the input of Kylin cube is hive table. Does it
> > make
> > > > any difference
> > > > between using internal hive table and external hive table when
> building
> > > the
> > > > cube?
> > > >
> > > > Best regards,
> > > > Zhong
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>

Re: internal hive table and build the cube backward

Posted by ShaoFeng Shi <sh...@apache.org>.
Only the first step actually, Kylin runs "hive -e" command to create an
intermediate table; The following steps are running MR over the files under
that table.

2016-01-20 4:18 GMT+08:00 zhong zhang <zz...@gmail.com>:

> Hi Yu and Everyone,
>
> Just a little bit supplement, Hive definitely involves in the step of
> Create
> Intermediate Flat Hive Table and Build Dimension Dictionary. The question
> is that does Hive involve in the following steps of building cuboids?
>
> Best regards,
> Zhong
>
> On Sun, Jan 17, 2016 at 10:35 PM, yu feng <ol...@gmail.com> wrote:
>
> > Firstly, kylin do not distinguish which kind table in hive,  if only you
> > can query it in hive, so the table can be normal table, external table,
> > view or table with some serdes.
> > then I think it is hard to build cube backward along the time in kylin.
> > maybe someone has some good ideas at this point.
> >
> > 2016-01-18 11:04 GMT+08:00 zhong zhang <zz...@gmail.com>:
> >
> > > Hi All,
> > >
> > > I'm wondering can I build the Kylin cube backward along the time. More
> > > specifically, can I build the cube from the current time to six months
> > ago
> > > and then from six months ago to 12 months ago and go on? In this way, I
> > can
> > > have the latest six months' cube result first.
> > >
> > > It's well known that the input of Kylin cube is hive table. Does it
> make
> > > any difference
> > > between using internal hive table and external hive table when building
> > the
> > > cube?
> > >
> > > Best regards,
> > > Zhong
> > >
> >
>



-- 
Best regards,

Shaofeng Shi

Re: internal hive table and build the cube backward

Posted by zhong zhang <zz...@gmail.com>.
Hi Yu and Everyone,

Just a little bit supplement, Hive definitely involves in the step of Create
Intermediate Flat Hive Table and Build Dimension Dictionary. The question
is that does Hive involve in the following steps of building cuboids?

Best regards,
Zhong

On Sun, Jan 17, 2016 at 10:35 PM, yu feng <ol...@gmail.com> wrote:

> Firstly, kylin do not distinguish which kind table in hive,  if only you
> can query it in hive, so the table can be normal table, external table,
> view or table with some serdes.
> then I think it is hard to build cube backward along the time in kylin.
> maybe someone has some good ideas at this point.
>
> 2016-01-18 11:04 GMT+08:00 zhong zhang <zz...@gmail.com>:
>
> > Hi All,
> >
> > I'm wondering can I build the Kylin cube backward along the time. More
> > specifically, can I build the cube from the current time to six months
> ago
> > and then from six months ago to 12 months ago and go on? In this way, I
> can
> > have the latest six months' cube result first.
> >
> > It's well known that the input of Kylin cube is hive table. Does it make
> > any difference
> > between using internal hive table and external hive table when building
> the
> > cube?
> >
> > Best regards,
> > Zhong
> >
>

Re: internal hive table and build the cube backward

Posted by zhong zhang <zz...@gmail.com>.
Hi Yu,

How is Kylin retrieving the data? Is it using Hive only for the metadata?
Or is it using Hive to retrieve the data for it?
If Kylin use Hive to retrieve the data for the build, then won't
performance of hive have an impact on Kylin's performance
as well?

I've also done some research for the above questions.

Based on the reference [1] (slide 28 and 29), the process of cube build is
like:

Cube build - Steps

1. Build dictionary from dimension tables (hive tables) on local disk. And
copy dictionary to HDFS.

2. Run Hive query to build a joined flatten table, which is also called
intermediate hive table.

3. Run map reduce job to build cuboids in HDFS sequence files from tier 1
to tier N

4. Calculate the key distribution of HDFS sequence files. And every split
the key space

into K regions.

5. Translate HDFS sequence files into HBase HFile

6. Bulk load the HFile into HBase



Question 1:

In the step 2, Kyline run Hive query to generate the intermediate hive
table. So Kylin does use Hive

to retrieve the data for the cube build. Am I right?


Question 2:

Based on my understanding, Kylin only needs to cooperate with Hive at step
1 and 2? After that,

Kylin does not need to retrieve data from Hive table for the map reduce
jobs?


[1]
http://www.slideshare.net/XuJiang2/kylin-hadoop-olap-engine/28?utm_source=slideview&utm_medium=ssemail&utm_campaign=share_clip


Best regards,

Zhong


On Sun, Jan 17, 2016 at 10:35 PM, yu feng <ol...@gmail.com> wrote:

> Firstly, kylin do not distinguish which kind table in hive,  if only you
> can query it in hive, so the table can be normal table, external table,
> view or table with some serdes.
> then I think it is hard to build cube backward along the time in kylin.
> maybe someone has some good ideas at this point.
>
> 2016-01-18 11:04 GMT+08:00 zhong zhang <zz...@gmail.com>:
>
> > Hi All,
> >
> > I'm wondering can I build the Kylin cube backward along the time. More
> > specifically, can I build the cube from the current time to six months
> ago
> > and then from six months ago to 12 months ago and go on? In this way, I
> can
> > have the latest six months' cube result first.
> >
> > It's well known that the input of Kylin cube is hive table. Does it make
> > any difference
> > between using internal hive table and external hive table when building
> the
> > cube?
> >
> > Best regards,
> > Zhong
> >
>

Re: internal hive table and build the cube backward

Posted by yu feng <ol...@gmail.com>.
Firstly, kylin do not distinguish which kind table in hive,  if only you
can query it in hive, so the table can be normal table, external table,
view or table with some serdes.
then I think it is hard to build cube backward along the time in kylin.
maybe someone has some good ideas at this point.

2016-01-18 11:04 GMT+08:00 zhong zhang <zz...@gmail.com>:

> Hi All,
>
> I'm wondering can I build the Kylin cube backward along the time. More
> specifically, can I build the cube from the current time to six months ago
> and then from six months ago to 12 months ago and go on? In this way, I can
> have the latest six months' cube result first.
>
> It's well known that the input of Kylin cube is hive table. Does it make
> any difference
> between using internal hive table and external hive table when building the
> cube?
>
> Best regards,
> Zhong
>