You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2016/07/01 03:30:25 UTC

Re: Questions / Clarifications for 1.5.x

Hi Abhilash, welcome back;

Kylin 1.5 delivers a new plugable architecture, with many enhancements
especially on performance, like fast cubing algorithm, sharded storage
engine, etc. The facts 1) to 4) are unchanged in between I think; For the
issues like memory usage, you can open JIRA to us or submit a patch if you
have solved it;

1) Using Kylin's rest API to trigger a cube rebuild/refresh when the data
is changed in Hive;
2) No change I think; JIRA is welcomed;
3) What do you mean by "reload" here? If new segment is ready and query
comes, its dim snapshots will be loaded; as the it is immutable, a
"reload"/"refresh" is not needed;
4) The cap on snapshot is still there, you can customize the max size in
kylin.properties;
5) When deploy into cluster, the query servers are identical in role, that
means all servers can server all cubes;
6) You can play the sample cube shipped with kylin binary, which is
partitioned by a date column, then you can try the incremental build
function: https://kylin.apache.org/docs15/tutorial/kylin_sample.html



2016-06-29 13:56 GMT+08:00 Abhilash L L <ab...@infoworks.io>:

> Hello,
>
>    We plan to upgrade to Kylin 1.5.x series from 1.2 series. We want to ask
> the team a few questions / clarification before proceeding. Since we are
> looking at upgrading, please do let us know the behaviour for the latest
> 1.5.x release.
>
>
> 1) Updating facts
> Eg:
> Lets say im building a cube for jira tickets. List of tickets is my fact
> table. List of ticket status is my dimension table. Now if a ticket is
> updated from 'open' to 'in progress' how to tell kylin this change ?
>
> 2) Multiple dimensions in memory
> When we were on 1.2 versions, all the dimensions were kept in memory. So
> memory always keeps increasing. The in memory dimensions were not swapped
> in / out, causing tomcat OOM issues. Is this still the same in 1.5.x ?
>
> 3) Do these dimension snapshots get reloaded on a cube refresh / data
> appended ?
>
> 4) Maximum size for dimension
> There was an older thread talking about 300mb max snapshot size. Does this
> limitation still hold? Do very high cardinality dimension still build the
> trees on one node ?
>
>
> 5) Multiple query servers
> When there are multiple query servers, will one query server serve only one
> cube or all servers serve all cubes. If its one cube only in one server,
> how does kylin handle a server going down.
>
>
> 6) Incremental build
> Is there a document on how incremental build works ?  We want to understand
> limitations / assumptions for this.
>
> Regards,
> Abhilash
>



-- 
Best regards,

Shaofeng Shi

Re: Questions / Clarifications for 1.5.x

Posted by Abhilash L L <ab...@infoworks.io>.

Thanks for taking time to reply ShaoFeng,

1)
"Using Kylin's rest API to trigger a cube rebuild/refresh when the data
is changed in Hive;"
Rebuild / refresh builds the whole cube. I was looking at a way to somehow
present only the changed (updates / inserts) facts records.

2) "No change I think; JIRA is welcomed"
Okay added to Jira.
https://issues.apache.org/jira/browse/KYLIN-1843
https://issues.apache.org/jira/browse/KYLIN-1844

3) "Do these dimension snapshots get reloaded on a cube refresh / data
appended ?"
Wanted to know if data is appended or a segment is refreshed, will the
dimension snapshots be reloaded. Pointers to the code also will help.

4) "The cap on snapshot is still there, you can customize the max size in
kylin.properties"
Okay

5) "When deploy into cluster, the query servers are identical in role,
that means
all servers can server all cubes"
Okay. Can you please let us know where to look to figure out how kylin
decides which server to pass the request to. Also could you let us know how
it handles one of the query servers going down.

6) " You can play the sample cube shipped with kylin binary, which is
partitioned
by a date column, then you can try the incremental build function:
https://kylin.apache.org/docs15/tutorial/kylin_sample.html "
Okay, we will try it out. But, I feel it will help someone evaluating the
tool, if we list down the limitations / assumptions for an
incremental/segment build. Eg: How does pure distinct count (for int) work
when only of the data is refreshed. Also updates on dimension keys in the
fact table.

Regards,
Abhilash

On Fri, Jul 1, 2016 at 9:00 AM, ShaoFeng Shi <sh...@apache.org> wrote:

> Hi Abhilash, welcome back;
>
> Kylin 1.5 delivers a new plugable architecture, with many enhancements
> especially on performance, like fast cubing algorithm, sharded storage
> engine, etc. The facts 1) to 4) are unchanged in between I think; For the
> issues like memory usage, you can open JIRA to us or submit a patch if you
> have solved it;
>
> 1) Using Kylin's rest API to trigger a cube rebuild/refresh when the data
> is changed in Hive;
> 2) No change I think; JIRA is welcomed;
> 3) What do you mean by "reload" here? If new segment is ready and query
> comes, its dim snapshots will be loaded; as the it is immutable, a
> "reload"/"refresh" is not needed;
> 4) The cap on snapshot is still there, you can customize the max size in
> kylin.properties;
> 5) When deploy into cluster, the query servers are identical in role, that
> means all servers can server all cubes;
> 6) You can play the sample cube shipped with kylin binary, which is
> partitioned by a date column, then you can try the incremental build
> function: https://kylin.apache.org/docs15/tutorial/kylin_sample.html
>
>
>
> 2016-06-29 13:56 GMT+08:00 Abhilash L L <ab...@infoworks.io>:
>
> > Hello,
> >
> >    We plan to upgrade to Kylin 1.5.x series from 1.2 series. We want to
> ask
> > the team a few questions / clarification before proceeding. Since we are
> > looking at upgrading, please do let us know the behaviour for the latest
> > 1.5.x release.
> >
> >
> > 1) Updating facts
> > Eg:
> > Lets say im building a cube for jira tickets. List of tickets is my fact
> > table. List of ticket status is my dimension table. Now if a ticket is
> > updated from 'open' to 'in progress' how to tell kylin this change ?
> >
> > 2) Multiple dimensions in memory
> > When we were on 1.2 versions, all the dimensions were kept in memory. So
> > memory always keeps increasing. The in memory dimensions were not swapped
> > in / out, causing tomcat OOM issues. Is this still the same in 1.5.x ?
> >
> > 3) Do these dimension snapshots get reloaded on a cube refresh / data
> > appended ?
> >
> > 4) Maximum size for dimension
> > There was an older thread talking about 300mb max snapshot size. Does
> this
> > limitation still hold? Do very high cardinality dimension still build the
> > trees on one node ?
> >
> >
> > 5) Multiple query servers
> > When there are multiple query servers, will one query server serve only
> one
> > cube or all servers serve all cubes. If its one cube only in one server,
> > how does kylin handle a server going down.
> >
> >
> > 6) Incremental build
> > Is there a document on how incremental build works ?  We want to
> understand
> > limitations / assumptions for this.
> >
> > Regards,
> > Abhilash
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>