You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by hongbin ma <ma...@apache.org> on 2015/11/04 04:42:28 UTC

Wish list for new cluster management & job dispatcher scheme

Since we're working on designing new cluster management for manage LB
servers and streaming job slaves.
I think it's a good opportunity for kylin user to share their pain points
and wish list help to improve kylin use experience.

Here're mine:

1. Cluster configuration is troublesome. Currently we have to write down
the server list in kylin.properties and assign a role to each server. This
is hard to maintain. The new cluster management should automate server
discovery, leader selection and failover.

2. Log analyze is not easy if multiple servers are running at the same
time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for example). For
query side, we should be able to answer questions like "I submitted a query
XXXXX at 10:00, please check why it's slow?", "what are the most time
consuming queries recently (and its related cube name)?". For streaming job
dispatcher side, we should be able to identify failed batches more
quickly(and resume it), as well as a better management of each batch's
build log (when you have tens of slaves, it's difficult to find where is a
batch's build log is). A related JIRA ticket is
https://issues.apache.org/jira/browse/KYLIN-1079

3. Streaming batch jobs should be horizontally scalable. If a batch is
found to be too big to fit into a single JVM, we should detect it and
divide the batch into smaller pieces so that we can dispatch the job to
multiple JVMs, and let subsequent auto-merge job to merge them. Related
JIRA is https://issues.apache.org/jira/browse/KYLIN-1042

4. Auto-merge job fail will lead to accumulating hundreds of segments, this
will greatly harm query performance. related JIRA:
https://issues.apache.org/jira/browse/KYLIN-1038


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Wish list for new cluster management & job dispatcher scheme

Posted by Luke Han <lu...@gmail.com>.

Cool, please set fix version to 2.1 first and let's try arrange them to
further release plan.


Best Regards!
---------------------

Luke Han

On Thu, Nov 12, 2015 at 11:00 AM, hongbin ma <ma...@apache.org> wrote:

> I intended to finalize these requirements through discussion here and then
> convert to JIRA
>
> On Wed, Nov 11, 2015 at 5:54 PM, Li Yang <li...@apache.org> wrote:
>
> > Should these be converted into some JIRA to ensure we don't forget.
> >
> > On Fri, Nov 6, 2015 at 2:15 PM, Luke Han <lu...@gmail.com> wrote:
> >
> > > #5 should keep same logical as today's cube's one, each cube/streaming
> > > could have it's own notification mail lists.
> > >
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Fri, Nov 6, 2015 at 10:26 AM, hongbin ma <ma...@apache.org>
> > wrote:
> > >
> > > > 5. For each streaming case maintains a receiver mail list (support
> > > multiple
> > > > receivers)  for all notification emails(including gaps notification,
> > etc)
> > > >
> > > > On Thu, Nov 5, 2015 at 11:19 AM, Li Yang <li...@apache.org> wrote:
> > > >
> > > > > Very good inputs.
> > > > >
> > > > > On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Since we're working on designing new cluster management for
> manage
> > LB
> > > > > > servers and streaming job slaves.
> > > > > > I think it's a good opportunity for kylin user to share their
> pain
> > > > points
> > > > > > and wish list help to improve kylin use experience.
> > > > > >
> > > > > > Here're mine:
> > > > > >
> > > > > > 1. Cluster configuration is troublesome. Currently we have to
> write
> > > > down
> > > > > > the server list in kylin.properties and assign a role to each
> > server.
> > > > > This
> > > > > > is hard to maintain. The new cluster management should automate
> > > server
> > > > > > discovery, leader selection and failover.
> > > > > >
> > > > > > 2. Log analyze is not easy if multiple servers are running at the
> > > same
> > > > > > time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for
> > > example).
> > > > > For
> > > > > > query side, we should be able to answer questions like "I
> > submitted a
> > > > > query
> > > > > > XXXXX at 10:00, please check why it's slow?", "what are the most
> > time
> > > > > > consuming queries recently (and its related cube name)?". For
> > > streaming
> > > > > job
> > > > > > dispatcher side, we should be able to identify failed batches
> more
> > > > > > quickly(and resume it), as well as a better management of each
> > > batch's
> > > > > > build log (when you have tens of slaves, it's difficult to find
> > where
> > > > is
> > > > > a
> > > > > > batch's build log is). A related JIRA ticket is
> > > > > > https://issues.apache.org/jira/browse/KYLIN-1079
> > > > > >
> > > > > > 3. Streaming batch jobs should be horizontally scalable. If a
> batch
> > > is
> > > > > > found to be too big to fit into a single JVM, we should detect it
> > and
> > > > > > divide the batch into smaller pieces so that we can dispatch the
> > job
> > > to
> > > > > > multiple JVMs, and let subsequent auto-merge job to merge them.
> > > Related
> > > > > > JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
> > > > > >
> > > > > > 4. Auto-merge job fail will lead to accumulating hundreds of
> > > segments,
> > > > > this
> > > > > > will greatly harm query performance. related JIRA:
> > > > > > https://issues.apache.org/jira/browse/KYLIN-1038
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > >
> > > > > > *Bin Mahone | 马洪宾*
> > > > > > Apache Kylin: http://kylin.io
> > > > > > Github: https://github.com/binmahone
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Wish list for new cluster management & job dispatcher scheme

Posted by hongbin ma <ma...@apache.org>.

I intended to finalize these requirements through discussion here and then
convert to JIRA

On Wed, Nov 11, 2015 at 5:54 PM, Li Yang <li...@apache.org> wrote:

> Should these be converted into some JIRA to ensure we don't forget.
>
> On Fri, Nov 6, 2015 at 2:15 PM, Luke Han <lu...@gmail.com> wrote:
>
> > #5 should keep same logical as today's cube's one, each cube/streaming
> > could have it's own notification mail lists.
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Fri, Nov 6, 2015 at 10:26 AM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > 5. For each streaming case maintains a receiver mail list (support
> > multiple
> > > receivers)  for all notification emails(including gaps notification,
> etc)
> > >
> > > On Thu, Nov 5, 2015 at 11:19 AM, Li Yang <li...@apache.org> wrote:
> > >
> > > > Very good inputs.
> > > >
> > > > On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org>
> > > wrote:
> > > >
> > > > > Since we're working on designing new cluster management for manage
> LB
> > > > > servers and streaming job slaves.
> > > > > I think it's a good opportunity for kylin user to share their pain
> > > points
> > > > > and wish list help to improve kylin use experience.
> > > > >
> > > > > Here're mine:
> > > > >
> > > > > 1. Cluster configuration is troublesome. Currently we have to write
> > > down
> > > > > the server list in kylin.properties and assign a role to each
> server.
> > > > This
> > > > > is hard to maintain. The new cluster management should automate
> > server
> > > > > discovery, leader selection and failover.
> > > > >
> > > > > 2. Log analyze is not easy if multiple servers are running at the
> > same
> > > > > time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for
> > example).
> > > > For
> > > > > query side, we should be able to answer questions like "I
> submitted a
> > > > query
> > > > > XXXXX at 10:00, please check why it's slow?", "what are the most
> time
> > > > > consuming queries recently (and its related cube name)?". For
> > streaming
> > > > job
> > > > > dispatcher side, we should be able to identify failed batches more
> > > > > quickly(and resume it), as well as a better management of each
> > batch's
> > > > > build log (when you have tens of slaves, it's difficult to find
> where
> > > is
> > > > a
> > > > > batch's build log is). A related JIRA ticket is
> > > > > https://issues.apache.org/jira/browse/KYLIN-1079
> > > > >
> > > > > 3. Streaming batch jobs should be horizontally scalable. If a batch
> > is
> > > > > found to be too big to fit into a single JVM, we should detect it
> and
> > > > > divide the batch into smaller pieces so that we can dispatch the
> job
> > to
> > > > > multiple JVMs, and let subsequent auto-merge job to merge them.
> > Related
> > > > > JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
> > > > >
> > > > > 4. Auto-merge job fail will lead to accumulating hundreds of
> > segments,
> > > > this
> > > > > will greatly harm query performance. related JIRA:
> > > > > https://issues.apache.org/jira/browse/KYLIN-1038
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > *Bin Mahone | 马洪宾*
> > > > > Apache Kylin: http://kylin.io
> > > > > Github: https://github.com/binmahone
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Wish list for new cluster management & job dispatcher scheme

Posted by Li Yang <li...@apache.org>.

Should these be converted into some JIRA to ensure we don't forget.

On Fri, Nov 6, 2015 at 2:15 PM, Luke Han <lu...@gmail.com> wrote:

> #5 should keep same logical as today's cube's one, each cube/streaming
> could have it's own notification mail lists.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Nov 6, 2015 at 10:26 AM, hongbin ma <ma...@apache.org> wrote:
>
> > 5. For each streaming case maintains a receiver mail list (support
> multiple
> > receivers)  for all notification emails(including gaps notification, etc)
> >
> > On Thu, Nov 5, 2015 at 11:19 AM, Li Yang <li...@apache.org> wrote:
> >
> > > Very good inputs.
> > >
> > > On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org>
> > wrote:
> > >
> > > > Since we're working on designing new cluster management for manage LB
> > > > servers and streaming job slaves.
> > > > I think it's a good opportunity for kylin user to share their pain
> > points
> > > > and wish list help to improve kylin use experience.
> > > >
> > > > Here're mine:
> > > >
> > > > 1. Cluster configuration is troublesome. Currently we have to write
> > down
> > > > the server list in kylin.properties and assign a role to each server.
> > > This
> > > > is hard to maintain. The new cluster management should automate
> server
> > > > discovery, leader selection and failover.
> > > >
> > > > 2. Log analyze is not easy if multiple servers are running at the
> same
> > > > time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for
> example).
> > > For
> > > > query side, we should be able to answer questions like "I submitted a
> > > query
> > > > XXXXX at 10:00, please check why it's slow?", "what are the most time
> > > > consuming queries recently (and its related cube name)?". For
> streaming
> > > job
> > > > dispatcher side, we should be able to identify failed batches more
> > > > quickly(and resume it), as well as a better management of each
> batch's
> > > > build log (when you have tens of slaves, it's difficult to find where
> > is
> > > a
> > > > batch's build log is). A related JIRA ticket is
> > > > https://issues.apache.org/jira/browse/KYLIN-1079
> > > >
> > > > 3. Streaming batch jobs should be horizontally scalable. If a batch
> is
> > > > found to be too big to fit into a single JVM, we should detect it and
> > > > divide the batch into smaller pieces so that we can dispatch the job
> to
> > > > multiple JVMs, and let subsequent auto-merge job to merge them.
> Related
> > > > JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
> > > >
> > > > 4. Auto-merge job fail will lead to accumulating hundreds of
> segments,
> > > this
> > > > will greatly harm query performance. related JIRA:
> > > > https://issues.apache.org/jira/browse/KYLIN-1038
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

Re: Wish list for new cluster management & job dispatcher scheme

Posted by Luke Han <lu...@gmail.com>.

#5 should keep same logical as today's cube's one, each cube/streaming
could have it's own notification mail lists.




Best Regards!
---------------------

Luke Han

On Fri, Nov 6, 2015 at 10:26 AM, hongbin ma <ma...@apache.org> wrote:

> 5. For each streaming case maintains a receiver mail list (support multiple
> receivers)  for all notification emails(including gaps notification, etc)
>
> On Thu, Nov 5, 2015 at 11:19 AM, Li Yang <li...@apache.org> wrote:
>
> > Very good inputs.
> >
> > On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org>
> wrote:
> >
> > > Since we're working on designing new cluster management for manage LB
> > > servers and streaming job slaves.
> > > I think it's a good opportunity for kylin user to share their pain
> points
> > > and wish list help to improve kylin use experience.
> > >
> > > Here're mine:
> > >
> > > 1. Cluster configuration is troublesome. Currently we have to write
> down
> > > the server list in kylin.properties and assign a role to each server.
> > This
> > > is hard to maintain. The new cluster management should automate server
> > > discovery, leader selection and failover.
> > >
> > > 2. Log analyze is not easy if multiple servers are running at the same
> > > time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for example).
> > For
> > > query side, we should be able to answer questions like "I submitted a
> > query
> > > XXXXX at 10:00, please check why it's slow?", "what are the most time
> > > consuming queries recently (and its related cube name)?". For streaming
> > job
> > > dispatcher side, we should be able to identify failed batches more
> > > quickly(and resume it), as well as a better management of each batch's
> > > build log (when you have tens of slaves, it's difficult to find where
> is
> > a
> > > batch's build log is). A related JIRA ticket is
> > > https://issues.apache.org/jira/browse/KYLIN-1079
> > >
> > > 3. Streaming batch jobs should be horizontally scalable. If a batch is
> > > found to be too big to fit into a single JVM, we should detect it and
> > > divide the batch into smaller pieces so that we can dispatch the job to
> > > multiple JVMs, and let subsequent auto-merge job to merge them. Related
> > > JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
> > >
> > > 4. Auto-merge job fail will lead to accumulating hundreds of segments,
> > this
> > > will greatly harm query performance. related JIRA:
> > > https://issues.apache.org/jira/browse/KYLIN-1038
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Wish list for new cluster management & job dispatcher scheme

Posted by hongbin ma <ma...@apache.org>.

5. For each streaming case maintains a receiver mail list (support multiple
receivers)  for all notification emails(including gaps notification, etc)

On Thu, Nov 5, 2015 at 11:19 AM, Li Yang <li...@apache.org> wrote:

> Very good inputs.
>
> On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org> wrote:
>
> > Since we're working on designing new cluster management for manage LB
> > servers and streaming job slaves.
> > I think it's a good opportunity for kylin user to share their pain points
> > and wish list help to improve kylin use experience.
> >
> > Here're mine:
> >
> > 1. Cluster configuration is troublesome. Currently we have to write down
> > the server list in kylin.properties and assign a role to each server.
> This
> > is hard to maintain. The new cluster management should automate server
> > discovery, leader selection and failover.
> >
> > 2. Log analyze is not easy if multiple servers are running at the same
> > time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for example).
> For
> > query side, we should be able to answer questions like "I submitted a
> query
> > XXXXX at 10:00, please check why it's slow?", "what are the most time
> > consuming queries recently (and its related cube name)?". For streaming
> job
> > dispatcher side, we should be able to identify failed batches more
> > quickly(and resume it), as well as a better management of each batch's
> > build log (when you have tens of slaves, it's difficult to find where is
> a
> > batch's build log is). A related JIRA ticket is
> > https://issues.apache.org/jira/browse/KYLIN-1079
> >
> > 3. Streaming batch jobs should be horizontally scalable. If a batch is
> > found to be too big to fit into a single JVM, we should detect it and
> > divide the batch into smaller pieces so that we can dispatch the job to
> > multiple JVMs, and let subsequent auto-merge job to merge them. Related
> > JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
> >
> > 4. Auto-merge job fail will lead to accumulating hundreds of segments,
> this
> > will greatly harm query performance. related JIRA:
> > https://issues.apache.org/jira/browse/KYLIN-1038
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Wish list for new cluster management & job dispatcher scheme

Posted by Li Yang <li...@apache.org>.

Very good inputs.

On Wed, Nov 4, 2015 at 11:42 AM, hongbin ma <ma...@apache.org> wrote:

> Since we're working on designing new cluster management for manage LB
> servers and streaming job slaves.
> I think it's a good opportunity for kylin user to share their pain points
> and wish list help to improve kylin use experience.
>
> Here're mine:
>
> 1. Cluster configuration is troublesome. Currently we have to write down
> the server list in kylin.properties and assign a role to each server. This
> is hard to maintain. The new cluster management should automate server
> discovery, leader selection and failover.
>
> 2. Log analyze is not easy if multiple servers are running at the same
> time.  (https://issues.apache.org/jira/browse/KYLIN-1124 for example). For
> query side, we should be able to answer questions like "I submitted a query
> XXXXX at 10:00, please check why it's slow?", "what are the most time
> consuming queries recently (and its related cube name)?". For streaming job
> dispatcher side, we should be able to identify failed batches more
> quickly(and resume it), as well as a better management of each batch's
> build log (when you have tens of slaves, it's difficult to find where is a
> batch's build log is). A related JIRA ticket is
> https://issues.apache.org/jira/browse/KYLIN-1079
>
> 3. Streaming batch jobs should be horizontally scalable. If a batch is
> found to be too big to fit into a single JVM, we should detect it and
> divide the batch into smaller pieces so that we can dispatch the job to
> multiple JVMs, and let subsequent auto-merge job to merge them. Related
> JIRA is https://issues.apache.org/jira/browse/KYLIN-1042
>
> 4. Auto-merge job fail will lead to accumulating hundreds of segments, this
> will greatly harm query performance. related JIRA:
> https://issues.apache.org/jira/browse/KYLIN-1038
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>